Just enough Scala for Spark
In this tutorial you will learn just enough Scala for Spark, it's like a quick guide for Scala basics needed for Spark programming, Scala syntax and few Scala examples.
Well, you can't become Scala expert in a day but after reading this post you will be able to write Spark programs. I will be using Spark-shell to run Scala commands, so no installation needed if you have Spark shell running on your machine. I would encourage you to run these commands side-by-side on your machine.
Staring with printing "Hello World", for example,
scala> println("Hello World")
Hello World
For comments you can use double forward slash, or for multiline comments you can use similar syntax like Java. For example, ignore the pipe character it's because I am using spark-shell.
scala> // Hello Data Nebulae - This is single line comment
scala> /* Hello World
| This is multi-line comment
| Data Nebulae
| */
We have two types of variables in Scala - mutable and immutable variables. Mutable variable are defined with var keyword and immutable variable with val keyword.
You can't re-assign immutable variables. For example,
scala> val myNumber :Int = 7
myNumber: Int = 7
scala> var myWord :String = "Hello"
myWord: String = Hello
Because myNumber is immutable variable so re-assignment failed
scala> myNumber = 10
<console>:25: error: reassignment to val
myNumber = 10
scala> myWord = "Dataneb"
myWord: String = Dataneb
You can specify datatype (Int, Double, Boolean, String) in front of variable name, if not Scala compiler will automatically assign the type (called variable type inference).
scala> val myNumber :Int = 10
myNumber: Int = 10
scala> val myFlag = true
myFlag: Boolean = true
You can also assign variables in pairs, basically tuples similar to Python,
scala> val (x, y) = (1, 5)
x: Int = 1
y: Int = 5
keep going..
scala> var (x, y, z) = (1, 2, "Hello")
x: Int = 1
y: Int = 2
z: String = Hello
You can pass these variables to println function
scala> println (x)
1
String interpolation, like you do in other languages s with double quotes;
scala> println(s"Value of x is: $x")
Value of x is: 1
Similar to other languages, you can create a range with step-size and print for each element.
scala> (1 to 5).foreach(println)
1
2
3
4
5
scala> (5 to 1 by -1)
res144: scala.collection.immutable.Range = Range(5, 4, 3, 2, 1)
scala> (5 to 1 by -2)
res145: scala.collection.immutable.Range = Range(5, 3, 1)
Strings are surrounded by double quotes and characters with single quotes, for example,
scala> "Hello Word"
res111: String = Hello Word
scala> 'H'
res112: Char = H
scala> :type ('H')
Char
You can apply similar methods like other languages, length, substring, replace etc, for example
scala> "Hello World".length
res113: Int = 11
scala> "Hello World".size
res1: Int = 11
scala> "Hello World".toUpperCase
res2: String = HELLO WORLD
scala> "Hello World".contains('H')
res5: Boolean = true
scala> 19.toHexString
res4: String = 13
scala> "Hello World".take(3)
res114: String = Hel
scala> "Hello World".drop(3)
res115: String = lo World
scala> "Hello World".substring(3,6)
res116: String = "lo "
scala> "Hello World".replace("H","3")
res123: String = 3ello World
scala> "Hello".map(x=>(x,1))
res7: scala.collection.immutable.IndexedSeq[(Char, Int)] = Vector((H,1), (e,1), (l,1), (l,1), (o,1))
Array, List, Map, Set - behaves similarly like other languages data structures
scala> val a = Array("Hello", "World", "Scala", "Spark")
a: Array[String] = Array(Hello, World, Scala, Spark)
// you can access the elements with index positions
scala> a(0)
res159: String = Hello
scala> (a(0),a(3))
res160: (String, String) = (Hello,Spark)
Similarly List..
// List of Integers
scala> val l = List(1, 2, 3, 4, 5)
l: List[Int] = List(1, 2, 3, 4, 5)
// List of strings
scala> val strings = List("Hello", "World", "Dataneb", "Spark")
strings: List[String] = List(Hello, World, Dataneb, Spark)
// List of List
scala> val listOfList = List(List(1,2,3), List(2,6,7), List(2,5,3))
listOfList: List[List[Int]] = List(List(1, 2, 3), List(2, 6, 7), List(2, 5, 3))
scala> val emptyList = List()
emptyList: List[Nothing] = List()
Similarly Map..
scala> val m = Map("one" -> 1, "two" -> 2 )
m: scala.collection.immutable.Map[String,Int] = Map(one -> 1, two -> 2)
scala> m("two")
res163: Int = 2
Set, returns boolean
scala> val s = Set("Apple", "Orange", "Banana")
s: scala.collection.immutable.Set[String] = Set(Apple, Orange, Banana)
scala> s("Apple")
res164: Boolean = true
scala> s("Grapes")
res165: Boolean = false
Arithmetic operations + (adds), -(subtracts), *(multiply), / (divide), %(remainder) for example,
scala> val (x, y) = (5, 8)
x: Int = 5
y: Int = 8
scala> y%x
res95: Int = 3
scala> res95 + 7
res110: Int = 10
scala> "Hello" + " World"
res0: String = Hello World
Relational operators ==, !=, <, >, >=, <= for example,
scala> y > x
res96: Boolean = true
Logical operators &&, ||, ! for example,
scala> !(y>x && x>y)
res98: Boolean = true
Assignment operators =, +=, %= etc for example, like other languages x+=y is same as x=x+y;
scala> var (x, y) = (5, 8)
x: Int = 5
y: Int = 8
scala> x+=y
scala> x
res102: Int = 13
Array of integers, with println and index
scala> val a = Array(1, 2, 3)
a: Array[Int] = Array(1, 2, 3)
scala> println(s"Sum is ${a(0) + a(1) + a(2)}")
Sum is 6
Defining function has also similar syntax (ignore | character), (Int, Int) => (Int, Int) means function takes two integer argument and returns two integers.
scala> def squareOfNumbers(x: Int, y: Int): (Int,Int) = {(x*x, y*y)
| // for multiline you have to use curly {} brackets
| }
squareOfNumbers: (x: Int, y: Int)(Int, Int)
scala> squareOfNumbers(2,3)
res131: (Int, Int) = (4,9)
Lambda function, if you will not mention datatype, Scala compiler will automatically decide it (inference).
scala> (x:Int) => x+x
res132: Int => Int = <function1>
Int => Int means function takes integer return integer
scala> val func: Int => Int = x => x + x
func: Int => Int = <function1>
scala> func(3)
res133: Int = 6
Takes two integer and returns one integer, first _ for first input and so on..
scala> val underscoreFunc: (Int, Int) => Int = (_ * 3 + _ * 2)
underscoreFunc: (Int, Int) => Int = <function2>
scala> underscoreFunc(7, 5)
res134: Int = 31
if-else statements, for example
scala> x
res139: Int = 5
scala> if (x==5) { println("five") } // curly braces not needed here but in case of multiline program
five
scala> println(if (x==4) println("Hello") else "Bye")
Bye
Loops, while, do-while and for loop
scala> while (i<5) {println(i); i+=1}
0
1
2
3
4
scala> do {println(i); i-=1} while (i>0)
5
4
3
2
1
In Scala, <- is like a generator, read like x in range(1 to 5) similar to Python
scala> for (x <- 1 to 5) println(x)
1
2
3
4
5
Pattern matching, for example;
scala> def patternMatch (x: Int) :String = x match {
| case 1 => "one"
| case 2 => "two"
| case _ => "unknown"
| }
patternMatch: (x: Int)String
scala> patternMatch(2)
res40: String = two
scala> patternMatch(4)
res41: String = unknown
Classes can be defined like other languages, for example
scala> class Dog(breed: String){
| var br: String = breed
| def bark = "Woof woof!"
| private def eat(food: String) =
| println(s"I am eating $food")
| }
defined class Dog
scala> val myDog = new Dog("pitbull")
myDog: Dog = Dog@62882596
scala> myDog.br
res155: String = pitbull
scala> myDog.bark
res156: String = Woof woof!
Case classes, these will be useful while performing data operations, for example
scala> case class Order(orderNum: Int, orderItem: String)
defined class Order
scala> val myOrder = Order(123, "iPhone")
myOrder: Order = Order(123,iPhone)
scala> val anotherOrder = Order(124, "macBook")
anotherOrder: Order = Order(124, macBook)
scala> myOrder.orderItem
res158: String = iPhone
Exercise
For Spark, most of the time you will be writing lambda functions. I have hardly seen complex functions written to transform the data in Spark. Spark has built-in transformations which takes care of complex transformations which you will learn soon.
For practice, try these examples.
Example 1: Area of Circle
scala> def areaCircle(radius:Double ) : Double = 3.14 * radius * radius
areaCircle: (radius: Double)Double
scala> areaCircle(5)
res17: Double = 78.5
Example 2: Sum of Squares of input numbers
scala> def sumOfSquares(x: Int, y:Int) : Int = x*x + y*y
sumOfSquares: (x: Int, y: Int)Int
scala> sumOfSquares(2,3)
res18: Int = 13
Example 3: Reverse the Sign of input number
scala> def reverseTheSign (x: Int) : Int = if (x>0) -x else -x
reverseTheSign: (x: Int)Int
scala> reverseTheSign(-6)
res23: Int = 6
scala> reverseTheSign(6)
res24: Int = -6
Example 4: Factorial of a number (to explain recursion), note how we are calling func within func;
scala> def factorial (x: Int) :Int = if (x==1) x else factorial(x-1)*x
factorial: (x: Int)Int
scala> factorial(4)
res26: Int = 24
Example 5: Defining objects and methods, you can define it like (ignore |)
scala> object MyObject{
| val MyVal = 1
| def MyMethod = "Hello"
| }
defined object MyObject
scala> MyObject.MyMethod
res30: String = Hello
for example;
scala> object Foo {val x = 1}
defined object Foo
scala> object Bar {val x = 2}
defined object Bar
scala> object fooBar {
| val y = Bar.x
| }
defined object fooBar
scala> fooBar.y
res31: Int = 2
Example 6: Sum of Squares using Lambda or anonymous func
scala> val z = (x:Int, y:Int) => x*x + y*y
z: (Int, Int) => Int = <function2>
scala> z(2,3)
res34: Int = 13
Example 7: Filtering the list with anonymous func
scala> List(1,2,3,4,5,6).filter(x => x % 2 == 0)
res39: List[Int] = List(2, 4, 6)
Example 8: For loops with yield
scala> for (x <- 1 to 5) yield x
res42: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 3, 4, 5)
scala> for (x <- 1 to 3; y <- Array("Hello","World")) yield (x, y)
res47: scala.collection.immutable.IndexedSeq[(Int, String)] = Vector((1,Hello), (1,World), (2,Hello), (2,World), (3,Hello), (3,World))
That's all guys! If you have any question please mention in the comments section below. Thank you!
Navigation menu
1. Apache Spark and Scala Installation
2. Getting Familiar with Scala IDE
3. Spark data structure basics
4. Spark Shell
5. Reading data files in Spark
6. Writing data files in Spark
7. Spark streaming
Commentaires