Jason Sando

Wednesday, April 13, 2011

First steps in Scala


I've followed Scala for the last couple years, watching other engineers' excitement and confusion, looking for some opportunity to try it myself.

I've worked in assembler, C, C++, C#, HTML, JavaScript, VB, Fortran, RPG, etc ... a pretty eclectic mix. Recently I did some work in Vala, a new experimental language based on C# but compiling to native code using no VM or garbage collection, just reference counting. I say this just for context, that I'm used to learning new languages and platforms. Although most of them are C-syntax based, so that probably colors my expectations a bit.

Scala compiles to Java bytecode and runs on any JVM. I've been working in Java since 1996, around Java 1.0.2. I'm currently using Java in several embedded devices, and although the benefits of having the whole JRE is fantastic, in terms of saving development time, the cost is a bloated runtime and slow startup times. I'm constantly nitpicking how many jars our projects depend on, analyzing which libraries are doing lots of classloading or garbage collection, etc. I found for example that the standard Java XML parsers create lots of garbage, and that JAXB generates thousands of classes at runtime ... which end up as garbage.

In order to use Scala in this type of environment, I'd need to add its 6Mb jar file. That's one strike against Scala already. However its easy enough to stomach the 6Mb on the server-side, where we already put up with a plethora of bloatware like Hibernate. Bloatware sucks, but the server has enough RAM to handle it, so the productivity gains outweigh the cost. Assuming there is a productivity gain.

I set out to write a trivial program in Scala, just to break the ice. I decided a simple Number Guessing Game would do. Here's the final program, and a summary of my notes:

object NumberGuess {

    def main(args: Array[String]) {

        val number = new java.util.Random().nextInt(100) + 1
        printf("I'm thinking of a number between 1 and 100, can you guess it?\n");

        def playGame: Boolean = {
            for (i <- 0 until 5) {
                printf("Guess %d - enter a number: ", i + 1)
                val guess = readLine().toInt

                if (guess == number) {
                    return true
                } else if (guess < number) {
                    printf("Too low!\n")
                } else {
                    printf("Too high!\n")
                }
            }
            return false
        }

        if (playGame)
            printf("You guessed it! My number was %d\n\n", number)
        else
            printf ("Sorry, better luck next time!\n");
    }
}


I think this is more complicated than it needed to be, as a result of Scala not having a 'break' keyword. That's right, Scala has no 'break' or 'continue' within a 'for' or 'while' loop. The reason given is that this is allegedly bad form, leading to difficult to read code. The 'workaround' is to pull the 'for' loop into a function, and then 'return' from it where you wanted to break.

Wait ... what? After spending 30 years trying to refactor code to have a single exit point, now the only way to make this work is to return from the middle of a function, because 'break' is bad?

I thought the lack of 'break' would be mentioned in one of the several references I was using, but I didn't find it (Programming in Scala, Scala by Example, and the Scala Language Reference).

In Scala 2.8 they now have an extra import I could do, which would allow me to wrap the loop in a breakable {} block and then call 'break', which essentially just throws an exception. For performance critical code, that to me does not sound practical at all. Performance bottlenecks in Java are generally tied to object creation / garbage collection, and that's what throwing exceptions does.

I found by looking at sample code (the code for the Scala compiler, which is included in the binary distribution) that there are some implicit functions defined. These include printf, println, readLine, and others. Looking at the source for these in Predef.scala, I see they just call Console.printf, Console.println, and so on. However, I don't see Console.flush(). Seems maybe an oversight? I've had more than one program not have its output showing up at the right time due to not flushing immediately. To clarify the point, its that this is inconsistent. Of course we can call Console.flush(), but then why have all these predefined functions, why not just pre-import Console and then its all more explicit and consistent.

In the code above you can see I call printf() and then readLine(). Why not just call readLine() with a printf spec? Because that threw a funky Scala exception:

java.util.IllegalFormatConversionException: d != scala.collection.mutable.WrappedArray$ofRef

What's worse, the line number in the exception was the line number of the for() loop, not the line number of the call to readLine(). Very bad!

The Scala compiler is VERY SLOW compared to javac. Compiler speed counts. Ask any Flex developer if they're happy with mxmlc's compile speed ;)

One thing I'm confused about is why readLine() has optional empty parenthesis after it (you can safely omit them and it compiles and runs fine). BUT, the call to toInt does not allow parenthesis. Coming from a C syntax background (C, Java, C#, etc), I'm used to the parens being there to differentiate between a function call versus a field access.

Final little nitpick ... the Scala sources use an indent of 2. I did that way back in the day, oh, about 15 years ago. Pretty much everything settled on an indent of 4 in the last 15 years. I remember I used to like 2 because of the density of code, but now it looks too compact to me. Like, too easy to misread it. We'll see if I can get used to that again.

That last point reminds me how much I would really like to have a code formatting IDE that doesn't allow alternate formatting, sort of like the old VB tokenizer used to do. You'd type in some text, hit 'enter', and VB would reformat it all to the 'official' format. Developers spend too much time doing a piss-poor job formatting their code, if the IDE could really just take care of that it would be for the best.

1 Comments:

  • Nice post!

    The explanation for the () at the end of a parameterless call in Scala was to include them if there is a side-effect as a result of the call, and that you should drop them if there is not. That confused me too and it still throws me coming from Java and C#.

    I agree, the Scala compiler is very, very slow. I wrote a small application to spit out the bit sequences for each byte from standard in, for a class I was teaching. Only 30 lines with good use of white-space and it seemed to take for ever. You can use fsc, similar to the fast-compiler in Flex, and that will speed it up considerably. Need to try that on something much bigger, though.

    The lack of a break statement struck me as an odd omission as well and I cannot figure out how I would intuitively get around it.

    By Blogger Me, at 10:03 AM  

Post a Comment

<< Home