Jason Sando

Wednesday, April 13, 2011

First steps in Scala


I've followed Scala for the last couple years, watching other engineers' excitement and confusion, looking for some opportunity to try it myself.

I've worked in assembler, C, C++, C#, HTML, JavaScript, VB, Fortran, RPG, etc ... a pretty eclectic mix. Recently I did some work in Vala, a new experimental language based on C# but compiling to native code using no VM or garbage collection, just reference counting. I say this just for context, that I'm used to learning new languages and platforms. Although most of them are C-syntax based, so that probably colors my expectations a bit.

Scala compiles to Java bytecode and runs on any JVM. I've been working in Java since 1996, around Java 1.0.2. I'm currently using Java in several embedded devices, and although the benefits of having the whole JRE is fantastic, in terms of saving development time, the cost is a bloated runtime and slow startup times. I'm constantly nitpicking how many jars our projects depend on, analyzing which libraries are doing lots of classloading or garbage collection, etc. I found for example that the standard Java XML parsers create lots of garbage, and that JAXB generates thousands of classes at runtime ... which end up as garbage.

In order to use Scala in this type of environment, I'd need to add its 6Mb jar file. That's one strike against Scala already. However its easy enough to stomach the 6Mb on the server-side, where we already put up with a plethora of bloatware like Hibernate. Bloatware sucks, but the server has enough RAM to handle it, so the productivity gains outweigh the cost. Assuming there is a productivity gain.

I set out to write a trivial program in Scala, just to break the ice. I decided a simple Number Guessing Game would do. Here's the final program, and a summary of my notes:

object NumberGuess {

    def main(args: Array[String]) {

        val number = new java.util.Random().nextInt(100) + 1
        printf("I'm thinking of a number between 1 and 100, can you guess it?\n");

        def playGame: Boolean = {
            for (i <- 0 until 5) {
                printf("Guess %d - enter a number: ", i + 1)
                val guess = readLine().toInt

                if (guess == number) {
                    return true
                } else if (guess < number) {
                    printf("Too low!\n")
                } else {
                    printf("Too high!\n")
                }
            }
            return false
        }

        if (playGame)
            printf("You guessed it! My number was %d\n\n", number)
        else
            printf ("Sorry, better luck next time!\n");
    }
}


I think this is more complicated than it needed to be, as a result of Scala not having a 'break' keyword. That's right, Scala has no 'break' or 'continue' within a 'for' or 'while' loop. The reason given is that this is allegedly bad form, leading to difficult to read code. The 'workaround' is to pull the 'for' loop into a function, and then 'return' from it where you wanted to break.

Wait ... what? After spending 30 years trying to refactor code to have a single exit point, now the only way to make this work is to return from the middle of a function, because 'break' is bad?

I thought the lack of 'break' would be mentioned in one of the several references I was using, but I didn't find it (Programming in Scala, Scala by Example, and the Scala Language Reference).

In Scala 2.8 they now have an extra import I could do, which would allow me to wrap the loop in a breakable {} block and then call 'break', which essentially just throws an exception. For performance critical code, that to me does not sound practical at all. Performance bottlenecks in Java are generally tied to object creation / garbage collection, and that's what throwing exceptions does.

I found by looking at sample code (the code for the Scala compiler, which is included in the binary distribution) that there are some implicit functions defined. These include printf, println, readLine, and others. Looking at the source for these in Predef.scala, I see they just call Console.printf, Console.println, and so on. However, I don't see Console.flush(). Seems maybe an oversight? I've had more than one program not have its output showing up at the right time due to not flushing immediately. To clarify the point, its that this is inconsistent. Of course we can call Console.flush(), but then why have all these predefined functions, why not just pre-import Console and then its all more explicit and consistent.

In the code above you can see I call printf() and then readLine(). Why not just call readLine() with a printf spec? Because that threw a funky Scala exception:

java.util.IllegalFormatConversionException: d != scala.collection.mutable.WrappedArray$ofRef

What's worse, the line number in the exception was the line number of the for() loop, not the line number of the call to readLine(). Very bad!

The Scala compiler is VERY SLOW compared to javac. Compiler speed counts. Ask any Flex developer if they're happy with mxmlc's compile speed ;)

One thing I'm confused about is why readLine() has optional empty parenthesis after it (you can safely omit them and it compiles and runs fine). BUT, the call to toInt does not allow parenthesis. Coming from a C syntax background (C, Java, C#, etc), I'm used to the parens being there to differentiate between a function call versus a field access.

Final little nitpick ... the Scala sources use an indent of 2. I did that way back in the day, oh, about 15 years ago. Pretty much everything settled on an indent of 4 in the last 15 years. I remember I used to like 2 because of the density of code, but now it looks too compact to me. Like, too easy to misread it. We'll see if I can get used to that again.

That last point reminds me how much I would really like to have a code formatting IDE that doesn't allow alternate formatting, sort of like the old VB tokenizer used to do. You'd type in some text, hit 'enter', and VB would reformat it all to the 'official' format. Developers spend too much time doing a piss-poor job formatting their code, if the IDE could really just take care of that it would be for the best.

Wednesday, June 02, 2010

There's a transcript of the Steve Jobs' interface from D8 available at http://d8.allthingsd.com/20100601/steve-jobs-session/

Its interesting that Steve says “… Apple TV is a hobby,” because “the television industry fundamentally has a subsidized business model that gives everyone a set-top box.” He goes on to say this is stifling innovation, ie because there’s no “go-to-market strategy.”

I know, from working in digital signage for the Casino Gaming industry, that there is a tremendous amount of innovation, across many competing vendors. The Apple TV is a terrific, polished product, in a sexy little box, for an incredibly low retail fee. There are dozens of 3rd-party boxes now available. I’m always on the lookout for interesting ARM-based products for use in industrial environments, but even in the X86 space I find for example the NVidia ION products very interesting. Zotac, Jetway, Viewsonic (?!) all have interesting products. I’ve had a much more difficult time getting AMD or Intel video hardware acceleration working under Linux, whereas NVidia works well, decoding and scaling 1080p streams while compositing in OSD.

These are all barebones-systems for under $300. All are released within the last 6-12 months. So, there is clearly hardware innovation, and this must be driven by something.

I’m guessing that “something” is the hobbyist HTPC (Home Theatre PC) market. There are a plethora of HTPC software projects. MythTV is one of my personal favorites, I’ve used a spin off called “MiniMyth” for several years. MiniMyth is a custom BusyBox Linux build, with MythTV, optimized for VIA MiniITX boards.

But, as I’ve seen at my own house, the HTPC is an add-on product. MiniMyth can’t tune directly into my digital cable service (Cox Communications), so I have to use a capture card to capture the video output from a cable box, transcode and rebroadcast within my house – creating lag especially when changing channels. MiniMyth can’t interact with Cox Communications VOD system. Its basically a hobbyist system, ie for someone who is willing to put up with all the extra pieces (and inherent instability!) this solution creates. The AppleTV works with iTunes and can do some great stuff, like making it super easy to listen to your music on your entertainment system or watch your pictures or movies … but not via live tv over your cable system.

It seems like the real problem in this space is not that cable boxes are “subsidized”, but that the innovative HTPC market can’t plug directly into the cable system. Ie, it seems more like a problem of there needing to be an open standard for digital cable tv services.

But, the trajectory of the home media system market now looks like it’s a matter of bandwidth. Once I can tune my HTPC directly into abc.com to watch Lost re-runs, what do I need the cable box for? I can sense many people nodding their heads … but consider this: right now with the cable box, I have a guaranteed service level across every channel. If any of those channels isn’t rendering correctly, I can call Cox. If consumers have to tune into every TV station via their website, and one works and one doesn’t, and the recourse is to email admin@somesite.com and hope for a reply … well that won’t be much fun.

This last is a perfect argument for services that aggregate radio, tv, and movie content. Sites like NetFlix or Hulu.

Hmm. Time to go make an investment?

Labels: , , ,

Sunday, February 10, 2008

Java 7 Language Enhancements

There's a lot of discussion about closures in Java 7, and there are quite a few well-thought objections to them as well (see http://weblogs.java.net/blog/kirillcool/archive/2008/02/evolving_the_la.html).

I think Ruby, Groovy, Python, and Scala have a lot of the new and interesting features already, and I'd like to see the IDE's catch up with support for these languages, rather than start to drain resources implementing "me too" features in Java.

And while I'm on the subject of alternate languages ... the above list all target the JVM, and some other environments as well:
  • Ruby has JRuby, MRI (Native) Ruby, and now IronRuby (.net DLR)
  • Python has Jython and the native python
  • Scala and Groovy are both JVM-only. Groovy (and probably Scala) can apparently be run on the .Net CLR using ikvm.
I've read books, looked at lots of code, and tried my hand at writing code in the above languages, and here's my take:
  • Ruby has a nice syntax and some outstanding frameworks (like Rails), and the JRuby implementation is terrific! However it may be a little alien to Java/C# developers. I think it should be learned however as an alternate, scripting language that can now be used on almost any platform.
  • Python is easy to learn and has terrific libraries. However, the syntax again will be a little strange to someone coming from a C-style language. It has heavy reliance on native libraries making it harder to port and run under jython, which is lagging behind the main native interpreter. If you're going to learn one other language besides Java/C#, I would make it Ruby at this point. But if you use software written in Python (mailman, trac), it is well worth the learning investment.
  • Scala ... has great features on paper. I get so excited reading all the articles about them all. But then I look at the code and ... I'm not really sure what I'm looking at. It is VERY foreign for C-style afficionados.
  • Groovy gets you many of the features of all of these languages, with a C-style syntax ... so very usable for that crowd (and me). Plus it features almost total compatibility with Java, so you can copy/paste Java code and then line-by-line 'upgrade' to less verbose Groovy code. Also it is designed for the Java platform (as is Scala), so leverages the vast Java runtime API's in a natural way (versus ruby and python that are ... 'bolted on' to the jvm).
I'm very surprised by Groovy, as I had written it off as a toy over a year ago. The binary packaging is very good.

I would rather see effort invested in the Groovy parser, and Groovy's code generation (and maybe even a new name for it :) than the same features be forced into Java 7.

Labels:

Friday, February 01, 2008

Making WSDL more ... relaxing.

I have a new proof-of-concept project over at google-code to try to take the grunt work out of creating standards-compliant WSDL. The project is at http://code.google.com/p/relax-ws/

I started with RelaxNG Compact schema, as I feel it is important to use the XSD type library for maximum compatability (rather than making a DSL in some other language). The remaining syntax came out of a desire to have a consistent syntax.

It is a proof of concept, but we've already been using it at work to begin new WSDL's for some of our .Net development (therefore I know it works with wsdl.exe). To me, success is measured by when I don't use the tool for a week, and can then launch a text editor and just start typing. There's no way I could ever do that with WSDL!

Let me know what you think!

Labels:

Sunday, October 22, 2006

Jetty 5.1 and 6.0 Performance on OS/400

At work we ran into some questions about Jetty (v5.1.6) and performance. We had a servlet that was building a large-ish SOAP response of about 400k as a String, and then using:

response.getWriter().write (result);

to send the entire 400k String back to our vb.net client. We assumed that modifying the code to write to the output a little at a time would actually improve performance, since Jetty would be able to send packets while our sevlet was waiting on local database resources to build the rest of the request. But that's not what happened.

I should mention the relevant details:
  • Jetty 5.1.6
  • OS/400 v5r2
  • JDK 1.4.2.
So we expected better performance, but instead it dragged to a crawl. A request that had been taking 2.5 seconds was now taking 30 seconds.

In response, my colleague Kris Jacobson wrapped the response writer in a BufferedWriter, with the default size. I haven't checked on 1.4.2, but on 1.5 the default is 8k (8192 bytes). And, voila! Performance came back to where it was before we rewrote the code (about 2.5 seconds).

So, I wrote a quick servlet, and an HttpClient console application to hit it, and test various combinations of buffer sizes and content lengths. And the results rather surprise me.

In fact, Jetty 5.1.6 already has a default buffer size of 8192. At least on Windows and Linux under JDK 1.5 and 1.6rc2. I tested on my local 100Mb network from my Fedora Core 5 machine hosting both Jetty 5.1.6 and Jetty 6.0.1, using the command line on my Windows XP box using JDK 1.6rc2. Results were nearly identical for both Jetty releases -- about 1.4 seconds to transfer 16 Megabytes. And that's with absolutely no change to the default buffer size, and without wrapping the response.getWriter() in another BufferedWriter.

So, methinks something is wonky on OS/400's JVM. I'll have to deploy my test servlet on the '400 and recheck everything.

I did learn through this that you don't need to add more buffering to what the Servlet container already provides. You should verify what the defaults are, and be aware of these calls:

HttpServletResponse.getBufferSize()
HttpServletResponse.setBufferSize(int)

Just because you call setBufferSize() doesn't mean the servlet container will set the buffer to exactly that size - it has some discretion about what makes sense.

We also ran into some problems with Keep-Alive connections from our VB.Net client to Jetty on the '400 that I need to research.