Go Readability

If you haven’t seen them, do take a look at these slides on Go readability, from one of Google’s Go readability group.

Readability is an important process at Google. In theory, it’s about ensuring the style guide for a language is applied. In practice, it’s also about ensuring that idiomatic code is produced. This is highly language specific, and not something that can easily be done with tooling.

In the case of Go readability, it feels like a mentoring process over a series of code reviews (other languages take a more “big-bang” approach). The end result is that I have a better idea of not just how to write Go, but how we like Go to be written at Google. I really appreciate the strong emphasis on simplicity in Go code. Hopefully, that comes through in the slides.

Rescuing Data with F-Script

I have an old iMac. It’s not in a good state.

Unwell iMac

I don’t care that much, it’s old slow, and not a lot of use. The only remaining thing of value is the large collection of photos I have on there. I have an external drive, so it’s easy to copy them over.

All the photos are stored in iPhoto 5, which (from googling around) appears to be the version before they moved all the metadata into a big binary blob. And looking in the iPhoto Library folder revealed a file AlbumData.xml which contained details of all 12,000+ photos. Result!

Apart from one slight problem. I’d never used iPhoto’s “albums.” Well, I’d made one or two (mostly by accident). But for the most part, I’d left the photos in the import “rolls”, just renaming (e.g.) “Roll 42″ to “Trip to Wales”. Now the rolls are represented in AlbumData.xml, but without the roll names. Just “Roll 1″, “Roll 2″, etc. <facepalm/>

So like any good engineer, I’ll spend a huge amount of time to avoid having to re-enter 340 roll names, which I could have done in a couple of hours. This information is necessary for whatever system I’ll be pulling the photos into afterwards, dammit!

iPhoto obviously knows what the name of each roll is, or it wouldn’t be able to display them. They’re just stored in some different place (a binary blob, as it happens). Thankfully, I can still run iPhoto. So long as I avoid the middle of the screen, anyway. Now I forget exactly where I heard about it (Core Intuition, perhaps?) but I remembered that F-Script Anywhere allowed inspecting a running Cocoa program.

F-Script itself is a smalltalk based language, built on top of Objective-C and the Cocoa runtime. The tutorial includes a nice side-by-side comparison with Objective-C. It’s pretty simple for the most part. F-Script Anywhere is an add-on that allows you to inject the F-Script runtime into any running process. I tried it with iPhoto and immediately had a command line and browser with complete access to all of iPhoto. It helped having a minimal knowledge of Cocoa (I could find the app delegate!)

However, this wasn’t enough on its own. I didn’t know my way around all the classes inside iPhoto. To that end, I used class-dump to piece together the classes and their relations.

Eventually after a few hours1, I came up with this.

app := NSApplication sharedApplication
appCtrl := app delegate
albumMgr := appCtrl archiveDocument albumMgr

"Each roll (album) has an imageRec whose name is what we need."
rolls := albumMgr rollAlbums

"Pull out the mapping from pseudo-album name to roll name."
keys := rolls name
values := rolls imageRec name
rollNames := NSDictionary dictionaryWithObjects:values forKeys:keys

"Write the result to a plist."
rollNames writeToFile:'/tmp/roll_names.xml' atomically:YES

One very interesting feature of F-Script is the loop. Or lack thereof. Did you see the rolls name above? rolls is an array, which doesn’t respond to the names method. So F-Script has magic to say “send the names message to each element of the rolls array”. There’s more on this in the messaging patterns section of the language guide. But it feels rather nice.

The end result is that I do have all the data I need to move my photos into the next system. I can now safely dispose of the broken iMac, much to the relief of the room that’s been harboring it.

As an aside, this whole business of Albums vs Rolls very much feels like an example of technical debt in a system which was rushed. There was no obvious reason to have rolls and albums represented so differently. But instead, they were bolted on in a strange fashion.

1Actual time period is undefined.

Happy Fifth Anniversary, Go

So, Go is five years old. Looking back through version control, my first bit of Go was in May 2012. I’ve been using it as my preferred language for the last year or so. It still feels very pleasant and easy to work with. In no small part that’s due to the excellent tooling. The well written standard library also helps. Go is perhaps the first language where I’ve not had a burning desire to write a URL type because the builtin one is so awful (e.g. Python and Java). Inside Google, it’s been said that if you want to understand a piece of infrastructure, you should look at the Go implementation.

Looking back at that first Go code I wrote (a small tool of about 500 lines), it’s far from embarrassing. Certainly when compared to my efforts in other languages. The code has a few useful types, and methods that act upon them. It’s pretty readable (thanks to gofmt). The one bad thing is that I relied on panic far too heavily, instead of returning errors. The code ran happily for a year or so until the need for it was removed.

Anyway, if you’ve not already done so, try writing your next program in Go. You may be surprised at what a pleasant and productive experience it is.

Go Strings

I’ve been looking at Go recently. It’s a pleasant language, with few surprises. However, I wondered (as always) what the encoding of a string is supposed to be. For example:

  • Python 2 has two types: str, and unicode. Python 3 has sensibly renamed these to bytes and str, respectively.
  • Perl has a magic bit which gets set to state that the string contains characters as opposed to bytes (it’s called the UTF-8 bit, but it means characters).

So how does Go deal with characters in strings? Given that the authors of Go also invented UTF-8, we can hope it’s been thought about.

There are three types to think about.

byte[]

A slice of bytes.

string

A (possibly empty) sequence of bytes. Strings are immutable.

rune

A single unicode code point. Produced by characters in single quotes.

There’s no explicit encoding in the above. Nonetheless, there’s an implicit preference for UTF-8:

But this doesn’t help the common case:

package main

import "fmt"

func main() {
  s := "café"
  fmt.Printf("%q has length %d\n", s, len(s))
}

// "café" has length 5

The unicode/utf8 package can do what’s needed though. This provides functions for, amongst other things, picking runes out of strings.

package main

import (
  "fmt"
  "unicode/utf8"
)

func main() {
  s := "café"
  fmt.Printf("%q has length %d\n", s, utf8.RuneCountInString((s)))
}

// "café" has length 4

This is very Go-like. The default is somewhat low-level, but the types and libraries build on top of it. For example, text/scanner provides a nice way of iterating over runes in a UTF-8 input stream.

On a whim, I took a look at the internals of utf8.RuneCountInString(). It’s deceptively simple.

func RuneCountInString(s string) (n int) {
  for _ = range s {
    n++
  }
  return
}

This relies on the spec defining how a string interacts with a for loop: it’s defined as iterating over the UTF-8 codepoints (or runes).

jslint4java 2.0.2

After another long period of having nothing to say, I’ve pushed out an update to jslint4java. Version 2.0.2 doesn’t include any major new features, but does update JSLint to the latest version (2012-02-03) and fix a couple of small bugs here and there.

  • issue 75: Handle BOMs when using the CLI.
  • issue 74: Document the technique for construction of JSLint objects.
  • issue 73: Better examples for maven configuration.
  • issue 72 : Add all formatters to the maven plugin automatically.
  • issue 67 : Fix maven docs.
  • Update to JSLint 2012-02-03.
    • This removes the adsafe, confusion and safe options.
    • This adds the anon option.

One minor point: Now that google code supports git repositories, I’m also pushing the source code there again. Github is still my “primary” but there’s another copy. More copies are good.

I did spend a bit of time testing the CLI interface properly. This isn’t really noteworthy, but it was entertaining for me, and hopefully results in fewer bugs like issue 75.

jslint4java 2.0.0

I’ve finally released jslint4java 2.0.0. It’s now available at code.google.com/p/jslint4java. The main new feature is that it now sports a maven plugin in addition to the ant task.

There is also a breaking change, that’s been inherited from JSLint. The meaning of several options has been inverted. Now, the default is to behave strictly, with options turned off. For example, if you want to turn off JSLint’s checking of whitespace, you now have to specify --white. Previously, this would enable checking of whitespace. See the release notes for details, and please take care when updating.

The maven plugin should behave much like any other maven plugin: you add it to your <build><plugins> section. Here’s an example:

<plugin>
  <groupId>com.googlecode.jslint4java</groupId>
  <artifactId>jslint4java-maven-plugin</artifactId>
  <version>2.0.0</version>
  <executions>
    <execution>
      <id>lint</id>
      <phase>process-resources</phase>
      <goals>
        <goal>lint</goal>
      </goals>
      <configuration>
        <failOnError>true</failOnError>
        <options>
          <undef>true</undef>
        </options>
      </configuration>
    </execution>
  </executions>
</plugin>

I’d love feedback on how well this works.

Python configuration

At $WORK, there is a program that uses Python as its configuration. Leaving aside the moment of whether or not this is a good idea, I wanted to look at how it does this.

All the program really needs is a dictionary of configuration items. But you can take advantage of it being Python to reduce duplication, generate some parts and so on.

# A much simplified example.
 
name = 'bob'
 
project = {
  'name': name,
  'branch': name + '_release_branch',
  'packages': [
    name + '_frontend',
    name + '_backend',
    name + '_middleend',
  ],
}

How do you read this configuration file, without it having any untowards effects on your program? Python has the execfile builtin to do just this.

scope = {}
execfile('bob.conf', scope)
return scope.get('project', {})

Where it gets really interesting is when there are similar configs that want to share amongst themselves; you have to start importing. Ideally, you’d like to be able to import from the same directory, so as to keep configuration together. This leads to something like:

conf_file = '/some/where/bob.conf'
oldsyspath = sys.path
try:
  sys.path = [os.path.dirname(conf_file)] + sys.path
  scope = {}
  execfile('bob.conf', scope)
  return scope.get('project', {})
finally:
  sys.path = oldsyspath

Of course, this leads to pollution. If bob.conf imports shared.py, a permanent record is kept in sys.modules. So, if another .conf imports shared.py, you’d not load it from disk again; it would refer to the already imported file.

Which is probably OK, unless you’re dealing with different directories full of configuration. Then, import shared may refer to different modules. Yes, this is messy. Yes, this is exactly what I was working on today. :)

Now, we need to throw away any imports that are done by the config file. Thankfully this is fairly easy.

conf_file = '/some/where/bob.conf'
oldsyspath = sys.path
oldsysmodules = set(sys.modules)
try:
  sys.path = [os.path.dirname(conf_file)] + sys.path
  scope = {}
  execfile('bob.conf', scope)
  return scope.get('project', {})
finally:
  sys.path = oldsyspath
  for name in set(sys.modules) - oldsysmodules:
    del sys.modules[name]

Phew! Now, I can read in all my configuration files from all over the system.

It’s not the end though. It turned out that some of the configuration files did silly things with stdin, so we had to capture stdin, redirect to /dev/null and restore it after the execfile().

Discussing with colleagues also revealed that the technique of cleaning up sys.modules could potentially cause trouble with modules that load .so files by not giving them a chance to clean up. The suggested workaround was to use the multiprocessing module to load the configuration in a separate process each time. Thankfully, none of the configuration files in this system were affected by this.

Nonetheless, by this point, I can now read in all configuration files, and write out a big list of them as a pickle file. Which lets me do some interesting analyses.

I guess the moral of this tale is that if you allow users access to a full programming language, they will use it! The system that this originated in has several thousand configuration files, dating back up to five years. There are a number of oddities lurking inside.

django & appengine

Last night I went to j4amie‘s brightonpy talk Python and Django for PHP Refugees (slides). It was a really good talk, though I knew most of the Python stuff. The django intro was great however.

What I was really interested in was using Django together with appengine. I’ve used appengine before with the builtin webapp framework. Whilst it’s good, it’s simplistic and I found myself building layers on top quickly.

Looking through the docs, the first thing I see is Running Django on Google App Engine. But this says that the builtin django is obsolete and I should be using django-nonrel. There is further documentation on this, Running Pure Django Projects on Google App Engine. This approach is interesting. It’s encouraging you to not be appengine specific, the way that you are with webapp’s default setup.

django-nonrel is made up of several components; you should start by looking at djangoappengine. You’ll need to download all five components.

You’ll also need the appengine SDK in case you don’t have it.

Once you’ve downloaded everything, import the necessary bits into a project you made with the appengine SDK.

% pwd
/Users/dom/work
% cp -r $APPENGINE_SDK/new_project_template hellodjango
% cd hellodjango
% mv ~/Downloads/wkornewald-django-nonrel-c73e6ca3843d/django .
% mv ~/Downloads/wkornewald-djangotoolbox-f79fecb60e6d/djangotoolbox .
% mv ~/Downloads/wkornewald-django-dbindexer-48589f5faad4/dbindexer . 
% mv ~/Downloads/wkornewald-djangoappengine-f9175cf4c8bd djangoappengine
% ls -l
total 24
-rwxr-x---@  1 dom  5000   106 13 Apr 12:09 app.yaml*
drwxr-xr-x@ 12 dom  5000   408 13 Apr 12:43 dbindexer/
drwxr-xr-x@ 18 dom  5000   612 13 Apr 12:33 django/
drwxr-xr-x@ 23 dom  5000   782 13 Apr 12:43 djangoappengine/
drwxr-xr-x@ 15 dom  5000   510 13 Apr 12:43 djangotoolbox/
-rwxr-x---   1 dom  5000   472 24 Mar 23:38 index.yaml*
-rwxr-x---   1 dom  5000  1002 24 Mar 23:38 main.py*

You’ll have to bundle all of this with your app. You may want to delete some bits of django/contrib that you don’t use.

Now, how to get started with my app? I’ll need to create a django project. Normally I use the installed django-admin.py. In this case, I’d like to use the version I’ve imported to my project.

% PYTHONPATH=. django/bin/django-admin.py 
Usage: django-admin.py subcommand [options] [args]% PYTHONPATH=. django/bin/django-admin.py startproject hellodjango
% mv hellodjango/* .
%

So now how do I hook that up to app.yaml? There’s no documentation, but there is a test app. And that contains the magic snippet:

handlers:
- url: /.*
  script: djangoappengine/main/main.py

Now, how do I run this? The appengine launcher I’m using has a “play” button. My first attempt broke, because I’d made the app in the hellodjango directory, the settings contained a reference to hellodjango.urls, which should be just urls. With that fixed, I get an “It worked!” page. Result!

The dev_appserver.py approach (aka the play button) worked for me, but the djangoappengine docs say to use ./manage.py runserver, so I’ll do that.

Now, I have an empty app. Let’s add in a minimal hello world view. First, I create views.py

from django.http import HttpResponse
 
def home(request):
  return HttpResponse('<h1>Hello World</h1>')

And then adjust urls.py to point to it.

from django.conf.urls.defaults import patterns, include, url
 
import views
 
urlpatterns = patterns('',
  url(r'^$', views.home, name='home'),
)

I now see the Hello World! displayed in my browser. I’d like to get a nice template working. I’ll update my views to look like this:

from django.shortcuts import render
 
def home(request):
  return render(request, 'home.html')

templates/home.html is as you would expect.

<h1>Hello World!</h1>

The final piece of the puzzle: how does django know where to find the template? In settings.py, there’s a TEMPLATE_DIRS setting.

TEMPLATE_DIRS = (
  os.path.join(os.path.dirname(__file__), 'templates'),
)

At this point, you’re using regular django, and should be able to use the regular docs to carry on. Although, please read the list of djangoappengine caveats.

OSGI Intro

On Tuesday, I attended the OSGI: Let’s Get Started session with Simon Maple and Zoë Slattery, courtesy of SkillsMatter and LJC. I figured it’s time to figure out what I am supposed to be doing with it. :)

For the last release I enabled OSGI headers for jslint4java. I was hoping that this session would show me how I fared in that.

First, what is OSGI? At the most basic, it’s a way of providing some order and structure to the traditional Java classpath. OSGI achieves this by using bundles.

A bundle is a regular jar file, but with additional metadata in META-INF/MANIFEST.MF. Details like the name, version and dependencies. The dependencies are interesting. A bundle can depend directly on other bundles, but that’s discouraged. A better approach is to specify that you depend on java packages. That way you don’t have to tie yourself to a particular provider of a package.

When OSGI loads in a bundle, it gives each bundle a unique ClassLoader. This means that:

  • You can have multiple versions of bundles loaded simultaneously. You don’t have to force everything to the same version.

  • Each bundle can only see classes that have been explicitly exported by its dependencies, not the whole transitive closure. This is very good for keeping your code clean.

    This also leads to a pattern I’ve seen before in the maven world: separate artifacts for APIs vs implementation. Pulling out interfaces is generally a good idea. But by putting them in a separate OSGI bundle, you enforce that your implementation can remain invisible. Even the “hello OSGI world” demo was shown this way.

On top of this metadata, OSGI provides a runtime for loading and unloading bundles. The runtime also supports the concept of services, where you can ask the runtime for various services. This looks cool, but the dynamicity of it can be hard to deal with—that service you got from the runtime can disappear at any point. There was a demo of something called blueprint, which aims to help, but it looked almost exactly like “more Spring XML” to me. If I was doing this, I’d look at peaberry instead.

How do you go about getting started with OSGI? Well, you could manage the bundle metadata yourself, but it’s much easier to use a tool to do it for you. One such tool was demo’d: bnd. The maven-bundle-plugin that I used for jslint4java builds on bnd.

If you need a runtime for your app, there are two in common use: Equinox and Felix. Equinox is the runtime used by Eclipse.

For followup detail, they recommended checking out anything by Neil Bartlett. It’s a shame he couldn’t make it.

Overall I was pretty impressed. It made me realise that I got the basics right, and I know where I need to go when I need more. Thanks, guys!

Having written all this, I’ve just realised the the wikipedia page on OSGI demonstrates nearly all of it, and with examples.