Go Strings

I’ve been looking at Go recently. It’s a pleasant language, with few surprises. However, I wondered (as always) what the encoding of a string is supposed to be. For example:

  • Python 2 has two types: str, and unicode. Python 3 has sensibly renamed these to bytes and str, respectively.
  • Perl has a magic bit which gets set to state that the string contains characters as opposed to bytes (it’s called the UTF-8 bit, but it means characters).

So how does Go deal with characters in strings? Given that the authors of Go also invented UTF-8, we can hope it’s been thought about.

There are three types to think about.

byte[]

A slice of bytes.

string

A (possibly empty) sequence of bytes. Strings are immutable.

rune

A single unicode code point. Produced by characters in single quotes.

There’s no explicit encoding in the above. Nonetheless, there’s an implicit preference for UTF-8:

But this doesn’t help the common case:

package main

import "fmt"

func main() {
  s := "café"
  fmt.Printf("%q has length %d\n", s, len(s))
}

// "café" has length 5

The unicode/utf8 package can do what’s needed though. This provides functions for, amongst other things, picking runes out of strings.

package main

import (
  "fmt"
  "unicode/utf8"
)

func main() {
  s := "café"
  fmt.Printf("%q has length %d\n", s, utf8.RuneCountInString((s)))
}

// "café" has length 4

This is very Go-like. The default is somewhat low-level, but the types and libraries build on top of it. For example, text/scanner provides a nice way of iterating over runes in a UTF-8 input stream.

On a whim, I took a look at the internals of utf8.RuneCountInString(). It’s deceptively simple.

func RuneCountInString(s string) (n int) {
  for _ = range s {
    n++
  }
  return
}

This relies on the spec defining how a string interacts with a for loop: it’s defined as iterating over the UTF-8 codepoints (or runes).

jslint4java 2.0.2

After another long period of having nothing to say, I’ve pushed out an update to jslint4java. Version 2.0.2 doesn’t include any major new features, but does update JSLint to the latest version (2012-02-03) and fix a couple of small bugs here and there.

  • issue 75: Handle BOMs when using the CLI.
  • issue 74: Document the technique for construction of JSLint objects.
  • issue 73: Better examples for maven configuration.
  • issue 72 : Add all formatters to the maven plugin automatically.
  • issue 67 : Fix maven docs.
  • Update to JSLint 2012-02-03.
    • This removes the adsafe, confusion and safe options.
    • This adds the anon option.

One minor point: Now that google code supports git repositories, I’m also pushing the source code there again. Github is still my “primary” but there’s another copy. More copies are good.

I did spend a bit of time testing the CLI interface properly. This isn’t really noteworthy, but it was entertaining for me, and hopefully results in fewer bugs like issue 75.

jslint4java 2.0.0

I’ve finally released jslint4java 2.0.0. It’s now available at code.google.com/p/jslint4java. The main new feature is that it now sports a maven plugin in addition to the ant task.

There is also a breaking change, that’s been inherited from JSLint. The meaning of several options has been inverted. Now, the default is to behave strictly, with options turned off. For example, if you want to turn off JSLint’s checking of whitespace, you now have to specify --white. Previously, this would enable checking of whitespace. See the release notes for details, and please take care when updating.

The maven plugin should behave much like any other maven plugin: you add it to your <build><plugins> section. Here’s an example:

<plugin>
  <groupId>com.googlecode.jslint4java</groupId>
  <artifactId>jslint4java-maven-plugin</artifactId>
  <version>2.0.0</version>
  <executions>
    <execution>
      <id>lint</id>
      <phase>process-resources</phase>
      <goals>
        <goal>lint</goal>
      </goals>
      <configuration>
        <failOnError>true</failOnError>
        <options>
          <undef>true</undef>
        </options>
      </configuration>
    </execution>
  </executions>
</plugin>

I’d love feedback on how well this works.

Python configuration

At $WORK, there is a program that uses Python as its configuration. Leaving aside the moment of whether or not this is a good idea, I wanted to look at how it does this.

All the program really needs is a dictionary of configuration items. But you can take advantage of it being Python to reduce duplication, generate some parts and so on.

# A much simplified example.
 
name = 'bob'
 
project = {
  'name': name,
  'branch': name + '_release_branch',
  'packages': [
    name + '_frontend',
    name + '_backend',
    name + '_middleend',
  ],
}

How do you read this configuration file, without it having any untowards effects on your program? Python has the execfile builtin to do just this.

scope = {}
execfile('bob.conf', scope)
return scope.get('project', {})

Where it gets really interesting is when there are similar configs that want to share amongst themselves; you have to start importing. Ideally, you’d like to be able to import from the same directory, so as to keep configuration together. This leads to something like:

conf_file = '/some/where/bob.conf'
oldsyspath = sys.path
try:
  sys.path = [os.path.dirname(conf_file)] + sys.path
  scope = {}
  execfile('bob.conf', scope)
  return scope.get('project', {})
finally:
  sys.path = oldsyspath

Of course, this leads to pollution. If bob.conf imports shared.py, a permanent record is kept in sys.modules. So, if another .conf imports shared.py, you’d not load it from disk again; it would refer to the already imported file.

Which is probably OK, unless you’re dealing with different directories full of configuration. Then, import shared may refer to different modules. Yes, this is messy. Yes, this is exactly what I was working on today. :)

Now, we need to throw away any imports that are done by the config file. Thankfully this is fairly easy.

conf_file = '/some/where/bob.conf'
oldsyspath = sys.path
oldsysmodules = set(sys.modules)
try:
  sys.path = [os.path.dirname(conf_file)] + sys.path
  scope = {}
  execfile('bob.conf', scope)
  return scope.get('project', {})
finally:
  sys.path = oldsyspath
  for name in set(sys.modules) - oldsysmodules:
    del sys.modules[name]

Phew! Now, I can read in all my configuration files from all over the system.

It’s not the end though. It turned out that some of the configuration files did silly things with stdin, so we had to capture stdin, redirect to /dev/null and restore it after the execfile().

Discussing with colleagues also revealed that the technique of cleaning up sys.modules could potentially cause trouble with modules that load .so files by not giving them a chance to clean up. The suggested workaround was to use the multiprocessing module to load the configuration in a separate process each time. Thankfully, none of the configuration files in this system were affected by this.

Nonetheless, by this point, I can now read in all configuration files, and write out a big list of them as a pickle file. Which lets me do some interesting analyses.

I guess the moral of this tale is that if you allow users access to a full programming language, they will use it! The system that this originated in has several thousand configuration files, dating back up to five years. There are a number of oddities lurking inside.

django & appengine

Last night I went to j4amie‘s brightonpy talk Python and Django for PHP Refugees (slides). It was a really good talk, though I knew most of the Python stuff. The django intro was great however.

What I was really interested in was using Django together with appengine. I’ve used appengine before with the builtin webapp framework. Whilst it’s good, it’s simplistic and I found myself building layers on top quickly.

Looking through the docs, the first thing I see is Running Django on Google App Engine. But this says that the builtin django is obsolete and I should be using django-nonrel. There is further documentation on this, Running Pure Django Projects on Google App Engine. This approach is interesting. It’s encouraging you to not be appengine specific, the way that you are with webapp’s default setup.

django-nonrel is made up of several components; you should start by looking at djangoappengine. You’ll need to download all five components.

You’ll also need the appengine SDK in case you don’t have it.

Once you’ve downloaded everything, import the necessary bits into a project you made with the appengine SDK.

% pwd
/Users/dom/work
% cp -r $APPENGINE_SDK/new_project_template hellodjango
% cd hellodjango
% mv ~/Downloads/wkornewald-django-nonrel-c73e6ca3843d/django .
% mv ~/Downloads/wkornewald-djangotoolbox-f79fecb60e6d/djangotoolbox .
% mv ~/Downloads/wkornewald-django-dbindexer-48589f5faad4/dbindexer . 
% mv ~/Downloads/wkornewald-djangoappengine-f9175cf4c8bd djangoappengine
% ls -l
total 24
-rwxr-x---@  1 dom  5000   106 13 Apr 12:09 app.yaml*
drwxr-xr-x@ 12 dom  5000   408 13 Apr 12:43 dbindexer/
drwxr-xr-x@ 18 dom  5000   612 13 Apr 12:33 django/
drwxr-xr-x@ 23 dom  5000   782 13 Apr 12:43 djangoappengine/
drwxr-xr-x@ 15 dom  5000   510 13 Apr 12:43 djangotoolbox/
-rwxr-x---   1 dom  5000   472 24 Mar 23:38 index.yaml*
-rwxr-x---   1 dom  5000  1002 24 Mar 23:38 main.py*

You’ll have to bundle all of this with your app. You may want to delete some bits of django/contrib that you don’t use.

Now, how to get started with my app? I’ll need to create a django project. Normally I use the installed django-admin.py. In this case, I’d like to use the version I’ve imported to my project.

% PYTHONPATH=. django/bin/django-admin.py 
Usage: django-admin.py subcommand [options] [args]% PYTHONPATH=. django/bin/django-admin.py startproject hellodjango
% mv hellodjango/* .
%

So now how do I hook that up to app.yaml? There’s no documentation, but there is a test app. And that contains the magic snippet:

handlers:
- url: /.*
  script: djangoappengine/main/main.py

Now, how do I run this? The appengine launcher I’m using has a “play” button. My first attempt broke, because I’d made the app in the hellodjango directory, the settings contained a reference to hellodjango.urls, which should be just urls. With that fixed, I get an “It worked!” page. Result!

The dev_appserver.py approach (aka the play button) worked for me, but the djangoappengine docs say to use ./manage.py runserver, so I’ll do that.

Now, I have an empty app. Let’s add in a minimal hello world view. First, I create views.py

from django.http import HttpResponse
 
def home(request):
  return HttpResponse('<h1>Hello World</h1>')

And then adjust urls.py to point to it.

from django.conf.urls.defaults import patterns, include, url
 
import views
 
urlpatterns = patterns('',
  url(r'^$', views.home, name='home'),
)

I now see the Hello World! displayed in my browser. I’d like to get a nice template working. I’ll update my views to look like this:

from django.shortcuts import render
 
def home(request):
  return render(request, 'home.html')

templates/home.html is as you would expect.

<h1>Hello World!</h1>

The final piece of the puzzle: how does django know where to find the template? In settings.py, there’s a TEMPLATE_DIRS setting.

TEMPLATE_DIRS = (
  os.path.join(os.path.dirname(__file__), 'templates'),
)

At this point, you’re using regular django, and should be able to use the regular docs to carry on. Although, please read the list of djangoappengine caveats.

OSGI Intro

On Tuesday, I attended the OSGI: Let’s Get Started session with Simon Maple and Zoë Slattery, courtesy of SkillsMatter and LJC. I figured it’s time to figure out what I am supposed to be doing with it. :)

For the last release I enabled OSGI headers for jslint4java. I was hoping that this session would show me how I fared in that.

First, what is OSGI? At the most basic, it’s a way of providing some order and structure to the traditional Java classpath. OSGI achieves this by using bundles.

A bundle is a regular jar file, but with additional metadata in META-INF/MANIFEST.MF. Details like the name, version and dependencies. The dependencies are interesting. A bundle can depend directly on other bundles, but that’s discouraged. A better approach is to specify that you depend on java packages. That way you don’t have to tie yourself to a particular provider of a package.

When OSGI loads in a bundle, it gives each bundle a unique ClassLoader. This means that:

  • You can have multiple versions of bundles loaded simultaneously. You don’t have to force everything to the same version.

  • Each bundle can only see classes that have been explicitly exported by its dependencies, not the whole transitive closure. This is very good for keeping your code clean.

    This also leads to a pattern I’ve seen before in the maven world: separate artifacts for APIs vs implementation. Pulling out interfaces is generally a good idea. But by putting them in a separate OSGI bundle, you enforce that your implementation can remain invisible. Even the “hello OSGI world” demo was shown this way.

On top of this metadata, OSGI provides a runtime for loading and unloading bundles. The runtime also supports the concept of services, where you can ask the runtime for various services. This looks cool, but the dynamicity of it can be hard to deal with—that service you got from the runtime can disappear at any point. There was a demo of something called blueprint, which aims to help, but it looked almost exactly like “more Spring XML” to me. If I was doing this, I’d look at peaberry instead.

How do you go about getting started with OSGI? Well, you could manage the bundle metadata yourself, but it’s much easier to use a tool to do it for you. One such tool was demo’d: bnd. The maven-bundle-plugin that I used for jslint4java builds on bnd.

If you need a runtime for your app, there are two in common use: Equinox and Felix. Equinox is the runtime used by Eclipse.

For followup detail, they recommended checking out anything by Neil Bartlett. It’s a shame he couldn’t make it.

Overall I was pretty impressed. It made me realise that I got the basics right, and I know where I need to go when I need more. Thanks, guys!

Having written all this, I’ve just realised the the wikipedia page on OSGI demonstrates nearly all of it, and with examples.

jslint4java-1.4.7

I’ve released a minor update, 1.4.7. It’s available from the usual place.

What’s new?

  • Added OSGI bundle headers.
    • I’m an OSGI novice; please let me know if these are wrong.
  • issue 52: Add checkstyle xml formatter.
  • issue 53: No files passed to the ant task is no longer an error (just an info message).
  • Update to JSLint 2011-03-07.
    • This adds the continue option, whilst removing eqeqeq, immed and laxbreak options.
    • JSLints interpretation of line and column numbers has changed. I’ve tried to keep up. Please file a bug if errors aren’t reported at the expected place.

pruning your tree

This is from a mailing list post I’ve just replied to. Since I had to look it up, it’s worth blogging. :)

It seems like a simple task. Find all the files in the current directory, excluding .svn directories. I’ve mocked up a simple layout.

% find .
.
./.svn
./.svn/README.txt
./README.txt
./src
./src/.svn
./src/.svn/foo.c
./src/foo.c

By default, find prints out everything. But we only want files.

% find . -type f
./.svn/README.txt
./README.txt
./src/.svn/foo.c
./src/foo.c

Now, we want to exclude everything under .svn. Easy.

% find . -name .svn -prune -type f

Ooops. That’s not good. What happened here? Well, the default for find is to and two expressions together. If we or it, we get what we want.

% find . -name .svn -prune -or -type f
./.svn
./README.txt
./src/.svn
./src/foo.c

Again, not so good. The problem is that default action to print everything. Because we’ve specified no action, it’ll print out each match, and that includes the .svn directories (even though it correctly stops going into them).

The answer is to provide an explicit action instead.

% find . -name .svn -prune -or -type f -print
./README.txt
./src/foo.c

This works, because now there is no default action, and the explicit action is only associated with the -type f predicate.

jslint4java status

I’ve done a few releases of jslint4java whilst this blog has been down. We’re presently at 1.4.6. It’s mostly been bug fixes and JSLint upgrades.

What’s really interesting has been paying attention to the integrations that people have come up with. It’s never been easier to have lint-free JavaScript!

Sonar

If you use the sonar code quality tool, check out the javascript-plugin-for-sonar which uses jslint4java.

HudsonJenkins

The hudson violations plugin can display JSLint errors in your project. You still have to arrange to run jslint4java as part of your build though.

Emacs

Want to run JSLint inside Emacs? Have a look at this gist.

Gradle

Do you use gradle for your builds? kellyrob99 has produced a gradle-jslint-plugin.

Mercurial

Want to run JSLint automatically when using mercurial? Take a look at Running JSLint as Mercurial precommit hook.

Netbeans

Check out Integrating JSLint more tightly into NetBeans.

Maven

Whilst there’s a jslint4java maven plugin in the works, this stackoverflow post describes several ways of integrating JSLint with Maven.

Phonegap

If you’re doing Phonegap development, the latest version of the eclipse plugin comes with JSLint.

If you develop an open source project, then you really should set up a google alert for its name. You will be surprised.

What’s coming up?

  • Update to the latest JSLint (as always). Doug Crockford recently did a major rewrite. I think I’ve got that mostly integrated now, but not released.
  • There are a few outstanding bugs that I need to pay attention to.
  • I’ve also been working on my own jslint4java-eclipse plugin, which feels nearly complete enough to release.
  • Assuming I can ever figure out the integration testing, I’ll go back and finish off the jslint4java-maven-plugin.

Plenty to keep busy with!