External Tests

I saw Eli Bendersky’s File-driven testing in Go post, and really like it. I was using something very similar yesterday. I’ve been attempting to replace a custom parser written in Python with an ANTLR one, with the goal being to run the same parser in both Python and Go. In order to do that, we need test cases that verify the old regex-based parser and the new antlr-based parser produce the same results. In order to do this, I moved all the test cases into a textproto (which is a common answer inside Google!). The output of the parser is another protobuf message, so we can include that directly.

test_case: {
  input: "some/valid/input"
  output: { … expected proto message inline … }
  want_error: false
test_case: { … }
// repeat as needed

The schema is pretty simple.

message TestCase {
  string input = 1;
  MyMessage output = 2;
  bool want_error = 3;

message TestCases {
  repeated TestCase test_case = 1;

This is trivially usable in the Go code, generating a subtest for each case.

func TestParse(t *testing.T) {
  data, err := os.Read("testdata/parser_test_cases.textproto")
  if err != nil { t.Fatal(err) }

  cases := &pb.TestCases{}
  if err := proto.UnmarshalText(data); err != nil { t.Fatal(err) }

  for _, tc := range cases {
    t.Run(tc.GetInput(), func(t *testing.T) {
      got, err := Parse(tc.input)
      if gotErr := (err != nil); gotErr != tc.getWantError() {
        t.Errorf("Parse() got err? %t want err? %t (err: %v)", gotErr. tc.GetWantError(), err)
      if diffs := cmp.Diff(want, got, protocmp.Transform()) {
        t.Errorf("Parse() had unexpected differences (-want +got):\n%s", diffs)

This makes it really easy to add new test cases, as I develop new parts of the ANTLR parser. The Python code is very similar to the Go code (both in the ANTLR and regex case). I was pleased to discover that Python has subtests along the way!


Python configuration

At $WORK, there is a program that uses Python as its configuration. Leaving aside the moment of whether or not this is a good idea, I wanted to look at how it does this.

All the program really needs is a dictionary of configuration items. But you can take advantage of it being Python to reduce duplication, generate some parts and so on.

# A much simplified example.

name = 'bob'

project = {
  'name': name,
  'branch': name + '_release_branch',
  'packages': [
    name + '_frontend',
    name + '_backend',
    name + '_middleend',

How do you read this configuration file, without it having any untowards effects on your program? Python has the execfile builtin to do just this.

scope = {}
execfile('bob.conf', scope)
return scope.get('project', {})

Where it gets really interesting is when there are similar configs that want to share amongst themselves; you have to start importing. Ideally, you’d like to be able to import from the same directory, so as to keep configuration together. This leads to something like:

conf_file = '/some/where/bob.conf'
oldsyspath = sys.path
  sys.path = [os.path.dirname(conf_file)] + sys.path
  scope = {}
  execfile('bob.conf', scope)
  return scope.get('project', {})
  sys.path = oldsyspath

Of course, this leads to pollution. If bob.conf imports, a permanent record is kept in sys.modules. So, if another .conf imports, you’d not load it from disk again; it would refer to the already imported file.

Which is probably OK, unless you’re dealing with different directories full of configuration. Then, import shared may refer to different modules. Yes, this is messy. Yes, this is exactly what I was working on today. 🙂

Now, we need to throw away any imports that are done by the config file. Thankfully this is fairly easy.

conf_file = '/some/where/bob.conf'
oldsyspath = sys.path
oldsysmodules = set(sys.modules)
  sys.path = [os.path.dirname(conf_file)] + sys.path
  scope = {}
  execfile('bob.conf', scope)
  return scope.get('project', {})
  sys.path = oldsyspath
  for name in set(sys.modules) - oldsysmodules:
    del sys.modules[name]

Phew! Now, I can read in all my configuration files from all over the system.

It’s not the end though. It turned out that some of the configuration files did silly things with stdin, so we had to capture stdin, redirect to /dev/null and restore it after the execfile().

Discussing with colleagues also revealed that the technique of cleaning up sys.modules could potentially cause trouble with modules that load .so files by not giving them a chance to clean up. The suggested workaround was to use the multiprocessing module to load the configuration in a separate process each time. Thankfully, none of the configuration files in this system were affected by this.

Nonetheless, by this point, I can now read in all configuration files, and write out a big list of them as a pickle file. Which lets me do some interesting analyses.

I guess the moral of this tale is that if you allow users access to a full programming language, they will use it! The system that this originated in has several thousand configuration files, dating back up to five years. There are a number of oddities lurking inside.


django & appengine

Last night I went to j4amie‘s brightonpy talk Python and Django for PHP Refugees (slides). It was a really good talk, though I knew most of the Python stuff. The django intro was great however.

What I was really interested in was using Django together with appengine. I’ve used appengine before with the builtin webapp framework. Whilst it’s good, it’s simplistic and I found myself building layers on top quickly.

Looking through the docs, the first thing I see is Running Django on Google App Engine. But this says that the builtin django is obsolete and I should be using django-nonrel. There is further documentation on this, Running Pure Django Projects on Google App Engine. This approach is interesting. It’s encouraging you to not be appengine specific, the way that you are with webapp’s default setup.

django-nonrel is made up of several components; you should start by looking at djangoappengine. You’ll need to download all five components.

You’ll also need the appengine SDK in case you don’t have it.

Once you’ve downloaded everything, import the necessary bits into a project you made with the appengine SDK.

% pwd
% cp -r $APPENGINE_SDK/new_project_template hellodjango
% cd hellodjango
% mv ~/Downloads/wkornewald-django-nonrel-c73e6ca3843d/django .
% mv ~/Downloads/wkornewald-djangotoolbox-f79fecb60e6d/djangotoolbox .
% mv ~/Downloads/wkornewald-django-dbindexer-48589f5faad4/dbindexer . 
% mv ~/Downloads/wkornewald-djangoappengine-f9175cf4c8bd djangoappengine
% ls -l
total 24
-rwxr-x---@  1 dom  5000   106 13 Apr 12:09 app.yaml*
drwxr-xr-x@ 12 dom  5000   408 13 Apr 12:43 dbindexer/
drwxr-xr-x@ 18 dom  5000   612 13 Apr 12:33 django/
drwxr-xr-x@ 23 dom  5000   782 13 Apr 12:43 djangoappengine/
drwxr-xr-x@ 15 dom  5000   510 13 Apr 12:43 djangotoolbox/
-rwxr-x---   1 dom  5000   472 24 Mar 23:38 index.yaml*
-rwxr-x---   1 dom  5000  1002 24 Mar 23:38*

You’ll have to bundle all of this with your app. You may want to delete some bits of django/contrib that you don’t use.

Now, how to get started with my app? I’ll need to create a django project. Normally I use the installed In this case, I’d like to use the version I’ve imported to my project.

% PYTHONPATH=. django/bin/ 
Usage: subcommand [options] [args]
% PYTHONPATH=. django/bin/ startproject hellodjango
% mv hellodjango/* .

So now how do I hook that up to app.yaml? There’s no documentation, but there is a test app. And that contains the magic snippet:

- url: /.*
  script: djangoappengine/main/

Now, how do I run this? The appengine launcher I’m using has a “play” button. My first attempt broke, because I’d made the app in the hellodjango directory, the settings contained a reference to hellodjango.urls, which should be just urls. With that fixed, I get an “It worked!” page. Result!

The approach (aka the play button) worked for me, but the djangoappengine docs say to use ./ runserver, so I’ll do that.

Now, I have an empty app. Let’s add in a minimal hello world view. First, I create

from django.http import HttpResponse

def home(request):
  return HttpResponse('<h1>Hello World</h1>')

And then adjust to point to it.

from django.conf.urls.defaults import patterns, include, url

import views

urlpatterns = patterns('',
  url(r'^$', views.home, name='home'),

I now see the Hello World! displayed in my browser. I’d like to get a nice template working. I’ll update my views to look like this:

from django.shortcuts import render

def home(request):
  return render(request, 'home.html')

templates/home.html is as you would expect.

<h1>Hello World!</h1>

The final piece of the puzzle: how does django know where to find the template? In, there’s a TEMPLATE_DIRS setting.

  os.path.join(os.path.dirname(__file__), 'templates'),

At this point, you’re using regular django, and should be able to use the regular docs to carry on. Although, please read the list of djangoappengine caveats.


Python Indentation

I like Python’s notion of using indentation instead of braces. All it does is force you to be consistent with yourself. Not a big deal, and it produces very reasonable code. I was very amused to see this however:

  % python
  Python 2.4.3 (#2, Mar 31 2006, 09:12:16)
  [GCC 3.4.4 [FreeBSD] 20050518] on freebsd6
  Type "help", "copyright", "credits" or "license" for more information.
  >>> from __future__ import braces
    File "<stdin>", line 1
  SyntaxError: not a chance