Python configuration

At $WORK, there is a program that uses Python as its configuration. Leaving aside the moment of whether or not this is a good idea, I wanted to look at how it does this.

All the program really needs is a dictionary of configuration items. But you can take advantage of it being Python to reduce duplication, generate some parts and so on.

# A much simplified example.
 
name = 'bob'
 
project = {
  'name': name,
  'branch': name + '_release_branch',
  'packages': [
    name + '_frontend',
    name + '_backend',
    name + '_middleend',
  ],
}

How do you read this configuration file, without it having any untowards effects on your program? Python has the execfile builtin to do just this.

scope = {}
execfile('bob.conf', scope)
return scope.get('project', {})

Where it gets really interesting is when there are similar configs that want to share amongst themselves; you have to start importing. Ideally, you’d like to be able to import from the same directory, so as to keep configuration together. This leads to something like:

conf_file = '/some/where/bob.conf'
oldsyspath = sys.path
try:
  sys.path = [os.path.dirname(conf_file)] + sys.path
  scope = {}
  execfile('bob.conf', scope)
  return scope.get('project', {})
finally:
  sys.path = oldsyspath

Of course, this leads to pollution. If bob.conf imports shared.py, a permanent record is kept in sys.modules. So, if another .conf imports shared.py, you’d not load it from disk again; it would refer to the already imported file.

Which is probably OK, unless you’re dealing with different directories full of configuration. Then, import shared may refer to different modules. Yes, this is messy. Yes, this is exactly what I was working on today. :)

Now, we need to throw away any imports that are done by the config file. Thankfully this is fairly easy.

conf_file = '/some/where/bob.conf'
oldsyspath = sys.path
oldsysmodules = set(sys.modules)
try:
  sys.path = [os.path.dirname(conf_file)] + sys.path
  scope = {}
  execfile('bob.conf', scope)
  return scope.get('project', {})
finally:
  sys.path = oldsyspath
  for name in set(sys.modules) - oldsysmodules:
    del sys.modules[name]

Phew! Now, I can read in all my configuration files from all over the system.

It’s not the end though. It turned out that some of the configuration files did silly things with stdin, so we had to capture stdin, redirect to /dev/null and restore it after the execfile().

Discussing with colleagues also revealed that the technique of cleaning up sys.modules could potentially cause trouble with modules that load .so files by not giving them a chance to clean up. The suggested workaround was to use the multiprocessing module to load the configuration in a separate process each time. Thankfully, none of the configuration files in this system were affected by this.

Nonetheless, by this point, I can now read in all configuration files, and write out a big list of them as a pickle file. Which lets me do some interesting analyses.

I guess the moral of this tale is that if you allow users access to a full programming language, they will use it! The system that this originated in has several thousand configuration files, dating back up to five years. There are a number of oddities lurking inside.

2 thoughts on “Python configuration

  1. Kamil Kisiel

    If you want to just use python-like syntax for defining data you can avoid these kinds of pitfalls by using ast.literal_eval(). It provides a safe way to interpret Python literals without actually executing code. I use it in particular with the ConfigParser library to allow values which are more complex data structures to be interpreted.

Comments are closed.