At $WORK, there is a program that uses Python as its configuration. Leaving aside the moment of whether or not this is a good idea, I wanted to look at how it does this.
All the program really needs is a dictionary of configuration items. But you can take advantage of it being Python to reduce duplication, generate some parts and so on.
# A much simplified example. name = 'bob' project = { 'name': name, 'branch': name + '_release_branch', 'packages': [ name + '_frontend', name + '_backend', name + '_middleend', ], }
How do you read this configuration file, without it having any untowards effects on your program? Python has the execfile builtin to do just this.
scope = {} execfile('bob.conf', scope) return scope.get('project', {})
Where it gets really interesting is when there are similar configs that want to share amongst themselves; you have to start importing. Ideally, you’d like to be able to import from the same directory, so as to keep configuration together. This leads to something like:
conf_file = '/some/where/bob.conf' oldsyspath = sys.path try: sys.path = [os.path.dirname(conf_file)] + sys.path scope = {} execfile('bob.conf', scope) return scope.get('project', {}) finally: sys.path = oldsyspath
Of course, this leads to pollution. If bob.conf
imports shared.py
, a permanent record is kept in sys.modules. So, if another .conf
imports shared.py
, you’d not load it from disk again; it would refer to the already imported file.
Which is probably OK, unless you’re dealing with different directories full of configuration. Then, import shared
may refer to different modules. Yes, this is messy. Yes, this is exactly what I was working on today. đŸ™‚
Now, we need to throw away any imports that are done by the config file. Thankfully this is fairly easy.
conf_file = '/some/where/bob.conf' oldsyspath = sys.path oldsysmodules = set(sys.modules) try: sys.path = [os.path.dirname(conf_file)] + sys.path scope = {} execfile('bob.conf', scope) return scope.get('project', {}) finally: sys.path = oldsyspath for name in set(sys.modules) - oldsysmodules: del sys.modules[name]
Phew! Now, I can read in all my configuration files from all over the system.
It’s not the end though. It turned out that some of the configuration files did silly things with stdin, so we had to capture stdin, redirect to /dev/null
and restore it after the execfile()
.
Discussing with colleagues also revealed that the technique of cleaning up sys.modules
could potentially cause trouble with modules that load .so
files by not giving them a chance to clean up. The suggested workaround was to use the multiprocessing module to load the configuration in a separate process each time. Thankfully, none of the configuration files in this system were affected by this.
Nonetheless, by this point, I can now read in all configuration files, and write out a big list of them as a pickle file. Which lets me do some interesting analyses.
I guess the moral of this tale is that if you allow users access to a full programming language, they will use it! The system that this originated in has several thousand configuration files, dating back up to five years. There are a number of oddities lurking inside.
2 replies on “Python configuration”
If you want to just use python-like syntax for defining data you can avoid these kinds of pitfalls by using ast.literal_eval(). It provides a safe way to interpret Python literals without actually executing code. I use it in particular with the ConfigParser library to allow values which are more complex data structures to be interpreted.
That’s a good idea, I’ll bear it in mind. Unfortunately, when this system was designed, ast wasn’t around.