HHC: Config Refactor

I’ve had the HugeHonkinConsole set up to derive its configuration information from the place where it’s executed, which is…convenient in a sense, but really a pain in the ass long-term. With multiple versions of the code in the source tree and deployed on different machines, it creates this strange split of data repositories that’s not reasonable to keep track of.

So today’s task is to take the configuration class (which I’ve taken to calling Environment) and add a filename to the constructor so that it can optionally populate the internal configuration dictionary from that instead of just deriving all the file locations and such from argv[0], scriptroot() and such.

With what should prove to be a truly trivial change (yadda yadda “contact with the enemy” notwithstanding) I’ll gain a massive amount of flexibility.

The original code is dead simple because it just doesn’t need to do much at all. In fact, here’s the important bits of the source file (allow for some shitty wordpress code formatting.):

-— Environment.py

import socket
import os
import sys

class Environment:
def __init__(self):
    self.env = dict()


def populate_environment(self):

    # Independent (system derived) values
    self.env['hostname']   = self._gethostname()
    self.env['scriptroot'] = self._getscriptroot()

    # Derived values
    self.env['dataroot'] = self._get_data_path()

def _gethostname(self):
    return socket.gethostname().lower()

def _getscriptroot(self):
    return os.path.dirname(os.path.realpath(sys.argv[0]))

def _get_data_path(self):
    return self.env['scriptroot'] + '/data/'


It’s really not rocket surgery. A couple system methods to grab the current host name, local run location, and build a config entry or two off of that.

Now I don’t want to change anything that’s IN there because I’ve got a bunch of nonsense that’s relying on it, simple as it is. So I’m really just going to add a couple lines of code to the constructor and a single new method and I…THINK it’ll be done.

So the constructor now looks like this:

def __init__(self,filename=None):
    self.env = dict()


    if filename:
        self.env['config_file'] = filename

It takes an optional new argument, the configuration filename and only changes behavior if it’s been provided. That, insofar as it goes, guarantees it won’t break any existing code.

Now let’s take a look at the new load_config_file method.


def load_config_file(self,filename):
    fh = open(filename,'r')
    config = json.load(fh)
    self.env = config


As you can see there’s almost nothing in there.

  • Open a file
  • Pass the file handle to json.load
  • take the result and assign it to the internal ‘env’ dictionary
  • close the file.

Now…this presumes a fair bit. It’s definitely “happy path” code.

  • The file needs to exist or the code will unceremoniously die.
  • The file needs to be well formed json or the code will unceremoniously die.
  • The implicit assumption is that the json file contains a top-level dictionary with the keys and values we want.

But I’m going to be manually writing the json file. So the danger is really so low as to be nonexistent. Also, you should really only capture errors you can do something about. If this code blows up on a bad json file, I don’t want it doing anything else. Die with an error message. Once these blocks of functionality are out and running on all kinds of different machines with the possibility of complex errors and auto-distributed configuration changes (because why not?) then I may want a more sophisticated error reporting system. But there’s no reason to engineer it to be more resilient than that at this stage. It works or it dies.

So great! I can now wire this in by passing a configuration file name in at the very beginning when the Environment object is created and I’ll be done. Now I just have to go write the config file.

Well…no I don’t. Not really.

Let’s add another method to the same class for shits and giggles:


def save_config_file(self,filename):
    fh = open(filename,'w')


That saves whatever’s in the current object’s ‘self.env’ dictionary to a well formatted json file.

So NOW I just drop to the command line and interactively execute the following:

> import environment
> env = environment.Environment()
> env.save_config_file(‘foo.json’)
> quit()

There. Now there’s a config file named foo.json with the default parameters already there. It clears up any issues I might have with how to format the file, what parameters need to be fulfilled, and how I can add more. I’ll take the system-specific values out, certainly. But it makes for a great little template.

By putting the configuration, all the magic constants in one single location, then abstracting that information into an external file I’ve taken total control of how the code behaves. The environment object is available all over the code base, so if I need something that looks like a constant or a conceivably configurable parameter, I can just add it to the json file and it will be automatically picked up and populated into the environment object and I can start using it.

Nothing else has to change.

Now here’s an idea: Since I have access to the system routines that tell me exactly where this code is being executed from, I COULD add code that says “IF this Environment object was created with no configuration file name, go to the root directory of this application and look for a file named ‘config.json’ and load that if it’s there.” THEN I wouldn’t even have to specify a filename. I wouldn’t have to change a single line of code outside that file in order to have it start picking up the new config file.

It’s tempting. But I’m not inclined to do it, and here’s why.

Fast forward six months (or, you know, three hours. Because that’s how this stuff happens.) I’m plugging away on something and suddenly the code just dies. Now let’s say the configuration file has been renamed or moved or something.

Think about what the diagnostic process looks will look like, because there’s no way I’ll remember I actually had the Environment module figure out the name of the config file on its own. So the first thing I’ll do is open the main script file and look there. It creates an environment object with no parameters. So now I have to go spelunking through the code to see what and where it gets that filename from. I’ll have to reread the constructor (once I’ve even gotten there) to come up with what the problem must be.

The added convenience of not requiring an explicit filename is outweighed by the forensic overhead that will (not might, will) be required to diagnose a config issue.

WITH the explicit filename in the top-level of the code I’d open that file, see the config filename sitting there, then go look at that file, assuming the Environment class code to be essentially stable. Then I’d far more quickly get to the actual issue.

Coding for simplicity isn’t just about removing lines of code and cleaning up the boundaries of responsibility, it’s about anticipating future maintenance load and understanding the long-term costs of “simplicity.”

So yeah. I’m not going to do that.