Advanced Use Cases¶
Providing Type Converters¶
When working with untyped sources like INI files or environment variables there are a few approaches to convert values to the correct type.
- Guess a type based on the content
- Distill information from other sources
- Manually provide type information for each key
Each approach has is pros and cons, ranging from either more failures to otherwise more manual labor. Because failures are inacceptable only the the second approach will be applied whenever possible and everything else stays untouched. However, configstacker makes it fairly easy for you to provide additional information to convert values.
Converters for Specific Keys¶
Consider the following setup.
import os
import configstacker as cs
os.environ['MYAPP|SOME_NUMBER'] = '100'
os.environ['MYAPP|SOME_BOOL'] = 'False'
os.environ['MYAPP|NESTED|SOME_BOOL'] = 'True'
untyped_env = cs.Environment('MYAPP', subsection_token='|')
Accessing the values will return them as strings because configstacker has no other source of knowledge.
>>> type(untyped_env.some_number)
<class 'str'>
To solve that issue we can provide a list of converters. A converter consists of a value name
and two converter functions. One that turns the value into the expected
format (here int
) and another one that turns it back to a storable
version like str
.
converter_list = [
cs.Converter('some_number', int, str)
]
partly_typed_env = cs.Environment('MYAPP', converters=converter_list, subsection_token='|')
Now accessing some_number
will return it as an integer while
some_bool
is still a string.
>>> partly_typed_env.some_number
100
>>> type(partly_typed_env.some_number)
<class 'int'>
To also convert the boolean value we could just compare the value to strings and return the result.
def to_bool(value):
return value == 'True'
assert to_bool('True')
However, that is not very safe and we should be a bit smarter about it. So more elaborate versions might even use other libraries to do the heavy lifting for us.
import distutils
def to_bool(value):
return bool(distutils.util.strtobool(value))
assert to_bool('yes')
After we choose an approach let’s put everything together. For convenience we can provide the converters as simple tuples. Configstacker will convert them internally. Just make sure to stick to the correct order of elements. First the key to convert, then the customizing function and finally the resetting function.
converter_list = [
('some_bool', to_bool, str),
('some_number', int, str),
]
typed_env = cs.Environment('MYAPP', converters=converter_list, subsection_token='|')
Now all values including the nested ones are typed. Also the value assignment works as expected.
>>> typed_env.some_number
100
>>> typed_env.some_bool
False
>>> typed_env.some_bool = True
>>> typed_env.some_bool
True
>>> typed_env.some_bool = 'False'
>>> typed_env.some_bool
False
>>> typed_env.nested.some_bool
True
Converters With Wildcards¶
The previous method only works if you know the value names in advance. Consider we have a simple map of libraries with their name and version string.
DATA = {
'libraries': {
'six': '3.6.0',
'requests': '1.2.1',
}
}
When accessing a library it would be really nice to get its version as a named tuple rather then a simple string. The issue is that we don’t want to specify a converter for each library. Instead we can make use of a wildcard to apply a converter to all matching keys. Let’s first create the version class and its serializers.
import collections
Version = collections.namedtuple('Version', 'major minor patch')
def to_version(version_string):
parts = version_string.split('.')
return Version(*map(int, parts))
def to_str(version):
return '.'.join(map(str, version))
With that in place we can create our config object with a converter that uses a wildcard to match all nested elements of libraries.
import configstacker as cs
config = cs.DictSource(DATA, converters=[
('libraries.*', to_version, to_str)
])
Now we can access a library by its name and get a nice version object returned.
>>> config.libraries.six
Version(major=3, minor=6, patch=0)
>>> config.libraries.requests.major
1
It is important to understand that changing a value on the converted object will not be stored in the configuration automatically. This is because configstacker doesn’t know when the custom object changes. To save changes to the object you can simply reassign it to the same key that you used to access it in the first place. You can also assign the object to a new key as long as it is covered by a converter or the underlying source handler is capable of storing it.
Note
The previous example only show cased the idea. For real use cases you should probably use a library that knows how to properly handle versions strings.
Converting Lists¶
In the previous section we had a very simple data dictionary where all libraries consisted of a name-to-version mapping. In this example we will have a list of json-like objects instead. The information is the same as before. Each item consists of a name and version pair.
DATA = {
'libraries': [{
'name': 'six',
'version': '3.6.0',
}, {
'name': 'request',
'version': '1.2.0',
}]
}
Equally we will setup our classes and serializers first. Also we assume a very simple version bump logic where a major update will reset the minor and patch numbers to zero and a minor upate resets the patch number to zero.
import collections
import configstacker as cs
import six
Version = collections.namedtuple('Version', 'major minor patch')
class Library(object):
def __init__(self, name, version):
self.name = name
self.version = version
def bump(self, part):
major, minor, patch = self.version
if part == 'major':
new_parts = major + 1, 0, 0
elif part == 'minor':
new_parts = major, minor + 1, 0
elif part == 'patch':
new_parts = major, minor, patch + 1
else:
raise ValueError('part must be major, minor or patch')
self.version = Version(*new_parts)
Because configstacker doesn’t know anything about the type of the values it has no idea how to treat lists. Especially how to parse and iterate over the individual items. So instead of just creating converter functions for single items we additionally have to create wrappers to convert and reset the whole list.
def _load_library(library_spec):
version_parts = library_spec['version'].split('.')
version = Version(*map(int, version_parts))
return Library(library_spec['name'], version)
def load_list_of_libraries(library_specifications):
"""Convert list of json objects to Library objects."""
return [_load_library(specification) for specification in library_specifications]
def _dump_library(library):
version_str = '.'.join(map(str, library.version))
return {'name': library.name, 'version': version_str}
def dump_list_of_libraries(libraries):
"""Dump list of Library objects to json objects."""
return [_dump_library(library) for library in libraries]
The final config object will be created as follows:
config = cs.DictSource(DATA, converters=[
('libraries', load_list_of_libraries, dump_list_of_libraries)
])
Converting Subsections¶
Usually a subsection, or a nested dict in python terms, is handled in a special way as it needs to be converted into a configstacker instance. However, you can change that behavior by providing a converter for a subsection. This is useful if you want to use the nested information to assemble a larger object or only load an object if it is accessed.
As an example consider we have a todo application that stores todos in a database, a caldav resource or anything else. To keep the application flexible it needs to be ignorant to how the storage system works internally but it has to know about how it can be used. For that reason the storage should implement an interface that is known to the application. Also it shouldn’t be hardcoded into the application but injected into it at start or runtime. With this setup it is pretty simple for the user to specify which storage to use and how it should be configured. When starting the application we will then dispatch the configuration to the specific storage factory and assemble it there.
The next snippet contains the interface IStorage
that enforces the
existence of a save
and a get_by_name
method. Additionally you
will find two dummy implementations for a database and a caldav storage.
import abc
class IStorage(abc.ABC):
@abc.abstractmethod
def save(self, todo):
pass
@abc.abstractmethod
def get_by_name(self, name):
pass
class DB(IStorage):
def __init__(self, user, password, host='localhost', port=3306):
# some setup code
self.host = host
def get_by_name(self, name):
# return self._connection.select(...)
pass
def save(self, todo):
# self._connection.insert(...)
pass
def __repr__(self):
return '<DB(host="%s")>' % self.host
class CalDav(IStorage):
def __init__(self, url):
# some setup code
self.url = url
def get_by_name(self, name):
# return self._resource.parse(...)
pass
def save(self, todo):
# self._resource.update(...)
pass
def __repr__(self):
return '<CalDav(url="%s")>' % self.url
The following two json files are examples of how a storage configuration
could look like. They are following a simple convention. type
is the
class that should be loaded while the content of the optional setup
key will be passed to the class on instantiation.
{
"storage": {
"type": "DB",
"setup": {
"user": "myuser",
"password": "mypwd"
}
}
}
{
"storage": {
"type": "CalDav",
"setup": {
"url": "http://localhost/caldav.php"
}
}
}
First let’s continue without a converter to see the difference between
both versions. We read the config files as usual and assemble a storage
object with the respective class from the storage_module
. Finally
it gets injected into the todo application.
import configstacker as cs
import storage_module
config = cs.YAMLFile('/path/to/config.yml')
storage_class = getattr(storage_module, config.storage.type)
storage = storage_class(**config.storage.get('setup', {}))
app = TodoApp(storage)
To make the code cleaner we could refactor the storage related setup into its own function and assign it to a converter instead.
import configstacker as cs
import storage_module
def load_storage(spec):
storage_class = getattr(storage_module, spec.type)
return storage_class(**spec.get('setup', {}))
config = cs.YAMLFile('/path/to/config.yml',
converters=[('storage', load_storage, None)])
app = TodoApp(config.storage)
With the converter in place accessing the storage returns us a fully constructed storage object instead of a nested subsection.
>>> config.storage
<CalDav(url="http://localhost/caldav.php")>
Using Merge Strategies¶
Often configurations are meant to override each other depending on their priority. However, there are cases where consecutive values should not be overridden but handled differently, for example collected into a list.
Consider we are building a tool that allows to specify multiple paths. For that we want to define a set of default paths and enable our users to add additional paths if they want to.
import os
import configstacker as cs
# our predefined defaults
DEFAULTS = {
'path': '/path/to/default/file'
}
# a user set variable
os.environ['MYAPP|PATH'] = '/path/to/user/file'
config = cs.StackedConfig(
cs.DictSource(DEFAULTS),
cs.Environment('MYAPP', subsection_token='|'),
)
When we try to access path
we will only get the value from the
source with the highest priority which in this case is the environment
variable.
>>> config.path
'/path/to/user/file'
To solve this problem we can use a merge strategy that simply collects all values into a list. For that we create a strategy map which contains the value’s name and its merge function. All occurrences of the specified key will now be merged consecutively with the previous merge result.
config = cs.StackedConfig(
cs.DictSource(DEFAULTS),
cs.Environment('MYAPP', subsection_token='|'),
strategy_map={
'path': cs.strategies.collect
}
)
Here we use collect
which is one of the builtin strategies and perfectly fits our needs. Now when
we access path
it returns a list of values in the prioritized order.
>>> config.path
['/path/to/user/file', '/path/to/default/file']
Let’s say instead of merging the paths into a list we want to join all
paths with a colon (or semicolon if you are on Windows). Create
a function that accepts a previous
and a next_
parameter and
join both values together.
def join_paths(previous, next_):
if previous is cs.strategies.EMPTY:
return next_
return ':'.join([previous, next_])
assert join_paths(cs.strategies.EMPTY, '/a/path') == '/a/path'
assert join_paths('/a/path', '/other/path') == '/a/path:/other/path'
Some things to note:
next_
ends with an underscore to prevent name clashes with the builtinnext()
function.- When the merge function is called
previous
contains the result from the last call andnext_
contains the current value.- If the merge function is called for the first time configstacker will pass
EMPTY
toprevious
. This is a good time to return a default value which in our case isnext_
.
To have something to play with we will also access the system environment variables so that we can make use of our global path variable. To not accidentally change anything we will load them in a read-only mode.
config = cs.StackedConfig(
cs.Environment('', readonly=True),
cs.DictSource(DEFAULTS),
cs.Environment('MYAPP', subsection_token='|'),
strategy_map={
'path': join_paths
}
)
path
should now return a single string with the user defined path,
the default path and the system path joined together with a colon.
>>> config.path
'/path/to/user/file:/path/to/default/file:/...'
Warning
This is a demonstration only. Be extra cautious when accessing the system variables like that.
Note
When using an empty prefix to access the system variables understand
that MYAPP
variables will also show up unparsed as myapp|...
in the config object. This is because the source handler with the
empty prefix doesn’t know anything about the special meaning of
MYAPP
.
Extending Source Handlers¶
Configstacker already ships with a couple of source handlers. However, there are always reasons why you want to override an existing handler or create a completely new one. Maybe the builtin handlers are not working as required or there are simply no handlers for a specific type of source available.
Therefore configstacker makes it fairly easy for you to create new
handlers. You only have to extend the base Source
class and
create at least a _read()
method that returns
a dictionary. If you also want to make it writable just add
a _write(data)
method which in return accepts
a dictionary and stores the data in the underlying source.
Assume you want to read information from a command line parser. There
are a couple of ways to accomplish that. The easiest one would be to
handover already parsed cli parameters to configstacker as
a DictSource
which is readonly and has the highest priority in
a stacked configuration. This method is great because it allows us to
easily incorporate any cli parser you like. It just has to return
a dictionary. Another way would be to create a parser that hooks into
sys.argv
and strips out the information itself. For demonstration
purposes we will implement the latter case.
Note
Handling the cli is a complex task and varies greatly between
applications and use cases. As such there is no default cli
handler integrated into configstacker.
If you are building a cli for your application it is probably
easier to just go with the DictSource
approach and make use of
great tools like click.
The following handler will create an argparse.ArgumentParser
internally. Because cli parameters cannot be changed after the script or
application has been started we don’t need a _write
method to save
changes. Additionally because they are only entered once at the startup
of the application we also don’t need to lazy load them. Therefore the
arguments can already be parsed in the __init__
method which makes
_read
very simple.
import argparse
import sys
import configstacker as cs
class CliSource(cs.Source):
def __init__(self, argv=None):
self._parser = argparse.ArgumentParser()
self._parser.add_argument('job_name')
self._parser.add_argument('--job-cache', action='store_true')
self._parser.add_argument('-r', '--job-retries', type=int, default=0)
self._parser.add_argument('--host-url', default='localhost')
self._parser.add_argument('--host-port', type=int, default=5050)
self._parser.add_argument('-v', '--verbose', action='count', default=0)
self._parsed_data = {}
parsed = self._parser.parse_args(argv or sys.argv[1:])
for (argument, value) in parsed._get_kwargs():
tokens = argument.split('_')
subsections, key_name = tokens[:-1], tokens[-1]
last_subdict = cs.utils.make_subdicts(self._parsed_data, subsections)
last_subdict[key_name] = value
super(CliSource, self).__init__()
def _read(self):
return self._parsed_data
def main():
cfg = CliSource()
# just some demonstration code
if cfg.verbose > 0:
print('Job runner:\t{url}:{port}'.format(**cfg.host))
if cfg.verbose > 1:
cache_state = 'enabled' if cfg.job.cache else 'disabled'
print('Job cache:\t%s' % cache_state)
print('Max retries:\t%s' % cfg.job.retries)
print('Start job %s' % cfg.job.name)
if __name__ == '__main__':
main()
We can test it by invoking our handler and providing some arguments to it.
>>> cfg = CliSource(['-vv', 'some_job'])
>>> cfg.name
'some_job'
>>> cfg.verbose
2
>>> cfg.index.cache
False
Now let’s call it as a script to make use of the exemplary main function. It will show us the help.
$ python cli-source.py -h
usage: cli-source.py [-h] [--job-cache] [-r JOB_RETRIES] [--host-url HOST_URL]
[--host-port HOST_PORT] [-v]
job_name
positional arguments:
job_name
optional arguments:
-h, --help show this help message and exit
--job-cache
-r JOB_RETRIES, --job-retries JOB_RETRIES
--host-url HOST_URL
--host-port HOST_PORT
-v, --verbose
And finally run a pseudo job.
$ python cli-source.py -vv some_job
Job runner: localhost:5050
Job cache: disabled
Max retries: 0
Start job some_job