digip.org blog

New features of Jansson 2.0, part 1

By Petri Lehtinen on 2011-03-01

This post starts a series of articles that give insight to the new features of Jansson 2.0.

First up is the json_unpack() API. I think it's the most powerful new feature, allowing the user to perform two things on a JSON value: data extraction, and validation against a simple schema. The idea has been stolen from Python's C API.

Example:

/* Assume that obj is the following JSON object:
 *   {"x": 15.4, "y": 99.8, "z": 42}}
 */
json_t *obj;
double x, y, z;

if(json_unpack(obj, "{s:f, s:f, s:f}", "x", &x, "y", &y, "z", &z))
    return -1;  /* error */

assert(x == 15.4 && y == 99.8 && z == 42);

The format string passed to json_unpack() describes the structure of the object. The s format denotes an object key, and the f format means a real number value. Whitespace, : and , are ignored, so {sfsfsf} would be an equivalent format string to the one above.

After the format string, there's one argument for each format character. For object keys, a string specifies what key is accessed, and for real numbers, a pointer to double gives an address where to store the value.

The equivalent code without json_unpack() would be something like this:

json_t *obj, *tmp;
double x, y, x;

tmp = json_object_get(obj, "x");
if(!json_is_real(tmp))
    return -1;  /* error */
x = json_real_value(tmp);

/* repeat for y and z */
/* ... */

printf("x: %f, y: %f, z: %f\n", x, y, z);
/* ==> x: 15.4, y: 99.8, z: 42 */

The code that uses json_unpack() is much shorter and cleaner, and it's easier to see what it's doing.

Nested values are supported, too:

/* Assume that nested is the following JSON object:
 *   {"foo": {"bar": [11, 12, 13]}}
 */
json_t *nested;
int i1, i2, i3;

if(json_unpack(nested, "{s:{s:[iii]}}", "foo", "bar", &i1, &i2, &i3))
    return -1;  /* error */

assert(i1 == 11 && i2 == 12 && i3 == 13);

This time, the format string has two nested objects and a nested array. There's no limit on the nesting levels. The variable arguments are used in the "flat" order in which they appear in the format string.

The same API can also be used in a validation-only mode, i.e. without extracting any values. Error messages are also available:

/* Assume the same JSON object as in the previous example */
json_t *nested;
json_error_t error;

if(json_unpack_ex(nested, &error, JSON_VALIDATE_ONLY,
   "{s:{s:[iii]}}", "foo", "bar"))
{
    fprintf(stderr, "Error: %d:%d: %s\n", error.line, error.column, error.text);
    return -1;
}

The json_unpack_ex() function is the extended version of json_unpack(). It takes an error parameter, similar to decoding functions, and optional flags to control the behaviour. The JSON_VALIDATE_ONLY flags tells it to only validate and not to extract anything. Extra arguments after the format sting are only required for object keys. The available validation is quite simple, only the object/array structure and value types can be checked, but usually this saves a lot of code.

I strongly believe that this feature, along with the json_pack() API described in the next part, will make it an order of magnitude more pleasant to manipulate JSON data in C. Many thanks to Graeme Smecher for suggesting this and providing the initial implementation.

This article only gave a few examples. For full details, all available format characters and flags, see the documentation.

Tags: jansson

Generating data files in setup.py

By Petri Lehtinen on 2011-01-28

In a work project, I have a few JavaScript files that are generated from a bunch of other files. The project is a Django website, so I just have views that generate the files on-the-fly when running in debug mode, and everything works nice and smooth.

For production, though, I needed the flat files that would be served from disk. I figured out that the best approach would be to generate the files in setup.py upon installation, but I could only find very superficial documentation on how to do that.

A brief intro to setup.py: In every project's setup.py file, the setup function, from Python standard library's distutils.core module, is used to define the project's files and metadata. (setup can also be imported from from setuptools or distribute, but they're compatible with distutils.) With the standard commands that setup.py provides, the files and metadata can be compiled to an egg, distributed as a source tarball, uploaded to PyPI, and so on.

The entry point to altering setup.py's behaviour is the optional cmdclass argument to the setup function. It's value is a dict from command names to distutils.command.Command subclasses that implement the commands. The build_py command is where the package data files are installed, so to override build_py, I created the class my_build_py and registered it, like this:

from distutils.core import setup
from distutils.command.build_py import build_py

class my_build_py(build_py):
    # ...

setup(
    # Define metadata, files, etc.
    # ...
    cmdclass={'build_py': my_build_py}
)

The run method of build_py, along with copying and compiling the Python source files, is responsible for copying the packages data files to the build directory build/lib.<platform>. (The actual directory name is stored in the build_py instance's self.build_lib variable.)

To install your own files, just override the run method. Remember to call the superclass after you're done with your own files.

def generate_content():
    # generate the file content...
    return content

class my_build_py(build_py):
    def run(self):
        # honor the --dry-run flag
        if not self.dry_run:
            target_dir = os.path.join(self.build_lib, 'mypkg/media')

            # mkpath is a distutils helper to create directories
            self.mkpath(target_dir)

            with open(os.path.join(target_dir, 'myfile.js'), 'w'):
                fobj.write(generate_content())

        # distutils uses old-style classes, so no super()
        build_py.run(self)

And that's it! A later phase of the installation copies everything from build/lib.<platform> to the correct place, so your generated file gets in, too.

Sala 1.0 released

By Petri Lehtinen on 2011-01-19

For some time now, I've been unsatisfied with the state of user names and passwords for the numerous services I use in the web. I'm having hard time to remember all the services I have singed up to and a bad habit of using a few common passwords for all of them.

So, I hacked for a while, and yestreday, I pushed the first release of sala to PyPI. It's is a simple, filesystem based, encrypted password storage system that uses GnuPG's symmetrical encryption. As usual, a git repository is available at GitHub.

The main idea of sala is to store passwords (or other tiny, plain-text secrets) in encrypted plain-text files in a directory hierarchy, like this:

/path/to/passwords
|-- example-service.com
|   |-- +webmail
|   |   |-- @myuser
|   |   `-- @otheruser
|   `-- +adminpanel
|       `-- @admin
`-- my-linux-box
    |-- @myuser
    `-- @root

As sala is a command line utility and there's one file per password, tab completion and other shell goodies are available. The custom of prefixing user names with @ and category/group/subservice names with + is my own preference and not enforced by the program. You may come up with your own scheme, for example if you want to protect user names as well as the actual passwords. For more information, see the PyPI page.

Use pip install sala to install, or download the source, unpack, and invoke python setup.py install. In addition to Python 2.5 or newer, requires gpg and GnuPGInterface. Currently, Python 3 is not supported because GnuPGInterface doesn't support it.

Update: While writing this blog entry, I found a few packaging bugs, and released version 1.0.1.