PyTables as a Computing Kernel and Compressed Container for Arbitrarily Large Datasets

With the addition of the tables.Expr class and Blosc in 2.2 release, PyTables can be used as a very efficient computing kernel for both out-of-memory and in-memory computations and becomes an excellent replacement of memmap-like solutions (like the numpy.memmap module), with a some important advantages.

First, tables.Expr combines the highly efficient Numexpr JIT, and great I/O capabilities of PyTables/HDF5 for performing complex expressions with homogeneous data containers in PyTables (like Array, CArray, EArray or Column), instead of using the typical NumPy array objects. Second, PyTables offer enhanced compressed containers via Blosc that allows to reduce memory consumption and barely affecting computing performance.

This combination is so powerful, that the evaluation of expressions is generally faster (and sometimes much faster) than using a numpy.memmap approach for out-of-memory computations. Even if datasets are small enough to fit in-memory easily, tables.Expr can even be faster than a pure NumPy computation. This is because PyTables I/O for operands and results (using compressed containers) is done against OS filesystem cache (i.e. in-memory), not directly to disk. Also, when using compressed containers, the I/O to filesystem cache can be reduced quite a few, compensating for the required additional compression/decompression time.

Example: Computing a Polynomial

A comparison between NumPy and tables.Expr computation paradigms.

To show you how effective this combination can be, consider the problem to compute the evaluation of a given polynomial (".25*x**3 + .75*x**2 - 1.5*x - 2") on a certain range ([-1, 1]) with some granularity (2e-7) in x axis. The working set for this is around 160 MB, so everything can be done in memory with a reasonably modern computer. We are going to perform this computation using pure numpy, numpy.memmap and tables.Expr methods. You can run this script (using a PyTables version > 2.2b3) and you will get something like this output (using an Intel Core2 E8400 @ 3GHz processor).

In the attached plot you can see these results more graphically. To start with, tables.Expr is about 10x faster than numpy.memmap and more than 5x faster than the pure numpy approach. This is basically due to the combination of numexpr and high I/O capabilities of PyTables. Secondly, it can be noted that when using Blosc compressor, with compression level 1 (i.e. the fastest), tables.Expr can perform computations at approximately the same speed (barely a 5% slower) than the uncompressed case. Higher compression levels give perceptible slowdown in computation time (around a 70%), but compression ratio improves considerably.

Already existing compressors expose better compression ratio (most specially zlib and bzip2), but they slowdown these in-memory computations considerably. Even LZO, which was one of the fastest compressors available (before Blosc was introduced), is slowing down computations by a 170%.

The Out-Of-Core Computation Paradigm Is Here To Stay

This new paradigm is a really great achievement in the computation world, and the advanced techniques of (making efficient use of memory hierarchies and using compression to save I/O to the memory subsystem) will probably be extensively used in the future for performing computations while reducing memory usage.

Another advantage of this approach is that it is able to perform computations no matter they fit in memory or not, as this is using an out-of-core paradigm for doing computations, and all the memory layers in modern computers (L1, L2, L3 caches, RAM and SSD disks) can be effectively used as caches during your data manipulation. So, you won't need to implement different versions of your code depending whether your data fit in memory or not; just do a single version using this out-of-core paradigm and you are done (unless you have to deal with very small datasets that can fit in your processor caches, where NumPy will still probably expose better performance).

Now, do you still think that using out-of-core computations and compression would slow down your computations? Think twice!

ComputingKernel (last edited 2010-11-21 12:40:00 by FrancescAlted)