Contents
PyTables Pro Professional Edition
PyTables Professional Edition is a commercial and much enhanced version of the Free Source software PyTables Standard Edition. In brief, PyTables Pro is ready for getting the most of the hardware behind it, allowing to perform complex data analysis on datasets that are typically larger (and sometimes much larger) than your available memory. With typical usage, PyTables Pro can cope, while using similar resources, with tables that can be up to 5x larger than what traditional databases would allow you.
Main features
It comes with OPSI included
OPSI is a powerful and innovative indexing engine. The technology behind this engine allows PyTables Pro to perform really fast queries on arbitrarily large tables. Moreover, the cost of the indexation process is pretty low (much less than traditional relational databases) and it grows linearly with the length of the table, no matter how large it is. Indexation code also leverages the vectorization capabilities of the NumPy and Numexpr packages to ensure really short indexing and search times.
By using OPSI it is possible to complete a query on a table in the order of ten thousand million (10,000,000,000) rows in up to a few hundredths of a second (i.e. a little more than the time taken by one single disk seek). Moreover, it can index columns in times typically 10x shorter than other databases, while reducing the size of indexes between 3x and 15x (depending of the desired optimization level).
The main reason behind this incredible performance and compactness is that OPSI is geared towards tables that are mostly used for read-only or append-only purposes, and this is the scenario where it absolutely shines. If the user needs to update or delete frequently the values of indexed columns, then OPSI takes much more time than other solutions to keep its indexes synchronized. So, for situations that don't require fast updates or deletions, OPSI is probably one of the best indexing engines available.
For more information about the operational details and benchmarks of this innovative indexing engine, see the OPSI White Paper.
Improved cache implementation
PyTables Pro leverages a fine-tuned LRU cache coded in Pyrex for both metadata (nodes) and regular data that lets you achieve maximum speed for intensive object tree browsing and repetitive patterns for data reads and queries. It complements the already efficient cache present in HDF5 by discovering repetition patterns in high-level structures that are specific of PyTables Pro and that HDF5 cannot catch as efficiently.
Professional installers
An all-in-one PyTables Pro installer for Windows is provided so that the user only has to download and execute the auto-installer to get the job done. All the software pre-requisites (except the Python itself) are included in this package, reducing to the maximum the risk of installing wrong versions on the user side. Although all-in-one installers for other platforms are not available, PyTables Pro can be quickly deployed by using distutils anyway.
Meant for production
More than 50,000 carefully designed test units (in 2.1 version) check every detail and feature of PyTables Pro. Besides, for every new version of the product, all the tests are verified to successfully pass for the most common platforms (Windows, Mac OS X, Linux 32-bit, Linux 64-bit). In this way, you can relax and concentrate your efforts in resolving your own problems.
What is new in forthcoming PyTables Pro 2.1
For the release 2.1, a series of new and exciting improvements for OPSI have been worked out. This release is currently undergoing its final cycles and is scheduled for sometime during 2008. Among the main improvements you will find:
New light indexes that can take up to 4x less space than 2.0 indexes, and more than 15x less space than indexes in traditional databases. Four levels of index "lightness", namely ultralight, light, medium and full (the latter being the one that implemented the 2.0 version), are available so that the user will be able to choose the most appropriate for her needs.
The index query code has been completely revamped and it is based now on the concept of chunkmaps. This allows for a much more effective way to retrieve table data in queries that have low selectivity, while retaining good performance for high selectivity ones.
A new query optimizer being able to use several indexes simultaneously in a broad range of complex queries. For example, in the query:
(((c_int32 == 3) | (c_bool == True)) & (c_int32 == 5)) & (c_extra > 0)
if c_int32 and c_bool columns are indexed but c_extra is not, both c_int32 and c_bool indexes will be used. That will greatly enhance the response times of potentially complicated queries.Last but not least, an additional optimization in the index creation process permits to achieve completely sorted indexes, allowing not only to get better performance in queries, but also to create completely sorted tables ordered by a specific field.
Just to whet your appetite, click on the attached plots to see the kind of improvement that you can expect over the 2.0 release. In particular, note the high performance that can be achieved now in low selectivity scenarios (i.e. queries with a large number of hits). You can get more detailed information about these new powerful developments in the informal talk that I gave at the The HDF Group headquarters in Urbana-Champaign (Illinois) back in 2007.
Getting PyTables Pro
Please go to the pricing schema page for PyTables Pro and in case it fits your budget, follow the instructions there. Needless to say, by acquiring a PyTables Pro license, you are not only making FrancescAlted (the main responsible of the beast) happier but also reassuring the future of the PyTables project.
You can download the evaluation version if you want to check first that PyTables Pro actually meets your expectations. Please be sure to read the evaluation license before using this version.




