What is PyTables?

Example of hierarchically structured datasets.

PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. You can download PyTables and use it for free. You can access documentation, some online examples and presentations in the HowToUse section.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package (it also supports numarray and Numeric right out-of-the-box). It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Pyrex), makes it a fast, yet extremely easy to use tool for interactively dealing with, processing and searching very large amounts of data. One important feature of PyTables is that it optimizes memory and disk resources so that data takes much less space (specially if on-flight compression is used) than other solutions such as relational or object oriented databases.

You can have a look at the MainFeatures of PyTables. Also, find more info by reading the PyTables FAQ.

PyTables is developed, maintained and supported by Francesc Alted, with contributions from Ivan Vilata and the community.

Where can be PyTables used?

A view of some table objects.

PyTables can be used on any scenario where you need to save and retrieve large amounts of data and provide metadata (that is, data about actual data) for it. Whether you want to work with large datasets of (potentially multidimensional) data, save and structure your NumPy datasets or just to provide a categorized structure for some portions of your cluttered RDBMS, then give PyTables a try. It works well for storing data from data acquisition systems, sensors in geosciences, simulation software, network data monitoring systems or as a centralized repository for system logs, to name only a few possible uses.

However, it's important to emphasize the fact that PyTables is not designed to work as a relational database competitor, but rather as a teammate. For example, if you have very large tables in your existing relational database, then you can move those tables to PyTables so as to reduce the burden of your existing database while efficiently keeping those huge tables on-disk.

Finally, remember that PyTables is Open Source software, so you are free to adapt it to your own needs, and due to its liberal BSD license, you can include it in any software you like (even if it is commercial). For those users requiring extreme speed and an optimal usage of resources, please consider getting a license of PyTables Professional Edition, its commercial counterpart; by doing this you will be contributing to achieve a longish life for the project.

Design goals

PyTables has been designed to fulfill the next requirements:

  1. Allow to structure your data in a hierarchical way.

  2. Easy to use. It implements the NaturalNaming scheme for allowing convenient access to the data.

  3. All the cells in datasets can be multidimensional entities.

  4. Most of the I/O operations speed should be only limited by the underlying I/O subsystem.

  5. Enable the end user to save large datasets in a efficient way, i.e. each single byte of data on disk has to be represented by one byte plus a small fraction when loaded in memory.

PyTables (last edited 2008-09-05 17:52:14 by FrancescAlted)