PyTables main features

PyTables has several characteristics that, when taken together, make it unique over other tools:

* Easy of use

You can save native data containers from the NumPy package in an straightforward manner. Some regular Python containers (mainly lists and tuples) are also supported in a transparent way.

* Supports a hierarchical data model
Allows the user to endow a clear structure to all his data.
* Natural naming support

PyTables builds up an object tree in memory that replicates the underlying file data structure. Access to the datasets is achieved by walking through and manipulating the attributes of the objects in the tree. See the NaturalNaming section for more info.

* Support for table entities
It allows you to create tables by merely define a simple class to describe the record fields. Then, you can save large amounts of rows and search them by using its extremely fast iterators. Also, you can tailor your data adding or deleting records in your tables.
* Indexing support for columns of tables
Very useful if you have large tables and you want to quickly look up for values in columns satisfying some criteria.
* Multidimensional and nested table cells
You can declare a column to consist of multidimensional cells as well as scalar cells, which is the only dimensionality allowed by most relational databases. You can even declare columns that are made of other columns (of different types), which is known as nested types.
* Flexible data containers

Not only tables (Table object), but also homogeneous data (arrays) containers are supported for greater flexibility. From classes for quick and dirty manipulation of datasets (Array) to others that allows compression (CArray), enlargeability (EArray) or that can have rows with variable length (VLArray), PyTables do offer the appropriate container for your storage needs.

* User defined metadata
Besides suporting system metadata (number of rows of a table, shape, flavor, ...) the user may specify its own metadata (as for example, room temperature, or protocol for IP traffic that was collected) that complement the meaning of his actual data.
* Unlimited datasets size
Allows working with tables and/or arrays with a very large number of rows (up to 2**63), i.e. that don't fit in memory.
* On-line data compression

It supports data compression (through the use of the zlib, Blosc, LZO and bzip2 libraries) out of the box. This become important when you have repetitive data patterns.

* High performance I/O
On modern systems, and for large amounts of data, tables and array objects can be read and written at a speed only limited by the performance of the underlying I/O subsystem (either disk or memory). Moreover, if your data is compressible, even this limit is surmountable!.
* Support of files bigger than 2 GB

So that you won't be limited if you want to deal with very large datasets. In fact, PyTables support full 64-bit file addressing even on 32-bit platforms (provided that the underlying filesystem does so too, of course).

* Architecture-independent

PyTables has been carefully coded (as HDF5 itself) with little-endian/big-endian byte orderings issues in mind . So, you can write a file in a big-endian machine (like a PowerPC, Sparc or MIPS) and read it in other little-endian (like Intel or Alpha) without problems.

* Portability

PyTables has been ported to many architectures, namely GNU/Linux, Windows, MacOSX, FreeBSD, Solaris, IRIX and probably works in many more. Moreover, it runs just fine also in 64 bit plaforms (like AMD64, Intel64, UltraSparc or MIPS RXX000 processors).

MainFeatures (last edited 2010-06-16 07:42:40 by FrancescAlted)