Release notes for PyTables 3.0 series

Author:

PyTables Developers

Contact:

pytables@googlemail.com

Changes from 2.4 to 3.0

New features

  • Since this release PyTables provides full support to Python 3 (closes gh-188).

  • The entire code base is now more compliant with coding style guidelines describe in the PEP8 (closes gh-103 and gh-224). See API changes for more details.

  • Basic support for HDF5 drivers. Now it is possible to open/create an HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE drivers. Users can also set the main driver parameters (closes gh-166). Thanks to Michal Slonina.

  • Basic support for in-memory image files. An HDF5 file can be set from or copied into a memory buffer (thanks to Michal Slonina). This feature is only available if PyTables is built against HDF5 1.8.9 or newer. Closes gh-165 and gh-173.

  • New File.get_filesize() method for retrieving the HDF5 file size.

  • Implemented methods to get/set the user block size in a HDF5 file (closes gh-123)

  • Improved support for PyInstaller. Now it is easier to pack frozen applications that use the PyTables package (closes: gh-177). Thanks to Stuart Mentzer and Christoph Gohlke.

  • All read methods now have an optional out argument that allows to pass a pre-allocated array to store data (closes gh-192)

  • Added support for the floating point data types with extended precision (Float96, Float128, Complex192 and Complex256). This feature is only available if numpy provides it as well. Closes gh-51 and gh-214. Many thanks to Andrea Bedini.

  • Consistent create_xxx() signatures. Now it is possible to create all data sets Array, CArray, EArray, VLArray, and Table from existing Python objects (closes gh-61 and gh-249). See also the API changes section.

  • Complete rewrite of the nodes.filenode module. Now it is fully compliant with the interfaces defined in the standard io module. Only non-buffered binary I/O is supported currently. See also the API changes section. Closes gh-244.

  • New pt2to3 tool is provided to help users to port their applications to the new API (see API changes section).

Improvements

  • Improved runtime checks on dynamic loading of libraries: meaningful error messages are generated in case of failure. Also, now PyTables no more alters the system PATH. Closes gh-178 and gh-179 (thanks to Christoph Gohlke).

  • Improved list of search paths for libraries as suggested by Nicholaus Halecky (see gh-219).

  • Removed deprecated Cython include (.pxi) files. Contents of convtypetables.pxi have been moved in utilsextension.pyx. Closes gh-217.

  • The internal Blosc library has been upgraded to version 1.2.3.

  • Pre-load the bzip2 library on windows (closes gh-205)

  • The File.get_node() method now accepts unicode paths (closes gh-203)

  • Improved compatibility with Cython 0.19 (see gh-220 and gh-221)

  • Improved compatibility with numexpr 2.1 (see also gh-199 and gh-241)

  • Improved compatibility with development versions of numpy (see gh-193)

  • Packaging: since this release the standard tar-ball package no more includes the PDF version of the “PyTables User Guide”, so it is a little bit smaller now. The complete and pre-build version of the documentation both in HTML and PDF format is available on the file download area on SourceForge.net. Closes: gh-172.

  • Now PyTables also uses Travis-CI as continuous integration service. All branches and all pull requests are automatically tested with different Python versions. Closes gh-212.

Other changes

  • PyTables now requires Python 2.6 or newer.

  • Minimum supported version of Numexpr is now 2.0.

API changes

The entire PyTables API as been made more PEP8 compliant (see gh-224).

This means that many methods, attributes, module global variables and also keyword parameters have been renamed to be compliant with PEP8 style guidelines (e.g. the tables.hdf5Version constant has been renamed into tables.hdf5_version).

We made the best effort to maintain compatibility to the old API for existing applications. In most cases, the old 2.x API is still available and usable even if it is now deprecated (see the Deprecations section).

The only important backwards incompatible API changes are for names of function/methods arguments. All uses of keyword arguments should be checked and fixed to use the new naming convention.

The new pt2to3 tool can be used to port PyTables based applications to the new API.

Many deprecated features and support for obsolete modules has been dropped:

  • The deprecated is_pro module constant has been removed

  • The nra module and support for the obsolete numarray module has been removed. The numarray flavor is no more supported as well (closes gh-107).

  • Support for the obsolete Numeric module has been removed. The numeric flavor is no longer available (closes gh-108).

  • The tables.netcdf3 module has been removed (closes gh-68).

  • The deprecated exceptions.Incompat16Warning exception has been removed

  • The File.create_external_link() method no longer has a keyword parameter named warn16incompat. It was deprecated in PyTables 2.4.

Moreover:

  • The File.create_array(), File.create_carray(), File.create_earray(), File.create_vlarray(), and File.create_table() methods of the File objects gained a new (optional) keyword argument named obj. It can be used to initialize the newly created dataset with an existing Python object, though normally these are numpy arrays.

    The atom/descriptor and shape parameters are now optional if the obj argument is provided.

  • The nodes.filenode has been completely rewritten to be fully compliant with the interfaces defined in the io module.

    The FileNode classes currently implemented are intended for binary I/O.

    Main changes:

    • the FileNode base class is no more available,

    • the new version of nodes.filenode.ROFileNode and nodes.filenode.RAFileNode objects no more expose the offset attribute (the seek and tell methods can be used instead),

    • the lineSeparator property is no more available and the \n character is always used as line separator.

  • The __version__ module constants has been removed from almost all the modules (it was not used after the switch to Git). Of course the package level constant (tables.__version__) still remains. Closes gh-112.

  • The lrange() has been dropped in favor of xrange (gh-181)

  • The parameters.MAX_THREADS configuration parameter has been dropped in favor of parameters.MAX_BLOSC_THREADS and parameters.MAX_NUMEXPR_THREADS (closes gh-147).

  • The conditions.compile_condition() function no more has a copycols argument, it was no more necessary since Numexpr 1.3.1. Closes gh-117.

  • The expectedsizeinMB parameter of the File.create_vlarray() and of the VLArrsy.__init__() methods has been replaced by expectedrows. See also (gh-35).

  • The Table.whereAppend() method has been renamed into Table.append_where() (closes gh-248).

Please refer to the Migrating from PyTables 2.x to 3.x document for more details about API changes and for some useful hint about the migration process from the 2.X API to the new one.

Other possibly incompatible changes

  • All methods of the Table class that take start, stop and step parameters (including Table.read(), Table.where(), Table.iterrows(), etc) have been redesigned to have a consistent behaviour. The meaning of the start, stop and step and their default values now always work exactly like in the standard slice objects. Closes gh-44 and gh-255.

  • Unicode attributes are not stored in the HDF5 file as pickled string. They are now saved on the HDF5 file as UTF-8 encoded strings.

    Although this does not introduce any API breakage, files produced are different (for unicode attributes) from the ones produced by earlier versions of PyTables.

  • System attributes are now stored in the HDF5 file using the character set that reflects the native string behaviour: ASCII for Python 2 and UTF8 for Python 3. In any case, system attributes are represented as Python string.

  • The iterrows() method of *Array and Table as well as the Table.itersorted() now behave like functions in the standard itertools module. If the start parameter is provided and stop is None then the array/table is iterated from start to the last line. In PyTables < 3.0 only one element was returned.

Deprecations

  • As described in API changes, all functions, methods and attribute names that was not compliant with the PEP8 guidelines have been changed. Old names are still available but they are deprecated.

  • The use of upper-case keyword arguments in the open_file() function and the File class initializer is now deprecated. All parameters defined in the tables/parameters.py module can still be passed as keyword argument to the open_file() function just using a lower-case version of the parameter name.

Bugs fixed

  • Better check access on closed files (closes gh-62)

  • Fix for File.renameNode() where in certain cases File._g_updateLocation() was wrongly called (closes gh-208). Thanks to Michka Popoff.

  • Fixed ptdump failure on data with nested columns (closes gh-213). Thanks to Alexander Ford.

  • Fixed an error in open_file() when filename is a numpy.str_ (closes gh-204)

  • Fixed gh-119, gh-230 and gh-232, where an index on Time64Col (only, Time32Col was ok) hides the data on selection from a Tables. Thanks to Jeff Reback.

  • Fixed tables.tests.test_nestedtypes.ColsTestCase.test_00a_repr test method. Now the repr of cols on big-endian platforms is correctly handled (closes gh-237).

  • Fixes bug with completely sorted indexes where nrowsinbuf must be equal to or greater than the chunksize (thanks to Thadeus Burgess). Closes gh-206 and gh-238.

  • Fixed an issue of the Table.itersorted() with reverse iteration (closes gh-252 and gh-253).

Enjoy data!

—The PyTables Developers