Changes from 3.1.0 to 3.1.1

Bugs fixed

  • Fixed a critical bug that caused an exception at import time. The error was triggered when a bug in long-double detection is detected in the HDF5 library (see gh-275) and numpy does not expose float96 or float128. Closes gh-344.

  • The internal Blosc library has been updated to version 1.3.5. This fixes a false buffer overrun condition that made c-blosc to fail, even if the problem was not real.

Improvements

Changes from 3.0 to 3.1.0

New features

  • Now PyTables is able to save/restore the default value of EnumAtom types (closes gh-234).

  • Implemented support for the H5FD_SPLIT driver (closes gh-288, gh-289 and gh-295). Many thanks to simleo.

  • New quantization filter: the filter truncates floating point data to a specified precision before writing to disk. This can significantly improve the performance of compressors (closes gh-261). Thanks to Andreas Hilboll.

  • Added new VLArray.get_row_size() method to VLArray for querying the number of atoms of a VLArray row. Closes gh-24 and gh-315.

  • The internal Blosc library has been updated to version 1.3.2. All new features introduced in the Blosc 1.3.x series, and in particular the ability to leverage different compressors within Blosc (see the Blosc Release Notes), are now available in PyTables via the blosc filter (closes: gh-324). A big thank you to Francesc.

Improvements

  • The node caching mechanism has been completely redesigned to be simpler and less dependent from specific behaviours of the __del__ method. Now PyTables is compatible with the forthcoming Python 3.4. Closes gh-306.

  • PyTables no longer uses shared/cached file handlers. This change somewhat improves support for concurrent reading allowing the user to safely open the same file in different threads for reading (requires HDF5 >= 1.8.7). More details about this change can be found in the Backward incompatible changes section. See also gh-130, gh-129 gh-292 and gh-216.

  • PyTables is now able to detect and use external installations of the Blosc library (closes gh-104). If Blosc is not found in the system, and the user do not specify a custom installation directory, then it is used an internal copy of the Blosc source code.

  • Automatically disable extended float support if a buggy version of HDF5 is detected (see also Issues with H5T_NATIVE_LDOUBLE). See also gh-275, gh-290 and gh-300.

  • Documented an unexpected behaviour with string literals in query conditions on Python 3 (closes gh-265).

  • The deprecated getopt module has been dropped in favour of argparse in all command line utilities (close gh-251)

  • Improved the installation section of the PyTables User’s Guide.

  • Enabled Travis-CI builds for Python 3.3.

  • Tables.read_coordinates() now also works with boolean indices input. Closes gh-287 and gh-298.

  • Improved compatibility with numpy >= 1.8 (see gh-259).

  • The code of the benchmark programs (bench directory) has been updated. Closes gh-114.

  • Fixed some warning related to non-unicode file names (the Windows bytes API has been deprecated in Python 3.4).

Bugs fixed

  • Fixed detection of platforms supporting Blosc.

  • Fixed a crash that occurred when one attempts to write a numpy array to an Atom (closes gh-209 and gh-296).

  • Prevent creation of a table with no columns (closes gh-18 and gh-299).

  • Fixed a memory leak that occured when iterating over CArray/EArray objects (closes gh-308, see also gh-309). Many thanks to Alistair Muldal.

  • Make NaN types sort to the end. Closes gh-282 and gh-313.

  • Fixed selection on float columns when NaNs are present (closes gh-327 and gh-330).

  • Fix computation of the buffer size for iterations on rows. The buffers size was overestimated resulting in a MemoryError in some cases. Closes gh-316. Thamks to bbudescu.

  • Better check of file open mode. Closes gh-318.

  • The Blosc filter now works correctly together with fletcher32. Closes gh-21.

  • Close the file handle before trying to delete the corresponding file. Fixes a test failure on Windows.

  • Use integer division for computing indices (fixes some warning on Windows).

Deprecations

Following the plan for the complete transition to the new (PEP8 compliant) API, all calls to the old API will raise a DeprecationWarning.

The new API has been introduced in PyTables 3.0 and is backward incompatible. In order to guarantee a smoother transition the old API is still usable even if it is now deprecated.

The plan for the complete transition to the new API is outlined in gh-224.

Backward incompatible changes

In PyTables <= 3.0 file handles (objects that are returned by the open_file() function) were stored in an internal registry and re-used when possible.

Two subsequent attempts to open the same file (with compatible open mode) returned the same file handle in PyTables <= 3.0:

In [1]: import tables
In [2]: print(tables.__version__)
3.0.0
In [3]: a = tables.open_file('test.h5', 'a')
In [4]: b = tables.open_file('test.h5', 'a')
In [5]: a is b
Out[5]: True

All this is an implementation detail, it happened under the hood and the user had no control over the process.

This kind of behaviour was considered a feature since it can speed up opening of files in case of repeated opens and it also avoids any potential problem related to multiple opens, a practice that the HDF5 developers recommend to avoid (see also H5Fopen reference page).

The trick, of course, is that files are not opened multiple times at HDF5 level, rather an open file is referenced several times.

The big drawback of this approach is that there are really few chances to use PyTables safely in a multi thread program. Several bug reports have been filed regarding this topic.

After long discussions about the possibility to actually achieve concurrent I/O and about patterns that should be used for the I/O in concurrent programs PyTables developers decided to remove the black magic under the hood and allow the users to implement the patterns they want.

Starting from PyTables 3.1 file handles are no more re-used (shared) and each call to the open_file() function returns a new file handle:

In [1]: import tables
In [2]: print tables.__version__
3.1.0
In [3]: a = tables.open_file('test.h5', 'a')
In [4]: b = tables.open_file('test.h5', 'a')
In [5]: a is b
Out[5]: False

It is important to stress that the new implementation still has an internal registry (implementation detail) and it is still not thread safe. Just now a smart enough developer should be able to use PyTables in a muti-thread program without too much headaches.

The new implementation behaves differently from the previous one, although the API has not been changed. Now users should pay more attention when they open a file multiple times (as recommended in the HDF5 reference ) and they should take care of using them in an appropriate way.

Please note that the File.open_count property was originally intended to keep track of the number of references to the same file handle. In PyTables >= 3.1, despite of the name, it maintains the same semantics, just now its value should never be higher that 1.

Note

HDF5 versions lower than 1.8.7 are not fully compatible with PyTables 3.1. A partial support to HDF5 < 1.8.7 is still provided but in that case multiple file opens are not allowed at all (even in read-only mode).