Structured storage classes

The Table class

class tables.Table(parentnode, name, description=None, title='', filters=None, expectedrows=None, chunkshape=None, byteorder=None, _log=True, track_times=True)[source]

This class represents heterogeneous datasets in an HDF5 file.

Tables are leaves (see the Leaf class in The Leaf class) whose data consists of a unidimensional sequence of rows, where each row contains one or more fields. Fields have an associated unique name and position, with the first field having position 0. All rows have the same fields, which are arranged in columns.

Fields can have any type supported by the Col class (see The Col class and its descendants) and its descendants, which support multidimensional data. Moreover, a field can be nested (to an arbitrary depth), meaning that it includes further fields inside. A field named x inside a nested field a in a table can be accessed as the field a/x (its path name) from the table.

The structure of a table is declared by its description, which is made available in the Table.description attribute (see Table).

This class provides new methods to read, write and search table data efficiently. It also provides special Python methods to allow accessing the table as a normal sequence or array (with extended slicing supported).

PyTables supports in-kernel searches working simultaneously on several columns using complex conditions. These are faster than selections using Python expressions. See the Table.where() method for more information on in-kernel searches.

Non-nested columns can be indexed. Searching an indexed column can be several times faster than searching a non-nested one. Search methods automatically take advantage of indexing where available.

When iterating a table, an object from the Row (see The Row class) class is used. This object allows to read and write data one row at a time, as well as to perform queries which are not supported by in-kernel syntax (at a much lower speed, of course).

Objects of this class support access to individual columns via natural naming through the Table.cols accessor. Nested columns are mapped to Cols instances, and non-nested ones to Column instances. See the Column class in The Column class for examples of this feature.

Parameters:
  • parentnode

    The parent Group object.

    Changed in version 3.0: Renamed from parentNode to parentnode.

  • name (str) – The name of this node in its parent group.

  • description – An IsDescription subclass or a dictionary where the keys are the field names, and the values the type definitions. In addition, a pure NumPy dtype is accepted. If None, the table metadata is read from disk, else, it’s taken from previous parameters.

  • title – Sets a TITLE attribute on the HDF5 table entity.

  • filters (Filters) – An instance of the Filters class that provides information about the desired I/O filters to be applied during the life of this object.

  • expectedrows – A user estimate about the number of rows that will be on table. If not provided, the default value is EXPECTED_ROWS_TABLE (see tables/parameters.py). If you plan to save bigger tables, try providing a guess; this will optimize the HDF5 B-Tree creation and management process time and memory used.

  • chunkshape – The shape of the data chunk to be read or written as a single HDF5 I/O operation. The filters are applied to those chunks of data. Its rank for tables has to be 1. If None, a sensible value is calculated based on the expectedrows parameter (which is recommended).

  • byteorder – The byteorder of the data on-disk, specified as ‘little’ or ‘big’. If this is not specified, the byteorder is that of the platform, unless you passed a recarray as the description, in which case the recarray byteorder will be chosen.

  • track_times

    Whether time data associated with the leaf are recorded (object access time, raw data modification time, metadata change time, object birth time); default True. Semantics of these times depend on their implementation in the HDF5 library: refer to documentation of the H5O_info_t data structure. As of HDF5 1.8.15, only ctime (metadata change time) is implemented.

    New in version 3.4.3.

Notes

The instance variables below are provided in addition to those in Leaf (see The Leaf class). Please note that there are several col* dictionaries to ease retrieving information about a column directly by its path name, avoiding the need to walk through Table.description or Table.cols.

Table attributes

coldescrs

Maps the name of a column to its Col description (see The Col class and its descendants).

coldflts

Maps the name of a column to its default value.

coldtypes

Maps the name of a column to its NumPy data type.

colindexed

Is the column which name is used as a key indexed?

colinstances

Maps the name of a column to its Column (see The Column class) or Cols (see The Cols class) instance.

colnames

A list containing the names of top-level columns in the table.

colpathnames

A list containing the pathnames of bottom-level columns in the table.

These are the leaf columns obtained when walking the table description left-to-right, bottom-first. Columns inside a nested column have slashes (/) separating name components in their pathname.

cols

A Cols instance that provides natural naming access to non-nested (Column, see The Column class) and nested (Cols, see The Cols class) columns.

coltypes

Maps the name of a column to its PyTables data type.

description

A Description instance (see The Description class) reflecting the structure of the table.

extdim

The index of the enlargeable dimension (always 0 for tables).

indexed

Does this table have any indexed columns?

nrows

The current number of rows in the table.

Table properties

Table.autoindex

Automatically keep column indexes up to date?

Setting this value states whether existing indexes should be automatically updated after an append operation or recomputed after an index-invalidating operation (i.e. removal and modification of rows). The default is true.

This value gets into effect whenever a column is altered. If you don’t have automatic indexing activated and you want to do an an immediate update use Table.flush_rows_to_index(); for an immediate reindexing of invalidated indexes, use Table.reindex_dirty().

This value is persistent.

Changed in version 3.0: The autoIndex property has been renamed into autoindex.

Table.colindexes

A dictionary with the indexes of the indexed columns.

Table.indexedcolpathnames

List of pathnames of indexed columns in the table.

Table.row

The associated Row instance (see The Row class).

Table.rowsize

The size in bytes of each row in the table.

Table methods - reading

Table.col(name)[source]

Get a column from the table.

If a column called name exists in the table, it is read and returned as a NumPy object. If it does not exist, a KeyError is raised.

Examples

narray = table.col('var2')

That statement is equivalent to:

narray = table.read(field='var2')

Here you can see how this method can be used as a shorthand for the Table.read() method.

Table.iterrows(start=None, stop=None, step=None)[source]

Iterate over the table using a Row instance.

If a range is not supplied, all the rows in the table are iterated upon - you can also use the Table.__iter__() special method for that purpose. If you want to iterate over a given range of rows in the table, you may use the start, stop and step parameters.

Warning

When in the middle of a table row iterator, you should not use methods that can change the number of rows in the table (like Table.append() or Table.remove_rows()) or unexpected errors will happen.

See also

tableextension.Row

the table row iterator and field accessor

Examples

result = [ row['var2'] for row in table.iterrows(step=5)
                                        if row['var1'] <= 20 ]

Changed in version 3.0: If the start parameter is provided and stop is None then the table is iterated from start to the last line. In PyTables < 3.0 only one element was returned.

Table.itersequence(sequence)[source]

Iterate over a sequence of row coordinates.

Table.itersorted(sortby, checkCSI=False, start=None, stop=None, step=None)[source]

Iterate table data following the order of the index of sortby column.

The sortby column must have associated a full index. If you want to ensure a fully sorted order, the index must be a CSI one. You may want to use the checkCSI argument in order to explicitly check for the existence of a CSI index.

The meaning of the start, stop and step arguments is the same as in Table.read().

Changed in version 3.0: If the start parameter is provided and stop is None then the table is iterated from start to the last line. In PyTables < 3.0 only one element was returned.

Table.read(start=None, stop=None, step=None, field=None, out=None)[source]

Get data in the table as a (record) array.

The start, stop and step parameters can be used to select only a range of rows in the table. Their meanings are the same as in the built-in Python slices.

If field is supplied only the named column will be selected. If the column is not nested, an array of the current flavor will be returned; if it is, a structured array will be used instead. If no field is specified, all the columns will be returned in a structured array of the current flavor.

Columns under a nested column can be specified in the field parameter by using a slash character (/) as a separator (e.g. ‘position/x’).

The out parameter may be used to specify a NumPy array to receive the output data. Note that the array must have the same size as the data selected with the other parameters. Note that the array’s datatype is not checked and no type casting is performed, so if it does not match the datatype on disk, the output will not be correct.

When specifying a single nested column with the field parameter, and supplying an output buffer with the out parameter, the output buffer must contain all columns in the table. The data in all columns will be read into the output buffer. However, only the specified nested column will be returned from the method call.

When data is read from disk in NumPy format, the output will be in the current system’s byteorder, regardless of how it is stored on disk. If the out parameter is specified, the output array also must be in the current system’s byteorder.

Changed in version 3.0: Added the out parameter. Also the start, stop and step parameters now behave like in slice.

Examples

Reading the entire table:

t.read()

Reading record n. 6:

t.read(6, 7)

Reading from record n. 6 to the end of the table:

t.read(6)
Table.read_coordinates(coords, field=None)[source]

Get a set of rows given their indexes as a (record) array.

This method works much like the Table.read() method, but it uses a sequence (coords) of row indexes to select the wanted columns, instead of a column range.

The selected rows are returned in an array or structured array of the current flavor.

Table.read_sorted(sortby, checkCSI=False, field=None, start=None, stop=None, step=None)[source]

Read table data following the order of the index of sortby column.

The sortby column must have associated a full index. If you want to ensure a fully sorted order, the index must be a CSI one. You may want to use the checkCSI argument in order to explicitly check for the existence of a CSI index.

If field is supplied only the named column will be selected. If the column is not nested, an array of the current flavor will be returned; if it is, a structured array will be used instead. If no field is specified, all the columns will be returned in a structured array of the current flavor.

The meaning of the start, stop and step arguments is the same as in Table.read().

Changed in version 3.0: The start, stop and step parameters now behave like in slice.

Table.__getitem__(key)[source]

Get a row or a range of rows from the table.

If key argument is an integer, the corresponding table row is returned as a record of the current flavor. If key is a slice, the range of rows determined by it is returned as a structured array of the current flavor.

In addition, NumPy-style point selections are supported. In particular, if key is a list of row coordinates, the set of rows determined by it is returned. Furthermore, if key is an array of boolean values, only the coordinates where key is True are returned. Note that for the latter to work it is necessary that key list would contain exactly as many rows as the table has.

Examples

record = table[4]
recarray = table[4:1000:2]
recarray = table[[4,1000]]   # only retrieves rows 4 and 1000
recarray = table[[True, False, ..., True]]

Those statements are equivalent to:

record = table.read(start=4)[0]
recarray = table.read(start=4, stop=1000, step=2)
recarray = table.read_coordinates([4,1000])
recarray = table.read_coordinates([True, False, ..., True])

Here, you can see how indexing can be used as a shorthand for the Table.read() and Table.read_coordinates() methods.

Table.__iter__()[source]

Iterate over the table using a Row instance.

This is equivalent to calling Table.iterrows() with default arguments, i.e. it iterates over all the rows in the table.

See also

tableextension.Row

the table row iterator and field accessor

Examples

result = [ row['var2'] for row in table if row['var1'] <= 20 ]

Which is equivalent to:

result = [ row['var2'] for row in table.iterrows()
                                        if row['var1'] <= 20 ]

Table methods - writing

Table.append(rows)[source]

Append a sequence of rows to the end of the table.

The rows argument may be any object which can be converted to a structured array compliant with the table structure (otherwise, a ValueError is raised). This includes NumPy structured arrays, lists of tuples or array records, and a string or Python buffer.

Examples

import tables as tb

class Particle(tb.IsDescription):
    name        = tb.StringCol(16, pos=1) # 16-character String
    lati        = tb.IntCol(pos=2)        # integer
    longi       = tb.IntCol(pos=3)        # integer
    pressure    = tb.Float32Col(pos=4)  # float  (single-precision)
    temperature = tb.FloatCol(pos=5)    # double (double-precision)

fileh = tb.open_file('test4.h5', mode='w')
table = fileh.create_table(fileh.root, 'table', Particle,
                           "A table")

# Append several rows in only one call
table.append([("Particle:     10", 10, 0, 10 * 10, 10**2),
              ("Particle:     11", 11, -1, 11 * 11, 11**2),
              ("Particle:     12", 12, -2, 12 * 12, 12**2)])
fileh.close()
Table.modify_column(start=None, stop=None, step=None, column=None, colname=None)[source]

Modify one single column in the row slice [start:stop:step].

The colname argument specifies the name of the column in the table to be modified with the data given in column. This method returns the number of rows modified. Should the modification exceed the length of the table, an IndexError is raised before changing data.

The column argument may be any object which can be converted to a (record) array compliant with the structure of the column to be modified (otherwise, a ValueError is raised). This includes NumPy (record) arrays, lists of scalars, tuples or array records, and a string or Python buffer.

Table.modify_columns(start=None, stop=None, step=None, columns=None, names=None)[source]

Modify a series of columns in the row slice [start:stop:step].

The names argument specifies the names of the columns in the table to be modified with the data given in columns. This method returns the number of rows modified. Should the modification exceed the length of the table, an IndexError is raised before changing data.

The columns argument may be any object which can be converted to a structured array compliant with the structure of the columns to be modified (otherwise, a ValueError is raised). This includes NumPy structured arrays, lists of tuples or array records, and a string or Python buffer.

Table.modify_coordinates(coords, rows)[source]

Modify a series of rows in positions specified in coords.

The values in the selected rows will be modified with the data given in rows. This method returns the number of rows modified.

The possible values for the rows argument are the same as in Table.append().

Table.modify_rows(start=None, stop=None, step=None, rows=None)[source]

Modify a series of rows in the slice [start:stop:step].

The values in the selected rows will be modified with the data given in rows. This method returns the number of rows modified. Should the modification exceed the length of the table, an IndexError is raised before changing data.

The possible values for the rows argument are the same as in Table.append().

Table.remove_rows(start=None, stop=None, step=None)[source]

Remove a range of rows in the table.

If only start is supplied, that row and all following will be deleted. If a range is supplied, i.e. both the start and stop parameters are passed, all the rows in the range are removed.

Changed in version 3.0: The start, stop and step parameters now behave like in slice.

See also

remove_row()

Parameters:
  • start (int) – Sets the starting row to be removed. It accepts negative values meaning that the count starts from the end. A value of 0 means the first row.

  • stop (int) – Sets the last row to be removed to stop-1, i.e. the end point is omitted (in the Python range() tradition). Negative values are also accepted. If None all rows after start will be removed.

  • step (int) –

    The step size between rows to remove.

    New in version 3.0.

Examples

Removing rows from 5 to 10 (excluded):

t.remove_rows(5, 10)

Removing all rows starting from the 10th:

t.remove_rows(10)

Removing the 6th row:

t.remove_rows(6, 7)

Note

removing a single row can be done using the specific remove_row() method.

Table.remove_row(n)[source]

Removes a row from the table.

Parameters:

n (int) – The index of the row to remove.

New in version 3.0.

Examples

Remove row 15:

table.remove_row(15)

Which is equivalent to:

table.remove_rows(15, 16)

Warning

This is not equivalent to:

table.remove_rows(15)
Table.__setitem__(key, value)[source]

Set a row or a range of rows in the table.

It takes different actions depending on the type of the key parameter: if it is an integer, the corresponding table row is set to value (a record or sequence capable of being converted to the table structure). If key is a slice, the row slice determined by it is set to value (a record array or sequence capable of being converted to the table structure).

In addition, NumPy-style point selections are supported. In particular, if key is a list of row coordinates, the set of rows determined by it is set to value. Furthermore, if key is an array of boolean values, only the coordinates where key is True are set to values from value. Note that for the latter to work it is necessary that key list would contain exactly as many rows as the table has.

Examples

# Modify just one existing row
table[2] = [456,'db2',1.2]

# Modify two existing rows
rows = np.rec.array(
    [[457,'db1',1.2],[6,'de2',1.3]], formats='i4,S3,f8'
)
table[1:30:2] = rows             # modify a table slice
table[[1,3]] = rows              # only modifies rows 1 and 3
table[[True,False,True]] = rows  # only modifies rows 0 and 2

Which is equivalent to:

table.modify_rows(start=2, rows=[456,'db2',1.2])
rows = np.rec.array(
    [[457,'db1',1.2],[6,'de2',1.3]], formats='i4,S3,f8'
)
table.modify_rows(start=1, stop=3, step=2, rows=rows)
table.modify_coordinates([1,3,2], rows)
table.modify_coordinates([True, False, True], rows)

Here, you can see how indexing can be used as a shorthand for the Table.modify_rows() and Table.modify_coordinates() methods.

Table methods - querying

Table.get_where_list(condition, condvars=None, sort=False, start=None, stop=None, step=None)[source]

Get the row coordinates fulfilling the given condition.

The coordinates are returned as a list of the current flavor. sort means that you want to retrieve the coordinates ordered. The default is to not sort them.

The meaning of the other arguments is the same as in the Table.where() method.

Table.read_where(condition, condvars=None, field=None, start=None, stop=None, step=None)[source]

Read table data fulfilling the given condition.

This method is similar to Table.read(), having their common arguments and return values the same meanings. However, only the rows fulfilling the condition are included in the result.

The meaning of the other arguments is the same as in the Table.where() method.

Table.where(condition, condvars=None, start=None, stop=None, step=None)[source]

Iterate over values fulfilling a condition.

This method returns a Row iterator (see The Row class) which only selects rows in the table that satisfy the given condition (an expression-like string).

The condvars mapping may be used to define the variable names appearing in the condition. condvars should consist of identifier-like strings pointing to Column (see The Column class) instances of this table, or to other values (which will be converted to arrays). A default set of condition variables is provided where each top-level, non-nested column with an identifier-like name appears. Variables in condvars override the default ones.

When condvars is not provided or None, the current local and global namespace is sought instead of condvars. The previous mechanism is mostly intended for interactive usage. To disable it, just specify a (maybe empty) mapping as condvars.

If a range is supplied (by setting some of the start, stop or step parameters), only the rows in that range and fulfilling the condition are used. The meaning of the start, stop and step parameters is the same as for Python slices.

When possible, indexed columns participating in the condition will be used to speed up the search. It is recommended that you place the indexed columns as left and out in the condition as possible. Anyway, this method has always better performance than regular Python selections on the table.

You can mix this method with regular Python selections in order to support even more complex queries. It is strongly recommended that you pass the most restrictive condition as the parameter to this method if you want to achieve maximum performance.

Warning

When in the middle of a table row iterator, you should not use methods that can change the number of rows in the table (like Table.append() or Table.remove_rows()) or unexpected errors will happen.

Examples

passvalues = [ row['col3'] for row in
               table.where('(col1 > 0) & (col2 <= 20)', step=5)
               if your_function(row['col2']) ]
print("Values that pass the cuts:", passvalues)

Note

A special care should be taken when the query condition includes string literals.

Let’s assume that the table table has the following structure:

class Record(IsDescription):
    col1 = StringCol(4)  # 4-character String of bytes
    col2 = IntCol()
    col3 = FloatCol()

The type of “col1” corresponds to strings of bytes.

Any condition involving “col1” should be written using the appropriate type for string literals in order to avoid TypeErrors.

The code below will fail with a TypeError:

condition = 'col1 == "AAAA"'
for record in table.where(condition):  # TypeError in Python3
    # do something with "record"

The reason is that in Python 3 “condition” implies a comparison between a string of bytes (“col1” contents) and a unicode literal (“AAAA”).

The correct way to write the condition is:

condition = 'col1 == b"AAAA"'

Changed in version 3.0: The start, stop and step parameters now behave like in slice.

Table.append_where(dstTable, condition=None, condvars=None, start=None, stop=None, step=None)[source]

Append rows fulfilling the condition to the dstTable table.

dstTable must be capable of taking the rows resulting from the query, i.e. it must have columns with the expected names and compatible types. The meaning of the other arguments is the same as in the Table.where() method.

The number of rows appended to dstTable is returned as a result.

Changed in version 3.0: The whereAppend method has been renamed into append_where.

Table.will_query_use_indexing(condition, condvars=None)[source]

Will a query for the condition use indexing?

The meaning of the condition and condvars arguments is the same as in the Table.where() method. If condition can use indexing, this method returns a frozenset with the path names of the columns whose index is usable. Otherwise, it returns an empty list.

This method is mainly intended for testing. Keep in mind that changing the set of indexed columns or their dirtiness may make this method return different values for the same arguments at different times.

Table methods - other

Table.copy(newparent=None, newname=None, overwrite=False, createparents=False, **kwargs)[source]

Copy this table and return the new one.

This method has the behavior and keywords described in Leaf.copy(). Moreover, it recognises the following additional keyword arguments.

Parameters:
  • sortby – If specified, and sortby corresponds to a column with an index, then the copy will be sorted by this index. If you want to ensure a fully sorted order, the index must be a CSI one. A reverse sorted copy can be achieved by specifying a negative value for the step keyword. If sortby is omitted or None, the original table order is used.

  • checkCSI – If true and a CSI index does not exist for the sortby column, an error will be raised. If false (the default), it does nothing. You can use this flag in order to explicitly check for the existence of a CSI index.

  • propindexes – If true, the existing indexes in the source table are propagated (created) to the new one. If false (the default), the indexes are not propagated.

Table.flush_rows_to_index(_lastrow=True)[source]

Add remaining rows in buffers to non-dirty indexes.

This can be useful when you have chosen non-automatic indexing for the table (see the Table.autoindex property in Table) and you want to update the indexes on it.

Table.get_enum(colname)[source]

Get the enumerated type associated with the named column.

If the column named colname (a string) exists and is of an enumerated type, the corresponding Enum instance (see The Enum class) is returned. If it is not of an enumerated type, a TypeError is raised. If the column does not exist, a KeyError is raised.

Table.reindex()[source]

Recompute all the existing indexes in the table.

This can be useful when you suspect that, for any reason, the index information for columns is no longer valid and want to rebuild the indexes on it.

Table.reindex_dirty()[source]

Recompute the existing indexes in table, if they are dirty.

This can be useful when you have set Table.autoindex (see Table) to false for the table and you want to update the indexes after a invalidating index operation (Table.remove_rows(), for example).

The Description class

class tables.Description(classdict, nestedlvl=-1, validate=True, ptparams=None)[source]

This class represents descriptions of the structure of tables.

An instance of this class is automatically bound to Table (see The Table class) objects when they are created. It provides a browseable representation of the structure of the table, made of non-nested (Col - see The Col class and its descendants) and nested (Description) columns.

Column definitions under a description can be accessed as attributes of it (natural naming). For instance, if table.description is a Description instance with a column named col1 under it, the later can be accessed as table.description.col1. If col1 is nested and contains a col2 column, this can be accessed as table.description.col1.col2. Because of natural naming, the names of members start with special prefixes, like in the Group class (see The Group class).

Description attributes

_v_colobjects

A dictionary mapping the names of the columns hanging directly from the associated table or nested column to their respective descriptions (Col - see The Col class and its descendants or Description - see The Description class instances).

Changed in version 3.0: The _v_colObjects attribute has been renamed into _v_colobjects.

_v_dflts

A dictionary mapping the names of non-nested columns hanging directly from the associated table or nested column to their respective default values.

_v_dtype

The NumPy type which reflects the structure of this table or nested column. You can use this as the dtype argument of NumPy array factories.

_v_dtypes

A dictionary mapping the names of non-nested columns hanging directly from the associated table or nested column to their respective NumPy types.

_v_is_nested

Whether the associated table or nested column contains further nested columns or not.

_v_itemsize

The size in bytes of an item in this table or nested column.

_v_name

The name of this description group. The name of the root group is ‘/’.

_v_names

A list of the names of the columns hanging directly from the associated table or nested column. The order of the names matches the order of their respective columns in the containing table.

_v_nested_descr

A nested list of pairs of (name, format) tuples for all the columns under this table or nested column. You can use this as the dtype and descr arguments of NumPy array factories.

Changed in version 3.0: The _v_nestedDescr attribute has been renamed into _v_nested_descr.

_v_nested_formats

A nested list of the NumPy string formats (and shapes) of all the columns under this table or nested column. You can use this as the formats argument of NumPy array factories.

Changed in version 3.0: The _v_nestedFormats attribute has been renamed into _v_nested_formats.

_v_nestedlvl

The level of the associated table or nested column in the nested datatype.

_v_nested_names

A nested list of the names of all the columns under this table or nested column. You can use this as the names argument of NumPy array factories.

Changed in version 3.0: The _v_nestedNames attribute has been renamed into _v_nested_names.

_v_pathname

Pathname of the table or nested column.

_v_pathnames

A list of the pathnames of all the columns under this table or nested column (in preorder). If it does not contain nested columns, this is exactly the same as the Description._v_names attribute.

_v_types

A dictionary mapping the names of non-nested columns hanging directly from the associated table or nested column to their respective PyTables types.

_v_offsets

A list of offsets for all the columns. If the list is empty, means that there are no padding in the data structure. However, the support for offsets is currently limited to flat tables; for nested tables, the potential padding is always removed (exactly the same as in pre-3.5 versions), and this variable is set to empty.

New in version 3.5: Previous to this version all the compound types were converted internally to ‘packed’ types, i.e. with no padding between the component types. Starting with 3.5, the holes in native HDF5 types (non-nested) are honored and replicated during dataset and attribute copies.

Description methods

Description._f_walk(type='All')[source]

Iterate over nested columns.

If type is ‘All’ (the default), all column description objects (Col and Description instances) are yielded in top-to-bottom order (preorder).

If type is ‘Col’ or ‘Description’, only column descriptions of that type are yielded.

The Row class

class tables.tableextension.Row

Table row iterator and field accessor.

Instances of this class are used to fetch and set the values of individual table fields. It works very much like a dictionary, where keys are the pathnames or positions (extended slicing is supported) of the fields in the associated table in a specific row.

This class provides an iterator interface so that you can use the same Row instance to access successive table rows one after the other. There are also some important methods that are useful for accessing, adding and modifying values in tables.

Row attributes

nrow

The current row number.

This property is useful for knowing which row is being dealt with in the middle of a loop or iterator.

Row methods

Row.append()

Add a new row of data to the end of the dataset.

Once you have filled the proper fields for the current row, calling this method actually appends the new data to the output buffer (which will eventually be dumped to disk). If you have not set the value of a field, the default value of the column will be used.

Warning

After completion of the loop in which Row.append() has been called, it is always convenient to make a call to Table.flush() in order to avoid losing the last rows that may still remain in internal buffers.

Examples

row = table.row
for i in xrange(nrows):
    row['col1'] = i-1
    row['col2'] = 'a'
    row['col3'] = -1.0
    row.append()
table.flush()
Row.fetch_all_fields()

Retrieve all the fields in the current row.

Contrarily to row[:] (see Row special methods), this returns row data as a NumPy void scalar. For instance:

[row.fetch_all_fields() for row in table.where('col1 < 3')]

will select all the rows that fulfill the given condition as a list of NumPy records.

Row.update()

Change the data of the current row in the dataset.

This method allows you to modify values in a table when you are in the middle of a table iterator like Table.iterrows() or Table.where().

Once you have filled the proper fields for the current row, calling this method actually changes data in the output buffer (which will eventually be dumped to disk). If you have not set the value of a field, its original value will be used.

Warning

After completion of the loop in which Row.update() has been called, it is always convenient to make a call to Table.flush() in order to avoid losing changed rows that may still remain in internal buffers.

Examples

for row in table.iterrows(step=10):
    row['col1'] = row.nrow
    row['col2'] = 'b'
    row['col3'] = 0.0
    row.update()
table.flush()

which modifies every tenth row in table. Or:

for row in table.where('col1 > 3'):
    row['col1'] = row.nrow
    row['col2'] = 'b'
    row['col3'] = 0.0
    row.update()
table.flush()

which just updates the rows with values bigger than 3 in the first column.

Row special methods

Row.__contains__(item)

A true value is returned if item is found in current row, false otherwise.

Row.__getitem__(key)

Get the row field specified by the key.

The key can be a string (the name of the field), an integer (the position of the field) or a slice (the range of field positions). When key is a slice, the returned value is a tuple containing the values of the specified fields.

Examples

res = [row['var3'] for row in table.where('var2 < 20')]

which selects the var3 field for all the rows that fulfil the condition. Or:

res = [row[4] for row in table if row[1] < 20]

which selects the field in the 4th position for all the rows that fulfil the condition. Or:

res = [row[:] for row in table if row['var2'] < 20]

which selects the all the fields (in the form of a tuple) for all the rows that fulfil the condition. Or:

res = [row[1::2] for row in table.iterrows(2, 3000, 3)]

which selects all the fields in even positions (in the form of a tuple) for all the rows in the slice [2:3000:3].

Row.__setitem__(key, value)

Set the key row field to the specified value.

Differently from its __getitem__() counterpart, in this case key can only be a string (the name of the field). The changes done via __setitem__() will not take effect on the data on disk until any of the Row.append() or Row.update() methods are called.

Examples

for row in table.iterrows(step=10):
    row['col1'] = row.nrow
    row['col2'] = 'b'
    row['col3'] = 0.0
    row.update()
table.flush()

which modifies every tenth row in the table.

The Cols class

class tables.Cols(table, desc)[source]

Container for columns in a table or nested column.

This class is used as an accessor to the columns in a table or nested column. It supports the natural naming convention, so that you can access the different columns as attributes which lead to Column instances (for non-nested columns) or other Cols instances (for nested columns).

For instance, if table.cols is a Cols instance with a column named col1 under it, the later can be accessed as table.cols.col1. If col1 is nested and contains a col2 column, this can be accessed as table.cols.col1.col2 and so on. Because of natural naming, the names of members start with special prefixes, like in the Group class (see The Group class).

Like the Column class (see The Column class), Cols supports item access to read and write ranges of values in the table or nested column.

Cols attributes

_v_colnames

A list of the names of the columns hanging directly from the associated table or nested column. The order of the names matches the order of their respective columns in the containing table.

_v_colpathnames

A list of the pathnames of all the columns under the associated table or nested column (in preorder). If it does not contain nested columns, this is exactly the same as the Cols._v_colnames attribute.

_v_desc

The associated Description instance (see The Description class).

Cols properties

Cols._v_table

The parent Table instance (see The Table class).

Cols methods

Cols._f_col(colname)[source]

Get an accessor to the column colname.

This method returns a Column instance (see The Column class) if the requested column is not nested, and a Cols instance (see The Cols class) if it is. You may use full column pathnames in colname.

Calling cols._f_col(‘col1/col2’) is equivalent to using cols.col1.col2. However, the first syntax is more intended for programmatic use. It is also better if you want to access columns with names that are not valid Python identifiers.

Cols.__getitem__(key)[source]

Get a row or a range of rows from a table or nested column.

If key argument is an integer, the corresponding nested type row is returned as a record of the current flavor. If key is a slice, the range of rows determined by it is returned as a structured array of the current flavor.

Examples

record = table.cols[4]  # equivalent to table[4]
recarray = table.cols.Info[4:1000:2]

Those statements are equivalent to:

nrecord = table.read(start=4)[0]
nrecarray = table.read(start=4, stop=1000, step=2).field('Info')

Here you can see how a mix of natural naming, indexing and slicing can be used as shorthands for the Table.read() method.

Cols.__len__()[source]

Get the number of top level columns in table.

Cols.__setitem__(key, value)[source]

Set a row or a range of rows in a table or nested column.

If key argument is an integer, the corresponding row is set to value. If key is a slice, the range of rows determined by it is set to value.

Examples

table.cols[4] = record
table.cols.Info[4:1000:2] = recarray

Those statements are equivalent to:

table.modify_rows(4, rows=record)
table.modify_column(4, 1000, 2, colname='Info', column=recarray)

Here you can see how a mix of natural naming, indexing and slicing can be used as shorthands for the Table.modify_rows() and Table.modify_column() methods.

The Column class

class tables.Column(table, name, descr)[source]

Accessor for a non-nested column in a table.

Each instance of this class is associated with one non-nested column of a table. These instances are mainly used to read and write data from the table columns using item access (like the Cols class - see The Cols class), but there are a few other associated methods to deal with indexes.

Column attributes

descr

The Description (see The Description class) instance of the parent table or nested column.

name

The name of the associated column.

pathname

The complete pathname of the associated column (the same as Column.name if the column is not inside a nested column).

attrs

Column attributes (see The Col class and its descendants).

Parameters:
  • table – The parent table instance

  • name – The name of the column that is associated with this object

  • descr – The parent description object

Column instance variables

Column.dtype

The NumPy dtype that most closely matches this column.

Column.index

The Index instance (see The Index class) associated with this column (None if the column is not indexed).

Column.is_indexed

True if the column is indexed, false otherwise.

Column.maindim

“The dimension along which iterators work. Its value is 0 (i.e. the first dimension).

Column.shape

The shape of this column.

Column.table

The parent Table instance (see The Table class).

Column.type

The PyTables type of the column (a string).

Column methods

Column.create_index(optlevel=6, kind='medium', filters=None, tmp_dir=None, _blocksizes=None, _testmode=False, _verbose=False)[source]

Create an index for this column.

Warning

In some situations it is useful to get a completely sorted index (CSI). For those cases, it is best to use the Column.create_csindex() method instead.

Parameters:
  • optlevel (int) – The optimization level for building the index. The levels ranges from 0 (no optimization) up to 9 (maximum optimization). Higher levels of optimization mean better chances for reducing the entropy of the index at the price of using more CPU, memory and I/O resources for creating the index.

  • kind (str) –

    The kind of the index to be built. It can take the ‘ultralight’, ‘light’, ‘medium’ or ‘full’ values. Lighter kinds (‘ultralight’ and ‘light’) mean that the index takes less space on disk, but will perform queries slower. Heavier kinds (‘medium’ and ‘full’) mean better chances for reducing the entropy of the index (increasing the query speed) at the price of using more disk space as well as more CPU, memory and I/O resources for creating the index.

    Note that selecting a full kind with an optlevel of 9 (the maximum) guarantees the creation of an index with zero entropy, that is, a completely sorted index (CSI) - provided that the number of rows in the table does not exceed the 2**48 figure (that is more than 100 trillions of rows). See Column.create_csindex() method for a more direct way to create a CSI index.

  • filters (Filters) – Specify the Filters instance used to compress the index. If None, default index filters will be used (currently, zlib level 1 with shuffling).

  • tmp_dir – When kind is other than ‘ultralight’, a temporary file is created during the index build process. You can use the tmp_dir argument to specify the directory for this temporary file. The default is to create it in the same directory as the file containing the original table.

Column.create_csindex(filters=None, tmp_dir=None, _blocksizes=None, _testmode=False, _verbose=False)[source]

Create a completely sorted index (CSI) for this column.

This method guarantees the creation of an index with zero entropy, that is, a completely sorted index (CSI) – provided that the number of rows in the table does not exceed the 2**48 figure (that is more than 100 trillions of rows). A CSI index is needed for some table methods (like Table.itersorted() or Table.read_sorted()) in order to ensure completely sorted results.

For the meaning of filters and tmp_dir arguments see Column.create_index().

Notes

This method is equivalent to Column.create_index(optlevel=9, kind=’full’, …).

Column.reindex()[source]

Recompute the index associated with this column.

This can be useful when you suspect that, for any reason, the index information is no longer valid and you want to rebuild it.

This method does nothing if the column is not indexed.

Column.reindex_dirty()[source]

Recompute the associated index only if it is dirty.

This can be useful when you have set Table.autoindex to false for the table and you want to update the column’s index after an invalidating index operation (like Table.remove_rows()).

This method does nothing if the column is not indexed.

Column.remove_index()[source]

Remove the index associated with this column.

This method does nothing if the column is not indexed. The removed index can be created again by calling the Column.create_index() method.

Column special methods

Column.__getitem__(key)[source]

Get a row or a range of rows from a column.

If key argument is an integer, the corresponding element in the column is returned as an object of the current flavor. If key is a slice, the range of elements determined by it is returned as an array of the current flavor.

Examples

print("Column handlers:")
for name in table.colnames:
    print(table.cols._f_col(name))
    print("Select table.cols.name[1]-->", table.cols.name[1])
    print("Select table.cols.name[1:2]-->", table.cols.name[1:2])
    print("Select table.cols.name[:]-->", table.cols.name[:])
    print("Select table.cols._f_col('name')[:]-->",
                                    table.cols._f_col('name')[:])

The output of this for a certain arbitrary table is:

Column handlers:
/table.cols.name (Column(), string, idx=None)
/table.cols.lati (Column(), int32, idx=None)
/table.cols.longi (Column(), int32, idx=None)
/table.cols.vector (Column(2,), int32, idx=None)
/table.cols.matrix2D (Column(2, 2), float64, idx=None)
Select table.cols.name[1]--> Particle:     11
Select table.cols.name[1:2]--> ['Particle:     11']
Select table.cols.name[:]--> ['Particle:     10'
 'Particle:     11' 'Particle:     12'
 'Particle:     13' 'Particle:     14']
Select table.cols._f_col('name')[:]--> ['Particle:     10'
 'Particle:     11' 'Particle:     12'
 'Particle:     13' 'Particle:     14']

See the examples/table2.py file for a more complete example.

Column.__len__()[source]

Get the number of elements in the column.

This matches the length in rows of the parent table.

Column.__setitem__(key, value)[source]

Set a row or a range of rows in a column.

If key argument is an integer, the corresponding element is set to value. If key is a slice, the range of elements determined by it is set to value.

Examples

# Modify row 1
table.cols.col1[1] = -1

# Modify rows 1 and 3
table.cols.col1[1::2] = [2,3]

Which is equivalent to:

# Modify row 1
table.modify_columns(start=1, columns=[[-1]], names=['col1'])

# Modify rows 1 and 3
columns = np.rec.fromarrays([[2,3]], formats='i4')
table.modify_columns(start=1, step=2, columns=columns,
                     names=['col1'])