Structured storage classes

The Table class

This class represents heterogeneous datasets in an HDF5 file.

Tables are leaves (see the Leaf class in The Leaf class) whose data consists of a unidimensional sequence of rows, where each row contains one or more fields. Fields have an associated unique name and position, with the first field having position 0. All rows have the same fields, which are arranged in columns.

Fields can have any type supported by the Col class (see The Col class and its descendants) and its descendants, which support multidimensional data. Moreover, a field can be nested (to an arbitrary depth), meaning that it includes further fields inside. A field named x inside a nested field a in a table can be accessed as the field a/x (its path name) from the table.

The structure of a table is declared by its description, which is made available in the Table.description attribute (see Table).

This class provides new methods to read, write and search table data efficiently. It also provides special Python methods to allow accessing the table as a normal sequence or array (with extended slicing supported).

PyTables supports in-kernel searches working simultaneously on several columns using complex conditions. These are faster than selections using Python expressions. See the Table.where() method for more information on in-kernel searches.

Non-nested columns can be indexed. Searching an indexed column can be several times faster than searching a non-nested one. Search methods automatically take advantage of indexing where available.

When iterating a table, an object from the Row (see The Row class) class is used. This object allows to read and write data one row at a time, as well as to perform queries which are not supported by in-kernel syntax (at a much lower speed, of course).

Objects of this class support access to individual columns via natural naming through the Table.cols accessor. Nested columns are mapped to Cols instances, and non-nested ones to Column instances. See the Column class in The Column class for examples of this feature.

Parameters:

parentnode –
The parent Group object.

Changed in version 3.0: Renamed from parentNode to parentnode.
name (str) – The name of this node in its parent group.
description – An IsDescription subclass or a dictionary where the keys are the field names, and the values the type definitions. In addition, a pure NumPy dtype is accepted. If None, the table metadata is read from disk, else, it’s taken from previous parameters.
title – Sets a TITLE attribute on the HDF5 table entity.
filters (Filters) – An instance of the Filters class that provides information about the desired I/O filters to be applied during the life of this object.
expectedrows – A user estimate about the number of rows that will be on table. If not provided, the default value is EXPECTED_ROWS_TABLE (see tables/parameters.py). If you plan to save bigger tables, try providing a guess; this will optimize the HDF5 B-Tree creation and management process time and memory used.
chunkshape – The shape of the data chunk to be read or written as a single HDF5 I/O operation. The filters are applied to those chunks of data. Its rank for tables has to be 1. If None, a sensible value is calculated based on the expectedrows parameter (which is recommended).
byteorder – The byteorder of the data on-disk, specified as ‘little’ or ‘big’. If this is not specified, the byteorder is that of the platform, unless you passed a recarray as the description, in which case the recarray byteorder will be chosen.
track_times –
Whether time data associated with the leaf are recorded (object access time, raw data modification time, metadata change time, object birth time); default True. Semantics of these times depend on their implementation in the HDF5 library: refer to documentation of the H5O_info_t data structure. As of HDF5 1.8.15, only ctime (metadata change time) is implemented.

Added in version 3.4.3.

Notes

The instance variables below are provided in addition to those in Leaf (see The Leaf class). Please note that there are several col* dictionaries to ease retrieving information about a column directly by its path name, avoiding the need to walk through Table.description or Table.cols.

Table attributes

coldescrs: Maps the name of a column to its Col description (see The Col class and its descendants).

coldflts: Maps the name of a column to its default value.

coldtypes: Maps the name of a column to its NumPy data type.

colindexed: Is the column which name is used as a key indexed?

colinstances: Maps the name of a column to its Column (see The Column class) or Cols (see The Cols class) instance.

colnames: A list containing the names of top-level columns in the table.

colpathnames

A list containing the pathnames of bottom-level columns in the table.

These are the leaf columns obtained when walking the table description left-to-right, bottom-first. Columns inside a nested column have slashes (/) separating name components in their pathname.

cols: A Cols instance that provides natural naming access to non-nested (Column, see The Column class) and nested (Cols, see The Cols class) columns.

coltypes: Maps the name of a column to its PyTables data type.

description: A Description instance (see The Description class) reflecting the structure of the table.

extdim: The index of the enlargeable dimension (always 0 for tables).

indexed: Does this table have any indexed columns?

nrows: The current number of rows in the table.

Table properties

Table.autoindex

Is True if the Table automatically keep column indexes up to date.

Setting this value states whether existing indexes should be automatically updated after an append operation or recomputed after an index-invalidating operation (i.e. removal and modification of rows). The default is true.

This value gets into effect whenever a column is altered. If you don’t have automatic indexing activated and you want to do an immediate update use Table.flush_rows_to_index(); for an immediate reindexing of invalidated indexes, use Table.reindex_dirty().

This value is persistent.

Changed in version 3.0: The autoIndex property has been renamed into autoindex.

Table.colindexes: Return a dictionary with the indexes of the indexed columns.

Table.indexedcolpathnames: List of pathnames of indexed columns in the table.

Table.row: Row instance (see The Row class) associated to the Table.

Table.rowsize: Size in bytes of each row in the table.

Table methods - reading

Table.col(name: str) → ndarray[source]

Get a column from the table.

If a column called name exists in the table, it is read and returned as a NumPy object. If it does not exist, a KeyError is raised.

Examples

narray = table.col('var2')

That statement is equivalent to:

narray = table.read(field='var2')

Here you can see how this method can be used as a shorthand for the Table.read() method.

Table.iterrows(start: int | None = None, stop: int | None = None, step: int | None = None) → Iterator[Row][source]

Iterate over the table using a Row instance.

If a range is not supplied, all the rows in the table are iterated upon - you can also use the Table.__iter__() special method for that purpose. If you want to iterate over a given range of rows in the table, you may use the start, stop and step parameters.

Warning

When in the middle of a table row iterator, you should not use methods that can change the number of rows in the table (like Table.append() or Table.remove_rows()) or unexpected errors will happen.

Table methods - writing

Table.append(rows: list | ndarray) → None[source]

Append a sequence of rows to the end of the table.

The rows argument may be any object which can be converted to a structured array compliant with the table structure (otherwise, a ValueError is raised). This includes NumPy structured arrays, lists of tuples or array records, and a string or Python buffer.

Examples

import tables as tb

class Particle(tb.IsDescription):
    name        = tb.StringCol(16, pos=1) # 16-character String
    lati        = tb.IntCol(pos=2)        # integer
    longi       = tb.IntCol(pos=3)        # integer
    pressure    = tb.Float32Col(pos=4)  # float  (single-precision)
    temperature = tb.FloatCol(pos=5)    # double (double-precision)

fileh = tb.open_file('test4.h5', mode='w')
table = fileh.create_table(fileh.root, 'table', Particle,
                           "A table")

# Append several rows in only one call
table.append([("Particle:     10", 10, 0, 10 * 10, 10**2),
              ("Particle:     11", 11, -1, 11 * 11, 11**2),
              ("Particle:     12", 12, -2, 12 * 12, 12**2)])
fileh.close()

Modify one single column in the row slice [start:stop:step].

The colname argument specifies the name of the column in the table to be modified with the data given in column. This method returns the number of rows modified. Should the modification exceed the length of the table, an IndexError is raised before changing data.

The column argument may be any object which can be converted to a (record) array compliant with the structure of the column to be modified (otherwise, a ValueError is raised). This includes NumPy (record) arrays, lists of scalars, tuples or array records, and a string or Python buffer.

Modify a series of columns in the row slice [start:stop:step].

The names argument specifies the names of the columns in the table to be modified with the data given in columns. This method returns the number of rows modified. Should the modification exceed the length of the table, an IndexError is raised before changing data.

The columns argument may be any object which can be converted to a structured array compliant with the structure of the columns to be modified (otherwise, a ValueError is raised). This includes NumPy structured arrays, lists of tuples or array records, and a string or Python buffer.

Table.modify_coordinates(coords: list | tuple | ndarray, rows: Sequence) → int[source]

Modify a series of rows in positions specified in coords.

The values in the selected rows will be modified with the data given in rows. This method returns the number of rows modified.

The possible values for the rows argument are the same as in Table.append().

Table.modify_rows(start: int | None = None, stop: int | None = None, step: int | None = None, rows: Sequence | None = None) → int[source]

Modify a series of rows in the slice [start:stop:step].

The values in the selected rows will be modified with the data given in rows. This method returns the number of rows modified. Should the modification exceed the length of the table, an IndexError is raised before changing data.

The possible values for the rows argument are the same as in Table.append().

Table.remove_rows(start: int | None = None, stop: int | None = None, step: int | None = None) → int[source]

Remove a range of rows in the table.

If only start is supplied, that row and all following will be deleted. If a range is supplied, i.e. both the start and stop parameters are passed, all the rows in the range are removed.

Changed in version 3.0: The start, stop and step parameters now behave like in slice.

Table methods - querying

Get the row coordinates fulfilling the given condition.

The coordinates are returned as a list of the current flavor. sort means that you want to retrieve the coordinates ordered. The default is to not sort them.

The meaning of the other arguments is the same as in the Table.where() method.

Read table data fulfilling the given condition.

This method is similar to Table.read(), having their common arguments and return values the same meanings. However, only the rows fulfilling the condition are included in the result.

The meaning of the other arguments is the same as in the Table.where() method.

Iterate over values fulfilling a condition.

This method returns a Row iterator (see The Row class) which only selects rows in the table that satisfy the given condition (an expression-like string).

The condvars mapping may be used to define the variable names appearing in the condition. condvars should consist of identifier-like strings pointing to Column (see The Column class) instances of this table, or to other values (which will be converted to arrays). A default set of condition variables is provided where each top-level, non-nested column with an identifier-like name appears. Variables in condvars override the default ones.

When condvars is not provided or None, the current local and global namespace is sought instead of condvars. The previous mechanism is mostly intended for interactive usage. To disable it, just specify a (maybe empty) mapping as condvars.

If a range is supplied (by setting some of the start, stop or step parameters), only the rows in that range and fulfilling the condition are used. The meaning of the start, stop and step parameters is the same as for Python slices.

When possible, indexed columns participating in the condition will be used to speed up the search. It is recommended that you place the indexed columns as left and out in the condition as possible. Anyway, this method has always better performance than regular Python selections on the table.

You can mix this method with regular Python selections in order to support even more complex queries. It is strongly recommended that you pass the most restrictive condition as the parameter to this method if you want to achieve maximum performance.

Warning

When in the middle of a table row iterator, you should not use methods that can change the number of rows in the table (like Table.append() or Table.remove_rows()) or unexpected errors will happen.

Examples

passvalues = [ row['col3'] for row in
               table.where('(col1 > 0) & (col2 <= 20)', step=5)
               if your_function(row['col2']) ]
print("Values that pass the cuts:", passvalues)

Note

A special care should be taken when the query condition includes string literals.

Let’s assume that the table table has the following structure:

class Record(IsDescription):
    col1 = StringCol(4)  # 4-character String of bytes
    col2 = IntCol()
    col3 = FloatCol()

The type of “col1” corresponds to strings of bytes.

Any condition involving “col1” should be written using the appropriate type for string literals in order to avoid TypeErrors.

The code below will fail with a TypeError:

condition = 'col1 == "AAAA"'
for record in table.where(condition):  # TypeError in Python3
    # do something with "record"

The reason is that in Python 3 “condition” implies a comparison between a string of bytes (“col1” contents) and a unicode literal (“AAAA”).

The correct way to write the condition is:

condition = 'col1 == b"AAAA"'

Changed in version 3.0: The start, stop and step parameters now behave like in slice.

Append rows fulfilling the condition to the dstTable table.

dstTable must be capable of taking the rows resulting from the query, i.e. it must have columns with the expected names and compatible types. The meaning of the other arguments is the same as in the Table.where() method.

The number of rows appended to dstTable is returned as a result.

Changed in version 3.0: The whereAppend method has been renamed into append_where.

Table.will_query_use_indexing(condition: str, condvars: dict[str, Column | ndarray] | None = None) → frozenset[source]

Return True if the query for the condition will use indexing.

The meaning of the condition and condvars arguments is the same as in the Table.where() method. If condition can use indexing, this method returns a frozenset with the path names of the columns whose index is usable. Otherwise, it returns an empty list.

This method is mainly intended for testing. Keep in mind that changing the set of indexed columns or their dirtiness may make this method return different values for the same arguments at different times.

Table methods - other

Table.copy(newparent: Group | None = None, newname: str | None = None, overwrite: bool = False, createparents: bool = False, **kwargs) → Table[source]

Copy this table and return the new one.

This method has the behavior and keywords described in Leaf.copy(). Moreover, it recognises the following additional keyword arguments.

Parameters:

sortby – If specified, and sortby corresponds to a column with an index, then the copy will be sorted by this index. If you want to ensure a fully sorted order, the index must be a CSI one. A reverse sorted copy can be achieved by specifying a negative value for the step keyword. If sortby is omitted or None, the original table order is used.
checkCSI – If true and a CSI index does not exist for the sortby column, an error will be raised. If false (the default), it does nothing. You can use this flag in order to explicitly check for the existence of a CSI index.
propindexes – If true, the existing indexes in the source table are propagated (created) to the new one. If false (the default), the indexes are not propagated.

Table.flush_rows_to_index(_lastrow: bool = True) → int[source]

Add remaining rows in buffers to non-dirty indexes.

This can be useful when you have chosen non-automatic indexing for the table (see the Table.autoindex property in Table) and you want to update the indexes on it.

Table.get_enum(colname: str) → Enum[source]

Get the enumerated type associated with the named column.

If the column named colname (a string) exists and is of an enumerated type, the corresponding Enum instance (see The Enum class) is returned. If it is not of an enumerated type, a TypeError is raised. If the column does not exist, a KeyError is raised.

Table.reindex() → None[source]

Recompute all the existing indexes in the table.

This can be useful when you suspect that, for any reason, the index information for columns is no longer valid and want to rebuild the indexes on it.

Table.reindex_dirty() → None[source]

Recompute the existing indexes in table, if they are dirty.

This can be useful when you have set Table.autoindex (see Table) to false for the table and you want to update the indexes after an invalidating index operation (Table.remove_rows(), for example).

The Description class

class tables.Description(classdict: dict[str, Any], nestedlvl: int = -1, validate: bool = True, ptparams: dict[str, Any] | None = None)[source]

This class represents descriptions of the structure of tables.

An instance of this class is automatically bound to Table (see The Table class) objects when they are created. It provides a browseable representation of the structure of the table, made of non-nested (Col - see The Col class and its descendants) and nested (Description) columns.

Column definitions under a description can be accessed as attributes of it (natural naming). For instance, if table.description is a Description instance with a column named col1 under it, the later can be accessed as table.description.col1. If col1 is nested and contains a col2 column, this can be accessed as table.description.col1.col2. Because of natural naming, the names of members start with special prefixes, like in the Group class (see The Group class).

Description attributes

_v_colobjects: A dictionary mapping the names of the columns hanging directly from the associated table or nested column to their respective descriptions (Col - see The Col class and its descendants or Description - see The Description class instances).

Changed in version 3.0: The _v_colObjects attribute has been renamed into _v_colobjects.

_v_dflts: A dictionary mapping the names of non-nested columns hanging directly from the associated table or nested column to their respective default values.

_v_dtype: The NumPy type which reflects the structure of this table or nested column. You can use this as the dtype argument of NumPy array factories.

_v_dtypes: A dictionary mapping the names of non-nested columns hanging directly from the associated table or nested column to their respective NumPy types.

_v_is_nested: Whether the associated table or nested column contains further nested columns or not.

_v_itemsize: The size in bytes of an item in this table or nested column.

_v_name: The name of this description group. The name of the root group is ‘/’.

_v_names: A list of the names of the columns hanging directly from the associated table or nested column. The order of the names matches the order of their respective columns in the containing table.

_v_nested_descr: A nested list of pairs of (name, format) tuples for all the columns under this table or nested column. You can use this as the dtype and descr arguments of NumPy array factories.

Changed in version 3.0: The _v_nestedDescr attribute has been renamed into _v_nested_descr.

_v_nested_formats: A nested list of the NumPy string formats (and shapes) of all the columns under this table or nested column. You can use this as the formats argument of NumPy array factories.

Changed in version 3.0: The _v_nestedFormats attribute has been renamed into _v_nested_formats.

_v_nestedlvl: The level of the associated table or nested column in the nested datatype.

_v_nested_names: A nested list of the names of all the columns under this table or nested column. You can use this as the names argument of NumPy array factories.

Changed in version 3.0: The _v_nestedNames attribute has been renamed into _v_nested_names.

_v_pathname: Pathname of the table or nested column.

_v_pathnames: A list of the pathnames of all the columns under this table or nested column (in preorder). If it does not contain nested columns, this is exactly the same as the Description._v_names attribute.

_v_types: A dictionary mapping the names of non-nested columns hanging directly from the associated table or nested column to their respective PyTables types.

_v_offsets: A list of offsets for all the columns. If the list is empty, means that there are no padding in the data structure. However, the support for offsets is currently limited to flat tables; for nested tables, the potential padding is always removed (exactly the same as in pre-3.5 versions), and this variable is set to empty.

Added in version 3.5: Previous to this version all the compound types were converted internally to ‘packed’ types, i.e. with no padding between the component types. Starting with 3.5, the holes in native HDF5 types (non-nested) are honored and replicated during dataset and attribute copies.

Description methods

Description._f_walk(type: Literal['All', 'Col', 'Description'] = 'All') → Generator[Col | Description][source]

Iterate over nested columns.

If type is ‘All’ (the default), all column description objects (Col and Description instances) are yielded in top-to-bottom order (preorder).

If type is ‘Col’ or ‘Description’, only column descriptions of that type are yielded.

The Row class

class tables.tableextension.Row

Table row iterator and field accessor.

Instances of this class are used to fetch and set the values of individual table fields. It works very much like a dictionary, where keys are the pathnames or positions (extended slicing is supported) of the fields in the associated table in a specific row.

This class provides an iterator interface so that you can use the same Row instance to access successive table rows one after the other. There are also some important methods that are useful for accessing, adding and modifying values in tables.

Row attributes

nrow

The current row number.

This property is useful for knowing which row is being dealt with in the middle of a loop or iterator.

Row methods

Row.append()

Add a new row of data to the end of the dataset.

Once you have filled the proper fields for the current row, calling this method actually appends the new data to the output buffer (which will eventually be dumped to disk). If you have not set the value of a field, the default value of the column will be used.

Warning

After completion of the loop in which Row.append() has been called, it is always convenient to make a call to Table.flush() in order to avoid losing the last rows that may still remain in internal buffers.

Examples

row = table.row
for i in xrange(nrows):
    row['col1'] = i-1
    row['col2'] = 'a'
    row['col3'] = -1.0
    row.append()
table.flush()

Row.fetch_all_fields()

Retrieve all the fields in the current row.

Contrarily to row[:] (see Row special methods), this returns row data as a NumPy void scalar. For instance:

[row.fetch_all_fields() for row in table.where('col1 < 3')]

will select all the rows that fulfill the given condition as a list of NumPy records.

Row.update()

Change the data of the current row in the dataset.

This method allows you to modify values in a table when you are in the middle of a table iterator like Table.iterrows() or Table.where().

Once you have filled the proper fields for the current row, calling this method actually changes data in the output buffer (which will eventually be dumped to disk). If you have not set the value of a field, its original value will be used.

Warning

After completion of the loop in which Row.update() has been called, it is always convenient to make a call to Table.flush() in order to avoid losing changed rows that may still remain in internal buffers.

Examples

for row in table.iterrows(step=10):
    row['col1'] = row.nrow
    row['col2'] = 'b'
    row['col3'] = 0.0
    row.update()
table.flush()

which modifies every tenth row in table. Or:

for row in table.where('col1 > 3'):
    row['col1'] = row.nrow
    row['col2'] = 'b'
    row['col3'] = 0.0
    row.update()
table.flush()

which just updates the rows with values bigger than 3 in the first column.

Row special methods

Row.__contains__(key, /): Return bool(key in self).

Row.__getitem__(key, /): Return self[key].

Row.__setitem__(key, value, /): Set self[key] to value.

The Cols class

class tables.Cols(table: Table, desc: Description)[source]

Container for columns in a table or nested column.

This class is used as an accessor to the columns in a table or nested column. It supports the natural naming convention, so that you can access the different columns as attributes which lead to Column instances (for non-nested columns) or other Cols instances (for nested columns).

For instance, if table.cols is a Cols instance with a column named col1 under it, the later can be accessed as table.cols.col1. If col1 is nested and contains a col2 column, this can be accessed as table.cols.col1.col2 and so on. Because of natural naming, the names of members start with special prefixes, like in the Group class (see The Group class).

Like the Column class (see The Column class), Cols supports item access to read and write ranges of values in the table or nested column.

Cols attributes

_v_colnames: A list of the names of the columns hanging directly from the associated table or nested column. The order of the names matches the order of their respective columns in the containing table.

_v_colpathnames: A list of the pathnames of all the columns under the associated table or nested column (in preorder). If it does not contain nested columns, this is exactly the same as the Cols._v_colnames attribute.

_v_desc: The associated Description instance (see The Description class).

Cols properties

Cols._v_table: Return the parent Table instance (see The Table class).

Cols methods

Cols._f_col(colname: str) → Cols[source]

Get an accessor to the column colname.

This method returns a Column instance (see The Column class) if the requested column is not nested, and a Cols instance (see The Cols class) if it is. You may use full column pathnames in colname.

Calling cols._f_col(‘col1/col2’) is equivalent to using cols.col1.col2. However, the first syntax is more intended for programmatic use. It is also better if you want to access columns with names that are not valid Python identifiers.

Cols.__getitem__(key: int | slice) → Any[source]

Get a row or a range of rows from a table or nested column.

If key argument is an integer, the corresponding nested type row is returned as a record of the current flavor. If key is a slice, the range of rows determined by it is returned as a structured array of the current flavor.

Examples

record = table.cols[4]  # equivalent to table[4]
recarray = table.cols.Info[4:1000:2]

Those statements are equivalent to:

nrecord = table.read(start=4)[0]
nrecarray = table.read(start=4, stop=1000, step=2).field('Info')

Here you can see how a mix of natural naming, indexing and slicing can be used as shorthands for the Table.read() method.

Cols.__len__() → int[source]: Get the number of top level columns in table.

Cols.__setitem__(key: int | slice, value: Any) → None[source]

Set a row or a range of rows in a table or nested column.

If key argument is an integer, the corresponding row is set to value. If key is a slice, the range of rows determined by it is set to value.

Examples

table.cols[4] = record
table.cols.Info[4:1000:2] = recarray

Those statements are equivalent to:

table.modify_rows(4, rows=record)
table.modify_column(4, 1000, 2, colname='Info', column=recarray)

Here you can see how a mix of natural naming, indexing and slicing can be used as shorthands for the Table.modify_rows() and Table.modify_column() methods.

The Column class

class tables.Column(table: Table, name: str, descr: Description)[source]

Accessor for a non-nested column in a table.

Each instance of this class is associated with one non-nested column of a table. These instances are mainly used to read and write data from the table columns using item access (like the Cols class - see The Cols class), but there are a few other associated methods to deal with indexes.

Column attributes

descr: The Description (see The Description class) instance of the parent table or nested column.

name: The name of the associated column.

pathname: The complete pathname of the associated column (the same as Column.name if the column is not inside a nested column).

attrs: Column attributes (see The Col class and its descendants).

Parameters:

table – The parent table instance
name – The name of the column that is associated with this object
descr – The parent description object

Column instance variables

Column.dtype: Return the NumPy dtype that most closely matches this column.

Column.index

Return the Index instance associated with this column.

Return None if the column is not indexed.

See The Index class.

Column.is_indexed: Return True if the column is indexed, false otherwise.

Column.maindim

Return the dimension along which iterators work.

Its value is 0 (i.e. the first dimension).

Column.shape: Return the shape of this column.

Column.table: Return the parent Table instance (see The Table class).

Column.type: Return the PyTables type of the column (a string).

Column methods

Column.create_index(optlevel: int = 6, kind: str = 'medium', filters: Filters | None = None, tmp_dir: str | None = None, _blocksizes: tuple[int, int, int, int] | None = None, _testmode: bool = False, _verbose: bool = False) → int[source]

Create an index for this column.

Warning

In some situations it is useful to get a completely sorted index (CSI). For those cases, it is best to use the Column.create_csindex() method instead.

Parameters:

optlevel (int) – The optimization level for building the index. The levels range from 0 (no optimization) up to 9 (maximum optimization). Higher levels of optimization mean better chances for reducing the entropy of the index at the price of using more CPU, memory and I/O resources for creating the index.
kind (str) –
The kind of the index to be built. It can take the ‘ultralight’, ‘light’, ‘medium’ or ‘full’ values. Lighter kinds (‘ultralight’ and ‘light’) mean that the index takes less space on disk, but will perform queries slower. Heavier kinds (‘medium’ and ‘full’) mean better chances for reducing the entropy of the index (increasing the query speed) at the price of using more disk space as well as more CPU, memory and I/O resources for creating the index.

Note that selecting a full kind with an optlevel of 9 (the maximum) guarantees the creation of an index with zero entropy, that is, a completely sorted index (CSI) - provided that the number of rows in the table does not exceed the 2**48 figure (that is more than 100 trillions of rows). See Column.create_csindex() method for a more direct way to create a CSI index.
filters (Filters) – Specify the Filters instance used to compress the index. If None, default index filters will be used (currently, zlib level 1 with shuffling).
tmp_dir – When kind is other than ‘ultralight’, a temporary file is created during the index build process. You can use the tmp_dir argument to specify the directory for this temporary file. The default is to create it in the same directory as the file containing the original table.

Column.create_csindex(filters: Filters | None = None, tmp_dir: str | None = None, _blocksizes: tuple[int, int, int, int] | None = None, _testmode: bool = False, _verbose: bool = False) → int[source]

Create a completely sorted index (CSI) for this column.

This method guarantees the creation of an index with zero entropy, that is, a completely sorted index (CSI) – provided that the number of rows in the table does not exceed the 2**48 figure (that is more than 100 trillions of rows). A CSI index is needed for some table methods (like Table.itersorted() or Table.read_sorted()) in order to ensure completely sorted results.

For the meaning of filters and tmp_dir arguments see Column.create_index().

Notes

This method is equivalent to Column.create_index(optlevel=9, kind=’full’, …).

Column.reindex() → None[source]

Recompute the index associated with this column.

This can be useful when you suspect that, for any reason, the index information is no longer valid and you want to rebuild it.

This method does nothing if the column is not indexed.

Column.reindex_dirty() → None[source]

Recompute the associated index only if it is dirty.

This can be useful when you have set Table.autoindex to false for the table and you want to update the column’s index after an invalidating index operation (like Table.remove_rows()).

This method does nothing if the column is not indexed.

Column.remove_index() → None[source]

Remove the index associated with this column.

This method does nothing if the column is not indexed. The removed index can be created again by calling the Column.create_index() method.

Column special methods

Column.__getitem__(key: int | slice) → ndarray[source]

Get a row or a range of rows from a column.

If key argument is an integer, the corresponding element in the column is returned as an object of the current flavor. If key is a slice, the range of elements determined by it is returned as an array of the current flavor.

Examples

print("Column handlers:")
for name in table.colnames:
    print(table.cols._f_col(name))
    print("Select table.cols.name[1]-->", table.cols.name[1])
    print("Select table.cols.name[1:2]-->", table.cols.name[1:2])
    print("Select table.cols.name[:]-->", table.cols.name[:])
    print("Select table.cols._f_col('name')[:]-->",
                                    table.cols._f_col('name')[:])

The output of this for a certain arbitrary table is:

Column handlers:
/table.cols.name (Column(), string, idx=None)
/table.cols.lati (Column(), int32, idx=None)
/table.cols.longi (Column(), int32, idx=None)
/table.cols.vector (Column(2,), int32, idx=None)
/table.cols.matrix2D (Column(2, 2), float64, idx=None)
Select table.cols.name[1]--> Particle:     11
Select table.cols.name[1:2]--> ['Particle:     11']
Select table.cols.name[:]--> ['Particle:     10'
 'Particle:     11' 'Particle:     12'
 'Particle:     13' 'Particle:     14']
Select table.cols._f_col('name')[:]--> ['Particle:     10'
 'Particle:     11' 'Particle:     12'
 'Particle:     13' 'Particle:     14']

See the examples/table2.py file for a more complete example.

Column.__len__() → int[source]

Get the number of elements in the column.

This matches the length in rows of the parent table.

Column.__setitem__(key: int | slice, value: Any) → int[source]

Set a row or a range of rows in a column.

If key argument is an integer, the corresponding element is set to value. If key is a slice, the range of elements determined by it is set to value.

Examples

# Modify row 1
table.cols.col1[1] = -1

# Modify rows 1 and 3
table.cols.col1[1::2] = [2,3]

Which is equivalent to:

# Modify row 1
table.modify_columns(start=1, columns=[[-1]], names=['col1'])

# Modify rows 1 and 3
columns = np.rec.fromarrays([[2,3]], formats='i4')
table.modify_columns(start=1, step=2, columns=columns,
                     names=['col1'])