General purpose expression evaluator class¶
The Expr class¶
- class tables.Expr(expr, uservars=None, **kwargs)¶
A class for evaluating expressions with arbitrary array-like objects.
Expr is a class for evaluating expressions containing array-like objects. With it, you can evaluate expressions (like “3 * a + 4 * b”) that operate on arbitrary large arrays while optimizing the resources required to perform them (basically main memory and CPU cache memory). It is similar to the Numexpr package (see [NUMEXPR]), but in addition to NumPy objects, it also accepts disk-based homogeneous arrays, like the Array, CArray, EArray and Column PyTables objects.
Expr class only offers a subset of the Numexpr features due to the complexity of implement some of them when dealing with huge amount of data.
All the internal computations are performed via the Numexpr package, so all the broadcast and upcasting rules of Numexpr applies here too. These rules are very similar to the NumPy ones, but with some exceptions due to the particularities of having to deal with potentially very large disk-based arrays. Be sure to read the documentation of the Expr constructor and methods as well as that of Numexpr, if you want to fully grasp these particularities.
expr (str) – This specifies the expression to be evaluated, such as “2 * a + 3 * b”.
uservars (dict) – This can be used to define the variable names appearing in expr. This mapping should consist of identifier-like strings pointing to any Array, CArray, EArray, Column or NumPy ndarray instances (or even others which will tried to be converted to ndarrays). When uservars is not provided or None, the current local and global namespace is sought instead of uservars. It is also possible to pass just some of the variables in expression via the uservars mapping, and the rest will be retrieved from the current local and global namespaces.
kwargs (dict) – This is meant to pass additional parameters to the Numexpr kernel. This is basically the same as the kwargs argument in Numexpr.evaluate(), and is mainly meant for advanced use.
The following shows an example of using Expr:
>>> f = tb.open_file('/tmp/test_expr.h5', 'w') >>> a = f.create_array('/', 'a', np.array([1,2,3])) >>> b = f.create_array('/', 'b', np.array([3,4,5])) >>> c = np.array([4,5,6]) >>> expr = tb.Expr("2 * a + b * c") # initialize the expression >>> expr.eval() # evaluate it array([14, 24, 36], dtype=int64) >>> sum(expr) # use as an iterator 74
where you can see that you can mix different containers in the expression (whenever shapes are consistent).
You can also work with multidimensional arrays:
>>> a2 = f.create_array('/', 'a2', np.array([[1,2],[3,4]])) >>> b2 = f.create_array('/', 'b2', np.array([[3,4],[5,6]])) >>> c2 = np.array([4,5]) # This will be broadcasted >>> expr = tb.Expr("2 * a2 + b2-c2") >>> expr.eval() array([[1, 3], [7, 9]], dtype=int64) >>> sum(expr) array([ 8, 12], dtype=int64) >>> f.close()
The append mode for user-provided output containers.
Common main dimension for inputs in expression.
The names of variables in expression (list).
The user-provided container (if any) for the expression outcome.
The start range selection for the user-provided output.
The stop range selection for the user-provided output.
The step range selection for the user-provided output.
Common shape for the arrays in expression.
The values of variables in expression (list).
Evaluate the expression and return the outcome.
Because of performance reasons, the computation order tries to go along the common main dimension of all inputs. If not such a common main dimension is found, the iteration will go along the leading dimension instead.
For non-consistent shapes in inputs (i.e. shapes having a different number of dimensions), the regular NumPy broadcast rules applies. There is one exception to this rule though: when the dimensions orthogonal to the main dimension of the expression are consistent, but the main dimension itself differs among the inputs, then the shortest one is chosen for doing the computations. This is so because trying to expand very large on-disk arrays could be too expensive or simply not possible.
Also, the regular Numexpr casting rules (which are similar to those of NumPy, although you should check the Numexpr manual for the exceptions) are applied to determine the output type.
Finally, if the setOuput() method specifying a user container has already been called, the output is sent to this user-provided container. If not, a fresh NumPy container is returned instead.
When dealing with large on-disk inputs, failing to specify an on-disk container may consume all your available memory.
- Expr.set_inputs_range(start=None, stop=None, step=None)¶
Define a range for all inputs in expression.
The computation will only take place for the range defined by the start, stop and step parameters in the main dimension of inputs (or the leading one, if the object lacks the concept of main dimension, like a NumPy container). If not a common main dimension exists for all inputs, the leading dimension will be used instead.
- Expr.set_output(out, append_mode=False)¶
Set out as container for output as well as the append_mode.
The out must be a container that is meant to keep the outcome of the expression. It should be an homogeneous type container and can typically be an Array, CArray, EArray, Column or a NumPy ndarray.
The append_mode specifies the way of which the output is filled. If true, the rows of the outcome are appended to the out container. Of course, for doing this it is necessary that out would have an append() method (like an EArray, for example).
If append_mode is false, the output is set via the __setitem__() method (see the Expr.set_output_range() for info on how to select the rows to be updated). If out is smaller than what is required by the expression, only the computations that are needed to fill up the container are carried out. If it is larger, the excess elements are unaffected.
- Expr.set_output_range(start=None, stop=None, step=None)¶
Define a range for user-provided output object.
The output object will only be modified in the range specified by the start, stop and step parameters in the main dimension of output (or the leading one, if the object does not have the concept of main dimension, like a NumPy container).