pytables-powered.png

Quotes from users

This is what people are saying about their experience with PyTables. You can leave your own quote here or cite other people's quotes. Also, it would be really cool if you can add the Powered by PyTables button to your project website :-)


edams-reduced.png

-- Armando Serrano, Universidad Politécnica de Valencia


pymc.png

-- Anand Patil


-- Maarten Sneep


sarvision-logo.png

-- Vincent Schut, Sarvision


-- Jan Strube, Rutherford Appleton Laboratory


maxplack-logo.png

-- Gabriel J.L. Beckers, Group Neurobiology of Behaviour, Max Planck Institute for Ornithology


GD-logo.png

-- Elias Collas, Stress Methods, Gulfstream Aerospace


jpl-logo.png

-- Ernesto Rodríguez, Group Supervisor in the Radar Science and Engineering Section, Jet Propulsion Laboratory


gl-group-logo.png

-- Berthold Höllmann, Germanischer Lloyd


acusim-logo.png

-- Farzin Shakib, President, ACUSIM Software, Inc.


noaa-nssl-logo.png

-- Lou Wicker, National Severe Storms Lab, Norman OK USA


teraview-logo.png

-- Bryan Cole, TeraView Ltd.


Success stories

Have you been using PyTables in your work? Has it been useful to you? Then let the world know by telling your story in this page. Explain what your problem was, how PyTables helped you to solve it, and how the solution fared. (The source text of this page contains a sample success story.)

PyTables in the computational methods and simulation toolkit Proteus

The Proteus toolkit is a Python package for solving partial differential equations--mainly model equations arising in civil and environmental engineering. !Proteus includes parallel implementations of a variety of numerical methods, including classical Galerkin, discontinous Galerkin, and non-conforming finite element methods. These methods can be applied on a fairly wide range of mesh topologies for large scale two- and three-dimensional simulations. !Proteus stores field data and meshes using XDMF, which is an open data format based on XML and HDF5. PyTables has made it easy to manipulate the HDF5 files safely and efficiently on a wide range of HPC machines.

-- Chris Kees

PyTables Pro and the Galaxy Zoo Project

I am reducing and analysing data from the Galaxy Zoo project (http://galaxyzoo.org). Essentially, this website asks users questions about galaxy images and records their answers. The result is a MySQL table containing records for 60 million answers for 250 thousand galaxies by 100 thousand users. The same images are presented to many different users, to improve the accuracy and enable us to quantify the uncertainty of the observable properties of each galaxy.

Much of the initial data reduction is performed in MySQL, including some quite computationally demanding steps, such as iterating over all clicks for each galaxy. However, I had a couple of specific steps which I couldn't work out how to make any faster and would have taken about a month of cpu time! I abandoned MySQL and tried numpy, which I have used extensively for the past nine years (i.e. since Numeric, via numarray). However, the size of the datasets and numpy's need to load everything into RAM meant that I kept running out of memory (I couldn't get memory mapping in numpy to work). I therefore looked around for a fast array library which didn't need to keep everything in RAM. PyTables (which I had briefly experimented with a while ago), seemed like the perfect solution and worked very well.

However, even with various tricks, PyTables was going to take a month to do this. I realised I needed indexing, so bought PyTables Pro. I'm glad to say that my data still took a while to process (20 hours) but that is still around 100 times faster than I could get it to run in MySQL or PyTables (without indexing)!

It is possible that by going back to MySQL, and/or much more time spent experimenting with different approaches, I might have found something faster. However, I had already spent a week on something I could actually write down a perfectly good (but, as it turned out, very slow) algorithm for in an hour. I'll be going straight to PyTables Pro next time, as it seems to be fast without having to spend ages experimenting.

-- Steven Bamford

PyTables in Multi-camera tracking of flying flies

We use PyTables extensively saving data from our multi-camera realtime fly tracking system. A typical experiment tracks multiple flies in a flight arena for durations of several hours or more and generates roughly 2 GB of uncompressed data in PyTables format (in addition to video footage). PyTables has proven very amenable for logging data in this environment, but its real benefit comes later. Its integration with Python's numerical array packages and its fast searching set it apart from other possible solutions. The ease and speed with which this significant amount of data can be analyzed interactively far surpasses other systems I've worked with. Carabos has been extremely responsive in responding to bug reports and feature requests in both a paid and unpaid manner. It's clear to me that HDF5 format is a wonderful beast, but without PyTables, there's no way I, as a scientist first and programmer second, would have learned to master its low levels and would simply be stuck with a far lesser solution.

-- AndrewStraw

PyTables as a high-performance container for multi-gigabyte logfiles

PyTables ROCKS! I work for one of the largest online travel sites and we produce many gigabytes of logfiles over the month with data on how the various services are performing. Originally we loaded all the data into a database to generate SLA reports at the end of the month. PostgreSQL just wasn't up to the task -- or at least I wasn't up to the task of tuning Postgres to handle the load. The code was a combination of Python and the proprietary Postgres Procedural Language and it was hideous. Not only was it difficult to maintain but the report would take hours (sometimes more than a day) to run and brought our server to its knees.

When I stumbled on PyTables I had to write a prototype to see how it would perform in our situation. In just a few days I had rewritten the whole system. The number of lines was drastically reduced. It was much easier to read. It was all Python. But best of all -- it worked really well and really fast! Because of the ability to turn on compression our diskspace consumption was drastically reduced. It now takes me 90 minutes to convert one months worth of log files into HDF5 format and then about 3 minutes to do all the computations. And it can run easily on any developer's machine since a database isn't needed!

I used to be saddled with the original system but since I rewrote it using PyTables I handed it off to another developer to maintain and it has successfully been transitioned to two other developers since then because it is so much easier to understand and work with now.

I didn't have any major issues with PyTables and you were fantastic about replying to any questions I did have. Word has gotten around our organization and PyTables is now a serious contender for quite a few different applications. Keep up the good work!

-- Chuck Clark

PyTables in animal communication research

Social animals such as birds and humans can produce tens of thousands of vocalizations per day. An essential first step in getting insight into how vocal communication is organized is to record complete acoustic scenes for extended periods of time, and organize this data in such a way that it can be looked at efficiently and in flexible ways. We use PyTables to store both the primary data (sound recordings) and measurements (pitch, duration, etc) of each sound present in acoustic communications scenes that are recorded continuously for weeks. PyTables allows us to very rapidly select sets of vocalizations based on those measurements and evaluate patterns, or perform additional analyses on the actual sounds. Given that the data sets are very large (say 20 Gb for a week of communication between two birds), this would be very impractical with traditional methods. One of the great things of PyTables is that it is very easy to work with; it essentially allows anyone with at least basic Python skills to work with a very high-performance system to organize, store, access, discover, analyze, and share huge amounts of complex data. Highly recommended!

-- Gabriel Beckers

PyTables in multi-physics micromagnetic simulator Nmag

The Nmag simulation package is a finite element solver for micromagnetic problems. Ferromagnetic nanostructures are discretised using a tetrahedral mesh, and the temporal behaviour of a multitude of physical fields are computed on the mesh. Nmag is novel (multi-physics) approach where we do not know at compile and coding time what type of data the user may use and how often they decide to save the data (or just part of it), so flexibility is crucial. PyTables provides just that flexibility.

We also use PyTables to save the mesh on which the calculation is done. The inbuilt compression allows us to reduce memory consumption significantly without slowing the process down: saving a mesh is approximately 4 times more space efficient than saving it in a ascii based file format (see some data). We get much more significant space savings when saving field data, as we have some fields that change very little over space or time and thus compress excellently.

In summary, PyTables allows to quickly write code that saves complicated and hard-to-predict data structures with very reasonable compression.

-- Hans Fangohr, University of Southampton, United Kingdom

UserQuotes (last edited 2012-03-16 15:12:12 by FrancescAlted)