- Quotes from users
- PyTables in the computational methods and simulation toolkit Proteus
- PyTables Pro and the Galaxy Zoo Project
- PyTables in Multi-camera tracking of flying flies
- PyTables as a high-performance container for multi-gigabyte logfiles
- PyTables in animal communication research
- PyTables in multi-physics micromagnetic simulator Nmag
Quotes from users
This is what people are saying about their experience with PyTables. You can leave your own quote here or cite other people's quotes. Also, it would be really cool if you can add the Powered by PyTables button to your project website
At UPV we are developing software for risk analysis in dam safety. When analysing complex problems, we need a database capable of dealing with several GB of information. PyTables is working great for us. The best things for us have been how easy it was to integrate PyTables with our existing code, it's speed, the possibility of seamlessly working with compressed data and the level of support we've received when something didn't work as we expected.
-- Armando Serrano, Universidad Politécnica de Valencia
PyTables has been a huge help to pymc. Pymc produces lots of samples from certain probability distributions, some of which can be high-dimensional. David Huard wrote a PyTables-based backend that compresses and saves the samples as they're created. The backend makes it much possible to work with larger models and recover long simulations after crashes... and when you're using it, you can hardly tell the data aren't in memory. Thanks for a terrific product.
-- Anand Patil
PyTables offers the best programming interface for using HDF-5 files available in any language. It is the most elegant HDF-5 API around, far better than the native HDF-5 C-interface. Don't be fooled by its tables and database moniker: if you have to deal with HDF-5 files, you'll enjoy PyTables.
-- Maarten Sneep
The nice thing of PyTables, above tiff, for me, is that I can use my data as one large memmapped NumPy array and let PyTables do the rest, while still saving lots of diskspace compared to normal memmapped arrays.
-- Vincent Schut, Sarvision
I am very happy that I found PyTables. I have used it extensively for data analysis on the BaBar experiment and I must say that it is by far my favorite package for data storage. Please keep up the good work! You helped me enjoy my data again.
-- Jan Strube, Rutherford Appleton Laboratory
- Many, many thanks for making such an extraordinary and excellent library freely available. I use it a lot for my research, and I simply can't work without it anymore.
-- Gabriel J.L. Beckers, Group Neurobiology of Behaviour, Max Planck Institute for Ornithology
I have been using PyTables with great success in a shared data access application for quite some time now and I am pleased to say that it has never let me down. My praise is endless regarding this excellent package.
-- Elias Collas, Stress Methods, Gulfstream Aerospace
PyTables is a very well designed interface to the HDF5 libraries. It fills a gap for people using Python/Numeric/!NumPy/numarray who need to deal with large data sets and convenient and fast data analysis tools.
-- Ernesto Rodríguez, Group Supervisor in the Radar Science and Engineering Section, Jet Propulsion Laboratory
We are very pleased with the PyTables functionality. Especially we are pleased by your prompt replay for problem reports. We had evaluated other products for storing our data in HDF5 files, but are we happy we choosed PyTables.
-- Berthold Höllmann, Germanischer Lloyd
We are developing a new engineering application, with PyTables as its core data base. We have found PyTables to be well designed, fast, well integrated into Python, and, perhaps more importantly, very robust.
-- Farzin Shakib, President, ACUSIM Software, Inc.
For large arrays and our raid 5 server, I can get read speeds approaching 1 GB per second. That is just awesome performance! Thanks to the PyTables team.
-- Lou Wicker, National Severe Storms Lab, Norman OK USA
I've recently started using PyTables for storing large datasets and I'd give it 10/10! Access is fast enough you can just access the data you need and leave the full array on disk.
-- Bryan Cole, TeraView Ltd.
Have you been using PyTables in your work? Has it been useful to you? Then let the world know by telling your story in this page. Explain what your problem was, how PyTables helped you to solve it, and how the solution fared. (The source text of this page contains a sample success story.)
PyTables in the computational methods and simulation toolkit Proteus
The Proteus toolkit is a Python package for solving partial differential equations--mainly model equations arising in civil and environmental engineering. !Proteus includes parallel implementations of a variety of numerical methods, including classical Galerkin, discontinous Galerkin, and non-conforming finite element methods. These methods can be applied on a fairly wide range of mesh topologies for large scale two- and three-dimensional simulations. !Proteus stores field data and meshes using XDMF, which is an open data format based on XML and HDF5. PyTables has made it easy to manipulate the HDF5 files safely and efficiently on a wide range of HPC machines.
-- Chris Kees
PyTables Pro and the Galaxy Zoo Project
I am reducing and analysing data from the Galaxy Zoo project (http://galaxyzoo.org). Essentially, this website asks users questions about galaxy images and records their answers. The result is a MySQL table containing records for 60 million answers for 250 thousand galaxies by 100 thousand users. The same images are presented to many different users, to improve the accuracy and enable us to quantify the uncertainty of the observable properties of each galaxy.
Much of the initial data reduction is performed in MySQL, including some quite computationally demanding steps, such as iterating over all clicks for each galaxy. However, I had a couple of specific steps which I couldn't work out how to make any faster and would have taken about a month of cpu time! I abandoned MySQL and tried numpy, which I have used extensively for the past nine years (i.e. since Numeric, via numarray). However, the size of the datasets and numpy's need to load everything into RAM meant that I kept running out of memory (I couldn't get memory mapping in numpy to work). I therefore looked around for a fast array library which didn't need to keep everything in RAM. PyTables (which I had briefly experimented with a while ago), seemed like the perfect solution and worked very well.
However, even with various tricks, PyTables was going to take a month to do this. I realised I needed indexing, so bought PyTables Pro. I'm glad to say that my data still took a while to process (20 hours) but that is still around 100 times faster than I could get it to run in MySQL or PyTables (without indexing)!
It is possible that by going back to MySQL, and/or much more time spent experimenting with different approaches, I might have found something faster. However, I had already spent a week on something I could actually write down a perfectly good (but, as it turned out, very slow) algorithm for in an hour. I'll be going straight to PyTables Pro next time, as it seems to be fast without having to spend ages experimenting.
-- Steven Bamford
PyTables in Multi-camera tracking of flying flies
We use PyTables extensively saving data from our multi-camera realtime fly tracking system. A typical experiment tracks multiple flies in a flight arena for durations of several hours or more and generates roughly 2 GB of uncompressed data in PyTables format (in addition to video footage). PyTables has proven very amenable for logging data in this environment, but its real benefit comes later. Its integration with Python's numerical array packages and its fast searching set it apart from other possible solutions. The ease and speed with which this significant amount of data can be analyzed interactively far surpasses other systems I've worked with. Carabos has been extremely responsive in responding to bug reports and feature requests in both a paid and unpaid manner. It's clear to me that HDF5 format is a wonderful beast, but without PyTables, there's no way I, as a scientist first and programmer second, would have learned to master its low levels and would simply be stuck with a far lesser solution.
PyTables as a high-performance container for multi-gigabyte logfiles
PyTables ROCKS! I work for one of the largest online travel sites and we produce many gigabytes of logfiles over the month with data on how the various services are performing. Originally we loaded all the data into a database to generate SLA reports at the end of the month. PostgreSQL just wasn't up to the task -- or at least I wasn't up to the task of tuning Postgres to handle the load. The code was a combination of Python and the proprietary Postgres Procedural Language and it was hideous. Not only was it difficult to maintain but the report would take hours (sometimes more than a day) to run and brought our server to its knees.
When I stumbled on PyTables I had to write a prototype to see how it would perform in our situation. In just a few days I had rewritten the whole system. The number of lines was drastically reduced. It was much easier to read. It was all Python. But best of all -- it worked really well and really fast! Because of the ability to turn on compression our diskspace consumption was drastically reduced. It now takes me 90 minutes to convert one months worth of log files into HDF5 format and then about 3 minutes to do all the computations. And it can run easily on any developer's machine since a database isn't needed!
I used to be saddled with the original system but since I rewrote it using PyTables I handed it off to another developer to maintain and it has successfully been transitioned to two other developers since then because it is so much easier to understand and work with now.
I didn't have any major issues with PyTables and you were fantastic about replying to any questions I did have. Word has gotten around our organization and PyTables is now a serious contender for quite a few different applications. Keep up the good work!
-- Chuck Clark
PyTables in animal communication research
Social animals such as birds and humans can produce tens of thousands of vocalizations per day. An essential first step in getting insight into how vocal communication is organized is to record complete acoustic scenes for extended periods of time, and organize this data in such a way that it can be looked at efficiently and in flexible ways. We use PyTables to store both the primary data (sound recordings) and measurements (pitch, duration, etc) of each sound present in acoustic communications scenes that are recorded continuously for weeks. PyTables allows us to very rapidly select sets of vocalizations based on those measurements and evaluate patterns, or perform additional analyses on the actual sounds. Given that the data sets are very large (say 20 Gb for a week of communication between two birds), this would be very impractical with traditional methods. One of the great things of PyTables is that it is very easy to work with; it essentially allows anyone with at least basic Python skills to work with a very high-performance system to organize, store, access, discover, analyze, and share huge amounts of complex data. Highly recommended!
-- Gabriel Beckers
PyTables in multi-physics micromagnetic simulator Nmag
The Nmag simulation package is a finite element solver for micromagnetic problems. Ferromagnetic nanostructures are discretised using a tetrahedral mesh, and the temporal behaviour of a multitude of physical fields are computed on the mesh. Nmag is novel (multi-physics) approach where we do not know at compile and coding time what type of data the user may use and how often they decide to save the data (or just part of it), so flexibility is crucial. PyTables provides just that flexibility.
We also use PyTables to save the mesh on which the calculation is done. The inbuilt compression allows us to reduce memory consumption significantly without slowing the process down: saving a mesh is approximately 4 times more space efficient than saving it in a ascii based file format (see some data). We get much more significant space savings when saving field data, as we have some fields that change very little over space or time and thus compress excellently.
In summary, PyTables allows to quickly write code that saves complicated and hard-to-predict data structures with very reasonable compression.