Stress Tests

With the goal of providing a hint about whether PyTables can be safely used in production environments that have to deal with very large amounts of data, some stress test has been conducted. By doing so, I hope of appease those who want to use PyTables in such exigent scenarios.

I've run these tests on a variety of different platforms, and two of them are described next: one made with an AMD Opteron machine and the other one with a Silicon Graphics (SGI) (both have architectures of 64-bit) machine.

AMD Opteron platform

For this one, I have setup a python script that, by running it once, have generated a unique file that have reached 5 TB (more than 5000 GB) of size. The file consists in several hundreds (400) of groups, each of them has several hundreds (300) of tables. Each table had one hundred thousand rows (100000). Here you have the definition for each table:

class Test(IsDescription):
    ngroup = IntCol(pos=1)
    ntable = IntCol(pos=2)
    nrow = IntCol(pos=3)
    string = StringCol(length=500, pos = 4)

where (ngroup, ntables, nrow) triad means the number of group, table and row that identify the row in the hierarchy. The string field is added just to make the row bigger (and more compressible). Each row has a length of 512 bytes.

The raw figures for the test were:

However, as transparent compression was used (using ZLIB compressor together with the shuffle filter), a file of a size of mere 11 GB was created. This allowed the making of such a large stress test, because finding today a filesystem with more than 4 TB available is still hard.

After creating this file, the same python script read all the data written and made some checks to ensure that the data in file was not corrupted.

Here, you can see the parameters passed to the the stress test program and its output:

time python stress-test3.py -l zlib -c 6 -g400 -t 300 -i 100000 test-big.h5
ls -lh test-big.h5

Compression level: 6
Compression library: zlib
Rows written: 12000000000  Row size: 512
Time writing rows: 262930.901 s (real) 262619.72 s (cpu)  100%
Write rows/sec:  45639
Write KB/s : 22819
Rows read: 12000000000  Row size: 512 Buf size: 49664
Time reading rows: 143171.761 s (real) 141560.42 s (cpu)  99%
Read rows/sec:  83815
Read KB/s : 41907

real    6768m34.076s
user    6183m38.690s
sys     552m51.150s
-rw-r--r--    1 5350     users         11G 2004-02-09 00:57 test-big.h5

The run of this stress test took little more than 112 hours and 300 MB of RAM. The software versions used were: PyTables 0.8, HDF5 1.6.2 and numarray 0.8. The hardware used was a 64-bit machine, with 4 AMD Opteron @ 1.6 GHz processors, 8 GB of shared memory and IDE disks @ 7200 rpm. The operating system was SuSe Linux Enterprise Server 8 using ReiserFS as filesystem.

The outcome of the test was that no difference between the read data and the original data was found.

I would like to thank to SourceForge team for providing the hardware needed to conduct this test.

SGI R14000 platform

One year or so before than the above, I've run a similar test but using a SGI machine. The version of PyTables used in this case was 0.7.1 and the final file size was five times smaller than in the Opteron machine. Next are described the results.

The file consisted in several hundreds (400) of groups, each of them had several hundreds (300) of tables. Each table had some thousands of rows (20000). The definition of tables was the same than in the Opteron case.

The raw figures for the test were:

However, as transparent compression was used (using LZO compressor), a file of a size of only 15 GB was created.

After creating this file, the same python script read all the data written and made some checks so as to ensure that the data in file was not corrupted.

Here, you can see the parameters passed to the the stress test program and its output:

$ time python stress-test3.py -l lzo -c 1 -g400 -t 300 -i 20000 test-big.h5
Compression level: 1
Compression library: lzo
Rows written: 2400000000  Row size: 512
Time writing rows: 73888.388 s (real) 557.196 s (cpu)  1%
Write rows/sec:  32481
Write KB/s : 16240
Rows read: 2400000000  Row size: 512 Buf size: 39936
Time reading rows: 36137.205 s (real) 1525.982 s (cpu)  4%
Read rows/sec:  66413
Read KB/s : 33206

real    1833m56.738s
user    1813m49.823s
sys     10m38.536s

$ ls -lh test-big.h5
-rw-------    1 falted   qtc           14G Aug 13 07:24 test-big.h5

The run of this stress test took little more that 30 hours and 500 MB of RAM. The software versions used were: PyTables 0.7.1, HDF5 1.6.0 and numarray 0.6.1. The hardware used was a SGI Origin 3000 with 16 R14000 @ 600 MHz processors with 12 GB of shared memory, with SCSI disks @ 10000 RPM. The operating system was IRIX 6.5 with XFS as filesystem.

The outcome of the test was that no difference between the read data and the original data was found.

I would like to thank to the Grup de Quimica Teorica i Computacional of Universitat Jaume I for providing the hardware needed to conduct this test.

StressTests (last edited 2008-04-21 11:12:44 by localhost)