SPM benchmarks
We have tested three high specification machines by recording the time taken for a some standard SPM99 processing steps. We tested the same Pentium 4 machine running Linux and Windows 2000
Machines tested
Name |
Manufacturer / Model |
OS |
CPU no x speed (MHz) |
Memory (GB) |
V480 |
Sun V480 |
Solaris 8 |
4 x 900 |
6 |
ES45 |
HP ES45 |
True64 |
4 x 1000 |
8 |
Linux P4 |
Advantec Pentium 4 |
Mandrake Linux 9.0 |
1 x 2530 |
1 |
Windows P4 |
Advantec Pentium 4 |
Windows 2000 |
1 x 2530 |
1 |
For reference, here are the integer and floating point benchmark results for each machine, from www.spec.org:
||<tablewidth="92%" tablestyle="page-break-inside: avoid"20%>
CFP2000 base |
CFP2000 rate |
CINT2000 base |
CINT2000 rate |
|
V480 |
637 |
7.16 |
469 |
5.39 |
ES45 |
776 |
9.00 |
621 |
7.20 |
!Linux P4 |
992 |
11.0 |
944 |
11.5 |
Results for the Linux P4 are taken from the most similar machine that had been tested (Dell Precision WorkStation 350 2.53 GHz P4). Rates are for 1 processor, as Matlab is a single-threaded application. The V480 has not been tested with one processor; the results are estimated from (V480 rate for 2 processors) * (rate for one processor on ES45 / rate for two processors on ES45).
Tests
The tests were designed to assess speed for a typical SPM analysis on a single subject. The data consisted of four sessions of fMRI, with 235 images per session. Matrix size was 128x128x21. Analyses used matlab tic and toc timing functions around SPM99 batch mode scripts. The tests were
Realignment and reslicing
Calculation of realignment parameters and reslicing of all images 1..N, trilinear interpolationSmoothing
Smoothing of original images with 8mm FWHMModel estimation
Estimation of standard 4 session statistical model, applying low- (hrf) and high- (120 second) pass filters. The Linux P4 proved suprisingly slow on the model calculation, which was due to unusual slowness of multiplication of sparse by full matrices – see the SPM Intel tuning page. We found that model estimation was considerably faster in general if we avoided the sparse matrix multiplication, We therefore ran the following model estimation test:Model estimation: optimized
Here we removed the use of sparse matrices from the model estimation.
We used Matlab 5.3 on the V480 and ES45, and Matlab 6.5 for the Linux P4. Our Matlab licensing meant that we could not use the same version on all three machines. We did compare the speed of the realignment process using Matlab 5.3 and Matlab 6.0 on the V480 and the ES45; differences were ~1%. Note that Matlab and SPM need to be optimized for the Pentium4 machine because of a problem with the default P4 handling of not-a-number values in floating point calculations. This is described in the SPM Intel tuning page.
Results
We tested the machines in two situations; with the data stored on the local hard disk, and with the data stored on a disk mounted using NFS. The values reported are times in minutes.
Data on local disk
Machine |
Realign |
Smooth |
Model: standard |
Model:optimized |
V480 |
55.0 |
16.4 |
20.9 |
15.1 |
ES45 |
32.7 |
Not tested |
18.7 |
Not tested |
Linux P4 |
16.2 |
5.4 |
24.2 |
5.7 |
Windows P4 |
23.0 |
5.8 |
23.4 |
5.5 |
Data on local disk vs data via NFS
Machine |
Realign: local |
Realign: NFS |
NFS / Local |
V480 |
55.0 |
60 |
1.09 |
ES45 |
32.7 |
33.7 |
1.03 |
Linux P4 |
16.2 |
18.3 |
1.13 |
The V480 and ES45 connected to a Sun/Solaris NFS SCSI server via a switch. The Linux P4 connected via a hub to an NFS IDE server running Redhat linux 7.3.
We also timed the V480 and ES45 when running 6 simultaneous realignment jobs, comparing NFS and local storage. The slowdown attributable to NFS varied between 3 and 20%; the variation may have been due to unrelated NFS and CPU loads on the NFS server, which were sometimes heavy.
In addition to the tests listed in the table, we ran the following tests: mutual information coregistration (Linux P4: 2.0 minutes); normizalization of structural image only (Linux P4: 46 seconds); normalization of structural image and reslicing of 960 fMRI images (LinuxP4: 12.5 minutes, Windows P4: 20.0 minutes).
The tests imply that most of a standard single-subject analysis (realignment, coregistration, normalization, smoothing, statistical analysis, writing contrasts) would take 16.2 + 2.0 + 12.5 + 5.4 + 5.7 + 3.2 = 45 minutes on the Linux P4.
Conclusions
As expected from published integer and floating point benchmarks, the Intel solution was the best performer on these real-world tests of SPM performance. Keeping data on the local hard disk results in a speed gain of the order of 10%.
Linux or Windows?
The Pentium machine is fast running SPM under Linux or Windows. Realignment/reslicing is 42% slower on Windows, normalization/reslicing is 60% slower. Both procedures involve a large amount of image writing and resampling; Windows may be slower because of slower disk access and/or less effective caching. Assuming coregistration takes the same time on Linux and Windows, the whole processing stream for Windows would take around 59 minutes, which 32% slower than Linux. Of course the choice between Linux and Windows is likely to be dictated by other factors, among which are NFS speed, multitasking performance, and the other applications you want to run.Matthew Brett
Rhodri Cusack
7th April 2003