Load sharing tool for SPM
Overview of commands
showload |
Shows load on all machines |
spm |
See separate help on SpmVersions |
bestmegmachine |
Shows best machine for launching MEG software |
all Elekta/MNE/Freesurfer commands |
Launched with loadsharing using runmne wrapper |
Technical details
On our site, we have a number of central Linux servers that are accessed by VNC. This is a tool I set up for launching SPM or our MEG software on the currently least loaded machine, so balancing the load across them. The SPM jobs are started either manually by the parallel scheduling part of AutomaticAnalysis. In the past, we had a script that would scan all of the machines each time an SPM job was launched. However, we now have 36 machines, and to scan them all takes an unpleasant amount of time. Also, the scanning job would hang if any of the machines did. The new version runs a cronjob every 5 mins, and launches a small job on each machine that checks its load and writes the result to a file. In fact, for robustness, the cronjob runs on every machine, but the script just exits if it is less than 30 secs since the last one. Even more robustly, when the script runs, it inserts the cronjob onto each of the machines, so provided at least one machine keeps running, the system will revive.
It also has a feature for controlling Matlab license usage. When multiple Matlab jobs are launched by the same user on the same machine, only one license is used. The load sharing tool allows each user a fixed number of matlab jobs (stored in loadsharesettings.maxlicensesperuser). Until this number, each new job will be allocated on the least loaded machine in the whole pool. Once they reach this limit, it will launch them on the least loaded machine out of the ones they already have jobs on.
The scripts are written in python, with a couple of shell scripts that help launch SPM. All of the settings are in "loadsharesettings.py", such as the list of all machines to be scanned for load, and then the lists of machines that are available for various kinds of software.
File listing
General settings
loadsharesettings.py |
Various settings for the loadshare system |
Checking machine load
crontab.txt |
A sample crontab (insert with crontab crontab.txt) |
scanmachines.py |
Script that goes through all of the machines as listed in loadsharesettings.machines and runs checkmachineload on each. Exits if run more recently than loadsharesettings.minscaninterval (default 30 s) |
checkmachineload.py |
Checks the load on a single machine |
status |
Directory containing files recording: load and last type scanmachines launched from each of the machines |
Launching software
loadshare.py |
Main function: getbestmachine(availablemachinelist, usesmatlab); Parameters: availablemachinelist is list of machine names; usesmatlab is 0 or 1, depending on whether matlab is required for this task (used in machine selection to control license issuing). This is now used by all of the below |
launchspm.py |
The main python launch script. Accepts various parameters - see SpmVersions. It also accepts some hidden parameters, including "workerdesktop" which launches without the Matlab java desktop and with a funky yellow on black |
launchspm_inner |
Inner script for SPM launcher |
launchspm_inner_unlimit |
Special version of inner script |
meg_runcommand.py |
Wrapper for Elekta Neuromag tools. All are load balanced. One (mce) requires Matlab |
run_mne.py |
Wrapper for MNE or freesurfer |
Updating known hosts
update_known_hosts.py |
Updates list of .ssh keys using list in file known_hosts |
known_hosts |
A list of host keys. Can be a copy of the file in your ~/.ssh folder |
Download
Download from here http://www.mrc-cbu.cam.ac.uk/~rhodri/loadshare.tar