I thought I’d share a little success story. A researcher recently approached me to say he’d been having difficulty getting his bioinformatics workflow (based on this) working on the University of Sheffield’s ShARC HPC cluster and had an urgent need to get it running to meet a pending deadline. He’s primarily using the GATK ‘genome analysis toolkit’. He appealed for help on the GATK forum, members of which suggested that the issue may be due to the way in which GATK was installed.
After submitting a job to a Grid Engine (GE) cluster such as ShARC, GE iterates through all job queues instances on all nodes (in a particular order) to see if there are sufficient free resources to start your job. If so (and GE isn’t holding resources back for a Resource Reservation) then your job will start running. If not, then your job is added to the list of pending jobs. Each pending job has an associated dynamically-calculated priority, which the GE scheduler uses to determine the order in which consider pending jobs.
HPC clusters running the Grid Engine distributed resource manager (job scheduling) software allow jobs to be submitted from a whitelist of ‘submit hosts’. With the ShARC and Iceberg clusters here at the University of Sheffield the per-cluster lists of permitted submit hosts include all login nodes and all worker nodes; for ease of management and security, unmanaged hosts (e.g. researchers' workstations) are not added to the list. If you really want to be able to automate the process of submitting jobs from your own machine then one option is to write a script that logs into the cluster via SSH then submits a job from there.