I’ve been running Arch Linux on my personal and work laptops for a couple of years and love it: it’s very configurable, cutting edge, (surprisingly) stable and has awesome documentation (useful for all Linux users). However, there’s one aspect of it that really bugs me: files related to the current kernel and kernel modules (inc. device drivers) are aggressively purged as soon as you upgrade the linux kernel package i.e. the kernel modules built against the running kernel are no longer on disk so cannot be dynamically loaded.
Why learn Latin in 2018 if you’re not a member of the catholic church? Or Greek if you’re not greek? Well, both are great at putting things in context and highlighting commonality: Latin does this for the romance languages and Greek does this for much scientific terminology. And, Ed, a 46-47 year old positively archaic text editor, does this for parts of the Unix userland that are still very much in common use (specifically sed, awk, grep, vi), as I learned from reading Michael W Lucas' new book Ed Mastery.
I thought I’d share a little success story. A researcher recently approached me to say he’d been having difficulty getting his bioinformatics workflow (based on this) working on the University of Sheffield’s ShARC HPC cluster and had an urgent need to get it running to meet a pending deadline. He’s primarily using the GATK ‘genome analysis toolkit’. He appealed for help on the GATK forum, members of which suggested that the issue may be due to the way in which GATK was installed.
When I started as a Research Software Engineer at the Uni of Sheffield a year ago I was given a lovely Dell XPS 9550 laptop to work on. The first thing I did was to install Arch Linux on it, which has so far proved to be extremely stable despite the rolling release model and the main Arch repository offering very recent versions of most FOSS packages. Large engineering apps say no to bleeding edge Linux distros However, the one main issue with running Arch at work is that some commercial engineering software supports Linux but not all flavours: there are a fair few commercial packages that are only supported and only really work with RHEL/Centos/SLES and Ubuntu.
After submitting a job to a Grid Engine (GE) cluster such as ShARC, GE iterates through all job queues instances on all nodes (in a particular order) to see if there are sufficient free resources to start your job. If so (and GE isn’t holding resources back for a Resource Reservation) then your job will start running. If not, then your job is added to the list of pending jobs. Each pending job has an associated dynamically-calculated priority, which the GE scheduler uses to determine the order in which consider pending jobs.
HPC clusters running the Grid Engine distributed resource manager (job scheduling) software allow jobs to be submitted from a whitelist of ‘submit hosts’. With the ShARC and Iceberg clusters here at the University of Sheffield the per-cluster lists of permitted submit hosts include all login nodes and all worker nodes; for ease of management and security, unmanaged hosts (e.g. researchers' workstations) are not added to the list. If you really want to be able to automate the process of submitting jobs from your own machine then one option is to write a script that logs into the cluster via SSH then submits a job from there.
I’m Will Furnass, a Research Software Engineer in the University of Sheffield’s Research Software Engineering team. My boss, Mike Croucher, is a big believer in blogs as a means of getting ideas out there and has been prodding me for some time about setting up my own blog to talk about research software, systems administration and teaching so I thought I’d start 2018 by finally doing just that. ‘Learning Patterns’ - why the name?