I like my job, so no screenshots. Sorry.
Notes:
- sbatch is a command for submitting jobs on high performance compute nodes
- the
huge-n128-512g node uses 128 cores and has 512GiB of memory
- This is occurring in a medical research nonprofit
User: Hello everyone, this is the first time I'm using GCP. I'm trying to run a job, but it keeps failing. These are the sbatch headers I'm using:
#SBATCH --partition=huge-n128-512g
#SBATCH --nodes=8
#SBATCH --mail-user=user@institute.org
#SBATCH --mail-type=FAIL
#SBATCH --mem-per-cpu=32G
IT: Please make sure you need to use that node, each one costs $4500/month to use. Can you describe the job you're trying to do?
User: I'm doing high-depth genetic sequencing using 3gb bam files.
(additional note: there's usually only 1 bam file per chromosome, so 69gb total. Nice.)
IT: Those bam files are pretty small. I'd recommend starting with the med-n16-64g node and moving up if needed. We're only billed for run time. If the jobs take the same amount of time, it would be 13% of the cost.
The astute among you will notice that an 8 node swarm of 32GiB of memory per core is 32TiB total. The job was failing because the --mem-per-cpu flag was going above the available memory on each node. Even without that flag, the swarm would have used 4TiB memory. Holy overallocation, Batman!