Last Updated: 2020-11-05

What you will build

In this codelab, you are going to deploy an auto-scaling HPC cluster on Google Cloud that comes with the Slurm job scheduler. You will customize this system to deploy compute nodes with OpenFOAM® installed and then use this infrastructure to simulate compressible flow past a NACA0012 aerofoil.

What you will learn

What you will need

Set IAM Policies

In HPC, there are clear distinctions between system administrators and system users. System administrators generally have "root access" enabling them to manage and operate compute resources. System users are generally researchers, scientists, and application engineers that only need to leverage the resources to execute jobs.

On Google Cloud Platform, the OS Login API provisions POSIX user information from GSuite, Cloud Identity, and Gmail accounts. Additionally, OS Login integrates with GCP's Identity and Access Management (IAM) system to determine if users should be allowed to escalate privileges on Linux systems.

In this tutorial, we assume you are filling the system administrator and compute engine administrator roles. We will configure IAM policies to give you sufficient permissions to accomplish the following tasks

To give yourself the necessary IAM roles to complete this tutorial

  1. Navigate to IAM & Admin > IAM in the Products and Services menu.
  2. Click "+Add" near the top of the page.
  3. Type in your GSuite account, Cloud Identity Account, or Gmail account under "Members"
  1. Add the following roles : Compute Admin, Compute OS Admin Login, and Service Account User
  1. Click Save

In this section, you will deploy the Fluid-Slurm-GCP solution, an auto-scaling HPC cluster with the Slurm job scheduler and software that supports computational fluid dynamics workflows, including Paraview.

  1. Open https://console.cloud.google.com/marketplace/details/fluid-cluster-ops/cfd-gcp.
  2. Click "Launch"
  3. Give the deployment a name (e.g. openfoam-demo) and select the GCP zone where you want to deploy your cluster.

  4. Leave the Controller and Login settings at their default settings.
  5. In the Partition Configuration section, set the partition name to ‘openfoam', the Machine Type to `n1-standard-8`, and the Disk Size to 50 GB.
  6. Click "Deploy" and wait for the cluster to be created.

In this section of the codelab, you will configure the openfoam partition to use the openfoam-gcp image. Note that this image is provided as part of Fluid-Slurm-GCP and is licensed to you under the Fluid-Slurm-GCP EULA

  1. Log in to your cluster controller instance using ssh
  2. Go root.
sudo su
  1. Create a cluster-configuration file use the cluster-services CLI.
cluster-services list all > config.yaml
  1. Open config.yaml in a text editor and navigate to the partitions[0].machines[0] block. Insert an image definition for the machine block that points to projects/fluid-cluster-ops/global/images/openfoam-gcp . Your machine block should look similar to the example block below.
  machines:
  - disable_hyperthreading: false
    disk_size_gb: 50
    disk_type: pd-standard
    external_ip: false
    gpu_count: 0
    gpu_type: nvidia-tesla-p4
    image: projects/fluid-cluster-ops/global/images/openfoam-gcp
    local_ssd_mount_directory: /scratch
    machine_type: n1-standard-8
    max_node_count: 10
    n_local_ssds: 0
    name: openfoam
    preemptible_bursting: false
    static_node_count: 0
    vpc_subnet: https://www.googleapis.com/compute/v1/projects/cloud-hpc-demo/regions/us-east4/subnetworks/default
    zone: us-east4-a
  1. Save the config.yaml file and exit your text editor.
  2. Use cluster-services to preview the changes to your openfoam partition.
cluster-services update partitions --config=config.yaml --preview
  1. Apply the changes.
cluster-services update partitions --config=config.yaml

In this section, we will access the cluster's login node to configure Slurm accounting, so that you can submit jobs using the Slurm job scheduler.

  1. SSH into the cluster's login node
  2. Go root
sudo su
  1. Append a sample slurm_accounts block to the end of the config.yaml file.
cluster-services sample slurm_accounts >> config.yaml
  1. Edit the cluster-configuration file so that you are allowed to submit to the openfoam partition. Make sure you remove the empty slurm_accounts: [] that is pre-populated in the cluster-configuration file.
    The example slurm_account configuration below will create a Slurm account called cfd with the user joe added to it. Users in this account will be allowed to submit jobs to the meshing, openfoam, and paraview partitions.
slurm_accounts:
  - allowed_partitions:
- meshing
- openfoam
- paraview
    name: cfd
    users:
- joe
  1. Preview the changes for updating the slurm_accounts. Verify that you have entered in the Slurm accounting information correctly.
cluster-services update slurm_accounts --config=config.yaml --preview
  1. Apply the changes.
cluster-services update slurm_accounts --config=config.yaml 
  1. Exit from root.
exit

In this section, you will submit a Slurm batch job to run the NACA0012 tutorial included with OpenFOAM®. To help you with this, the Fluid-Slurm-GCP solution comes with an example Slurm batch script (/apps/share/openfoam.slurm). This example batch script can also be used as a starting point for other OpenFOAM® jobs on the cluster.

  1. On the cluster login node, clone the fluid-slurm-gcp_custom-image-bakery repository and copy the example Slurm batch script to your home directory.
git clone https://github.com/fluidnumerics/fluid-slurm-gcp_custom-image-bakery.git
cp fluid-slurm-gcp_custom-image-bakery/examples/openfoam/openfoam.slurm ./
  1. Open the Slurm batch script in a text editor. Set the --account parameter to the Slurm account name that you set in step 5 from the previous section of this codelab. Save the file when you are done and exit the text editor.
#SBATCH --account=cfd
  1. Submit the batch job using sbatch.
sbatch openfoam.slurm
  1. Wait for the job to complete.

When the job completes, you will have the aerofoilNACA0012 OpenFOAM® simulation case directory in your home directory.

ls aerofoilNACA0012/
0     1050  1200  1350  150  300  450  550  700  850  Allclean  dynamicCode      log.transformPoints  processor1  processor4  processor7
100   1100  1250  1400  200  350  50   600  750  900  Allrun    log.blockMesh    postProcessing       processor2  processor5  system
1000  1150  1300  1410  250  400  500  650  800  950  constant  log.extrudeMesh  processor0           processor3  processor6

In this codelab, you created an auto-scaling, cloud-native HPC cluster and ran a parallel OpenFOAM® simulation on Google Cloud Platform!

Further reading

Tell us your feedback