Last Updated: 2020-03-16

Slurm-GCP Infrastructure

fluid-slurm-gcp provides a design on cloud resources that is similar in look-and-feel to on-premise HPC infrastructure. The deployment has a number of static login nodes, where users access the cluster, and a controller instance that hosts the Slurm job scheduler. Compute nodes are either static or ephemeral.

Static compute nodes persist as long as the deployment is active and remain ready to receive workloads from Slurm. Ephemeral compute nodes are created on-the-fly, as needed, to meet variable compute capacity demands. When ephemeral nodes become idle, Slurm automatically removes the compute resources from GCP so that you pay only for compute cycles as needed.

Package Management

In the basic deployment of slurm-gcp, the controller instance hosts the /home and /apps directory, which are mounted to the login node and compute nodes using NFS. Because this system has multiple VMs, running yum install or apt-get install on the login or controller nodes does not make new packages available on compute nodes.

Centralized Workflow

On this system, packages can be installed under /apps to make them available to all instances in the cluster. Environment modules can then be used to dynamically configure PATH, LD_LIBRARY_PATH, and other environment variables. This strategy requires some management on the system-administrator's side to ensure software is available for all users. However, fluid-slurm-gcp comes with Python 2.7 and Python 3.8 installed under /apps, removing some of the hurdles associated with a centralized workflow.

Virtual environments

For python applications, virtualenv or venv allow developers to install python packages in an isolated location for a given application. The venv toolkit is included with Python 3.3 and later by default. The virtualenv toolkit must be installed separately from Python, but provides support for Python 2.7+ and Python 3.3+.

Like containerization, this puts control of software dependency installation in the developer's hands without requiring global system changes. Unlike containerization, developers don't need to write recipe files that include the whole environment for their application. Instead, a simple requirements.txt file is used to specify what python packages the application depends on.

Container Workflow

Another approach is to use Singularity container images to run applications on Slurm-GCP. Singularity provides a container platform that does not require root escalation to run (unlike Docker). This is beneficial for a shared computing system, like Fluid-Slurm-GCP, where system users should not have root access (system admins should).

In this approach, developers can build Docker or Singularity images on their own platforms or using Google Cloud Build and deploy them on Fluid-Slurm-GCP.

What you will build

In this codelab, you are going to try three different approaches for python package management on fluid-slurm-gcp

What you will learn

What you will need

Fluid-Slurm-GCP comes with python/3.8.2 already installed under /apps/python/3.8.2, which includes pip3 and venv. This install of python is visible to all instances on your cluster. Any user of your cluster can use environment modules, to use this instance of python. Environment modules allow you to easily manipulate PATH, LD_LIBRARY_PATH, and other environment variables using module commands.

In this section, we'll walk through

In order to complete the steps in this section, you will need to be able to escalate privileges ("go root") on your cluster. This requires the Compute OS Admin Login role .

Log in to the cluster

  1. Open https://console.cloud.google.com/compute/instances. Verify that you are in the GCP project that is hosting your fluid-slurm-gcp cluster.
  2. ssh into the login node in your cluster by clicking the `ssh` button next to the login node.

Working with Python3 through Environment modules

Once you are on the login node of your cluster, you can use module commands to dynamically load software packages into your default path. You can view which modules are available by running module avail.

$ module avail

------------------------------------------ /usr/share/Modules/modulefiles ------------------------------------------
dot         module-git  module-info modules     null        use.own

------------------------------------------------- /etc/modulefiles -------------------------------------------------
mpi/openmpi-x86_64

--------------------------------------------------------------------------- /apps/modulefiles ---------------------------------------------------------------------------
python/2.7.9      python/3.8.2      singularity/3.2.1

The directory /apps/modulefiles hosts module files that are visible across the cluster.

You can use the module load PACKAGE command to make additional software packages visible. For this codelab, we will load the python/3.8.2 module.

$ module load python/3.8.2 
$ module list
Currently Loaded Modulefiles:
  1) python/3.8.2

At any time, you can use module list to display which modules are currently loaded.

You can use the which command to determine the full path to the exact python3 binary is being used. After loading the python/3.8.2 module, this should point to /apps/python/3.8.2/bin/python3.

$ which python3
/apps/python/3.8.2/bin/python3
$ which pip3
/apps/python/3.8.2/bin/pip3

Python3 Package Installation

To install packages for other developers to use on your system with the /apps install of python3, you will need to be root. For this codelab, you will install numpy, scipy, and matplotlib in order to run a demo application.

  1. Go root
$ sudo su
[root]#
  1. Install numpy, scipy, and matplotlib with the /apps install of pip3.
[root]# module load python/3.8.2
[root]# pip3 install numpy scipy matplotlib
  1. Exit root
[root]# exit 

Application Testing

We've provided a simple python3 application in the pythondemo-slurm_gcp repository that reports the version and module path for scipy, numpy, and matplotlib. This application is meant to verify that we are indeed using the site-packages under the /apps/python/3.8.2 installation path.

In this step, we'll show how to run a batch job on slurm that uses the /apps install of python3.

  1. Clone the pythondemo-slurm_gcp repository
$ git clone https://bitbucket.org/fluidnumerics/pythondemo-slurm_gcp.git
$ cd pythondemo-slurm_gcp
  1. Edit the account and partition settings in test.batch to match your account and partition settings.
  1. Submit the batch job and wait for the job to complete. At the end of the job, STDERR and STDOUT will be written to a file called test.out
$ sbatch test.batch
  1. When the job is complete, you can check the contents of test.out.
$ cat test.out 
/apps/python/3.8.2/bin/python3
scipy version : 1.4.1
scipy module : /apps/python/3.8.2/lib/python3.8/site-packages/scipy/__init__.py
numpy version : 1.18.2
numpy module : /apps/python/3.8.2/lib/python3.8/site-packages/numpy/__init__.py
matplotlib version : 3.2.1
matplotlib module : /apps/python/3.8.2/lib/python3.8/site-packages/matplotlib/__init__.py

Notice that the paths are all prefixed with /apps/python/3.8.2

With python3 installed under /apps, you can provide python packages to all users by running pip3 install with the python/3.8.2 module loaded. Alternatively, users can take advantage of python3 virtual environments to manage their application dependencies in local isolated environments.

To illustrate how this works, we'll use virtual environments on our test application in the pythondemo-slurm_gcp repository.

Install packages with venv and pip3

  1. Load the python/3.8.2 module
$ module load python/3.8.2
  1. Start the virtual environment with venv. Notice that this creates a new subdirectory ./env/
$ python3 -m venv env
  1. Activate the virtual environment. The activation step is what configures your environment to install python packages in your isolated ./env/lib/python3.8/site-packages/ path.
$ source ./env/bin/activate
  1. Install packages with pip3
(env) $ pip3 install numpy scipy matplotlib

Notice that root privileges are not required to install packages under virtual environments.

Saving Dependencies

To create a file with your application dependencies, run

(env) $ pip3 freeze > requirements.txt

and commit requirements.txt to your repository.

Application users, colleagues, and team-members can then use python3 venv and your requirements.txt file to use your application on any system with python3 installed. After cloning your repository, users can run the following commands, provided python3 is in their path

$ python3 -m venv env
$ source ./env/bin/activate
(env) $ pip3 install -r requirements.txt

Application testing

To run the application within the virtual environment, you will modify the test.batch script to activate the virtual environment on the compute node prior to running the application.

  1. Modify test.batch by adding a line to activate the virtual environment after loading the python/3.8.2 module. You should also deactivate the virtual environment at the end of the batch file. Your batch file should look similar to the code block below
  1 #!/bin/bash
  2 #
  3 #SBATCH --account=default
  4 #
  5 #SBATCH --partition=v-compute
  6 #
  7 #SBATCH --ntasks=1
  8 #
  9 #SBATCH -e test.out
 10 #
 11 #SBATCH -o test.out
 12 #
 13 #//////////////////////////////////////#
 14 
 15 module purge
 16 module load python/3.8.2
 17 
 18 # Activate the current virtual environment
 19 source ./env/bin/activate
 20 
 21 which python3
 22 python3 ./test.py
 23 deactivate
  1. Submit the batch job and wait for the job to complete. At the end of the job, STDERR and STDOUT will be written to a file called test.out
$ sbatch test.batch
  1. When the job is complete, you can check the contents of test.out.
$ cat test.out 
/home/joe/pythondemo-slurm_gcp/env/bin/python3
scipy version : 1.4.1
scipy module : /home/joe/pythondemo-slurm_gcp/env/lib/python3.8/site-packages/scipy/__init__.py
numpy version : 1.18.2
numpy module : /home/joe/pythondemo-slurm_gcp/env/lib/python3.8/site-packages/numpy/__init__.py
matplotlib version : 3.2.0
matplotlib module : /home/joe/pythondemo-slurm_gcp/env/lib/python3.8/site-packages/matplotlib/__init__.py

Notice that the test application indicates that we are using the python packages that are installed under the isolated virtual environment.

In this workflow, developers can use a combination of Docker, Cloud Build, and Singularity to launch their applications on fluid-slurm-gcp.

In this section of the codelab, you will use a Dockerfile for a python application and build the container image with Cloud Build. We do this so that we can use the Google Container Registry, which only supports Docker images. From here, you will use Singularity to pull the Docker image onto your cluster. This gives you a container image that you can run without requiring privilege escalation on the cluster.

Build a Docker Image

  1. On your local system, configure gcloud to use the same project where your fluid-slurm-gcp cluster is hosted.
$ gcloud config set project PROJECT-ID
  1. On your local system, clone the pythondemo-slurm_gcp repository and check out the feature/singularity-workflow branch
$ git clone https://bitbucket.org/fluidnumerics/pythondemo-slurm_gcp.git
$ cd pythondemo-slurm_gcp
$ git checkout feature/singularity-workflow
  1. Examine the Dockerfile provided in the pythondemo-slurm_gcp repository. This Dockerfile contains a recipe to install python application dependencies and to install our application from our repository to the /apps directory inside the container image.
  1 FROM python:3.8.2-buster
  2 
  3 # Install Python application dependencies
  4 RUN pip3 install numpy scipy matplotlib
  5 
  6 # Copy the current directory from the host to /apps/ within the container image
  7 COPY . /apps
  1. Examine the cloudbuild.yaml provided in the pythondemo-slurm_gcp repository. This file instructs Cloud Build to create a Docker image and to host it in your projects' private Google Container Registry.
  1 steps:
  2 
  3 - id: Application Image Build
  4   name: 'gcr.io/cloud-builders/docker'
  5   args: ['build',
  6          '.',
  7          '-t',
  8          'gcr.io/${PROJECT_ID}/pythondemo_slurm-gcp:latest',
  9          ]
 10 
 11 images: ['gcr.io/${PROJECT_ID}/pythondemo_slurm-gcp:latest']
  1. Create the image using Cloud Build. This step can take up to 5 minutes to complete.
$ gcloud builds submit .

In these steps, you used Cloud Build to create a Docker image from a Dockerfile. This image was then posted the Google Container Registry in your GCP project at gcr.io/PROJECT-ID/pythondemo_slurm-gcp:latest.

Build the Singularity image

  1. On your fluid-slurm-gcp login node, navigate to your local download of the pythondemo-slurm_gcp repository and check out the feature/singularity-workflow branch
$ cd ~/pythondemo-slurm_gcp
$ git checkout feature/singularity-workflow
  1. On your fluid-slurm-gcp login node, configure the Docker credentials helper. Singularity will use the file ~/.docker/config.json to establish credentials to access the Container Registry.
$ gcloud auth configure-docker 
  1. Load the singularity module
$ module load singularity
  1. Pull the docker image with singularity, replacing PROJECT ID with your GCP project id
$ singularity pull docker://gcr.io/PROJECT ID/pythondemo_slurm-gcp
  1. After the image pull is complete, you can inspect the image metadata
$ singularity inspect pythondemo_slurm-gcp_latest.sif 
==labels==
org.label-schema.build-date: Tuesday_17_March_2020_1:59:20_UTC
org.label-schema.schema-version: 1.0
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: gcr.io/fluid-cluster-ops/pythondemo_slurm-gcp
org.label-schema.usage.singularity.version: 3.2.1

Application Testing

  1. Edit the account and partition settings in test.batch to match your account and partition settings.
  1. Submit the batch job and wait for the job to complete. At the end of the job, STDERR and STDOUT will be written to test.out
$ sbatch test.batch
  1. When the job is complete, you can check the contents of test.out.
$ cat test.out 
/usr/local/bin/python
scipy version : 1.4.1
scipy module : /usr/local/lib/python3.8/site-packages/scipy/__init__.py
numpy version : 1.18.1
numpy module : /usr/local/lib/python3.8/site-packages/numpy/__init__.py
matplotlib version : 3.2.0
matplotlib module : /usr/local/lib/python3.8/site-packages/matplotlib/__init__.py

Notice that the paths are all prefixed with /usr/local/lib.
Keep in mind, these paths are paths within the Singularity container, and not on the host system (the fluid-slurm-gcp compute nodes)

In this codelab, you experimented with three different strategies for handling python package dependencies. These strategies covered centralized package management, python virtual environments, and application containerization.

Let us know how we did

Submit your feedback and request new codelabs

Additional Codelabs

Learn how to configure a high availability compute partition (multi-zone)

Learn how to configure a globally scalable compute partition (multi-region)

Reference docs

https://help.fluidnumerics.com/slurm-gcp

https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/