AlphaFold2 / RoseTTAFold ๐Ÿ”—

๐Ÿ“ฃ

Alphafold2 relies on old toolchains which won't be supported any longer in the near future. Please consider using AlphaFold 3 instead


ContainerNative
AlphaFoldโœ…โœ…
Alphafold 2.1.1 (Multimer)โœ…
RoseTTAFold๐Ÿšง WIPโœ…
Versionskylake (gpuv100)zen3 (gpu2080, gputitanrtx, gpu3090, gpuv100, gpuhgx )
2.0.0
2.1.1
2.1.2module load palma/2021a
module load foss/2021a
module load AlphaFold/2.1.2

AlphaFold 2 ๐Ÿ”—

Detailed information can be found at: https://github.com/deepmind/alphafold

Genetic Databases ๐Ÿ”—

Alphafold and RoseTTAFold are using distinct data bases optimized for the corresponding algorithms. The Alphafold database can be found here:

/Applic.HPC/data/alphafold/ 
|-- bfd
|-- mgnify
|-- params
|-- pdb70
|-- pdb_mmcif
|-- pdb_seqres
|-- small_bfd
|-- uniclust30
|-- uniprot
`-- uniref90

The complete database size is around 5TB. It takes more than 50h to download and unpack them. Therefore: PLEASE DO NOT DOWNLOAD THESE DATABASES AGAIN!

Native ๐Ÿ”—

Interactive session ๐Ÿ”—

Alphafold has been updated to the latest version 2.1.1 including the multimer feature and compiled for the skylake-GPU as well as Zen3 nodes.

Before you start, do the following steps

  • Create a suitable directory for your calculations on scratch, e.g. /scratch/tmp/$USER/AlphaFold/
  • Create sub-directories for any locations you additionally want to use inside the container (here we create a results folder as well as a folder for storing the initial fasta file)

For an interactive session on the GPGPU Node the Alphafold module can be loaded:

module load palma/2020b
module load fosscuda
module load AlphaFold/2.1.1

For executing Alphafold(2.1.1) you need to create a folder in your scratch directory and copy your sequence file such as fasta into it.

Submission to the batch system ๐Ÿ”—

For submission to the batch system, the following Script can be adapted:

โš ๏ธ

Adjust the job script for your data! Don't just copy-paste it and expect it to work.

#!/bin/bash
#SBATCH --partition=gpuv100
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --gpus=1
#SBATCH --gpus-per-node=1
#SBATCH --cpus-per-task=6
#SBATCH --mem=60G
#SBATCH --time=1-23:59:00
#SBATCH --job-name=alphafold
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de

module load palma/2021a
module load foss
module load ml AlphaFold/2.1.1-CUDA-11.3.1
wait 
export ALPHAFOLD_DATA_DIR=/Applic.HPC/data/alphafold ย 
alphafold \
 --fasta_paths=Input_path \
 --model_preset=multimer \ 								#Default is Monomer
 --output_dir=/scratch/tmp/$USER/Alphafold/Results \
 --max_template_date=2021-11-25 \
 --is_prokaryote_list=false \
 --db_preset=reduced_dbs \
 --data_dir=/Applic.HPC/data/alphafold \
 ย  

Container ๐Ÿ”—

The execution of Docker containers on the cluster is not allowed due to security reasons. Therefore we provide a container image for Singularity (a containerization software for HPC purposes):

  • for skylake nodes (normal, gpuv100): /Applic.HPC/container/alphafold_skylake-latest.sif
  • for ivybridge/sandybridge nodes (gputitanrtx, gpu2080): /Applic.HPC/container/alphafold_ivybridge-latest.sif

We created an AlphaFold module, automatically loading Singularity and setting the environment variable $ALPHAFOLD_SIFIMAGE to point to the correct path.

Starting AlphaFold ๐Ÿ”—

You can find an example job script of how to run AlphaFold on PALMA below. Before you start, do the following stepts

  • Create a suitable directory for your calculations on scratch, e.g. /scratch/tmp/$USER/AlphaFold/
  • Create sub-directories for any locations you additionally want to use inside the container (here we create a results folder as well as a folder for storing the initial fasta file)
    • Those directories have to be bind-mounted into the container! (The -B flag in the singularity run command)
  • Create a Job-Script or use an interactive SLURM session to request resources on the cluster. You should request a minimum of 8 cores and 64GB of memory. GPUs are supported as well.
โš ๏ธ

Adjust the job script for your data! Don't just copy-paste it and expect it to work.

AlphaFold Example Job Script
#!/bin/bash

#SBATCH --partition=gpu2080
#SBATCH --nodes=1
#SBATCH --gres=gpu:4
#SBATCH --cpus-per-task=24
#SBATCH --mem=170G
#SBATCH --time=12:00:00
#SBATCH --job-name=alphafold

module load AlphaFold/2.0.0-singularity

singularity run \
 --env TF_FORCE_UNIFIED_MEMORY=1,XLA_PYTHON_CLIENT_MEM_FRACTION=4.0 \
 -B /Applic.HPC/data/alphafold:/data \
 -B .:/etc \
 -B ./results:/results \
 -B ./fasta:/fasta \
 --pwd /app/alphafold \
 --nv $ALPHAFOLD_SIFIMAGE \
 --fasta_paths /fasta/Chitin-synthase-deacetylase.fasta \
 --output_dir /results/ \
 --max_template_date 2021-07-31 \
 --data_dir /data/ \
 --uniref90_database_path /data/uniref90/uniref90.fasta \
 --mgnify_database_path /data/mgnify/mgy_clusters.fa \
 --small_bfd_database_path /data/small_bfd/bfd-first_non_consensus_sequences.fasta \
 --pdb70_database_path /data/pdb70/pdb70 \
 --template_mmcif_dir /data/pdb_mmcif/mmcif_files \
 --obsolete_pdbs_path /data/pdb_mmcif/obsolete.dat \
 --model_names model_1,model_2,model_3,model_4,model_5 \
 --preset reduced_dbs