AlphaFold2 / RoseTTAFold ๐
Alphafold2 relies on old toolchains which won't be supported any longer in the near future. Please consider using AlphaFold 3 instead
| Container | Native | |
|---|---|---|
| AlphaFold | โ | โ |
| Alphafold 2.1.1 (Multimer) | โ | |
| RoseTTAFold | ๐ง WIP | โ |
| Version | skylake (gpuv100) | zen3 (gpu2080, gputitanrtx, gpu3090, gpuv100, gpuhgx ) |
|---|---|---|
| 2.0.0 | ||
| 2.1.1 | ||
| 2.1.2 | module load palma/2021a module load foss/2021a module load AlphaFold/2.1.2 |
AlphaFold 2 ๐
Detailed information can be found at: https://github.com/deepmind/alphafold
Genetic Databases ๐
Alphafold and RoseTTAFold are using distinct data bases optimized for the corresponding algorithms. The Alphafold database can be found here:
/Applic.HPC/data/alphafold/
|-- bfd
|-- mgnify
|-- params
|-- pdb70
|-- pdb_mmcif
|-- pdb_seqres
|-- small_bfd
|-- uniclust30
|-- uniprot
`-- uniref90
The complete database size is around 5TB. It takes more than 50h to download and unpack them. Therefore: PLEASE DO NOT DOWNLOAD THESE DATABASES AGAIN!
Native ๐
Interactive session ๐
Alphafold has been updated to the latest version 2.1.1 including the multimer feature and compiled for the skylake-GPU as well as Zen3 nodes.
Before you start, do the following steps
- Create a suitable directory for your calculations on scratch, e.g.
/scratch/tmp/$USER/AlphaFold/ - Create sub-directories for any locations you additionally want to use inside the container (here we create a
resultsfolder as well as a folder for storing the initialfastafile)
For an interactive session on the GPGPU Node the Alphafold module can be loaded:
module load palma/2020b
module load fosscuda
module load AlphaFold/2.1.1
For executing Alphafold(2.1.1) you need to create a folder in your scratch directory and copy your sequence file such as fasta into it.
Submission to the batch system ๐
For submission to the batch system, the following Script can be adapted:
Adjust the job script for your data! Don't just copy-paste it and expect it to work.
#!/bin/bash
#SBATCH --partition=gpuv100
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --gpus=1
#SBATCH --gpus-per-node=1
#SBATCH --cpus-per-task=6
#SBATCH --mem=60G
#SBATCH --time=1-23:59:00
#SBATCH --job-name=alphafold
#SBATCH --mail-type=ALL
#SBATCH --mail-user=your_account@uni-muenster.de
module load palma/2021a
module load foss
module load ml AlphaFold/2.1.1-CUDA-11.3.1
wait
export ALPHAFOLD_DATA_DIR=/Applic.HPC/data/alphafold ย
alphafold \
--fasta_paths=Input_path \
--model_preset=multimer \ #Default is Monomer
--output_dir=/scratch/tmp/$USER/Alphafold/Results \
--max_template_date=2021-11-25 \
--is_prokaryote_list=false \
--db_preset=reduced_dbs \
--data_dir=/Applic.HPC/data/alphafold \
ย
Container ๐
The execution of Docker containers on the cluster is not allowed due to security reasons. Therefore we provide a container image for Singularity (a containerization software for HPC purposes):
- for skylake nodes (normal, gpuv100):
/Applic.HPC/container/alphafold_skylake-latest.sif - for ivybridge/sandybridge nodes (gputitanrtx, gpu2080):
/Applic.HPC/container/alphafold_ivybridge-latest.sif
We created an AlphaFold module, automatically loading Singularity and setting the environment variable $ALPHAFOLD_SIFIMAGE to point to the correct path.
Starting AlphaFold ๐
You can find an example job script of how to run AlphaFold on PALMA below. Before you start, do the following stepts
- Create a suitable directory for your calculations on scratch, e.g.
/scratch/tmp/$USER/AlphaFold/ - Create sub-directories for any locations you additionally want to use inside the container (here we create a
resultsfolder as well as a folder for storing the initialfastafile)- Those directories have to be bind-mounted into the container! (The -B flag in the singularity
runcommand)
- Those directories have to be bind-mounted into the container! (The -B flag in the singularity
- Create a Job-Script or use an interactive SLURM session to request resources on the cluster. You should request a minimum of 8 cores and 64GB of memory. GPUs are supported as well.
Adjust the job script for your data! Don't just copy-paste it and expect it to work.
#!/bin/bash
#SBATCH --partition=gpu2080
#SBATCH --nodes=1
#SBATCH --gres=gpu:4
#SBATCH --cpus-per-task=24
#SBATCH --mem=170G
#SBATCH --time=12:00:00
#SBATCH --job-name=alphafold
module load AlphaFold/2.0.0-singularity
singularity run \
--env TF_FORCE_UNIFIED_MEMORY=1,XLA_PYTHON_CLIENT_MEM_FRACTION=4.0 \
-B /Applic.HPC/data/alphafold:/data \
-B .:/etc \
-B ./results:/results \
-B ./fasta:/fasta \
--pwd /app/alphafold \
--nv $ALPHAFOLD_SIFIMAGE \
--fasta_paths /fasta/Chitin-synthase-deacetylase.fasta \
--output_dir /results/ \
--max_template_date 2021-07-31 \
--data_dir /data/ \
--uniref90_database_path /data/uniref90/uniref90.fasta \
--mgnify_database_path /data/mgnify/mgy_clusters.fa \
--small_bfd_database_path /data/small_bfd/bfd-first_non_consensus_sequences.fasta \
--pdb70_database_path /data/pdb70/pdb70 \
--template_mmcif_dir /data/pdb_mmcif/mmcif_files \
--obsolete_pdbs_path /data/pdb_mmcif/obsolete.dat \
--model_names model_1,model_2,model_3,model_4,model_5 \
--preset reduced_dbs