Data transfers 🔗

Data Transfers via the command line 🔗

There are the usual ways, how you can transfer your data to the cluster:

  • scp: The regular way of transferring data in Linux
  • rsync: Also a standard tool in Linux like scp, but has the advantage that it can resume broken transfers as it normally keeps the metadata of the files
  • rclone: A mighty tool for data transfers, normally to cloud storage. May also be utilized for data transfers to Palma and might be faster than the former solutions when transferring many small files
  • WinSCP: The standard tool when using Windows

Data Transfers via the web interface 🔗

This part is still in a testing phase! Please be careful when using it! 🔗

If you have access to Palma, you can use a Nextcloud for data transfers. To make use of it, do the following

  • Create a directory called "transfer" at /scratch/tmp/$USER: "mkdir /scratch/tmp/$USER/transfer"
  • Log in at https://palma.uni-muenster.de/nextcloud/
  • Click on the "scratch" directory. (You might have to click multiple times in the first run). Data you put in here will be stored in the scratch directory. Also, data that you put in /scratch/tmp/$USER/transfer will be visible here
  • The recommended way to use this is to upload your data via the webinterface and move it to another location on the cluster afterwards. The speed of the nextcloud will probably decrease, if the directory becomes too crowded.
  • In future iterations, this is intended to be used for realizing data transfers between different HPC clusters.

Transferring data to the Nextcloud via rclone 🔗

⚠️

If you are trying to use rclone from outside of the university, you should not connect via jumphost

If you want to copy data to and from Palma from the outside and have rclone available, it can be used for a performant data transfer:

Setting up rclone 🔗

  • rclone config
    • n) New remote
    • name> palma-web
    • Storage> WebDAV
    • url> https://palma-web.uni-muenster.de/remote.php/dav/files/$username (replace $username with your user id)
    • vendor> 2
    • user> $username (replace $username with your user id)
    • y/g/n> y (y) Yes, type in my own password)
    • password: Create an app password in the web interface ( https://palma.uni-muenster.de/nextcloud/ ) at Settings → Security and put it here.
    • bearer_token> (Leave empty)
    • n) No (default)
    • y) Yes this is OK (default)
    • q) Quit config

Data transfer with rclone 🔗

  • From remote to palma:

    rclone copy folder palma-web:scratch/ --progress --verbose

  • From palma to remote:

    rclone  copy palma-web:scratch/_folder_ . --progress --verbose

See also: https://linuxpip.org/rclone-examples

Mounting webdav (e.g. sciebo or Nextcloud) as a filesystem with rclone 🔗

  1. Load necessary modules: module load palma/2023a clone

  2. Setup the remote host with rclone config interactive command line tool, as shown above:

    1. type in the name (for example sciebo)
    2. select the storage type: WebDAV (option 47)
    3. type in the URL: https://uni-muenster.sciebo.de/remote.php/webdav/
    4. select the vendor: Owncloud (option 3)
    5. type in the username: your sciebo username (@uni-muenster.de)
    6. type in the password: your sciebo password (it will be stored in a config file encrypted)

    Skip all other settings and save. At the end there should be a new entry in the

    ~/.config/rclone/rclone.conf

    file that looks something like this

    [sciebo]
    type = webdav
    url = https://uni-muenster.sciebo.de/remote.php/webdav/
    vendor = owncloud
    user = <user>@uni-muenster.de
    pass = xxx
    
  3. Create a directory where you want to mount sciebo, for example

    mkdir ~/sciebo

  4. Mount sciebo using rclone mount command

    rclone mount sciebo: ~/sciebo

The mount stays active while the command runs, when you interrupt it with ctrl+c, the mount will be unmounted

  1. All content on sciebo will appear in the ~/sciebo directory as local files and you can copy them, edit them, create new files, etc.

  2. Unmount by interrupting the call from step 4

  3. In case unmounting fails for any reason (like for example you still using the directory when rclone is stopped, remounting will not work (not even with --allow-non-empty) and you will have to unmount manually using: fusermount -u ~/sciebo

Transferring files from Palma to storage server outside uni-muenster 🔗

Another scenario is uploading your research data to a server outside the University of Münster. As an example, we cover here an upload to the public server of the National Center of Biotechnology Information (NCBI), but it should generalize well to other use cases too. As usual, if in doubt, feel free to contact us at hpc@uni-muenster.de or on Mattermost. Even in this case, you can use rclone.

Setting up rclone 🔗

First, you will need to setup your remote upload by running:

rclone config

This will start an interface that will guide you in setting up your remote, asking you step-by-step for the following:

  • The name of the new remote > your free choice, but the more informative, the better. In our example, NCBI.
  • The type of storage > Check on the info page of the remote storage server. Here, ftp
  • Username > the name of the user account on the remote server that you want to use for the upload
  • Password > first "y" and then you will be prompted to type the password to the host server
  • Left the other entries as default (Press Enter)
  • Quit config > q

Create a folder at the target location (optional) 🔗

Easily done with rclone mkdir:

rclone mkdir NCBI:path/to/your/usershare/yourfolder

Files Upload 🔗

You can copy your files directly from the login node, by adapting the rclone copy command explained in the previous sections. Alternatively, you can also submit the command via sbatch, leveraging the advantage of a non-interactive job that would run also when you close your session and in case the login node goes down (e.g., due to maintenance).

For the latter option, here is a template of a submission script:

#!/bin/bash
#SBATCH --partition=normal
#SBATCH --nodes=1
#SBATCH --mem=20G
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --time=00:01:00 [!!!] CHANGE THE TIME ACCORDING TO YOUR EXPECTATIONS
#SBATCH --job-name=my-transfer
#SBATCH --output=my-transfer.log
#SBATCH --mail-type=ALL
#SBATCH --mail-user=$USER@uni-muenster.de

rclone -v --progress --bwlimit=100M copy /scratch/tmp/path/to/data NCBI:/path/to/your/usershare/yourfolder