In this tutorial, you will learn nextflow script that uses
Containerised applications are highly portable and reproducible for scientific applications. Fortunately, Nextflow smoothly supports integration with popular containers ( e.g., Docker and Singularity) to provide a light-weight virtualisation layer for running software applications. You can either create your own Docker/Singularity image or download pre-existing one from a container registry. Please note that you can only work with Singularity containers on Puhti as docker containers require prevelized access which CSC users don’t have it on Puhti.
When working with Nextflow scripts using containers, pay attention to the following things:
Let’s download material needed for this tutorial from github as shown below:
cd /scratch/project_xxxx/$USER/nextflow_tutorial 
module load git 
git clone https://github.com/yetulaxman/nf_coverage_demo.git
cd nf_coverage_demo
git clone https://github.com/iarcbioinfo/data_test
Here is a simple example syntax (for an alternative approach, see profiles section below) to use docker/singularity containers:
## For Docker
nextflow run <nextflow_script>  -with-docker <image_path> # e.g.,image_path = docker://biocontainers/fastqc:v0.11.9_cv7
## For Singularity
nextflow run <nextflow_script>  -with-singularity <image_path>    
Because of the way how nextflow works with containers, you don’t need to have software (e.g., fastqc) installed on your machine. It will download container image and  uses fastqc from the image.
We often need to add some other attributes besides a container flag as mentioned above. This is accomplished using profiles. A profile is a set of configuration attributes that can be activated/chosen when launching a pipeline execution.  When a workflow script is launched, Nextflow first looks for a file named nextflow.config in the current directory and in the workflow (or script base) directory (if different from current directory). Finally, it checks for the file $HOME/.nextflow/config. Configuration files can contain the definition of one or more profiles.
Example profiles are shown below:
profiles {
 docker {
    docker.enabled = true 
    process.container = 'iarcbioinfo/nf_coverage_demo:v2.3'
    pullTimeout = "200 min"
  }
  
  singularity {
    singularity.enabled = true 
    singularity.autoMounts = true
    process.container = 'shub://IARCbioinfo/nf_coverage_demo:v2.3'
    pullTimeout = "200 min"
  }
}
copy above script and paste in nextflow.config file which is located in current directory.
You can then launch nf_coverage workflow (from nf_coverage_demo folder) with defined profiles as shown below:
module load bioconda
source activate nextflow
nextflow run plot_coverage.nf  \
          -profile singularity \
          --bam_folder data_test/BAM/BAM_multiple/ \
          --bed data_test/BED/TP53_exon2_11.bed
Nextflow provides options for reporting and visualisation your pipeline using the following nextflow flags:
-with-dag
-with-timeline
-with-report
You can either use the flags in commandline or add each feature to config file as discussed below:
dagEither use the following flag (-with-dag) when launching script as below:
nextflow run  <nextflow_script>  -with-dag <file-name>.dot
or add the following script to nextflow.config file at the end.
dag {
  enabled = true
  file="dag.png"
}
timelineEither use the following flag (-with-timeline) when launching script as below:
nextflow run <nextflow_script> -with-timeline <file-name>.html
or add the following script to nextflow.config file at the end.
timeline {
  enabled = true
}
reportEither use the following flag (-with-report) when launching script as below:
nextflow run <nextflow_script> -with-report <file-name>.html
or add the following script to nextflow.config file at the end.
report {
  enabled = true
}
traceEither use the following flag (-with-trace) when launching script as below:
nextflow run <nextflow_script> -with-trace <file-name>.txt
or add the following script to nextflow.config file at the end.
trace {
  enabled = true
}
For the convenience of this tutorial, configure all visualisation features (i.e., dag/timeline/reports/trace) into nextflow.config file.
Once you have configured profiles for singularity and enabled reporting/visualisation features in nextflow.config file, you can use the following batch script to submit on Puhti:
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --partition=test
#SBATCH --account=project_xxx
export TMPDIR=$PWD
export SINGULARITY_TMPDIR=$PWD
export SINGULARITY_CACHEDIR=$PWD
unset XDG_RUNTIME_DIR
# Activate  Nextflow on Puhti
module load bioconda
source activate nextflow
# Nextflow command here
nextflow run plot_coverage.nf  \
       -profile singularity  \
       --bam_folder data_test/BAM/BAM_multiple/  \
       --bed data_test/BED/TP53_exon2_11.bed 
copy and paste above script to a file (nf_coverage.sh), replace project number with correct one in slurm directives and finally submit sbatch script to Puhti cluster:
rm -fr work/    # remove previous analysis results
rm *.html *.png trace.txt # remove these visualisation files if any
sbatch nf_coverage.sh # start a fresh job
Copy all nextflow report and visualisation files from working directory (i.e., .html, .dot and .txt files) to home directory to view them from your local browser.
mkdir -p $HOME/nextflow_output
cp *.html *.png *.txt *.pdf  $HOME/nextflow_output
One has to open a port on Puhti login node to access files on your Puhti home directory from your local computer via browser. In this course, every participant should have a unique port number opened on Puhti login node. Open a new terminal on your local machine and replace $port value with some random number (e.g., a number between 7000 and 9000) before executing the following command:
ssh -L $port:localhost:$port <your_csc_username>@puhti.csc.fi  # e.g., with port number: 7077 
                                                              # ssh -L 7077:localhost:7077 <username>@puhti.csc.fi 
and then run the following command (also use the same port value that you have slected before) on the login node:
python3 -m http.server $port # with port number: 7077 -> python3 -m http.server 7077
Point your browser to http://localhost:$port (remember to replace your port number with $port) on your local machine. You can now view all files available on your Puhti home directory.