diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index f60257e03..6269fe644 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -138,7 +138,7 @@ Please see the [official docs](https://squidfunk.github.io/mkdocs-material/refer
- Note that we have two custom admonitions: `exercise` and `result` (alias `solution`).
- `!!!` does a regular admonition, `???` makes it collapsed (click to expand).
-- Intendation is important! Make sure you check the rendered site, as it's easy to make a mistake.
+- Indentation is important! Make sure you check the rendered site, as it's easy to make a mistake.
## Known limitations
diff --git a/docs/advanced/index.md b/docs/advanced/index.md
index 89e363df1..a2576c3fc 100644
--- a/docs/advanced/index.md
+++ b/docs/advanced/index.md
@@ -2,9 +2,10 @@
Welcome to our Nextflow workshop for intermediate and advanced users!
-In this workshop, we will explore the advanced features of the Nextflow language and runtime, and learn how to use them to write efficient and scalable data-intensive workflows. We will cover topics such as parallel execution, error handling, and workflow customization.
+In this workshop, we will explore the advanced features of the Nextflow language and runtime, and learn how to use them to write efficient and scalable data-intensive workflows.
+We will cover topics such as parallel execution, error handling, and workflow customization.
-Please note that this is not an introductory workshop, and we will assume some basic familiarity with Nextflow.
+Please note that this is not an introductory workshop, and we will assume extensive familiarity with Nextflow.
By the end of this workshop, you will have the skills and knowledge to create complex and powerful Nextflow pipelines for your own data analysis projects.
@@ -34,9 +35,12 @@ Please note that this is **not** a beginner's workshop and familiarity with Next
- Familiarity with Nextflow and Groovy
- An understanding of common file formats
-## Follow the training video
+## Follow the training videos and get help
-We run a free online training event for this course approximately every six months. Videos are streamed to YouTube and questions are handled in the nf-core Slack community. You can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-sept-2023/)) below:
+Video recordings are available for this course.
+You can ask questions in the [Seqera community forum](https://community.seqera.io/).
+
+You can watch the recording of the most recent training ([September, 2023](https://nf-co.re/events/2023/training-sept-2023/)) below:
diff --git a/docs/basic_training/cache_and_resume.md b/docs/basic_training/cache_and_resume.md
index 3d4713c08..a21903677 100644
--- a/docs/basic_training/cache_and_resume.md
+++ b/docs/basic_training/cache_and_resume.md
@@ -349,7 +349,7 @@ So D is matched with 'a' here, which was not the intention. That order will like
??? solution
- You should see that while FOO and BAR reliably re-use their cache, FOOBAR will re-run at least a subset of its tasks due to differences in the combinations of inputs it recieves.
+ You should see that while FOO and BAR reliably re-use their cache, FOOBAR will re-run at least a subset of its tasks due to differences in the combinations of inputs it receives.
The output will look like this:
diff --git a/docs/basic_training/index.md b/docs/basic_training/index.md
index bf224d32c..686a30a99 100644
--- a/docs/basic_training/index.md
+++ b/docs/basic_training/index.md
@@ -1,6 +1,7 @@
# Fundamentals Training
-You are now on the path to writing reproducible and scalable scientific workflows using Nextflow. This guide complements the full [Nextflow documentation](https://www.nextflow.io/docs/latest) - if you ever have any doubts, please refer to that.
+You are now on the path to writing reproducible and scalable scientific workflows using Nextflow.
+This guide complements the full [Nextflow documentation](https://www.nextflow.io/docs/latest) - if you ever have any doubts, please refer to that.
Let's get started!
@@ -18,7 +19,10 @@ By the end of this course you should:
## Audience & prerequisites
-Please note that this is **not** a beginner's workshop and familiarity with Nextflow, the command line, and common file formats is assumed.
+Please note that this is **not** a beginner's workshop.
+Familiarity with Nextflow, the command line, and common file formats is assumed.
+
+For a beginner's introduction to Nextflow, please see the [Hello Nextflow](../hello_nextflow/) course.
**Prerequisites**
@@ -26,9 +30,10 @@ Please note that this is **not** a beginner's workshop and familiarity with Next
- Experience with command line
- An understanding of common file formats
-## Follow the training videos
+## Follow the training videos and get help
-Free online training events for this course are run approximately every six months. Videos are streamed to YouTube and questions are handled in the nf-core Slack community.
+Video recordings are available for this course.
+You can ask questions in the [Seqera community forum](https://community.seqera.io/).
You can watch the recording of the most recent training ([March, 2024](https://nf-co.re/events/2024/training-foundational-march)) in the [YouTube playlist](https://youtu.be/dbOKB3VRpuE?si=MYBy4-gjRfEYkVRM) below:
@@ -38,7 +43,7 @@ You can watch the recording of the most recent training ([March, 2024](https://n
!!! warning
- Please note that the training material is updated regularly and that the videos may be out of date.
+ Please note that the training material is updated regularly and that the videos linked below may be out of date.
If English is not your preferred language, you may find it useful to follow the training from the [March 2023 event](https://nf-co.re/events/2023/training-march-2023), which is available in multiple languages.
diff --git a/docs/envsetup/01_setup.md b/docs/envsetup/01_setup.md
index 820689d21..c9571dc48 100644
--- a/docs/envsetup/01_setup.md
+++ b/docs/envsetup/01_setup.md
@@ -1,32 +1,34 @@
# Gitpod
-Gitpod is a cloud development environment for teams to efficiently and securely develop software. It can improve your developer experience by coding in a cloud development environment.
+Gitpod is a cloud-based development environment for teams to efficiently and securely develop software.
+We use it to provide a consistent training environment for everyone.
## Creating a Gitpod account
-You can create a free [Gitpod](https://gitpod.io/) account using your GitLab, GitHub, or Bitbucket account.
+You can create a free [Gitpod](https://gitpod.io/) account from the [Gitpod login page](https://gitpod.io/login/).
-You can create an account using the [Gitpod login page](https://gitpod.io/login/).
+You will be prompted to choose between 'Gitpod Flex' and 'Gitpod Classic'.
+Select 'Gitpod Classic' and click 'Continue'.
-![Gitpod log in](img/login.png)
-
-It is best to connect your LinkedIn account to receive a full 50 hours usage allocation.
+![Select 'Gitpod Classic'](img/select_gitpod_classic.png)
-![Gitpod log in one step](img/onestepaway.png)
+Next, log in using your GitHub account.
-After selecting your preferred editor, theme, and profile details, click continue and your account will be created and ready to use.
+![Gitpod log in](img/login.png)
-!!! note
+You may need to fill out an additional form or two.
+When prompted to connect a LinkedIn account, we recommend doing so if you have one, to receive the extra 50 hours usage allocation.
+Don't worry too much if you don't have one; the basic allocation is more than enough to work through the introductory training course.
- It is recommended to use the VS code editor.
+If you are prompted to select your preferred editor, we strongly recommend choosing the VSCode editor, as that is what we use for Nextflow development in general and for trainings in particular.
## Running Gitpod
-Click the following URL to run Gitpod:
+Once you are logged in to Gitpod, open this link in your browser to open the training environment:
-This URL is the Nextflow training repository prefixed with `https://gitpod.io/#`.
+This URL is the address to the Nextflow training repository prefixed with `https://gitpod.io/#`.
-Alternatively, you can click on the button below.
+Alternatively, you can click on the button shown below from the many pages in the training portal where it is displayed.
[![Open Gitpod](https://img.shields.io/badge/Gitpod-%20Open%20in%20Gitpod-908a85?logo=gitpod)](https://gitpod.io/#https://github.com/nextflow-io/training)
@@ -34,18 +36,20 @@ If you are already logged in, your Gitpod environment will start to load.
### Explore your Gitpod IDE
-After Gitpod has loaded, you should see something similar to the following:
+After Gitpod has loaded, you should see something similar to the following (which may in light mode depending on your account preferences):
![Gitpod welcome](img/gitpod.welcome.png)
+This is the interface of the VSCode IDE, a popular code development application that we recommend using for Nextflow development.
+
- **The sidebar** allows you to customize your Gitpod environment and perform basic tasks (copy, paste, open files, search, git, etc.). You can click the explorer button to see which files are in this repository.
- **The terminal** allows you to run all the programs in the repository. For example, both `nextflow` and `docker` are installed and can be executed.
- **The file explorer** allows you to view and edit files. Clicking on a file in the explorer will open it within the main window.
-- **The Simple Browser** lets you view the nf-training material browser (). If you close it by accident, you can load the simple browser again by typing the following in the terminal: `gp preview https://training.nextflow.io`.
+- **The Simple Browser** lets you view the training instructions in a web browser (). If you close it by accident, you can load the simple browser again by typing the following in the terminal: `gp preview https://training.nextflow.io`.
### Gitpod resources
-Gitpod gives you 500 free credits per month, which is equivalent to 50 hours of free environment runtime using the standard workspace (up to 4 cores, 8 GB RAM, and 30 GB storage).
+Gitpod gives you up to 500 free credits per month, which is equivalent to 50 hours of free environment runtime using the standard workspace (up to 4 cores, 8 GB RAM, and 30 GB storage).
There is also a large workspace option that gives you up to 8 cores, 16GB RAM, and 50GB storage. However, the large workspace will use your free credits quicker and you will have fewer hours of access to this space.
diff --git a/docs/envsetup/02_local.md b/docs/envsetup/02_local.md
index 919d31a2d..b31b91614 100644
--- a/docs/envsetup/02_local.md
+++ b/docs/envsetup/02_local.md
@@ -1,6 +1,6 @@
# Local installation
-If you **can not** access Gitpod you can also install everything locally.
+If you **cannot** use Gitpod for any reason, you have the option of installing everything locally instead.
Some requirements may be different depending on your local machine.
@@ -59,7 +59,7 @@ To download the material, execute this command:
git clone https://github.com/nextflow-io/training.git
```
-Then `cd` into the `nf-training` directory.
+Then `cd` into the relevant directory. By default, that is `hello-nextflow`.
## Checking your installation
@@ -79,7 +79,7 @@ This should print the current version, system, and runtime.
nextflow info
```
- This should come up with the Nextflow version and runtime information:
+ This should come up with the Nextflow version and runtime information (actual versions may differ):
```console
Version: 23.10.1 build 5891
diff --git a/docs/envsetup/img/gitpod.welcome.png b/docs/envsetup/img/gitpod.welcome.png
index d5d4827a4..b95fff44d 100644
Binary files a/docs/envsetup/img/gitpod.welcome.png and b/docs/envsetup/img/gitpod.welcome.png differ
diff --git a/docs/envsetup/img/login.png b/docs/envsetup/img/login.png
index 45501c87d..ca2a05a0a 100644
Binary files a/docs/envsetup/img/login.png and b/docs/envsetup/img/login.png differ
diff --git a/docs/envsetup/img/onestepaway.png b/docs/envsetup/img/onestepaway.png
deleted file mode 100644
index 2280b2c18..000000000
Binary files a/docs/envsetup/img/onestepaway.png and /dev/null differ
diff --git a/docs/envsetup/img/select_gitpod_classic.png b/docs/envsetup/img/select_gitpod_classic.png
new file mode 100644
index 000000000..90ca641a0
Binary files /dev/null and b/docs/envsetup/img/select_gitpod_classic.png differ
diff --git a/docs/hello_nextflow/01_orientation.md b/docs/hello_nextflow/00_orientation.md
similarity index 53%
rename from docs/hello_nextflow/01_orientation.md
rename to docs/hello_nextflow/00_orientation.md
index 99e3dfff7..671b467e2 100644
--- a/docs/hello_nextflow/01_orientation.md
+++ b/docs/hello_nextflow/00_orientation.md
@@ -1,16 +1,16 @@
# Orientation
-The Gitpod environment contains all the software, code and data necessary to work through this training course, so you don't need to install anything yourself.
+The training environment contains all the software, code and data necessary to work through this training course, so you don't need to install anything yourself.
However, you do need a (free) account to log in, and you should take a few minutes to familiarize yourself with the interface.
If you have not yet done so, please follow [this link](../../envsetup/) before going any further.
## Materials provided
-Throughout this training course, we'll be working in the `hello-nextflow/` directory, which loads by default when you open the Gitpod workspace.
+Throughout this training course, we'll be working in the `hello-nextflow/` directory, which loads by default when you open the training workspace.
This directory contains all the code files, test data and accessory files you will need.
-Feel free to explore the contents of this directory; the easiest way to do so is to use the file explorer on the left-hand side of the Gitpod workspace.
+Feel free to explore the contents of this directory; the easiest way to do so is to use the file explorer on the left-hand side of the training workspace.
Alternatively, you can use the `tree` command.
Throughout the course, we use the output of `tree` to represent directory structure and contents in a readable form, sometimes with minor modifications for clarity.
@@ -24,64 +24,36 @@ If you run this inside `hello-nextflow`, you should see the following output:
```console title="Directory contents"
.
-├── containers
-│ ├── build
-│ ├── data
-│ ├── results
-│ └── scripts
-├── data
-│ ├── bam
-│ ├── greetings.csv
-│ ├── ref
-│ ├── sample_bams.txt
-│ └── samplesheet.csv
-├── hello-config
-│ ├── demo-params.json
-│ ├── main.nf
-│ └── nextflow.config
+├── greetings.csv
+├── hello-channels.nf
+├── hello-config.nf
├── hello-containers.nf
-├── hello-genomics.nf
-├── hello-modules
-│ ├── demo-params.json
-│ ├── main.nf
-│ └── nextflow.config
-├── hello-nf-core
-│ ├── data
-│ └── solution
-├── hello-nf-test
-│ ├── demo-params.json
-│ ├── main.nf
-│ ├── modules
-│ └── nextflow.config
-├── hello-operators.nf
+├── hello-modules.nf
+├── hello-workflow.nf
├── hello-world.nf
├── nextflow.config
-└── solutions
- ├── hello-config
- ├── hello-genomics
- ├── hello-modules
- ├── hello-nf-test
- ├── hello-operators
- └── hello-world
-
-18 directories, 17 files
+├── solutions
+│ ├── 1-hello-world
+│ ├── 2-hello-channels
+│ ├── 3-hello-workflow
+│ ├── 4-hello-modules
+│ ├── 5-hello-containers
+│ └── 6-hello-config
+└── test-params.json
+
+7 directories, 9 files
```
-!!!note
-
- Don't worry if this seems like a lot; we'll go through the relevant pieces at each step of the course.
- This is just meant to give you an overview.
-
**Here's a summary of what you should know to get started:**
- **The `.nf` files** are workflow scripts that are named based on what part of the course they're used in.
-- **The `hello-*` directories** are directories used in the later Parts of the course where we are working with more than just one workflow file.
-
- **The file `nextflow.config`** is a configuration file that sets minimal environment properties.
You can ignore it for now.
-- **The `data` directory** contains the input data we'll use in most of the course. The dataset is described in detail in Part 3, when we introduce it for the first time.
+- **The file `greetings.csv`** contains input data we'll use in most of the course. It is described in Part 1, when we introduce it for the first time.
+
+- **The file `test-params.json`** is a file we'll use in Part 6. You can ignore it for now.
- **The `solutions` directory** contains the completed workflow scripts that result from each step of the course.
They are intended to be used as a reference to check your work and troubleshoot any issues.
@@ -90,7 +62,7 @@ If you run this inside `hello-nextflow`, you should see the following output:
!!!tip
- If for whatever reason you move out of this directory, you can always run this command to return to it:
+ If for whatever reason you move out of this directory, you can always run this command to return to it (within the training environment):
```bash
cd /workspace/gitpod/hello-nextflow
diff --git a/docs/hello_nextflow/01_orientation.pt.md b/docs/hello_nextflow/00_orientation.pt.md
similarity index 100%
rename from docs/hello_nextflow/01_orientation.pt.md
rename to docs/hello_nextflow/00_orientation.pt.md
diff --git a/docs/hello_nextflow/01_hello_world.md b/docs/hello_nextflow/01_hello_world.md
new file mode 100644
index 000000000..634f98936
--- /dev/null
+++ b/docs/hello_nextflow/01_hello_world.md
@@ -0,0 +1,686 @@
+# Part 1: Hello World
+
+In this first part of the Hello Nextflow training course, we ease into the topic with a very basic domain-agnostic Hello World example, which we'll progressively build up to demonstrate the usage of foundational Nextflow logic and components.
+
+!!! note
+
+ A "Hello World!" is a minimalist example that is meant to demonstrate the basic syntax and structure of a programming language or software framework. The example typically consists of printing the phrase "Hello, World!" to the output device, such as the console or terminal, or writing it to a file.
+
+---
+
+## 0. Warmup: Run Hello World directly
+
+Let's demonstrate this with a simple command that we run directly in the terminal, to show what it does before we wrap it in Nextflow.
+
+### 0.1. Make the terminal say hello
+
+```bash
+echo 'Hello World!'
+```
+
+This outputs the text 'Hello World' to the terminal.
+
+```console title="Output"
+Hello World!
+```
+
+### 0.2. Now make it write the text output to a file
+
+```bash
+echo 'Hello World!' > output.txt
+```
+
+This does not output anything to the terminal.
+
+```console title="Output"
+
+```
+
+### 0.3. Show the file contents
+
+```bash
+cat output.txt
+```
+
+The text 'Hello World' is now in the output file we specified.
+
+```console title="output.txt" linenums="1"
+Hello World!
+```
+
+!!! tip
+
+ In the training environment, you can also find the output file in the file explorer, and view its contents by clicking on it. Alternatively, you can use the `code` command to open the file for viewing.
+
+ ```bash
+ code output.txt
+ ```
+
+### Takeaway
+
+You now know how to run a simple command in the terminal that outputs some text, and optionally, how to make it write the output to a file.
+
+### What's next?
+
+Find out what that would look like written as a Nextflow workflow.
+
+---
+
+## 1. Examine the Hello World workflow starter script
+
+As mentioned in the orientation, we provide you with a fully functional if minimalist workflow script named `hello-world.nf` that does the same thing as before (write out 'Hello World!') but with Nextflow.
+
+To get you started, we'll first open up the workflow script so you can get a sense of how it's structured.
+
+### 1.1. Examine the overall code structure
+
+Let's open the `hello-world.nf` script in the editor pane.
+
+!!! note
+
+ The file is in the `hello-nextflow` directory, which should be your current working directory.
+ You can either click on the file in the file explorer, or type `ls` in the terminal and Cmd+Click (MacOS) or Ctrl+Click (PC) on the file to open it.
+
+```groovy title="hello-world.nf" linenums="1"
+#!/usr/bin/env nextflow
+
+/*
+ * Use echo to print 'Hello World!' to a file
+ */
+process sayHello {
+
+ output:
+ path 'output.txt'
+
+ script:
+ """
+ echo 'Hello World!' > output.txt
+ """
+}
+
+workflow {
+
+ // emit a greeting
+ sayHello()
+}
+```
+
+As you can see, a Nextflow script involves two main types of core components: one or more **processes**, and the **workflow** itself.
+Each **process** describes what operation(s) the corresponding step in the pipeline should accomplish, while the **workflow** describes the dataflow logic that connects the various steps.
+
+Let's take a closer look at the **process** block first, then we'll look at the **workflow** block.
+
+### 1.2 The `process` definition
+
+The first block of code describes a **process**.
+The process definition starts with the keyword `process`, followed by the process name and finally the process body delimited by curly braces.
+The process body must contain a script block which specifies the command to run, which can be anything you would be able to run in a command line terminal.
+
+Here we have a **process** called `sayHello` that writes its **output** to a file named `output.txt`.
+
+```groovy title="hello-world.nf" linenums="3"
+/*
+ * Use echo to print 'Hello World!' to a file
+ */
+process sayHello {
+
+ output:
+ path 'output.txt'
+
+ script:
+ """
+ echo 'Hello World!' > output.txt
+ """
+}
+```
+
+This a very minimal process definition that just contains an `output` definition and the `script` to execute.
+
+The `output` definition includes the `path` qualifier, which tells Nextflow this should be handled as a path (includes both directory paths and files).
+Another common qualifier is `val`.
+
+!!! note
+
+ The output definition does not _determine_ what output will be created.
+ It simply _declares_ what is the expected output, so that Nextflow can look for it once execution is complete.
+ This is necessary for verifying that the command was executed successfully and for passing the output to downstream processes if needed.
+
+!!! warning
+
+ This example is brittle because we hardcoded the output filename in two separate places (the script and the output blocks).
+ If we change one but not the other, the script will break.
+ Later, you'll learn how to use variables to avoid this problem.
+
+In a real-world pipeline, a process usually contains additional blocks such as directives, inputs, and conditional clauses, which we'll introduce later in this training course.
+
+### 1.3 The `workflow` definition
+
+The second block of code describes the **workflow** itself.
+The workflow definition starts with the keyword `workflow`, followed by an optional name, then the workflow body delimited by curly braces.
+
+Here we have a **workflow** that consists of one call to the `sayHello` process.
+
+```groovy title="hello-world.nf" linenums="17"
+workflow {
+
+ // emit a greeting
+ sayHello()
+}
+```
+
+This a very minimal **workflow** definition.
+In a real-world pipeline, the workflow typically contains multiple calls to **processes** connected by **channels**, and the processes expect one or more variable **input(s)**.
+
+You'll learn how to add variable inputs later in this training module; and you'll learn how to add more processes and connect them by channels in Part 3 of this course.
+
+### Takeaway
+
+You now know how a simple Nextflow workflow is structured.
+
+### What's next?
+
+Learn to launch the workflow, monitor execution and find your outputs.
+
+---
+
+## 2. Run the workflow
+
+Looking at code is not nearly as fun as running it, so let's try this out in practice.
+
+### 2.1. Launch the workflow and monitor execution
+
+In the terminal, run the following command:
+
+```bash
+nextflow run hello-world.nf
+```
+
+You console output should look something like this:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-world.nf` [goofy_torvalds] DSL2 - revision: c33d41f479
+
+executor > local (1)
+[a3/7be2fa] sayHello | 1 of 1 ✔
+```
+
+Congratulations, you just ran your first Nextflow workflow!
+
+The most important output here is the last line (line 6):
+
+```console title="Output" linenums="6"
+[a3/7be2fa] sayHello | 1 of 1 ✔
+```
+
+This tells us that the `sayHello` process was successfully executed once (`1 of 1 ✔`).
+
+Importantly, this line also tells you where to find the output of the `sayHello` process call.
+Let's look at that now.
+
+### 2.2. Find the output and logs in the `work` directory
+
+When you run Nextflow for the first time in a given directory, it creates a directory called `work` where it will write all files (and any symlinks) generated in the course of execution.
+
+Within the `work` directory, Nextflow organizes outputs and logs per process call.
+For each process call, Nextflow creates a nested subdirectory, named with a hash in order to make it unique, where it will stage all necessary inputs (using symlinks by default), write helper files, and write out logs and any outputs of the process.
+
+The path to that subdirectory is shown in truncated form in square brackets in the console output.
+Looking at what we got for the run shown above, the console log line for the sayHello process starts with `[a3/7be2fa]`. That corresponds to the following directory path: `work/`**`a3/7be2fa`**`7be2fad5e71e5f49998f795677fd68`
+
+Let's take a look at what's in there.
+
+!!! tip
+
+ If you browse the contents of the task subdirectory in the VSCode file explorer, you'll see all the files right away.
+ However, the log files are set to be invisible in the terminal, so if you want to use `ls` or `tree` to view them, you'll need to set the relevant option for displaying invisible files.
+
+ ```bash
+ tree -a work
+ ```
+
+You should see something like this, though the exact subdirectory names will be different on your system:
+
+```console title="Directory contents"
+work
+└── a3
+ └── 7be2fad5e71e5f49998f795677fd68
+ ├── .command.begin
+ ├── .command.err
+ ├── .command.log
+ ├── .command.out
+ ├── .command.run
+ ├── .command.sh
+ ├── .exitcode
+ └── output.txt
+```
+
+These are the helper and log files:
+
+- **`.command.begin`**: Metadata related to the beginning of the execution of the process call
+- **`.command.err`**: Error messages (`stderr`) emitted by the process call
+- **`.command.log`**: Complete log output emitted by the process call
+- **`.command.out`**: Regular output (`stdout`) by the process call
+- **`.command.run`**: Full script run by Nextflow to execute the process call
+- **`.command.sh`**: The command that was run by the process call call
+- **`.exitcode`**: The exit code resulting from the command
+
+The `.command.sh` file is especially useful because it tells you what command Nextflow actually executed.
+In this case it's very straightforward, but later in the course you'll see commands that involve some interpolation of variables.
+When you're dealing with that, you need to be able to check exactly what was run, especially when troubleshooting an issue.
+
+The actual output of the `sayHello` process is `output.txt`.
+Open it and you will find the `Hello World!` greeting, which was the expected result of our minimalist workflow.
+
+```console title="output.txt" linenums="1"
+Hello World!
+```
+
+### Takeaway
+
+You know how to decipher a simple Nextflow script, run it and find the output and relevant log files in the work directory.
+
+### What's next?
+
+Learn how to manage your workflow executions conveniently.
+
+---
+
+## 3. Manage workflow executions
+
+Knowing how to launch workflows and retrieve outputs is great, but you'll quickly find there are a few other aspects of workflow management that will make your life easier, especially if you're developing your own workflows.
+
+Here we show you how to use the `publishDir` directive to TODO, the `resume` feature for when you need to re-launch the same workflow, and how to delete older work directories with `nextflow clean`.
+
+### 3.1. Publish outputs
+
+As you have just learned, the output produced by our pipeline is buried in a working directory several layers deep.
+This is done on purpose; Nextflow is in control of this directory and we are not supposed to interact with it.
+
+However, that makes it inconvenient to retrieve outputs that we care about.
+
+Fortunately, Nextflow provides a way to manage this more conveniently, called the `publishDir` directive, which acts at the process level.
+This directive tells Nextflow to copy the output(s) of the process to a designated output directory.
+It allows us to retrieve the desired output file without having to dig down into the work directory.
+
+#### 3.1.1. Add a `publishDir` directive to the `sayHello` process
+
+In the workflow script file `hello-world.nf`, make the following code modification:
+
+_Before:_
+
+```groovy title="hello-world.nf" linenums="6"
+process sayHello {
+
+ output:
+ path 'output.txt'
+```
+
+_After:_
+
+```groovy title="hello-world.nf" linenums="6"
+process sayHello {
+
+ publishDir 'results', mode: 'copy'
+
+ output:
+ path 'output.txt'
+```
+
+#### 3.1.2. Run the workflow again
+
+Now run the modified workflow script:
+
+```bash
+nextflow run hello-world.nf
+```
+
+The log output should look very familiar:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-world.nf` [jovial_mayer] DSL2 - revision: 35bd3425e5
+
+executor > local (1)
+[62/49a1f8] sayHello | 1 of 1 ✔
+```
+
+This time, Nextflow has created a new directory called `results/`.
+Our `output.txt` file is in this directory.
+If you check the contents it should match the output in the work subdirectory.
+This is how we move results files outside of the working directories conveniently.
+
+It is also possible to set the `publishDir` directive to make a symbolic link to the file instead of actually copying it.
+This is preferable when you're dealing with very large files you don't need to retain longer term.
+However, if you delete the work directory as part of a cleanup operation, you will lost access to the file, so always make sure you have actual copies of everything you care about before deleting anything.
+
+!!! note
+
+ A newer syntax option had been proposed to make it possible to declare and publish workflow-level outputs, documented [here](https://www.nextflow.io/docs/latest/workflow.html#publishing-outputs).
+ This will eventually make using `publishDir` at the process level redundant for completed pipelines.
+ However, we expect that `publishDir` will still remain very useful during pipeline development.
+
+### 3.2. Re-launch a workflow with `-resume`
+
+Sometimes, you're going to want to re-run a pipeline that you've already launched previously without redoing any steps that already completed successfully.
+
+Nextflow has an option called `-resume` that allows you to do this.
+Specifically, in this mode, any processes that have already been run with the exact same code, settings and inputs will be skipped.
+This means Nextflow will only run processes that you've added or modified since the last run, or to which you're providing new settings or inputs.
+
+There are two key advantages to doing this:
+
+- If you're in the middle of developing your pipeline, you can iterate more rapidly since you only have to run the process(es) you're actively working on in order to test your changes.
+- If you're running a pipeline in production and something goes wrong, in many cases you can fix the issue and relaunch the pipeline, and it will resume running from the point of failure, which can save you a lot of time and compute.
+
+To use it, simply add `-resume` to your command and run it:
+
+```bash
+nextflow run hello-world.nf -resume
+```
+
+The console output should look similar.
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-world.nf` [golden_cantor] DSL2 - revision: 35bd3425e5
+
+[62/49a1f8] sayHello | 1 of 1, cached: 1 ✔
+```
+
+Look for the `cached:` bit that has been added in the process status line (line 6), which means that Nextflow has recognized that it has already done this work and simply re-used the result from the previous successful run.
+
+You can also see that the work subdirectory hash is the same as in the previous run.
+Nextflow is literally pointing you to the previous execution and saying "I already did that over there."
+
+!!! note
+
+ When your re-run a pipeline with `resume`, Nextflow does not overwrite any files written to a `publishDir` directory by any process call that was previously run successfully.
+
+### 3.3. Delete older work directories
+
+During the development process, you'll typically run your draft pipelines a large number of times, which can lead to an accumulation of very many files across many subdirectories.
+Since the subdirectories are named randomly, it is difficult to tell from their names what are older vs. more recent runs.
+
+Nextflow includes a convenient `-clean` command that can automatically delete the work subdirectories for past runs that you no longer care about, with several [options](https://www.nextflow.io/docs/latest/reference/cli.html#clean) to control what will be deleted.
+
+Here we show you an example that deletes all subdirectories from runs before a given run, specified using its run name.
+The run name is the machine-generated two-part string shown in square brackets in the `Launching (...)` console output line.
+
+First we use the dry run flag `-n` to check what will be deleted given the command:
+
+```bash
+nextflow clean -before golden_cantor -n
+```
+
+The output should look like this:
+
+```console title="Output"
+Would remove /workspace/gitpod/hello-nextflow/work/a3/7be2fad5e71e5f49998f795677fd68
+```
+
+If you don't see any lines output, you either did not provide a valid run name or there are no past runs to delete.
+
+If the output looks as expected and you want to proceed with the deletion, re-run the command with the `-f` flag instead of `-n`:
+
+```bash
+nextflow clean -before golden_cantor -f
+```
+
+You should now see the following:
+
+```console title="Output"
+Removed /workspace/gitpod/hello-nextflow/work/a3/7be2fad5e71e5f49998f795677fd68
+```
+
+!!! Warning
+
+ Deleting work subdirectories from past runs removes them from Nextflow's cache and deletes any outputs that were stored in those directories.
+ That means it breaks Nextflow's ability to resume execution without re-running the corresponding processes.
+
+ You are responsible for saving any outputs that you care about or plan to rely on! If you're using the `publishDir` directive for that purpose, make sure to use the `copy` mode, not the `symlink` mode.
+
+### Takeaway
+
+You know how to publish outputs to a specific directory, relaunch a pipeline without repeating steps that were already run in an identical way, and use the `nextflow clean` command to clean up old work directories.
+
+### What's next?
+
+Learn to provide a variable input via a command-line parameter and utilize default values effectively.
+
+---
+
+## 4. Use a variable input passed on the command line
+
+In its current state, our workflow uses a greeting hardcoded into the process command.
+We want to add some flexibility by using an input variable, so that we can more easily change the greeting at runtime.
+
+### 4.1. Modify the workflow to take and use a variable input
+
+This requires us to make three changes to our script:
+
+1. Tell the process to expect a variable input by adding an `input:` block
+2. Edit the process to use the input
+3. Set up a command-line parameter and provide its value as an input to the process call
+
+Let's make these changes one at a time.
+
+#### 4.1.1. Add an input block to the process definition
+
+First we need to adapt the process definition to accept an input called `greeting`.
+
+In the process block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="6"
+process sayHello {
+
+ publishDir 'results', mode: 'copy'
+
+ output:
+ path 'output.txt'
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="6"
+process sayHello {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ val greeting
+
+ output:
+ path 'output.txt'
+```
+
+The `greeting` variable is prefixed by `val` to tell Nextflow it's a value (not a path).
+
+#### 4.1.2. Edit the process command to use the input variable
+
+Now we swap the original hardcoded value for the value of the input variable we expect to receive.
+
+In the process block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="16"
+script:
+"""
+echo 'Hello World!' > output.txt
+"""
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="16"
+script:
+"""
+echo '$greeting' > output.txt
+"""
+```
+
+Make sure to prepend the `$` symbol to tell Nextflow this is a variable name that needs to be replaced with the actual value (=interpolated).
+
+#### 4.1.3. Set up a CLI parameter and provide it as input to the process call
+
+Now we need to actually set up a way to provide an input value to the `sayHello()` process call.
+
+We could simply hardcode it directly by writing `sayHello('Hello World!')`.
+However, when we're doing real work with our workflow, we're often going to want to be able to control its inputs from the command line.
+
+Good news: Nextflow has a built-in workflow parameter system called `params`, which makes it easy to declare and use CLI parameters. The general syntax is to declare `params.` to tell Nextflow to expect a `--` parameter on the command line.
+
+Here, we want to create a parameter called `--greeting`, so we need to declare `params.greeting` somewhere in the workflow.
+In principle we can write it anywhere; but since we're going to want to give it to the `sayHello()` process call, we can plug it in there directly by writing `sayHello(params.greeting)`.
+
+!!! note
+
+ The parameter name (at the workflow level) does not have to match the input variable name (at the process level).
+ We're just using the same word because that's what makes sense and keeps the code readable.
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-world.nf" linenums="24"
+// emit a greeting
+sayHello()
+```
+
+_After:_
+
+```groovy title="hello-world.nf" linenums="24"
+// emit a greeting
+sayHello(params.greeting)
+```
+
+This tells Nextflow to run the `sayHello` process on the value provided through the `--greeting` parameter.
+
+#### 4.1.4. Run the workflow command again
+
+Let's run it!
+
+```bash
+nextflow run hello-world.nf --greeting 'Bonjour le monde!'
+```
+
+If you made all three edits correctly, you should get another successful execution:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-world.nf` [elated_lavoisier] DSL2 - revision: 7c031b42ea
+
+executor > local (1)
+[4b/654319] sayHello | 1 of 1 ✔
+```
+
+Be sure to open up the output file to check that you now have the new version of the greeting.
+
+```console title="output.txt" linenums="1"
+Bonjour le monde!
+```
+
+Voilà!
+
+!!! tip
+
+ You can readily distinguish Nextflow-level parameters from pipeline-level parameters.
+
+ - Parameters that apply to a pipeline always take a double hyphen (`--`).
+ - Parameters that modify a Nextflow setting, _e.g._ the `-resume` feature we used earlier, take a single hyphen (`-`).
+
+### 4.2. Use default values for command line parameters
+
+In many cases, it makes sense to supply a default value for a given parameter so that you don't have to specify it for every run.
+
+#### 4.2.1. Set a default value for the CLI parameter
+
+Let's give the `greeting` parameter with a default value by declaring it before the workflow definition.
+
+```groovy title="hello-world.nf" linenums="22"
+/*
+ * Pipeline parameters
+ */
+params.greeting = 'Holà mundo!'
+```
+
+!!! tip
+
+ You can put the parameter declaration inside the workflow block if you prefer. Whatever you choose, try to group similar things in the same place so you don't end up with declarations all over the place.
+
+#### 4.2.2. Run the workflow again without specifying the parameter
+
+Now that you have a default value set, you can run the workflow again without having to specify a value in the command line.
+
+```bash
+nextflow run hello-world.nf
+```
+
+The console output should look the same.
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-world.nf` [determined_edison] DSL2 - revision: 3539118582
+
+executor > local (1)
+[72/394147] sayHello | 1 of 1 ✔
+```
+
+Check the output in the results directory:
+
+```console title="output.txt" linenums="1"
+Holà mundo!
+```
+
+Nextflow used the default value to name the output.
+
+#### 4.2.3. Run the workflow again with the parameter to override the default value
+
+If you provide the parameter on the command line, the CLI value will override the default value.
+
+Try it out:
+
+```bash
+nextflow run hello-world.nf --greeting 'Konnichiwa!'
+```
+
+The console output should look the same.
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-world.nf` [elegant_faraday] DSL2 - revision: 3539118582
+
+executor > local (1)
+[6f/a12a91] sayHello | 1 of 1 ✔
+```
+
+Now you will have the corresponding new output in your results directory.
+
+```console title="output.txt" linenums="1"
+Konnichiwa!
+```
+
+!!! note
+
+ In Nextflow, there are multiple places where you can specify values for parameters.
+ If the same parameter is set to different values in multiple places, Nexflow will determine what value to use based on the order of precedence that is described [here](https://www.nextflow.io/docs/latest/config.html).
+
+### Takeaway
+
+You know how to use a simple variable input provided at runtime via a command-line parameter, as well as set up, use and override default values.
+
+More generally, you know how to interpret a simple Nextflow workflow, manage its execution, and retrieve outputs.
+
+### What's next?
+
+Take a little break, you've earned it!
+When you're ready, move on to Part 2 to learn how to use channels to feed inputs into your workflow, which will allow you to take advantage of Nextflow's built-in dataflow parallelism and other powerful features.
diff --git a/docs/hello_nextflow/02_hello_channels.md b/docs/hello_nextflow/02_hello_channels.md
new file mode 100644
index 000000000..54eba99d7
--- /dev/null
+++ b/docs/hello_nextflow/02_hello_channels.md
@@ -0,0 +1,880 @@
+# Part 2: Hello Channels
+
+In Part 1 of this course (Hello World), we showed you how to provide a variable input to a process by providing the input in the process call directly: `sayHello(params.greet)`.
+That was a deliberately simplified approach.
+In practice, that approach has major limitations; namely that it only works for very simple cases where we only want to run the process once, on a single value.
+In most realistic workflow use cases, we want to process multiple values (experimental data for multiple samples, for example), so we need a more sophisticated way to handle inputs.
+
+That is what Nextflow **channels** are for.
+Channels are queues designed to handle inputs efficiently and shuttle them from one step to another in multi-step workflows, while providing built-in parallelism and many additional benefits.
+
+In this part of the course, you will learn how to use a channel to handle multiple inputs from a variety of different sources.
+You will also learn to use **operators** to transform channel contents as needed.
+
+_For training on using channels to connect steps in a multi-step workflow, see Part 3 of this course._
+
+---
+
+## 0. Warmup: Run `hello-channels.nf`
+
+We're going to use the workflow script `hello-channels.nf` as a starting point.
+It is equivalent to the script produced by working through Part 1 of this training course.
+
+Just to make sure everything is working, run the script once before making any changes:
+
+```bash
+nextflow run hello-channels.nf --greeting 'Hello Channels!'
+```
+
+```console title="Output"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-channels.nf` [insane_lichterman] DSL2 - revision: c33d41f479
+
+executor > local (1)
+[86/9efa08] sayHello | 1 of 1 ✔
+```
+
+As previously, you will find the output file named `output.txt` in the `results` directory (specified by the `publishDir` directive).
+
+```console title="output.txt" linenums="1"
+Hello Channels!
+```
+
+If that worked for you, you're ready to learn about channels.
+
+---
+
+## 1. Provide variable inputs via a channel explicitly
+
+We are going to create a **channel** to pass the variable input to the `sayHello()` process instead of relying on the implicit handling, which has certain limitations.
+
+### 1.1. Create an input channel
+
+There are a variety of **channel factories** that we can use to set up a channel.
+To keep things simple for now, we are going to use the most basic channel factory, called `Channel.of`, which will create a channel containing a single value.
+Functionally this will be exactly equivalent to how we had it set up before, but explicit instead of implicit.
+
+This is the line of code we're going to use:
+
+```console title="Syntax"
+greeting_ch = Channel.of('Hello Channels!')
+```
+
+This creates a channel called `greeting_ch` using the `Channel.of()` channel factory, which sets up a simple value channel, and loads the string `'Hello Channels!'` to use as the greeting value.
+
+!!! note
+
+ We are temporarily switching back to hardcoded strings instead of using a CLI parameter for the sake of readability. We'll go back to using CLI parameters once we've covered what's happening at the level of the channel.
+
+In the workflow block, add the channel factory code:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="27"
+workflow {
+
+ // emit a greeting
+ sayHello(params.greeting)
+}
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="27"
+workflow {
+
+ // create a channel for inputs
+ greeting_ch = Channel.of('Hello Channels!')
+
+ // emit a greeting
+ sayHello(params.greeting)
+}
+```
+
+This is not yet functional since we haven't yet switched the input to the process call.
+
+### 1.2. Add the channel as input to the process call
+
+Now we need to actually plug our newly created channel into the `sayHello()` process call, replacing the CLI parameter which we were providing directly before.
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="27"
+workflow {
+
+ // create a channel for inputs
+ greeting_ch = Channel.of('Hello Channels!')
+
+ // emit a greeting
+ sayHello(params.greeting)
+}
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="27"
+workflow {
+
+ // create a channel for inputs
+ greeting_ch = Channel.of('Hello Channels!')
+
+ // emit a greeting
+ sayHello(greeting_ch)
+}
+```
+
+This tells Nextflow to run the `sayHello` process on the contents of the `greeting_ch` channel.
+
+Now our workflow is properly functional; it is the explicit equivalent of writing `sayHello('Hello Channels!')`.
+
+### 1.3. Run the workflow command again
+
+Let's run it!
+
+```bash
+nextflow run hello-channels.nf
+```
+
+If you made both edits correctly, you should get another successful execution:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-channels.nf` [nice_heisenberg] DSL2 - revision: 41b4aeb7e9
+
+executor > local (1)
+[3b/f2b109] sayHello (1) | 1 of 1 ✔
+```
+
+You can check the results directory to satisfy yourself that the outcome is still the same as previously.
+
+```console title="output.txt" linenums="1"
+Hello Channels!
+```
+
+So far we're just progressively tweaking the code to increase the flexibility of our workflow while achieving the same end result.
+
+!!! note
+
+ This may seem like we're writing more code for no tangible benefit, but the value will become clear as soon as we start handling more inputs.
+
+### Takeaway
+
+You know how to use a basic channel factory to provide an input to a process.
+
+### What's next?
+
+Learn how to use channels to make the workflow iterate over multiple input values.
+
+---
+
+## 2. Modify the workflow to run on multiple input values
+
+Workflows typically run on batches of inputs that are meant to be processed in bulk, so we want to upgrade the workflow to accept multiple input values.
+
+### 2.1. Load multiple greetings into the input channel
+
+Conveniently, the `Channel.of()` channel factory we've been using is quite happy to accept more than one value, so we don't need to modify that at all.
+We just have to load more values into the channel.
+
+#### 2.1.1. Add more greetings
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="29"
+// create a channel for inputs
+greeting_ch = Channel.of('Hello Channels')
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="29"
+// create a channel for inputs
+greeting_ch = Channel.of('Hello','Bonjour','Holà')
+```
+
+The documentation tells us this should work. Can it really be so simple?
+
+#### 2.1.2. Run the command and look at the log output
+
+Let's try it.
+
+```bash
+nextflow run hello-channels.nf
+```
+
+It certainly seems to run just fine:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-channels.nf` [suspicious_lamport] DSL2 - revision: 778deadaea
+
+executor > local (3)
+[cd/77a81f] sayHello (3) | 3 of 3 ✔
+```
+
+However... This seems to indicate that '3 of 3' calls were made for the process, which is encouraging, but this only shows us a single run of the process, with one subdirectory path (`cd/77a81f`).
+What's going on?
+
+By default, the ANSI logging system writes the logging from multiple calls to the same process on the same line.
+Fortunately, we can disable that behavior to see the full list of process calls.
+
+#### 2.1.3. Run the command again with the `-ansi-log false` option
+
+To expand the logging to display one line per process call, add `-ansi-log false` to the command.
+
+```bash
+nextflow run hello-channels.nf -ansi-log false
+```
+
+This time we see all three process runs and their associated work subdirectories listed in the output:
+
+```console title="Output" linenums="1"
+N E X T F L O W ~ version 24.10.0
+Launching `hello-channels.nf` [pensive_poitras] DSL2 - revision: 778deadaea
+[76/f61695] Submitted process > sayHello (1)
+[6e/d12e35] Submitted process > sayHello (3)
+[c1/097679] Submitted process > sayHello (2)
+```
+
+That's much better; at least for a simple workflow.
+For a complex workflow, or a large number of inputs, having the full list output to the terminal might get a bit overwhelming, so you might not choose to use `-ansi-log false` in those cases.
+
+!!! note
+
+ The way the status is reported is a bit different between the two logging modes.
+ In the condensed mode, Nextflow reports whether calls were completed successfully or not.
+ In this expanded mode, it only reports that they were submitted.
+
+That being said, we have another problem. If you look in the `results` directory, there is only one file: `output.txt`!
+
+```console title="Directory contents"
+results
+└── output.txt
+```
+
+What's up with that? Shouldn't we be expecting a separate file per input greeting, so three files in all?
+Did all three greetings go into a single file?
+
+You can check the contents of `output.txt`; you will find only one of the three, containing one of the three greetings we provided.
+
+```console title="output.txt" linenums="1"
+Bonjour
+```
+
+You may recall that we hardcoded the output file name for the `sayHello` process, so all three calls produced a file called `output.txt`.
+You can check the work subdirectories for each of the three processes; each of them contains a file called `output.txt` as expected.
+
+As long as the output files stay there, isolated from the other processes, that is okay.
+But when the `publishDir` directive copies each of them to the same `results` directory, whichever got copied there first gets overwritten by the next one, and so on.
+
+### 2.2. Ensure the output file names will be unique
+
+We can continue publishing all the outputs to the same results directory, but we need to ensure they will have unique names.
+Specifically, we need to modify the first process to generate a file name dynamically so that the final file names will be unique.
+
+So how do we make the file names unique?
+A common way to do that is to use some unique piece of metadata from the inputs (received from the input channel) as part of the output file name.
+Here, for convenience, we'll just use the greeting itself since it's just a short string, and prepend it to the base output filename.
+
+#### 2.2.1. Construct a dynamic output file name
+
+In the process block, make the following code changes:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="6"
+process sayHello {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ val greeting
+
+ output:
+ path 'output.txt'
+
+ script:
+ """
+ echo '$greeting' > output.txt
+ """
+}
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="6"
+process sayHello {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ val greeting
+
+ output:
+ path "${greeting}-output.txt"
+
+ script:
+ """
+ echo '$greeting' > '$greeting-output.txt'
+ """
+}
+```
+
+Make sure to replace `output.txt` in both the output definition and in the `script:` command block.
+
+!!! tip
+
+ In the output definition, you MUST use double quotes around the output filename expression (NOT single quotes), otherwise it will fail.
+
+This should produce a unique output file name every time the process is called, so that it can be distinguished from the outputs from other iterations of the same process in the output directory.
+
+#### 2.2.2. Run the workflow
+
+Let's run it:
+
+```bash
+nextflow run hello-channels.nf
+```
+
+Reverting back to the summary view, the output looks like this again:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-channels.nf` [astonishing_bell] DSL2 - revision: f57ff44a69
+
+executor > local (3)
+[2d/90a2e2] sayHello (1) | 3 of 3 ✔
+```
+
+Importantly, now we have three new files in addition to the one we already had in the `results` directory:
+
+```console title="Directory contents"
+results
+├── Bonjour-output.txt
+├── Hello-output.txt
+├── Holà-output.txt
+└── output.txt
+```
+
+They each have the expected contents:
+
+```console title="Bonjour-output.txt" linenums="1"
+Bonjour
+```
+
+```console title="Hello-output.txt" linenums="1"
+Hello
+```
+
+```console title="Holà-output.txt" linenums="1"
+Holà
+```
+
+Success! Now we can add as many greetings as we like without worrying about output files being overwritten.
+
+!!! note
+
+ In practice, naming files based on the input data itself is almost always impractical.
+ The better way to generate dynamic filenames is to pass metadata to a process along with the input files.
+ The metadata is typically provided via a 'sample sheet' or equivalents.
+ You'll learn how to do that later in your Nextflow training.
+
+### Takeaway
+
+You know how to feed multiple input elements through a channel.
+
+### What's next?
+
+Learn to use an operator to transform the contents of a channel.
+
+---
+
+## 3. Use an operator to transform the contents of a channel
+
+In Nextflow, [operators](https://www.nextflow.io/docs/latest/reference/operator.html) allow us to transform the contents of a channel.
+
+We just showed you how to handle multiple input elements that were hardcoded directly in the channel factory.
+What if we wanted to provide those multiple inputs in a different form?
+
+For example, imagine we set up an input variable containing an array of elements like this:
+
+`greetings_array = ['Hello','Bonjour','Holà']`
+
+Can we load that into our output channel and expect it to work? Let's find out.
+
+### 3.1. Provide an array of values as input to the channel
+
+Common sense suggests we should be able to simply pass in an array of values instead of a single value. Right?
+
+#### 3.1.1. Set up the input variable
+
+Let's take the `greetings_array` variable we just imagined and make it a reality by adding it to the workflow block:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="27"
+workflow {
+
+ // create a channel for inputs
+ greeting_ch = Channel.of('Hello','Bonjour','Holà')
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="27"
+workflow {
+
+ // declare an array of input greetings
+ greetings_array = ['Hello','Bonjour','Holà']
+
+ // create a channel for inputs
+ greeting_ch = Channel.of('Hello','Bonjour','Holà')
+```
+
+#### 3.1.2. Set array of greetings as the input to the channel factory
+
+We're going to replace the values `'Hello','Bonjour','Holà'` currently hardcoded in the channel factory with the `greetings_array` we just created.
+
+In the workflow block, make the following change:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="32"
+ // create a channel for inputs
+ greeting_ch = Channel.of('Hello','Bonjour','Holà')
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="32"
+ // create a channel for inputs
+ greeting_ch = Channel.of(greetings_array)
+```
+
+#### 3.1.3. Run the workflow
+
+Let's try running this:
+
+```bash
+nextflow run hello-channels.nf
+```
+
+Oh no! Nextflow throws an error that starts like this:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-channels.nf` [friendly_koch] DSL2 - revision: 97256837a7
+
+executor > local (1)
+[22/57e015] sayHello (1) | 0 of 1
+ERROR ~ Error executing process > 'sayHello (1)'
+
+Caused by:
+ Missing output file(s) `[Hello, Bonjour, Holà]-output.txt` expected by process `sayHello (1)`
+```
+
+It looks like Nextflow tried to run a single process call, using `[Hello, Bonjour, Holà]` as a string value, instead of using the three strings in the array as separate values.
+
+How do we get Nextflow to unpack the array and load the individual strings into the channel?
+
+### 3.2. Use an operator to transform channel contents
+
+This is where **operators** come in.
+
+If you skim through the [list of operators](https://www.nextflow.io/docs/latest/reference/operator.html) in the Nextflow documentation, you'll find [`flatten()`](https://www.nextflow.io/docs/latest/reference/operator.html#flatten), which does exactly what we need: unpack the contents of an array and emits them as individual items.
+
+!!! note
+
+ It is technically possible to achieve the same results by using a different channel factory, [`Channel.fromList`](https://nextflow.io/docs/latest/reference/channel.html#fromlist), which includes an implicit mapping step in its operation.
+ Here we chose not to use that in order to demonstrate the use of an operator on a fairly simple use case.
+
+#### 3.2.1. Add the `flatten()` operator
+
+To apply the `flatten()` operator to our input channel, we append it to the channel factory declaration.
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="31"
+ // create a channel for inputs
+ greeting_ch = Channel.of(greetings_array)
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="31"
+ // create a channel for inputs
+ greeting_ch = Channel.of(greetings_array)
+ .flatten()
+```
+
+Here we added the operator on the next line for readability, but you can add operators on the same line as the channel factory if you prefer, like this: `greeting_ch = Channel.of(greetings_array).flatten()`
+
+#### 3.2.2. Add `view()` to inspect channel contents
+
+We could run this right away to test if it works, but while we're at it, we're also going to add a couple of [`view()`](https://www.nextflow.io/docs/latest/reference/operator.html#view) directives, which allow us to inspect the contents of a channel.
+You can think of `view()` as a debugging tool, like a `print()` statement in Python, or its equivalent in other languages.
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="31"
+ // create a channel for inputs
+ greeting_ch = Channel.of(greetings_array)
+ .flatten()
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="31"
+ // create a channel for inputs
+ greeting_ch = Channel.of(greetings_array)
+ .view { "Before flatten: $it" }
+ .flatten()
+ .view { "After flatten: $it" }
+```
+
+Here `$it` is an implicit variable that represents each individual item loaded in a channel.
+
+#### 3.2.3. Run the workflow
+
+Finally, you can try running the workflow again!
+
+```bash
+nextflow run hello-channels.nf
+```
+
+This time it works AND gives us the additional insight into what the contents of the channel look like before and after we run the `flatten()` operator:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-channels.nf` [tiny_elion] DSL2 - revision: 1d834f23d2
+
+executor > local (3)
+[8e/bb08f3] sayHello (2) | 3 of 3 ✔
+Before flatten: [Hello, Bonjour, Holà]
+After flatten: Hello
+After flatten: Bonjour
+After flatten: Holà
+```
+
+You see that we get a single `Before flatten:` statement because at that point the channel contains one item, the original array.
+Then we get three separate `After flatten:` statements, one for each greeting, which are now individual items in the channel.
+
+Importantly, this means each item can now be processed separately by the workflow.
+
+!!! tip
+
+ You should delete or comment out the `view()` statements before moving on.
+
+ ```groovy title="hello-channels.nf" linenums="31"
+ // create a channel for inputs
+ greeting_ch = Channel.of(greetings_array)
+ .flatten()
+ ```
+
+ We left them in the `hello-channels-3.nf` solution file for reference purposes.
+
+### Takeaway
+
+You know how to use an operator like `flatten()` to transform the contents of a channel, and how to use the `view()` directive to inspect channel contents before and after applying an operator.
+
+### What's next?
+
+Learn how to make the workflow take a file as its source of input values.
+
+---
+
+## 4. Use an operator to parse input values from a CSV file
+
+It's often the case that, when we want to run on multiple inputs, the input values are contained in a file.
+As an example, we prepared a CSV file called `greetings.csv` containing several greetings, one on each line (like a column of data).
+
+```csv title="greetings.csv" linenums="1"
+Hello
+Bonjour
+Holà
+```
+
+So now we need to modify our workflow to read in the values from a file like that.
+
+### 4.1. Modify the script to expect a CSV file as the source of greetings
+
+To get started, we're going to need to make two key changes to the script:
+
+- Switch the input parameter to point to the CSV file
+- Switch to a channel factory designed to handle a file
+
+#### 4.1.1. Switch the input parameter to point to the CSV file
+
+Remember the `params.greeting` parameter we set up in Part1?
+We're going to update it to point to the CSV file containing our greetings.
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="25"
+/*
+ * Pipeline parameters
+ */
+params.greeting = ['Hello','Bonjour','Holà']
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="25"
+/*
+ * Pipeline parameters
+ */
+params.greeting = 'greetings.csv'
+```
+
+#### 4.1.2. Switch to a channel factory designed to handle a file
+
+Since we now want to use a file instead of simple values as the input, we can't use the `Channel.of()` channel factory from before.
+We need to switch to using a new channel factory, [`Channel.fromPath()`](https://www.nextflow.io/docs/latest/reference/channel.html#channel-path), which has some built-in functionality for handling file paths.
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="31"
+ // create a channel for inputs
+ greeting_ch = Channel.of(greetings_array)
+ .flatten()
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="31"
+ // create a channel for inputs from a CSV file
+ greeting_ch = Channel.fromPath(params.greeting)
+```
+
+#### 4.1.3. Run the workflow
+
+Let's try running the workflow with the new channel factory and the input file.
+
+```bash
+nextflow run hello-channels.nf
+```
+
+Oh no, it doesn't work. Here's the start of the console output and error message:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-channels.nf` [adoring_bhabha] DSL2 - revision: 8ce25edc39
+
+[- ] sayHello | 0 of 1
+ERROR ~ Error executing process > 'sayHello (1)'
+
+Caused by:
+ File `/workspace/gitpod/hello-nextflow/data/greetings.csv-output.txt` is outside the scope of the process work directory: /workspace/gitpod/hello-nextflow/work/e3/c459b3c8f4029094cc778c89a4393d
+
+
+Command executed:
+
+ echo '/workspace/gitpod/hello-nextflow/data/greetings.csv' > '/workspace/gitpod/hello-nextflow/data/greetings.
+```
+
+The `Command executed:` bit (lines 13-15) is especially helpful here.
+
+This may look a little bit familiar.
+It looks like Nextflow tried to run a single process call using the file path itself as a string value.
+So it has resolved the file path correctly, but it didn't actually parse its contents, which is what we wanted.
+
+How do we get Nextflow to open the file and load its contents into the channel?
+
+Sounds like we need another [operator](https://www.nextflow.io/docs/latest/reference/operator.html)!
+
+### 4.2. Use the `splitCsv()` operator to parse the file
+
+Looking through the list of operators again, we find [`splitCsv()`](https://www.nextflow.io/docs/latest/reference/operator.html#splitCsv), which is designed to parse and split CSV-formatted text.
+
+#### 4.2.1. Apply `splitCsv()` to the channel
+
+To apply the operator, we append it to the channel factory line like previously.
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="31"
+// create a channel for inputs from a CSV file
+greeting_ch = Channel.fromPath(params.greeting)
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="31"
+// create a channel for inputs from a CSV file
+greeting_ch = Channel.fromPath(params.greeting)
+ .view { "Before splitCsv: $it" }
+ .splitCsv()
+ .view { "After splitCsv: $it" }
+```
+
+As you can see, we also include before/after view statements while we're at it.
+
+#### 4.2.2. Run the workflow again
+
+Let's try running the workflow with the added CSV-parsing logic.
+
+```bash
+nextflow run hello-channels.nf
+```
+
+Interestingly, this fails too, but with a different error. The console output and error starts like this:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-channels.nf` [stoic_ride] DSL2 - revision: a0e5de507e
+
+executor > local (3)
+[42/8fea64] sayHello (1) | 0 of 3
+Before splitCsv: /workspace/gitpod/hello-nextflow/greetings.csv
+After splitCsv: [Hello]
+After splitCsv: [Bonjour]
+After splitCsv: [Holà]
+ERROR ~ Error executing process > 'sayHello (2)'
+
+Caused by:
+ Missing output file(s) `[Bonjour]-output.txt` expected by process `sayHello (2)`
+
+
+Command executed:
+
+ echo '[Bonjour]' > '[Bonjour]-output.txt'
+```
+
+This time Nextflow has parsed the contents of the file (yay!) but it's added brackets around the greetings.
+
+Long story short, `splitCsv()` reads each line into an array, and each comma-separated value in the line becomes an element in the array.
+So here it gives us three arrays containing one element each.
+
+!!! note
+
+ Even if this behavior feels inconvenient right now, it's going to be extremely useful later when we deal with input files with multiple columns of data.
+
+We could solve this by using `flatten()`, which you already know.
+However, there's another operator called `map()` that's more appropriate to use here and is really useful to know; it pops up a lot in Nextflow pipelines.
+
+### 4.3. Use the `map()` operator to extract the greetings
+
+The `map()` operator is a very handy little tool that allows us to do all kinds of mappings to the contents of a channel.
+
+In this case, we're going to use it to extract that one element that we want from each line of our file.
+This is what the syntax looks like:
+
+```groovy title="Syntax"
+.map { item -> item[0] }
+```
+
+This means 'for each item in the channel, take the first of any elements it contains'.
+
+So let's apply that to our CVS parsing.
+
+#### 4.3.1. Apply `map()` to the channel
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-channels.nf" linenums="31"
+// create a channel for inputs from a CSV file
+greeting_ch = Channel.fromPath(params.greeting)
+ .view { "Before splitCsv: $it" }
+ .splitCsv()
+ .view { "After splitCsv: $it" }
+```
+
+_After:_
+
+```groovy title="hello-channels.nf" linenums="31"
+// create a channel for inputs from a CSV file
+greeting_ch = Channel.fromPath(params.greeting)
+ .view { "Before splitCsv: $it" }
+ .splitCsv()
+ .view { "After splitCsv: $it" }
+ .map { item -> item[0] }
+ .view { "After map: $it" }
+```
+
+Once again we include another `view()` call to confirm that the operator does what we expect.
+
+#### 4.3.2. Run the workflow one more time
+
+Let's run it one more time:
+
+```bash
+nextflow run hello-channels.nf
+```
+
+This time it should run without error.
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-channels.nf` [tiny_heisenberg] DSL2 - revision: 845b471427
+
+executor > local (3)
+[1a/1d19ab] sayHello (2) | 3 of 3 ✔
+Before splitCsv: /workspace/gitpod/hello-nextflow/greetings.csv
+After splitCsv: [Hello]
+After splitCsv: [Bonjour]
+After splitCsv: [Holà]
+After map: Hello
+After map: Bonjour
+After map: Holà
+```
+
+Looking at the output of the `view()` statements, we see the following:
+
+- A single `Before splitCsv:` statement: at that point the channel contains one item, the original file path.
+- Three separate `After splitCsv:` statements: one for each greeting, but each is contained within an array that corresponds to that line in the file.
+- Three separate `After map:` statements: one for each greeting, which are now individual items in the channel.
+
+You can also look at the output files to verify that each greeting was correctly extracted and processed through the workflow.
+
+We've achieved the same result as previously, but now we have a lot more flexibility to add more elements to the channel of greetings we want to process by modifying an input file, without modifying any code.
+
+!!! note
+
+ Here we had all greetings on one line in the CSV file.
+ You can try adding more columns to the CSV file and see what happens; for example, try the following:
+
+ ```csv title="greetings.csv"
+ Hello,English
+ Bonjour,French
+ Holà,Spanish
+ ```
+
+ You can also try replacing `.map { item -> item[0] }` with `.flatten()` and see what happens depending on how many lines and columns you have in the input file.
+
+ You'll learn learn more advanced approaches for handling complex inputs in a later training.
+
+### Takeaway
+
+You know how to use the operators `splitCsv()` and `map()` to read in a file of input values and handle them appropriately.
+
+More generally, you have a basic understanding of how Nextflow uses **channels** to manage inputs to processes and **operators** to transform their contents.
+
+### What's next?
+
+Take a big break, you worked hard in this one!
+When you're ready, move on to Part 3 to learn how to add more steps and connect them together into a proper workflow.
diff --git a/docs/hello_nextflow/02_hello_world.md b/docs/hello_nextflow/02_hello_world.md
deleted file mode 100644
index 8d6588d03..000000000
--- a/docs/hello_nextflow/02_hello_world.md
+++ /dev/null
@@ -1,1186 +0,0 @@
-# Part 1: Hello World
-
-A "Hello World!" is a minimalist example that is meant to demonstrate the basic syntax and structure of a programming language or software framework. The example typically consists of printing the phrase "Hello, World!" to the output device, such as the console or terminal, or writing it to a file.
-
-In this first part of the Hello Nextflow training course, we ease into the topic with a very simple domain-agnostic Hello World example, which we'll progressively build up to demonstrate the usage of foundational Nextflow logic and components.
-
----
-
-## 0. Warmup: Run Hello World directly
-
-Let's demonstrate this with a simple command that we run directly in the terminal, to show what it does before we wrap it in Nextflow.
-
-### 0.1. Make the terminal say hello
-
-```bash
-echo 'Hello World!'
-```
-
-### 0.2. Now make it write the text output to a file
-
-```bash
-echo 'Hello World!' > output.txt
-```
-
-### 0.3. Verify that the output file is there using the `ls` command
-
-```bash
-ls
-```
-
-### 0.4. Show the file contents
-
-```bash
-cat output.txt
-```
-
-!!! tip
-
- In the Gitpod environment, you can also find the output file in the file explorer, and view its contents by clicking on it. Alternatively, you can use the `code` command to open the file for viewing.
-
- ```bash
- code output.txt
- ```
-
-### Takeaway
-
-You now know how to run a simple command in the terminal that outputs some text, and optionally, how to make it write the output to a file.
-
-### What's next?
-
-Discover what that would look like written as a Nextflow workflow.
-
----
-
-## 1. Try the Hello World workflow starter script
-
-As mentioned in the orientation, we provide you with a fully functional if minimalist workflow script named `hello-world.nf` that does the same thing as before (write out 'Hello World!') but with Nextflow.
-
-To get you started, we'll first open up the workflow script so you can get a sense of how it's structured, then we'll run it (before trying to make any modifications) to verify that it does what we expect.
-
-### 1.1. Decipher the code structure
-
-Let's open the `hello-world.nf` script in the editor pane.
-
-!!! note
-
- The file is in the `hello-nextflow` directory, which should be your current working directory.
- You can either click on the file in the file explorer, or type `ls` in the terminal and Cmd+Click (MacOS) or Ctrl+Click (PC) on the file to open it.
-
-```groovy title="hello-world.nf" linenums="1"
-#!/usr/bin/env nextflow
-
-/*
- * Use echo to print 'Hello World!' to standard out
- */
-process sayHello {
-
- output:
- stdout
-
- script:
- """
- echo 'Hello World!'
- """
-}
-
-workflow {
-
- // emit a greeting
- sayHello()
-}
-```
-
-As you can see, a Nextflow script involves two main types of core components: one or more **processes**, and the **workflow** itself.
-Each **process** describes what operation(s) the corresponding step in the pipeline should accomplish, while the **workflow** describes the dataflow logic that connects the various steps.
-
-Let's take a closer look at the **process** block first, then we'll look at the **workflow** block.
-
-#### 1.1.1 The `process` definition
-
-The first block of code describes a **process**.
-The process definition starts with the keyword `process`, followed by the process name and finally the process body delimited by curly braces.
-The process body must contain a script block which specifies the command to run, which can be anything you would be able to run in a command line terminal.
-
-Here we have a **process** called `sayHello` that writes its **output** to `stdout`.
-
-```groovy title="hello-world.nf" linenums="3"
-/*
- * Use echo to print 'Hello World!' to standard out
- */
-process sayHello {
-
- output:
- stdout
-
- script:
- """
- echo 'Hello World!'
- """
-}
-```
-
-This a very minimal process definition that just contains an output definition and the script itself.
-In a real-world pipeline, a process usually contains additional blocks such as directives, inputs, and conditional clauses, which we'll introduce later in this training course.
-
-!!! note
-
- The output definition does not _determine_ what output will be created.
- It simply _declares_ what is the expected output, so that Nextflow can look for it once execution is complete.
- This is necessary for verifying that the command was executed successfully and for passing the output to downstream processes if needed.
-
-#### 1.1.2 The `workflow` definition
-
-The second block of code describes the **workflow** itself.
-The workflow definition starts with the keyword `workflow`, followed by an optional name, then the workflow body delimited by curly braces.
-
-Here we have a **workflow** that consists of one call to the `sayHello` process.
-
-```groovy title="hello-world.nf" linenums="16"
-workflow {
-
- // emit a greeting
- sayHello()
-}
-```
-
-This a very minimal **workflow** definition.
-In a real-world pipeline, the workflow typically contains multiple calls to **processes** connected by **channels**.
-You'll learn how to add more processes and connect them by channels in a little bit.
-
-### 1.2. Run the workflow
-
-Looking at code is not nearly as fun as running it, so let's try this out in practice.
-
-```bash
-nextflow run hello-world.nf
-```
-
-You console output should look something like this:
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `hello-world.nf` [reverent_carson] DSL2 - revision: 463b611a35
-
-executor > local (1)
-[1c/7d08e6] sayHello [100%] 1 of 1 ✔
-```
-
-Congratulations, you just ran your first Nextflow workflow!
-
-The most important output here is the last line (line 6), which reports that the `sayHello` process was successfully executed once.
-
-Okay, that's great, but where do we find the output?
-The `sayHello` process definition said that the output would be sent to standard out, but nothing got printed in the console, did it?
-
-### 1.3. Find the output and logs in the `work` directory
-
-When you run Nextflow for the first time in a given directory, it creates a directory called `work` where it will write all files (and symlinks) generated in the course of execution.
-Have a look inside; you'll find a subdirectory named with a hash (in order to make it unique; we'll discuss why in a bit), nested two levels deep and containing a handful of log files.
-
-!!! tip
-
- If you browse the contents of the task subdirectory in the Gitpod's VSCode file explorer, you'll see all these files right away.
- However, these files are set to be invisible in the terminal, so if you want to use `ls` or `tree` to view them, you'll need to set the relevant option for displaying invisible files.
-
- ```bash
- tree -a work
- ```
-
- You should see something like this, though the exact subdirectory names will be different on your system.
-
- ```console title="Directory contents"
- work
- └── 1c
- └── 7d08e685a7aa7060b9c21667924824
- ├── .command.begin
- ├── .command.err
- ├── .command.log
- ├── .command.out
- ├── .command.run
- ├── .command.sh
- └── .exitcode
- ```
-
-You may have noticed that the subdirectory names appeared (in truncated form) in the output from the workflow run, in the line that says:
-
-```console title="Output"
-[1c/7d08e6] sayHello [100%] 1 of 1 ✔
-```
-
-This tells you what is the subdirectory path for that specific process call (sometimes called task).
-
-!!! note
-
- Nextflow creates a separate unique subdirectory for each process call.
- It stages the relevant input files, script, and other helper files there, and writes any output files and logs there as well.
-
-If we look inside the subdirectory, we find the following log files:
-
-- **`.command.begin`**: Metadata related to the beginning of the execution of the process task
-- **`.command.err`**: Error messages (stderr) emitted by the process task
-- **`.command.log`**: Complete log output emitted by the process task
-- **`.command.out`**: Regular output (stdout) by the process task
-- **`.command.sh`**: The command that was run by the process task call
-- **`.exitcode`**: The exit code resulting from the command
-
-In this case, you can look for your output in the `.command.out` file, since that's where stdout output is captured.
-If you open it, you'll find the `Hello World!` greeting, which was the expected result of our minimalist workflow.
-
-It's also worth having a look at the `.command.sh` file, which tells you what command Nextflow actually executed. In this case it's very straightforward, but later in the course you'll see commands that involve some interpolation of variables. When you're dealing with that, you need to be able to check exactly what was run, especially when troubleshooting an issue.
-
-### Takeaway
-
-You know how to decipher a simple Nextflow script, run it and find the output and logs in the work directory.
-
-### What's next?
-
-Learn how to make the script output a named file.
-
----
-
-## 3. Send the output to a file
-
-Instead of printing "Hello World!" to standard output, we'd prefer to save that output to a specific file, just like we did when running in the terminal earlier.
-This is how most tools that you'll run as part of real-world pipelines typically behave; we'll see examples of that later.
-
-To achieve this result, both the script and the output definition blocks need to be updated.
-
-### 3.1. Change the process command to output a named file
-
-This is the same change we made when we ran the command directly in the terminal earlier.
-
-_Before:_
-
-```groovy title="hello-world.nf" linenums="11"
-"""
-echo 'Hello World!'
-"""
-```
-
-_After:_
-
-```groovy title="hello-world.nf" linenums="11"
-"""
-echo 'Hello World!' > output.txt
-"""
-```
-
-### 3.2. Change the output declaration in the `sayHello` process
-
-We need to tell Nextflow that it should now look for a specific file to be produced by the process execution.
-
-_Before:_
-
-```groovy title="hello-world.nf" linenums="8"
-output:
- stdout
-```
-
-_After:_
-
-```groovy title="hello-world.nf" linenums="8"
-output:
- path 'output.txt'
-```
-
-!!! note
-
- Inputs and outputs in the process blocks typically require a qualifier and a variable name:
-
- ```
-
- ```
-
- The qualifier defines the type of data to be received.
- This information is used by Nextflow to apply the semantic rules associated with each qualifier, and handle it properly.
- Common qualifiers include `val` and `path`.
- In the example above, `stdout` is an exception since it is not associated with a name.
-
-### 3.3. Run the workflow again
-
-```bash
-nextflow run hello-world.nf
-```
-
-The log output should be very similar to the first time your ran the workflow:
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `hello-world.nf` [cranky_sinoussi] DSL2 - revision: 30b437bb96
-
-executor > local (1)
-[7a/6bd54c] sayHello [100%] 1 of 1 ✔
-```
-
-Like you did before, find the `work` directory in the file explorer.
-There, find the `output.txt` output file and click on it to open it, and verify that it contains the greeting as expected.
-
-!!! warning
-
- This example is brittle because we hardcoded the output filename in two separate places (the script and the output blocks).
- If we change one but not the other, the script will break.
- Later, you'll learn how to use variables to avoid this problem.
-
-### 3.4. Add a `publishDir` directive to the process
-
-You'll have noticed that the output is buried in a working directory several layers deep.
-Nextflow is in control of this directory and we are not supposed to interact with it.
-To make the output file more accessible, we can utilize the `publishDir` directive.
-By specifying this directive, we are telling Nextflow to automatically copy the output file to a designated output directory.
-This allows us to leave the working directory alone, while still having easy access to the desired output file.
-
-_Before:_
-
-```groovy title="hello-world.nf" linenums="6"
-process sayHello {
-
- output:
- path 'output.txt'
-```
-
-_After:_
-
-```groovy title="hello-world.nf" linenums="6"
-process sayHello {
-
- publishDir 'results', mode: 'copy'
-
- output:
- path 'output.txt'
-```
-
-!!! note
-
- There is a newer syntax option that makes it possible to declare and publish workflow-level outputs, documented [here](https://www.nextflow.io/docs/latest/workflow.html#publishing-outputs), which makes using `publishDir` at the process level redundant once your pipeline is fully operational.
- However, `publishDir` is still very useful during pipeline development; that is why we include it in this training series.
- This will also ensure that you can read and understand the large number of pipelines that have already been written with `publishDir`.
-
- You'll learn how to use the workflow-level outputs syntax later in this training series.
-
-### 3.5. Run the workflow again
-
-```bash
-nextflow run hello-world.nf
-```
-
-The log output should start looking very familiar:
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `hello-world.nf` [mighty_lovelace] DSL2 - revision: 6654bc1327
-
-executor > local (1)
-[10/15498d] sayHello [100%] 1 of 1 ✔
-```
-
-This time, Nextflow will have created a new directory called `results/`.
-In this directory is our `output.txt` file.
-If you check the contents it should match the output in our work/task directory.
-This is how we move results files outside of the working directories.
-
-### Takeaway
-
-You know how to send outputs to a specific named file and use the `publishDir` directive to move files outside of the Nextflow working directory.
-
-### What's next?
-
-Learn how to make Nextflow resume running a pipeline using cached results from a prior run to skip any steps it had already completed successfully.
-
----
-
-## 4. Use the Nextflow resume feature
-
-Nextflow has an option called `-resume` that allows you to re-run a pipeline you've already launched previously.
-When launched with `-resume` any processes that have already been run with the exact same code, settings and inputs will be skipped.
-Using this mode means Nextflow will only run processes that are either new, have been modified or are being provided new settings or inputs.
-
-There are two key advantages to doing this:
-
-- If you're in the middle of developing your pipeline, you can iterate more rapidly since you only effectively have to run the process(es) you're actively working on in order to test your changes.
-- If you're running a pipeline in production and something goes wrong, in many cases you can fix the issue and relaunch the pipeline, and it will resume running from the point of failure, which can save you a lot of time and compute.
-
-### 4.1. Run the workflow again with `-resume`
-
-```bash
-nextflow run hello-world.nf -resume
-```
-
-The console output should look similar.
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `hello-world.nf` [thirsty_gautier] DSL2 - revision: 6654bc1327
-
-[10/15498d] sayHello [100%] 1 of 1, cached: 1 ✔
-```
-
-Notice the additional `cached:` bit in the process status line, which means that Nextflow has recognized that it has already done this work and simply re-used the result from the last run.
-
-!!! note
-
- When your re-run a pipeline with `resume`, Nextflow does not overwrite any files written to a publishDir directory by any process call that was previously run successfully.
-
-### Takeaway
-
-You know how to to relaunch a pipeline without repeating steps that were already run in an identical way.
-
-### What's next?
-
-Learn how to add in variable inputs.
-
----
-
-## 5. Add in variable inputs using a channel
-
-So far, we've been emitting a greeting hardcoded into the process command.
-Now we're going to add some flexibility by using an input variable, so that we can easily change the greeting.
-
-This requires us to make a series of inter-related changes:
-
-1. Tell the process about expected variable inputs using the `input:` block
-2. Edit the process to use the input
-3. Create a **channel** to pass input to the process (more on that in a minute)
-4. Add the channel as input to the process call
-
-### 5.1. Add an input definition to the process block
-
-First we need to adapt the process definition to accept an input.
-
-_Before:_
-
-```groovy title="hello-world.nf" linenums="6"
-process sayHello {
-
- publishDir 'results', mode: 'copy'
-
- output:
- path "output.txt"
-```
-
-_After:_
-
-```groovy title="hello-world.nf" linenums="6"
-process sayHello {
-
- publishDir 'results', mode: 'copy'
-
- input:
- val greeting
-
- output:
- path "output.txt"
-```
-
-### 5.2. Edit the process command to use the input variable
-
-Now we swap the original hardcoded value for the input variable.
-
-_Before:_
-
-```groovy title="hello-world.nf" linenums="16"
-"""
-echo 'Hello World!' > output.txt
-"""
-```
-
-_After:_
-
-```groovy title="hello-world.nf" linenums="16"
-"""
-echo '$greeting' > output.txt
-"""
-```
-
-### 5.3. Create an input channel
-
-Now that our process expects an input, we need to set up that input in the workflow body.
-This is where channels come in: Nextflow uses channels to feed inputs to processes and ferry data between processes that are connected together.
-
-There are multiple ways to do this, but for now, we're just going to use the simplest possible channel, containing a single value.
-
-We're going to create the channel using the `of()` channel factory, which sets up a simple value channel, and give it a hardcoded string to use as greeting by declaring `greeting_ch = Channel.of('Hello world!')`.
-
-_Before:_
-
-```groovy title="hello-world.nf" linenums="21"
-workflow {
-
- // emit a greeting
- sayHello()
-}
-```
-
-_After:_
-
-```groovy title="hello-world.nf" linenums="21"
-workflow {
-
- // create a channel for inputs
- greeting_ch = Channel.of('Hello world!')
-
- // emit a greeting
- sayHello()
-}
-```
-
-### 5.4. Add the channel as input to the process call
-
-Now we need to actually plug our newly created channel into the `sayHello()` process call.
-
-_Before:_
-
-```groovy title="hello-world.nf" linenums="26"
-// emit a greeting
-sayHello()
-```
-
-_After:_
-
-```groovy title="hello-world.nf" linenums="26"
-// emit a greeting
-sayHello(greeting_ch)
-```
-
-### 5.5. Run the workflow command again
-
-Let's run it!
-
-```bash
-nextflow run hello-world.nf
-```
-
-If you made all four edits correctly, you should get another successful execution:
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `hello-world.nf` [prickly_avogadro] DSL2 - revision: b58b6ab94b
-
-executor > local (1)
-[1f/50efd5] sayHello (1) [100%] 1 of 1 ✔
-```
-
-Feel free to check the results directory to satisfy yourself that the outcome is still the same as previously; so far we're just progressively tweaking the internal plumbing to increase the flexibility of our workflow while achieving the same end result.
-
-### Takeaway
-
-You know how to use a simple channel to provide an input to a process.
-
-### What's next?
-
-Learn how to pass inputs from the command line.
-
----
-
-## 6. Use CLI parameters for inputs
-
-We want to be able to specify the input from the command line, since that is the piece that will almost always be different in subsequent runs of the workflow.
-Good news: Nextflow has a built-in workflow parameter system called `params`, which makes it easy to declare and use CLI parameters.
-
-### 6.1. Edit the input channel declaration to use a parameter
-
-Here we swap out the hardcoded string for `params.greeting` in the channel creation line.
-
-_Before:_
-
-```groovy title="hello-world.nf" linenums="23"
-// create a channel for inputs
-greeting_ch = Channel.of('Hello world!')
-```
-
-_After:_
-
-```groovy title="hello-world.nf" linenums="23"
-// create a channel for inputs
-greeting_ch = Channel.of(params.greeting)
-```
-
-This automatically creates a parameter called `greeting` that you can use to provide a value in the command line.
-
-### 6.2. Run the workflow again with the `--greeting` parameter
-
-To provide a value for this parameter, simply add `--greeting ` to your command line.
-
-```bash
-nextflow run hello-world.nf --greeting 'Bonjour le monde!'
-```
-
-Running this should feel extremely familiar by now.
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `hello-world.nf` [cheesy_engelbart] DSL2 - revision: b58b6ab94b
-
-executor > local (1)
-[1c/9b6dc9] sayHello (1) [100%] 1 of 1 ✔
-```
-
-Be sure to open up the output file to check that you now have the new version of the greeting. Voilà!
-
-!!! tip
-
- It's helpful to distinguish Nextflow-level parameters from pipeline-level parameters.
- For parameters that apply to a pipeline, we use a double hyphen (`--`), whereas we use a single hyphen (`-`) for parameters that modify a specific Nextflow setting, _e.g._ the `-resume` feature we used earlier.
-
-### 6.3. Set a default value for a command line parameter
-
-In many cases, it makes sense to supply a default value for a given parameter so that you don't have to specify it for every run.
-
-Let's initialize the `greeting` parameter with a default value by adding the parameter declaration at the top of the script (with a comment block as a free bonus).
-
-```groovy title="hello-world.nf" linenums="3"
-/*
- * Pipeline parameters
- */
-params.greeting = "Holà mundo!"
-```
-
-### 6.4. Run the workflow again without specifying the parameter
-
-Now that you have a default value set, you can run the workflow again without having to specify a value in the command line.
-
-```bash
-nextflow run hello-world.nf
-```
-
-The output should look the same.
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `hello-world.nf` [wise_waddington] DSL2 - revision: 988fc779cf
-
-executor > local (1)
-[c0/8b8332] sayHello (1) [100%] 1 of 1 ✔
-```
-
-Check the output in the results directory, and... Tadaa! It works! Nextflow used the default value to name the output. But wait, what happens now if we provide the parameter in the command line?
-
-### 6.5. Run the workflow again with the `--greeting` parameter on the command line using a different greeting
-
-```bash
-nextflow run hello-world.nf --greeting 'Konnichiwa!'
-```
-
-Nextflow's not complaining, that's a good sign:
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `hello-world.nf` [prickly_miescher] DSL2 - revision: 988fc779cf
-
-executor > local (1)
-[56/f88a56] sayHello (1) [100%] 1 of 1 ✔
-```
-
-Check the results directory and look at the contents of `output.txt`. Tadaa again!
-
-The value of the parameter we passed on the command line overrode the value we gave the variable in the script. In fact, parameters can be set in several different ways; if the same parameter is set in multiple places, its value is determined based on the order of precedence that is described [here](https://www.nextflow.io/docs/latest/config.html).
-
-!!! tip
-
- You can put the parameter declaration inside the workflow block if you prefer. Whatever you choose, try to group similar things in the same place so you don't end up with declarations all over the place.
-
-### Takeaway
-
-You know how to set up an input variable for a process and supply a value in the command line.
-
-### What's next?
-
-Learn how to add in a second process and chain them together.
-
----
-
-## 7. Add a second step to the workflow
-
-Most real-world workflows involve more than one step. Here we introduce a second process that converts the text to uppercase (all-caps), using the classic UNIX one-liner:
-
-```bash
-tr '[a-z]' '[A-Z]'
-```
-
-We're going to run the command by itself in the terminal first to verify that it works as expected without any of the workflow code getting in the way of clarity, just like we did at the start with `echo 'Hello World'`. Then we'll write a process that does the same thing, and finally we'll connect the two processes so the output of the first serves as input to the second.
-
-### 7.1. Run the command in the terminal by itself
-
-```bash
-echo 'Hello World' | tr '[a-z]' '[A-Z]'
-```
-
-The output is simply the uppercase version of the text string:
-
-```console title="Output"
-HELLO WORLD
-```
-
-!!! note
-
- This is a very naive text replacement one-liner that does not account for accented letters, so for example 'Holà' will become 'HOLà'. This is expected.
-
-### 7.2. Make the command take a file as input and write the output to a file
-
-As previously, we want to output results to a dedicated file, which we name by prepending the original filename with `UPPER-`.
-
-```bash
-cat output.txt | tr '[a-z]' '[A-Z]' > UPPER-output.txt
-```
-
-Now the `HELLO WORLD` output is in the new output file, `UPPER-output.txt`.
-
-### 7.3. Wrap the command in a new Nextflow process definition
-
-We can model our new process on the first one, since we want to use all the same components.
-
-```groovy title="hello-world.nf" linenums="26"
-/*
- * Use a text replace utility to convert the greeting to uppercase
- */
-process convertToUpper {
-
- publishDir 'results', mode: 'copy'
-
- input:
- path input_file
-
- output:
- path "UPPER-${input_file}"
-
- script:
- """
- cat '$input_file' | tr '[a-z]' '[A-Z]' > UPPER-${input_file}
- """
-}
-```
-
-As a little bonus, here we composed the second output filename based on the first one.
-
-!!! tip
-
- Very important to remember: you have to use double quotes around the output filename expression (NOT single quotes) or it will fail.
-
-### 7.4. Add a call to the new process in the workflow body
-
-Don't forget we need to tell Nextflow to actually call the process we just created! To do that, we add it to the `workflow` body.
-
-```groovy title="hello-world.nf" linenums="44"
-workflow {
-
- // create a channel for inputs
- greeting_ch = Channel.of(params.greeting)
-
- // emit a greeting
- sayHello(greeting_ch)
-
- // convert the greeting to uppercase
- convertToUpper()
-}
-```
-
-Looking good! But we still need to wire up the `convertToUpper` process call to run on the output of `sayHello`.
-
-### 7.5. Pass the output of the first process to the second process
-
-The output of the `sayHello` process is automatically packaged as a channel called `sayHello.out`, so all we need to do is pass that as the input to the `convertToUpper` process.
-
-```groovy title="hello-world.nf" linenums="52"
-// convert the greeting to uppercase
-convertToUpper(sayHello.out)
-```
-
-For a simple case like this, that's all we need to do to connect two processes!
-
-### 7.6. Run the same workflow command as before
-
-Let's make sure this works:
-
-```bash
-nextflow run hello-world.nf --greeting 'Hello World!'
-```
-
-Oh, how exciting! There is now an extra line in the log output, which corresponds to the new process we just added:
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `hello-world.nf` [magical_brenner] DSL2 - revision: 0e18f34798
-
-executor > local (2)
-[57/3836c0] sayHello (1) [100%] 1 of 1 ✔
-[ee/bb3cc8] convertToUpper (1) [100%] 1 of 1 ✔
-```
-
-You'll notice that this time the workflow produced two new work subdirectories; one per process call.
-Check out the work directory of the call to the second process, where you should find two different output files listed. If you look carefully, you'll notice one of them (the output of the first process) has a little arrow icon on the right; that signifies it's a symbolic link.
-It points to the location where that file lives in the work directory of the first process.
-By default, Nextflow uses symbolic links to stage input files whenever possible, to avoid making duplicate copies.
-
-!!! note
-
- All we did was connect the output of `sayHello` to the input of `convertToUpper` and the two processes could be run in serial.
- Nextflow did the hard work of handling input and output files and passing them between the two commands for us.
- This is the power of channels in Nextflow, doing the busywork of connecting our pipeline steps together.
-
- What's more, Nextflow will automatically determine which call needs to be executed first based on how they're connected, so the order in which they're written in the workflow body does not matter.
- However, we do recommend you be kind to your collaborators and to your future self, and try to write them in a logical order!
-
-### Takeaway
-
-You know how to add a second step that takes the output of the first step as input.
-
-### What's next?
-
-Learn how to make the workflow run on a batch of input values.
-
----
-
-## 8. Modify the workflow to run on a batch of input values
-
-Workflows typically run on batches of inputs that are meant to be processed in bulk, so we want to upgrade the workflow to accept multiple input values.
-
-Conveniently, the `of()` channel factory we've been using is quite happy to accept more than one value, so we don't need to modify that at all; we just have to load more values into the channel.
-
-### 8.1. Load multiple greetings into the input channel
-
-To keep things simple, we go back to hardcoding the greetings in the channel factory instead of using a parameter for the input, but we'll improve on that shortly.
-
-_Before:_
-
-```groovy title="hello-world.nf" linenums="46"
-// create a channel for inputs
-greeting_ch = Channel.of(params.greeting)
-```
-
-_After:_
-
-```groovy title="hello-world.nf" linenums="46"
-// create a channel for inputs
-greeting_ch = Channel.of('Hello','Bonjour','Holà')
-```
-
-The documentation tells us this should work. Can it really be so simple?
-
-### 8.2. Run the command and look at the log output
-
-Let's try it.
-
-```bash
-nextflow run hello-world.nf
-```
-
-Well, it certainly seems to run just fine.
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `hello-world.nf` [lonely_pare] DSL2 - revision: b9f1d96905
-
-executor > local (6)
-[3d/1fe62c] sayHello (2) [100%] 3 of 3 ✔
-[86/695813] convertToUpper (3) [100%] 3 of 3 ✔
-```
-
-However... This seems to indicate that '3 of 3' calls were made for each process, which is encouraging, but this only give us one subdirectory path for each. What's going on?
-
-By default, the ANSI logging system writes the logging from multiple calls to the same process on the same line. Fortunately, we can disable that behavior.
-
-### 8.3. Run the command again with the `-ansi-log false` option
-
-To expand the logging to display one line per process call, just add `-ansi-log false` to the command.
-
-```bash
-nextflow run hello-world.nf -ansi-log false
-```
-
-This time we see all six work subdirectories listed in the output:
-
-```console title="Output"
-N E X T F L O W ~ version 24.02.0-edge
-Launching `hello-world.nf` [big_woese] DSL2 - revision: 53f20aeb70
-[62/d81e63] Submitted process > sayHello (1)
-[19/507af3] Submitted process > sayHello (2)
-[8a/3126e6] Submitted process > sayHello (3)
-[12/48a5c6] Submitted process > convertToUpper (1)
-[73/e6e746] Submitted process > convertToUpper (2)
-[c5/4fedda] Submitted process > convertToUpper (3)
-```
-
-That's much better; at least for this number of processes.
-For a complex workflow, or a large number of inputs, having the full list output to the terminal might get a bit overwhelming.
-
-That being said, we have another problem. If you look in the `results` directory, there are only two files: `output.txt` and `UPPER-output.txt`!
-
-```console title="Directory contents"
-results
-├── output.txt
-└── UPPER-output.txt
-```
-
-What's up with that? Shouldn't we be expecting two files per input greeting, so six files in all?
-
-You may recall that we hardcoded the output file name for the first process.
-This was fine as long as there was only a single call made per process, but when we start processing multiple input values and publishing the outputs into the same directory of results, it becomes a problem.
-For a given process, every call produces an output with the same file name, so Nextflow just overwrites the previous output file every time a new one is produced.
-
-### 8.4. Ensure the output file names will be unique
-
-Since we're going to be publishing all the outputs to the same results directory, we need to ensure they will have unique names.
-Specifically, we need to modify the first process to generate a file name dynamically so that the final file names will be unique.
-
-So how do we make the file names unique? A common way to do that is to use some unique piece of metadata as part of the file name.
-Here, for convenience, we'll just use the greeting itself.
-
-_Before:_
-
-```groovy title="hello-world.nf" linenums="11"
-process sayHello {
-
- publishDir 'results', mode: 'copy'
-
- input:
- val greeting
-
- output:
- path "output.txt"
-
- script:
- """
- echo '$greeting' > "output.txt"
- """
-}
-```
-
-_After:_
-
-```groovy title="hello-world.nf" linenums="11"
-process sayHello {
-
- publishDir 'results', mode: 'copy'
-
- input:
- val greeting
-
- output:
- path "${greeting}-output.txt"
-
- script:
- """
- echo '$greeting' > '$greeting-output.txt'
- """
-}
-```
-
-This should produce a unique output file name for every call of each process.
-
-### 8.5. Run the workflow and look at the results directory
-
-Let's run it and check that it works.
-
-```bash
-nextflow run hello-world.nf
-```
-
-Reverting back to the summary view, the output looks like this again:
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `hello-world.nf` [jovial_mccarthy] DSL2 - revision: 53f20aeb70
-
-executor > local (6)
-[03/f007f2] sayHello (1) [100%] 3 of 3 ✔
-[e5/dd2890] convertToUpper (3) [100%] 3 of 3 ✔
-```
-
-But more importantly, now we have six new files in addition to the two we already had in the `results` directory:
-
-```console title="Directory contents"
-results
-├── Bonjour-output.txt
-├── Hello-output.txt
-├── Holà-output.txt
-├── output.txt
-├── UPPER-Bonjour-output.txt
-├── UPPER-Hello-output.txt
-├── UPPER-Holà-output.txt
-└── UPPER-output.txt
-```
-
-Success! Now we can add as many greetings as we like without worrying about output files being overwritten.
-
-!!! note
-
- In practice, naming files based on the input data itself is almost always impractical. The better way to generate dynamic filenames is to use a samplesheet contain relevant metadata (such as unique sample IDs) and create a data structure called a 'map', which we pass to processes, and from which we can grab an appropriate identifier to generate the filenames.
- We'll show you how to do that later in this training course.
-
-### Takeaway
-
-You know how to feed a batch of multiple input elements through a channel.
-
-### What's next?
-
-Learn how to make the workflow take a file as its source of input values.
-
----
-
-## 9. Modify the workflow to take a file as its source of input values
-
-It's often the case that, when we want to run on a batch of multiple input elements, the input values are contained in a file.
-As an example, we have provided you with a CSV file called `greetings.csv` in the `data/` directory, containing several greetings separated by commas.
-
-```csv title="greetings.csv"
-Hello,Bonjour,Holà
-```
-
-So we just need to modify our workflow to read in the values from a file like that.
-
-### 9.1. Set up a CLI parameter with a default value pointing to an input file
-
-First, let's use the `params` system to set up a new parameter called `input_file`, replacing the now useless `greeting` parameter, with a default value pointing to the `greetings.csv` file.
-
-_Before:_
-
-```groovy title="hello-world.nf" linenums="6"
-/*
- * Pipeline parameters
- */
-params.greeting = "Holà mundo!"
-```
-
-_After:_
-
-```groovy title="hello-world.nf" linenums="6"
-/*
- * Pipeline parameters
- */
-params.input_file = "data/greetings.csv"
-```
-
-### 9.2. Update the channel declaration to handle the input file
-
-At this point we introduce a new channel factory, `fromPath()`, which has some built-in functionality for handling file paths.
-We're going to use that instead of the `of()` channel factory we used previously; the base syntax looks like this:
-
-```groovy title="channel construction syntax"
-Channel.fromPath(params.input_file)
-```
-
-Now, we are going to deploy a new concept, an 'operator' to transform that CSV file into channel content. You'll learn more about operators later, but for now just understand them as ways of transforming channels in a variety of ways.
-
-Since our goal is to read in the contents of a `.csv` file, we're going to add the `.splitCsv()` operator to make Nextflow parse the file contents accordingly, as well as the `.flatten()` operator to turn the array element produced by `.splitCsv()` into a channel of individual elements.
-
-So the channel construction instruction becomes:
-
-```groovy title="channel construction syntax"
-Channel.fromPath(params.input_file)
- .splitCsv()
- .flatten()
-```
-
-And here it is in the context of the workflow body:
-
-_Before:_
-
-```groovy title="hello-world.nf" linenums="46"
-// create a channel for inputs
-greeting_ch = Channel.of('Hello','Bonjour','Holà')
-```
-
-_After:_
-
-```groovy title="hello-world.nf" linenums="46"
-// create a channel for inputs from a CSV file
-greeting_ch = Channel.fromPath(params.input_file)
- .splitCsv()
- .flatten()
-```
-
-If you want to see the impact of `.flatten()`, we can make use of `.view()`, another operator, to demonstrate. Edit that section of code so it looks like:
-
-```groovy title="flatten usage"
-// create a channel for inputs from a CSV file
-greeting_ch = Channel.fromPath(params.input_file)
- .splitCsv()
- .view{ "After splitCsv: $it" }
- .flatten()
- .view{ "After flatten: $it" }
-```
-
-When you run this updated workflow, you'll see the difference:
-
-```console title="view output with and without flatten"
-After splitCsv: [Hello, Bonjour, Holà]
-After flatten: Hello
-After flatten: Bonjour
-After flatten: Holà
-[d3/1a6e23] Submitted process > sayHello (3)
-[8f/d9e431] Submitted process > sayHello (1)
-[e7/a088af] Submitted process > sayHello (2)
-[1a/776e2e] Submitted process > convertToUpper (1)
-[83/fb8eba] Submitted process > convertToUpper (2)
-[ee/280f93] Submitted process > convertToUpper (3)
-```
-
-As you can see, the `flatten()` operator has transformed the channel from containing arrays to containing individual elements. This can be useful when you want to process each item separately in your workflow.
-
-Remove the `.view()` operations before you continue.
-
-!!! tip
-
- While you're developing your pipeline, you can inspect the contents of any channel by adding the `.view()` operator to the name of the channel.
- For example, if you add `greeting_ch.view()` anywhere in the workflow body, when you run the script, Nextflow will print the channel contents to standard out.
-
- You can also use this to inspect the effect of the operators.
- For example, the output of `Channel.fromPath(params.input_file).splitCsv().view()` will look like this:
-
- ```console title="Output"
- [Hello, Bonjour, Holà]
- ```
-
- While the output of `Channel.fromPath(params.input_file).splitCsv().flatten().view()` will look like this:
-
- ```console title="Output"
- Hello
- Bonjour
- Holà
- ```
-
-### 9.3. Run the workflow (one last time!)
-
-```bash
-nextflow run hello-world.nf
-```
-
-Once again we see each process get executed three times:
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `hello-world.nf` [angry_spence] DSL2 - revision: d171cc0193
-
-executor > local (6)
-[0e/ceb175] sayHello (2) [100%] 3 of 3 ✔
-[01/046714] convertToUpper (3) [100%] 3 of 3 ✔
-```
-
-Looking at the outputs, we see each greeting was correctly extracted and processed through the workflow. We've achieved the same result as the previous step, but now we have a lot more flexibility to add more elements to the channel of greetings we want to process.
-
-### Takeaway
-
-You know how to provide the input values to the workflow via a file.
-
-More generally, you've learned how to use the essential components of Nextflow and you have a basic grasp of the logic of how to build a workflow and manage inputs and outputs.
-
-### What's next?
-
-Celebrate your success and take a break!
-
-Don't worry if the channel types and operators feel like a lot to grapple with the first time you encounter them.
-You'll get more opportunities to practice using these components in various settings as you work through this training course.
-
-When you're ready, move on to Part 2 to learn about another important concept: provisioning the software required for each process.
diff --git a/docs/hello_nextflow/03_hello_containers.md b/docs/hello_nextflow/03_hello_containers.md
deleted file mode 100644
index ec975b17a..000000000
--- a/docs/hello_nextflow/03_hello_containers.md
+++ /dev/null
@@ -1,471 +0,0 @@
-# Part 2: Hello Containers
-
-In Part 1, you learned how to use the basic building blocks of Nextflow to assemble a simple pipeline capable of processing some text and parallelizing execution if there were multiple inputs.
-
-However, you were limited to basic UNIX tools available in your environment.
-Real-world tasks often require various tools and packages not included by default.
-Typically, you'd need to install these tools, manage their dependencies, and resolve any conflicts.
-
-That is all very tedious and annoying, so we're going to show you how to use **containers** to solve this problem much more conveniently.
-
-!!! Note
-
- We'll be teaching this using the technology [Docker](https://www.docker.com/get-started/), but Nextflow supports [several other container technologies](https://www.nextflow.io/docs/latest/container.html#) as well.
-
----
-
-## 1. Use a container directly
-
-A **container** is a lightweight, standalone, executable unit of software created from a container **image** that includes everything needed to run an application including code, system libraries and settings.
-To use a container you usually download or "pull" a container image from a container registry, and then run the container image to create a container instance.
-
-### 1.1. Pull the container image
-
-Let's pull a container image that contains the `cowsay` command so we can use it to display some text in a fun way.
-
-```bash
-docker pull 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65'
-```
-
-### 1.2 Use the container to execute a single command
-
-The `docker run` command is used to spin up a container instance from a container image and execute a command in it.
-The `--rm` flag tells Docker to remove the container instance after the command has completed.
-
-```bash
-docker run --rm 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65' cowsay -t "Hello World"
-```
-
-```console title="Output"
- _____________
-< Hello World >
- -------------
- \ ^__^
- \ (oo)\_______
- (__)\ )\/\
- ||----w |
- || ||
-```
-
-### 1.2. Spin up the container interactively
-
-You can also run a container interactively, which will give you a shell prompt inside the container.
-
-```bash
-docker run --rm -it 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65' /bin/bash
-```
-
-Notice that the prompt has changed to `(base) root@b645838b3314:/tmp#`, which indicates that you are now inside the container.
-If we run:
-
-```console title="Output"
-(base) root@b645838b3314:/tmp# ls /
-bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var
-```
-
-You can see that the filesystem inside the container is different from the filesystem on your host system.
-
-### 1.3. Run the command
-
-Now that you are inside the container, you can run the `cowsay` command directly.
-
-!!! Tip
-
- Us the '-c' flag to pick a different "cow" from this list:
- `beavis`, `cheese`, `cow`, `daemon`, `dragon`, `fox`, `ghostbusters`, `kitty`, `meow`, `miki`, `milk`, `octopus`, `pig`, `stegosaurus`, `stimpy`, `trex`, `turkey`, `turtle`, `tux`
-
-```bash
-cowsay -t "Hello World" -c tux
-```
-
-Output:
-
-```console title="Output"
- ___________
-| Hello World |
- ===========
- \
- \
- \
- .--.
- |o_o |
- |:_/ |
- // \ \
- (| | )
- /'\_ _/`\
- \___)=(___/
-```
-
-### 1.4. Exit the container
-
-To exit the container, you can type `exit` at the prompt or use the ++ctrl+d++ keyboard shortcut.
-
-```bash
-exit
-```
-
-Your prompt should now be back to what it was before you started the container.
-
-### 1.5. Mounting data into containers
-
-When you run a container, it is isolated from the host system by default.
-This means that the container can't access any files on the host system unless you explicitly tell it to.
-One way to do this is to **mount** a **volume** from the host system into the container.
-
-Prior to working on the next task, confirm that you are in the `hello-nextflow` directory. The last part of the path shown when you type `pwd` should be `hello-nextflow`.
-
-Then run:
-
-```bash
-docker run --rm -it -v $(pwd)/containers/data:/data 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65' /bin/bash
-```
-
-Let's explore the contents of the container.
-Note that we need to navigate to the `/data` directory inside the container to see the contents of the `data` directory on the host system.
-
-```console title="Output"
-(base) root@08dd2d3efbd4:/tmp# ls
-conda.yml environment.lock
-(base) root@08dd2d3efbd4:/tmp# cd /data
-(base) root@08dd2d3efbd4:/data# ls
-greetings.csv pioneers.csv
-```
-
-### 1.6. Use the mounted data
-
-Now that we have mounted the `data` directory into the container, we can use the `cowsay` command to display the contents of the `greetings.csv` file.
-To do this we'll use the syntax `-t "$(cat data/greetings.csv)"` to output the contents of the file into the `cowsay` command.
-
-```bash
-cowsay -t "$(cat /data/greetings.csv)" -c pig
-```
-
-Output:
-
-```console title="Output"
- __________________
-| Hello,Bonjour,Holà |
- ==================
- \
- \
- \
- \
- ,.
- (_|,.
- ,' /, )_______ _
- __j o``-' `.'-)'
- (") \'
- `-j |
- `-._( /
- |_\ |--^. /
- /_]'|_| /_)_/
- /_]' /_]'
-```
-
-Now exit the container once again:
-
-```bash
-exit
-```
-
-### Takeaway
-
-You know how to pull a container and run it interactively, make your data accessible to it, which lets you try commands without having to install any software on your system.
-
-### What's next?
-
-Learn how to get a container image for any pip/conda-installable tool.
-
----
-
-## 2. Use containers in Nextflow
-
-Nextflow has built-in support for running processes inside containers to let you run tools you don't have installed in your compute environment.
-This means that you can use any container image you like to run your processes, and Nextflow will take care of pulling the image, mounting the data, and running the process inside it.
-
-### 2.1. Add a container directive to your process
-
-Edit the `hello-containers.nf` script to add a `container` directive to the `cowsay` process.
-
-_Before:_
-
-```groovy title="hello-containers.nf"
-process cowSay {
-
- publishDir 'containers/results', mode: 'copy'
-```
-
-_After:_
-
-```groovy title="hello-containers.nf"
-process cowSay {
-
- publishDir 'containers/results', mode: 'copy'
- container 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65'
-```
-
-### 2.2. Run Nextflow pipelines using containers
-
-Run the script to see the container in action.
-
-```bash
-nextflow run hello-containers.nf
-```
-
-!!! NOTE
-
- The `nextflow.config` in our current working directory contains `docker.enabled = true`, which tells Nextflow to use Docker to run processes.
- Without that configuration we would have to specify the `-with-docker` flag when running the script.
-
-### 2.3. Check the results
-
-You should see a new directory called `containers/results` that contains the output of the `cowsay` process.
-
-```console title="containers/results/cowsay-output-Bonjour.txt"
- _______
-| Bonjour |
- =======
- \
- \
- ^__^
- (oo)\_______
- (__)\ )\/\
- ||----w |
- || ||
-```
-
-### 2.4. Explore how Nextflow launched the containerized task
-
-Let's take a look at the task directory for one of the cowsay tasks to see how Nextflow works with containers under the hood.
-
-Check the output from your `nextflow run` command to find the task ID for the `cowsay` process.
-Then check out the task directory for that task.
-
-```bash
-tree -a work/8c/738ac55b80e7b6170aa84a68412454
-work/8c/738ac55b80e7b6170aa84a68412454
-├── .command.begin
-├── .command.err
-├── .command.log
-├── .command.out
-├── .command.run
-├── .command.sh
-├── .exitcode
-├── cowsay-output-Bonjour.txt
-└── output-Bonjour.txt -> /workspace/gitpod/nf-training/hello-nextflow/work/0e/e96c123cb7ae9ff7b7bed1c5444009/output-Bonjour.txt
-
-1 directory, 9 files
-```
-
-Open the `.command.run` file which holds all the busywork that Nextflow does under the hood.
-
-```bash
-code work/8c/738ac55b80e7b6170aa84a68412454/.command.run
-```
-
-Search for `nxf_launch` and you should see something like this:
-
-```bash
-nxf_launch() {
- docker run -i --cpu-shares 1024 -e "NXF_TASK_WORKDIR" -v /workspace/gitpod/nf-training/hello-nextflow/work:/workspace/gitpod/nf-training/hello-nextflow/work -w "$NXF_TASK_WORKDIR" --name $NXF_BOXID community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65 /bin/bash -ue /workspace/gitpod/nf-training/hello-nextflow/work/8c/738ac55b80e7b6170aa84a68412454/.command.sh
-}
-```
-
-As you can see, Nextflow is using the `docker run` command to launch the task.
-It also mounts the task's working directory into the container, sets the working directory inside the container to the task's working directory, and runs our templated bash script in the `.command.sh` file.
-All the hard work we learned about in the previous sections is done for us by Nextflow!
-
-### Takeaway
-
-You know how to use containers in Nextflow to run processes.
-
-### What's next?
-
-You have everything you need to continue to the [next chapter](./04_hello_genomics.md) of this training series.
-Optionally, continue on to learn how to get container images for tools you want to use in your Nextflow pipelines.
-
----
-
-## 3. Optional Topic: How to find or make container images
-
-Some software developers provide container images for their software that are available on container registries like Docker Hub, but many do not.
-In this optional section, we'll show you to two ways to get a container image for tools you want to use in your Nextflow pipelines: using Seqera Containers and building the container image yourself.
-
-You'll be getting/building a container image for the `quote` pip package, which will be used in the exercise at the end of this section.
-
-### 3.1. Get a container image from Seqera Containers
-
-Seqera Containers is a free service that builds container images for pip and conda (including bioconda) installable tools.
-Navigate to [Seqera Containers](https://www.seqera.io/containers/) and search for the `quote` pip package.
-
-![Seqera Containers](img/seqera-containers-1.png)
-
-Click on "+Add" and then "Get Container" to request a container image for the `quote` pip package.
-
-![Seqera Containers](img/seqera-containers-2.png)
-
-If this is the first time a community container has been built for this version of the package, it may take a few minutes to complete.
-Click to copy the URI (e.g. `community.wave.seqera.io/library/pip_quote:ae07804021465ee9`) of the container image that was created for you.
-
-You can now use the container image to run the `quote` command and get a random saying from Grace Hopper.
-
-```bash
-docker run --rm community.wave.seqera.io/library/pip_quote:ae07804021465ee9 quote "Grace Hopper"
-```
-
-Output:
-
-```console title="Output"
-Humans are allergic to change. They love to say, 'We've always done it
-this way.' I try to fight that. That's why I have a clock on my wall
-that runs counter-clockwise.
-```
-
-### 3.2. Build the container image yourself
-
-Let's use some build details from the Seqera Containers website to build the container image for the `quote` pip package ourselves.
-Return to the Seqera Containers website and click on the "Build Details" button.
-
-The first item we'll look at is the `Dockerfile`, a type of script file that contains all the commands needed to build the container image.
-We've added some explanatory comments to the Dockerfile below to help you understand what each part does.
-
-```Dockerfile title="Dockerfile"
-# Start from the micromamba base docker image
-FROM mambaorg/micromamba:1.5.10-noble
-# Copy the conda.yml file into the container
-COPY --chown=$MAMBA_USER:$MAMBA_USER conda.yml /tmp/conda.yml
-# Install various utilities for Nextflow to use and the packages in the conda.yml file
-RUN micromamba install -y -n base -f /tmp/conda.yml \
- && micromamba install -y -n base conda-forge::procps-ng \
- && micromamba env export --name base --explicit > environment.lock \
- && echo ">> CONDA_LOCK_START" \
- && cat environment.lock \
- && echo "<< CONDA_LOCK_END" \
- && micromamba clean -a -y
-# Run the container as the root user
-USER root
-# Set the PATH environment variable to include the micromamba installation directory
-ENV PATH="$MAMBA_ROOT_PREFIX/bin:$PATH"
-```
-
-The second item we'll look at is the `conda.yml` file, which contains the list of packages that need to be installed in the container image.
-
-```conda.yml title="conda.yml"
-channels:
-- conda-forge
-- bioconda
-dependencies:
-- pip
-- pip:
- - quote==3.0.0 #
-```
-
-Copy the contents of these files into the stubs located in the `containers/build` directory, then run the following command to build the container image yourself.
-
-!!! Note
-
- We use the `-t quote:latest` flag to tag the container image with the name `quote` and the tag `latest`.
- We will be able to use this tag to refer to the container image when running it on this system.
-
-```bash
-docker build -t quote:latest containers/build
-```
-
-After it has finished building, you can run the container image you just built.
-
-```bash
-docker run --rm quote:latest quote "Margaret Oakley Dayhoff"
-```
-
-### Takeaway
-
-You've learned two different ways to get a container image for a tool you want to use in your Nextflow pipelines: using Seqera Containers and building the container image yourself.
-
-### What's next?
-
-You have everything you need to continue to the [next chapter](./04_hello_genomics.md) of this training series.
-You can also continue on with an optional exercise to fetch quotes on computer/biology pioneers using the `quote` container and output them using the `cowsay` container.
-
----
-
-## 4. Bonus Exercise: Make the cow quote famous scientists
-
-This section contains some stretch exercises, to practice what you've learned so far.
-Doing these exercises is _not required_ to understand later parts of the training, but provide a fun way to reinforce your learnings by figuring out how to make the cow quote famous scientists.
-
-```console title="cowsay-output-Grace-Hopper.txt"
- _________________________________________________
- / \
-| Humans are allergic to change. They love to |
-| say, 'We've always done it this way.' I try to fi |
-| ght that. That's why I have a clock on my wall th |
-| at runs counter-clockwise. |
-| -Grace Hopper |
- \ /
- =================================================
- \
- \
- ^__^
- (oo)\_______
- (__)\ )\/\
- ||----w |
- || ||
-```
-
-### 4.1. Modify the `hello-containers.nf` script to use a getQuote process
-
-We have a list of computer and biology pioneers in the `containers/data/pioneers.csv` file.
-At a high level, to complete this exercise you will need to:
-
-- Modify the default `params.input_file` to point to the `pioneers.csv` file.
-- Create a `getQuote` process that uses the `quote` container to fetch a quote for each input.
-- Connect the output of the `getQuote` process to the `cowsay` process to display the quote.
-
-For the `quote` container image, you can either use the one you built yourself in the previous stretch exercise or use the one you got from Seqera Containers .
-
-!!! Hint
-
- A good choice for the `script` block of your getQuote process might be:
- ```groovy
- script:
- def safe_author = author.tokenize(' ').join('-')
- """
- quote "$author" > quote-${safe_author}.txt
- echo "-${author}" >> quote-${safe_author}.txt
- """
- ```
-
-You can find a solution to this exercise in `containers/solutions/hello-containers-4.1.nf`.
-
-### 4.2. Modify your Nextflow pipeline to allow it to execute in `quote` and `sayHello` modes.
-
-Add some branching logic using to your pipeline to allow it to accept inputs intended for both `quote` and `sayHello`.
-Here's an example of how to use an `if` statement in a Nextflow workflow:
-
-```groovy title="hello-containers.nf"
-workflow {
- if (params.quote) {
- ...
- }
- else {
- ...
- }
- cowSay(text_ch)
-}
-```
-
-!!! Hint
-
- You can use `new_ch = processName.out` to assign a name to the output channel of a process.
-
-You can find a solution to this exercise in `containers/solutions/hello-containers-4.2.nf`.
-
-### Takeaway
-
-You know how to use containers in Nextflow to run processes, and how to build some branching logic into your pipelines!
-
-### What's next?
-
-Celebrate, take a stretch break and drink some water!
-
-When you are ready, move on to Part 3 of this training series to learn how to apply what you've learned so far to a more realistic data analysis use case.
diff --git a/docs/hello_nextflow/03_hello_workflow.md b/docs/hello_nextflow/03_hello_workflow.md
new file mode 100644
index 000000000..665d3e8a3
--- /dev/null
+++ b/docs/hello_nextflow/03_hello_workflow.md
@@ -0,0 +1,861 @@
+# Part 3: Hello Workflow
+
+Most real-world workflows involve more than one step.
+In this training module, you'll learn how to connect processes together in a multi-step workflow.
+
+This will teach you the Nextflow way of achieving the following:
+
+1. Making data flow from one process to the next
+2. Collecting outputs from multiple process calls into a single process call
+3. Passing more than one input to a process
+4. Handling multiple outputs coming out of a process
+
+To demonstrate, we will continue building on the domain-agnostic Hello World example from Parts 1 and 2.
+This time, we're going to make the following changes to our workflow to better reflect how people build actual workflows:
+
+1. Add a second step that converts the greeting to uppercase.
+2. Add a third step that collects all the transformed greetings and writes them into a single file.
+3. Add a parameter to name the final output file and pass that as a secondary input to the collection step.
+4. Make the collection step also output a simple statistic about what was processed.
+
+---
+
+## 0. Warmup: Run `hello-workflow.nf`
+
+We're going to use the workflow script `hello-workflow.nf` as a starting point.
+It is equivalent to the script produced by working through Part 2 of this training course.
+
+Just to make sure everything is working, run the script once before making any changes:
+
+```bash
+nextflow run hello-workflow.nf
+```
+
+```console title="Output"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-workflow.nf` [stupefied_sammet] DSL2 - revision: b9e466930b
+
+executor > local (3)
+[2a/324ce6] sayHello (3) | 3 of 3 ✔
+```
+
+As previously, you will find the output files in the `results` directory (specified by the `publishDir` directive).
+
+```console title="Directory contents"
+results
+├── Bonjour-output.txt
+├── Hello-output.txt
+└── Holà-output.txt
+```
+
+!!! note
+
+ There may also be a file named `output.txt` left over if you worked through Part 2 in the same environment.
+
+If that worked for you, you're ready to learn how to assemble a multi-step workflow.
+
+---
+
+## 1. Add a second step to the workflow
+
+We're going to add a step to convert the greeting to uppercase.
+To that end, we need to do three things:
+
+- Define the command we'lre going to use to do the uppercase conversion.
+- Write a new process that wraps the uppercasing command.
+- Add the new process to the workflow and set it up to take the output of the `sayHello()` process as input.
+
+### 1.1. Define the uppercasing command and test it in the terminal
+
+To do the conversion of the greetings to uppercase, we're going to a classic UNIX tool called `tr` for 'text replacement', with the following syntax:
+
+```bash title="Syntax"
+tr '[a-z]' '[A-Z]'
+```
+
+This is a very naive text replacement one-liner that does not account for accented letters, so for example 'Holà' will become 'HOLà', but it will do a good enough job for demonstrating the Nextflow concepts and that's what matters.
+
+To test it out, we can run the `echo 'Hello World'` command and pipe its output to the `tr` command:
+
+```bash
+echo 'Hello World' | tr '[a-z]' '[A-Z]' > UPPER-output.txt
+```
+
+The output is a text file called `UPPER-output.txt` that contains the uppercase version of the `Hello World` string:
+
+```console title="UPPER-output.txt"
+HELLO WORLD
+```
+
+That's basically what we're going to try to do with our workflow.
+
+### 1.1. Write the uppercasing step as a Nextflow process
+
+We can model our new process on the first one, since we want to use all the same components.
+
+Add the following process definition to the workflow script:
+
+```groovy title="hello-workflow.nf" linenums="22"
+/*
+ * Use a text replacement tool to convert the greeting to uppercase
+ */
+process convertToUpper {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ path input_file
+
+ output:
+ path "UPPER-${input_file}"
+
+ script:
+ """
+ cat '$input_file' | tr '[a-z]' '[A-Z]' > 'UPPER-${input_file}'
+ """
+}
+```
+
+Here, we compose the second output filename based on the input filename, similarly to what we did originally for the output of the first process.
+
+!!! note
+
+ Nextflow will determine the order of operations based on the chaining of inputs and outputs, so the order of the process definitions in the workflow script does not matter.
+ However, we do recommend you be kind to your collaborators and to your future self, and try to write them in a logical order for the sake of readability.
+
+### 1.2. Add a call to the new process in the workflow block
+
+Now we need to tell Nextflow to actually call the process that we just defined.
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="53"
+ // emit a greeting
+ sayHello(greeting_ch)
+}
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="53"
+ // emit a greeting
+ sayHello(greeting_ch)
+
+ // convert the greeting to uppercase
+ convertToUpper()
+}
+```
+
+This is not yet functional because we have not specified what should be input to the `convertToUpper()` process.
+
+### 1.3. Pass the output of the first process to the second process
+
+Now we need to make the output of the `sayHello()` process flow into the `convertToUpper()` process.
+
+Conveniently, Nextflow automatically packages the output of a process into a channel called `.out`.
+So the output of the `sayHello` process is a channel called `sayHello.out`, which we can plug straight into the call to `convertToUpper()`.
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="56"
+ // convert the greeting to uppercase
+ convertToUpper()
+}
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="56"
+ // convert the greeting to uppercase
+ convertToUpper(sayHello.out)
+}
+```
+
+For a simple case like this (one output to one input), that's all we need to do to connect two processes!
+
+### 1.4. Run the workflow again with `-resume`
+
+Let's run this using the `-resume` flag, since we've already run the first step of the workflow successfully.
+
+```bash
+nextflow run hello-workflow.nf -resume
+```
+
+You should see the following output:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-workflow.nf` [disturbed_darwin] DSL2 - revision: 4e252c048f
+
+executor > local (3)
+[79/33b2f0] sayHello (2) | 3 of 3, cached: 3 ✔
+[b3/d52708] convertToUpper (3) | 3 of 3 ✔
+```
+
+There is now an extra line in the console output (line 7), which corresponds to the new process we just added.
+
+Let's have a look inside the work directory of one of the calls to the second process.
+
+```console title="Directory contents"
+work/b3/d52708edba8b864024589285cb3445/
+├── Bonjour-output.txt -> /workspace/gitpod/hello-nextflow/work/79/33b2f0af8438486258d200045bd9e8/Bonjour-output.txt
+└── UPPER-Bonjour-output.txt
+```
+
+We find two output files: the output of the first process AND the output of the second.
+
+The output of the first process is in there because Nextflow staged it there in order to have everything needed for execution within the same subdirectory.
+However, it is actually a symbolic link pointing to the the original file in the subdirectory of the first process call.
+By default, when running on a single machine as we're doing here, Nextflow uses symbolic links rather than copies to stage input and intermediate files.
+
+You'll also find the final outputs in the `results` directory since we used the `publishDir` directive in the second process too.
+
+```console title="Directory contents"
+results
+├── Bonjour-output.txt
+├── Hello-output.txt
+├── Holà-output.txt
+├── UPPER-Bonjour-output.txt
+├── UPPER-Hello-output.txt
+└── UPPER-Holà-output.txt
+```
+
+Think about how all we did was connect the output of `sayHello` to the input of `convertToUpper` and the two processes could be run in series.
+Nextflow did the hard work of handling individual input and output files and passing them between the two commands for us.
+
+This is one of the reasons Nextflow channels are so powerful: they take care of the busywork involved in connecting workflow steps together.
+
+### Takeaway
+
+You know how to add a second step that takes the output of the first step as input.
+
+### What's next?
+
+Learn how to collect outputs from batched process calls and feed them into a single process.
+
+---
+
+## 2. Add a third step to collect all the greetings
+
+When we use a process to apply a transformation to each of the elements in a channel, like we're doing here to the multiple greetings, we sometimes want to collect elements from the output channel of that process, and feed them into another process that performs some kind of analysis or summation.
+
+In the next step we're simply going to write all the elements of a channel to a single file, using the UNIX `cat` command.
+
+### 2.1. Define the collection command and test it in the terminal
+
+The collection step we want to add to our workflow will use the `cat` command to concatenate multiple uppercased greetings into a single file.
+
+Let's run the command by itself in the terminal to verify that it works as expected, just like we've done previously.
+
+Run the following in your terminal:
+
+```bash
+echo 'Hello' | tr '[a-z]' '[A-Z]' > UPPER-Hello-output.txt
+echo 'Bonjour' | tr '[a-z]' '[A-Z]' > UPPER-Bonjour-output.txt
+echo 'Holà' | tr '[a-z]' '[A-Z]' > UPPER-Holà-output.txt
+cat UPPER-Hello-output.txt UPPER-Bonjour-output.txt UPPER-Holà-output.txt > COLLECTED-output.txt
+```
+
+The output is a text file called `COLLECTED-output.txt` that contains the uppercase versions of the original greetings.
+
+```console title="COLLECTED-output.txt"
+HELLO
+BONJOUR
+HOLà
+```
+
+That is the result we want to achieve with our workflow.
+
+### 2.1. Create a new process to do the collection step
+
+Let's create a new process and call it `collectGreetings()`.
+We can start writing it based on the previous one.
+
+#### 2.1.1. Write the 'obvious' parts of the process
+
+Add the following process definition to the workflow script:
+
+```groovy title="hello-workflow.nf" linenums="41"
+/*
+ * Collect uppercase greetings into a single output file
+ */
+process collectGreetings {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ ???
+
+ output:
+ path "COLLECTED-output.txt"
+
+ script:
+ """
+ ??? > 'COLLECTED-output.txt'
+ """
+}
+```
+
+This is what we can write with confidence based on what you've learned so far.
+But this is not functional!
+It leaves out the input definition(s) and the first half of the script command because we need to figure out how to write that.
+
+### 2.1.2. Define inputs to `collectGreetings()`
+
+We need to collect the greetings from all the calls to the `convertToUpper()` process.
+What do we know we can get from the previous step in the workflow?
+
+The channel output by `convertToUpper()` will contain the paths to the individual files containing the uppercased greetings.
+That amounts to one input slot; let's call it `input_files` for simplicity.
+
+In the process block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="48"
+ input:
+ ???
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="48"
+ input:
+ path input_files
+```
+
+Notice we use the `path` prefix even though we expect this to contain multiple files.
+Nextflow doesn't mind, so it doesn't matter.
+
+#### 2.1.3. Compose the concatenation command
+
+This is where things could get a little tricky, because we need to be able to handle an arbitrary number of input files.
+Specifically, we can't write the command up front, so we need to tell Nextflow how to compose it at runtime based on what inputs flow into the process.
+
+In other words, if we have an input channel containing the item `[file1.txt, file2.txt, file3.txt]`, we need Nextflow to turn that into `cat file1.txt file2.txt file3.txt`.
+
+Fortunately, Nextflow is quite happy to do that for us if we simply write `cat ${input_files}` in the script command.
+
+In the process block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="54"
+ script:
+ """
+ ??? > 'COLLECTED-output.txt'
+ """
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="54"
+ script:
+ """
+ cat ${input_files} > 'COLLECTED-output.txt'
+ """
+```
+
+In theory this should handle any arbitrary number of input files.
+
+!!! tip
+
+ Some command-line tools require providing an argument (like `-input`) for each input file.
+ In that case, we would have to do a little bit of extra work to compose the command.
+ You can see an example of this in the 'Nextflow for Genomics' training course.
+
+
+
+### 2.2. Add the collection step to the workflow
+
+Now we should just need to call the collection process on the output of the uppercasing step.
+
+#### 2.2.1. Connect the process calls
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="75"
+ // convert the greeting to uppercase
+ convertToUpper(sayHello.out)
+}
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="75"
+ // convert the greeting to uppercase
+ convertToUpper(sayHello.out)
+
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out)
+}
+```
+
+This connects the output of `convertToUpper()` to the input of `collectGreetings()`.
+
+#### 2.2.2. Run the workflow with `-resume`
+
+Let's try it.
+
+```bash
+nextflow run hello-workflow.nf -resume
+```
+
+It runs successfully, including the third step:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-workflow.nf` [mad_gilbert] DSL2 - revision: 6acfd5e28d
+
+executor > local (3)
+[79/33b2f0] sayHello (2) | 3 of 3, cached: 3 ✔
+[99/79394f] convertToUpper (3) | 3 of 3, cached: 3 ✔
+[47/50fe4a] collectGreetings (1) | 3 of 3 ✔
+```
+
+However, look at the number of calls for `collectGreetings()` on line 8.
+We were only expecting one, but there are three.
+
+And have a look at the contents of the final output file too:
+
+```console title="COLLECTED-output.txt"
+Holà
+```
+
+Oh no. The collection step was run individually on each greeting, which is NOT what we wanted.
+
+We need to do something to tell Nextflow explicitly that we want that third step to run on all the items in the channel output by `convertToUpper()`.
+
+### 2.3. Use an operator to collect the greetings into a single input
+
+Yes, once again the answer to our problem is an operator.
+
+Specifically, we are going to use the aptly-named [`collect()`](https://www.nextflow.io/docs/latest/reference/operator.html#collect) operator.
+
+#### 2.3.1. Add the `collect()` operator
+
+This time it's going to look a bit different because we're not adding the operator in the context of a channel factory, but to an output channel.
+
+We take the `convertToUpper.out` and append the `collect()` operator, which gives us `convertToUpper.out.collect()`.
+We can plug that directly into the `collectGreetings()` process call.
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="78"
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out)
+}
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="78"
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out.collect())
+}
+```
+
+#### 2.3.2. Add some `view()` statements
+
+Let's also include a couple of `view()` statements to visualize the before and after states of the channel contents.
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="78"
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out.collect())
+}
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="78"
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out.collect())
+
+ // optional view statements
+ convertToUpper.out.view { "Before collect: $it" }
+ convertToUpper.out.collect().view { "After collect: $it" }
+}
+```
+
+The `view()` statements can go anywhere you want; we put them after the call for readability.
+
+#### 2.3.3. Run the workflow again with `-resume`
+
+Let's try it:
+
+```bash
+nextflow run hello-workflow.nf -resume
+```
+
+It runs successfully, although the log output may look a little messier than this (we cleaned it up for readability).
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-workflow.nf` [soggy_franklin] DSL2 - revision: bc8e1b2726
+
+[d6/cdf466] sayHello (1) | 3 of 3, cached: 3 ✔
+[99/79394f] convertToUpper (2) | 3 of 3, cached: 3 ✔
+[1e/83586c] collectGreetings | 1 of 1 ✔
+Before collect: /workspace/gitpod/hello-nextflow/work/b3/d52708edba8b864024589285cb3445/UPPER-Bonjour-output.txt
+Before collect: /workspace/gitpod/hello-nextflow/work/99/79394f549e3040dfc2440f69ede1fc/UPPER-Hello-output.txt
+Before collect: /workspace/gitpod/hello-nextflow/work/aa/56bfe7cf00239dc5badc1d04b60ac4/UPPER-Holà-output.txt
+After collect: [/workspace/gitpod/hello-nextflow/work/b3/d52708edba8b864024589285cb3445/UPPER-Bonjour-output.txt, /workspace/gitpod/hello-nextflow/work/99/79394f549e3040dfc2440f69ede1fc/UPPER-Hello-output.txt, /workspace/gitpod/hello-nextflow/work/aa/56bfe7cf00239dc5badc1d04b60ac4/UPPER-Holà-output.txt]
+```
+
+This time the third step was only called once!
+
+Looking at the output of the `view()` statements, we see the following:
+
+- Three `Before collect:` statements, one for each greeting: at that point the file paths are individual items in the channel.
+- A single `After collect:` statement: the three file paths are now packaged into a single item.
+
+Have a look at the contents of the final output file too:
+
+```console title="COLLECTED-output.txt"
+BONJOUR
+HELLO
+HOLà
+```
+
+This time we have all three greetings in the final output file. Success!
+
+!!! note
+
+ If you run this several times without `-resume`, you will see that the order of the greetings changes from one run to the next.
+ This shows you that the order in which items flow through the pipeline is not guaranteed to be consistent.
+
+### Takeaway
+
+You know how to collect outputs from a batch of process calls and feed them into a joint analysis or summation step.
+
+### What's next?
+
+Learn how to pass more than one input to a process.
+
+---
+
+## 3. Pass more than one input to a process in order to name the final output file uniquely
+
+We want to be able to name the final output file something specific in order to process subsequent batches of greetings without overwriting the final results.
+
+To that end, we're going to make the following refinements to the workflow:
+
+- Modify the collector process to accept a user-defined name for the output file
+- Add a command-line parameter to the workflow and pass it to the collector process
+
+### 3.1. Modify the collector process to accept a user-defined name for the output file
+
+We're going to need to declare the additional input and integrate it into the output file name.
+
+#### 3.1.1. Declare the additional input in the process definition
+
+Good news: we can declare as many input variables as we want.
+Let's call this one `batch_name`.
+
+In the process block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="48"
+ input:
+ path input_files
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="48"
+ input:
+ path input_files
+ val batch_name
+```
+
+You can set up your processes to expect as many inputs as you want.
+Later on, you will learn how to manage required vs. optional inputs.
+
+#### 3.1.2. Use the `batch_name` variable in the output file name
+
+In the process block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="52"
+ output:
+ path "COLLECTED-output.txt"
+
+ script:
+ """
+ cat ${input_files} > 'COLLECTED-output.txt'
+ """
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="52"
+ output:
+ path "COLLECTED-${batch_name}-output.txt"
+
+ script:
+ """
+ cat ${input_files} > 'COLLECTED-${batch_name}-output.txt'
+ """
+```
+
+This sets up the process to use the `batch_name` value to generate a specific filename for the final output of the workflow.
+
+### 3.2. Add a `batch` command-line parameter
+
+Now we need a way to supply the value for `batch_name` and feed it to the process call.
+
+#### 3.2.1. Use `params` to set up the parameter
+
+You already know how to use the `params` system to declare CLI parameters.
+Let's use that to declare a `batch` parameter (with a default value because we are lazy).
+
+In the pipeline parameters section, make the following code changes:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="61"
+/*
+ * Pipeline parameters
+ */
+params.greeting = 'greetings.csv'
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="61"
+/*
+ * Pipeline parameters
+ */
+params.greeting = 'greetings.csv'
+params.batch = 'test-batch'
+```
+
+Remember you can override that default value by specifying a value with `--batch` on the command line.
+
+#### 3.2.2. Pass the `batch` parameter to the process
+
+To provide the value of the parameter to the process, we need to add it in the process call.
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="80"
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out.collect())
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="80"
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out.collect(), params.batch)
+```
+
+!!! warning
+
+ You MUST provide the inputs to a process in the EXACT SAME ORDER as they are listed in the input definition block of the process.
+
+### 3.3. Run the workflow
+
+Let's try running this with a batch name on the command line.
+
+```bash
+nextflow run hello-workflow.nf -resume --batch trio
+```
+
+It runs successfully:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-workflow.nf` [confident_rutherford] DSL2 - revision: bc58af409c
+
+executor > local (1)
+[79/33b2f0] sayHello (2) | 3 of 3, cached: 3 ✔
+[99/79394f] convertToUpper (2) | 3 of 3, cached: 3 ✔
+[b5/f19efe] collectGreetings | 1 of 1 ✔
+```
+
+And produces the desired output:
+
+```console title="bash"
+cat results/COLLECTED-trio-output.txt
+```
+
+```console title="Output"
+HELLO
+BONJOUR
+HOLà
+```
+
+Now, subsequent runs on other batches of inputs won't clobber previous results (as long as we specify the parameter appropriately).
+
+### Takeaway
+
+You know how to pass more than one input to a process.
+
+### What's next?
+
+Learn how to emit multiple outputs and handle them conveniently.
+
+---
+
+## 4. Add an output to the collector step
+
+When a process produces only one output, it's easy to access it (in the workflow block) using the `.out` syntax.
+When there are two or more outputs, the default way to select a specific output is to use the corresponding (zero-based) index; for example, you would use `.out[0]` to get the first output.
+This is not terribly convenient; it's too easy to grab the wrong index.
+
+Let's have a look at how we can select and use a specific output of a process when there are more than one.
+
+For demonstration purposes, let's say we want to count and report the number of greetings that are being collected for a given batch of inputs.
+
+To that end, we're going to make the following refinements to the workflow:
+
+- Modify the process to count and output the number of greetings
+- Once the process has run, select the count and report it using `view` (in the workflow block)
+
+### 4.1. Modify the process to count and output the number of greetings
+
+This will require two key changes to the process definition: we need a way to count the greetings, then we need to add that count to the `output` block of the process.
+
+#### 4.1.1. Count the number of greetings collected
+
+Conveniently, Nextflow lets us add arbitrary code in the `script:` block of the process definition, which comes in really handy for doing things like this.
+
+That means we can use the built-in `size()` function to get the number of files in the `input_files` array.
+
+In the process block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="55"
+ script:
+ """
+ cat ${input_files} > 'COLLECTED-${batch_id}-output.txt'
+ """
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="55"
+ script:
+ count_greetings = input_files.size()
+ """
+ cat ${input_files} > 'COLLECTED-${batch_id}-output.txt'
+ """
+```
+
+The `count_greetings` variable will be computed at runtime.
+
+### 4.1.2. Emit the count as a named output
+
+In principle all we need to do is to add the `count_greetings` variable to the `output:` block.
+
+However, while we're at it, we're also going to add some `emit:` tags to our output declarations. These will enable us to select the outputs by name instead of having to use positional indices.
+
+In the process block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="52"
+ output:
+ path "COLLECTED-${batch_id}-output.txt"
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="52"
+ output:
+ path "COLLECTED-${batch_id}-output.txt" , emit: outfile
+ val count_greetings , emit: count
+```
+
+The `emit:` tags are optional, and we could have added a tag to only one of the outputs.
+But as the saying goes, why not both?
+
+### 4.2. Report the output at the end of the workflow
+
+Now that we have two outputs coming out of the `collectGreetings` process, the `collectGreetings.out` output channel contains two 'tracks':
+
+- `collectGreetings.out.outfile` contains the final output file
+- `collectGreetings.out.count` contains the count of greetings
+
+We could send either or both of these to another process for further work. However, in the interest of wrapping this up, we're just going to use `view()` to demonstrate that we can access and report the count of greetings.
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-workflow.nf" linenums="82"
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out.collect(), params.batch)
+```
+
+_After:_
+
+```groovy title="hello-workflow.nf" linenums="82"
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out.collect(), params.batch)
+
+ // emit a message about the size of the batch
+ collectGreetings.out.count.view { "There were $it greetings in this batch" }
+```
+
+Here we are using `$it` in the same way we did earlier, as an implicit variable to access the contents of the channel.
+
+!!! note
+
+ There are a few other ways we could achieve a similar result, including some more elegant ones like the `count()` operator, but this allows us to show how to handle multiple outputs, which is what we care about.
+
+### 4.3. Run the workflow
+
+Let's try running this with the current batch of greetings.
+
+```bash
+nextflow run hello-workflow.nf -resume --batch trio
+```
+
+This runs successfully:
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-workflow.nf` [evil_sinoussi] DSL2 - revision: eeca64cdb1
+
+[d6/cdf466] sayHello (1) | 3 of 3, cached: 3 ✔
+[99/79394f] convertToUpper (2) | 3 of 3, cached: 3 ✔
+[9e/1dfda7] collectGreetings | 1 of 1, cached: 1 ✔
+There were 3 greetings in this batch
+```
+
+The last line (line 8) shows that we correctly retrieved the count of greetings processed.
+Feel free to add more greetings to the CSV and see what happens.
+
+### Takeaway
+
+You know how to make a process emit a named output and how to access it from the workflow block.
+
+More generally, you understand the key principles involved in connecting processes together in common ways.
+
+### What's next?
+
+Take an extra long break, you've earned it.
+When you're ready, move on to Part 4 to learn how to modularize your code for better maintainability and code efficiency.
diff --git a/docs/hello_nextflow/04_hello_modules.md b/docs/hello_nextflow/04_hello_modules.md
new file mode 100644
index 000000000..d156239b5
--- /dev/null
+++ b/docs/hello_nextflow/04_hello_modules.md
@@ -0,0 +1,374 @@
+# Part 4: Hello Modules
+
+This section covers how to organize your workflow code to make development and maintenance of your pipeline more efficient and sustainable.
+Specifically, we are going to demonstrate how to use **modules**.
+
+In Nextflow, a **module** is a single process definition that is encapsulated by itself in a standalone code file.
+To use a module in a workflow, you just add a single-line import statement to your workflow code file; then you can integrate the process into the workflow the same way you normally would.
+
+When we started developing our workflow, we put everything in one single code file.
+
+Putting processes into individual modules makes it possible to reuse process definitions in multiple workflows without producing multiple copies of the code.
+This makes the code more shareable, flexible and maintainable.
+
+!!!note
+
+ It is also possible to encapsulate a section of a workflow as a 'subworkflow' that can be imported into a larger pipeline, but that is outside the scope of this course.
+
+---
+
+## 0. Warmup: Run `hello-modules.nf`
+
+We're going to use the workflow script `hello-modules.nf` as a starting point.
+It is equivalent to the script produced by working through Part 3 of this training course.
+
+Just to make sure everything is working, run the script once before making any changes:
+
+```bash
+nextflow run hello-modules.nf
+```
+
+```console title="Output"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-modules.nf` [festering_nobel] DSL2 - revision: eeca64cdb1
+
+executor > local (7)
+[25/648bdd] sayHello (2) | 3 of 3 ✔
+[60/bc6831] convertToUpper (1) | 3 of 3 ✔
+[1a/bc5901] collectGreetings | 1 of 1 ✔
+There were 3 greetings in this batch
+```
+
+As previously, you will find the output files in the `results` directory (specified by the `publishDir` directive).
+
+```console title="Directory contents"
+results
+├── Bonjour-output.txt
+├── COLLECTED-output.txt
+├── COLLECTED-test-batch-output.txt
+├── COLLECTED-trio-output.txt
+├── Hello-output.txt
+├── Holà-output.txt
+├── UPPER-Bonjour-output.txt
+├── UPPER-Hello-output.txt
+└── UPPER-Holà-output.txt
+```
+
+!!! note
+
+ There may also be a file named `output.txt` left over if you worked through Part 2 in the same environment.
+
+If that worked for you, you're ready to learn how to modularize your workflow code.
+
+---
+
+## 1. Create a directory to store modules
+
+It is best practice to store your modules in a specific directory.
+You can call that directory anything you want, but the convention is to call it `modules/`.
+
+```bash
+mkdir modules
+```
+
+!!! note
+
+ Here we are showing how to use local modules, meaning modules stored locally in the same repository as the rest of the workflow code, in contrast to remote modules, which are stored in other (remote) repositories. For more information about remote modules, see the [documentation](https://www.nextflow.io/docs/latest/module.html).
+
+---
+
+## 2. Create a module for `sayHello()`
+
+In its simplest form, turning an existing process into a module is little more than a copy-paste operation.
+We're going to create a file stub for the module, copy the relevant code over then delete it from the main workflow file.
+
+Then all we'll need to do is add an import statement so that Nextflow will know to pull in the relevant code at runtime.
+
+### 2.1.1. Create a file stub for the new module
+
+Let's create an empty file for the module called `sayHello.nf`.
+
+```bash
+touch modules/sayHello.nf
+```
+
+This gives us a place to put the process code.
+
+### 2.2. Move the `sayHello` process code to the module file
+
+Copy the whole process definition over from the workflow file to the module file, making sure to copy over the `#!/usr/bin/env nextflow` shebang too.
+
+```groovy title="modules/sayHello.nf" linenums="1"
+#!/usr/bin/env nextflow
+
+/*
+ * Use echo to print 'Hello World!' to a file
+ */
+process sayHello {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ val greeting
+
+ output:
+ path "${greeting}-output.txt"
+
+ script:
+ """
+ echo '$greeting' > '$greeting-output.txt'
+ """
+}
+```
+
+Once that is done, delete the process definition from the workflow file, but make sure to leave the shebang in place.
+
+### 2.3. Add an import declaration before the workflow block
+
+The syntax for importing a local module is fairly straightforward:
+
+```groovy title="Syntax: Import declaration"
+include { } from ''
+```
+
+Let's insert that above the workflow block and fill it out appropriately.
+
+_Before:_
+
+```groovy title="hello-modules.nf" linenums="50"
+workflow {
+```
+
+_After:_
+
+```groovy title="hello-modules.nf" linenums="50"
+// Include modules
+include { sayHello } from './modules/sayHello.nf'
+
+workflow {
+```
+
+### 2.4. Run the workflow to verify that it does the same thing as before
+
+We're running the workflow with essentially the same code and inputs as before, so let's run with the `-resume` flag and see what happens.
+
+```bash
+nextflow run hello-modules.nf -resume
+```
+
+This runs quickly very quickly because everything is cached.
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-modules.nf` [romantic_poisson] DSL2 - revision: 96edfa9ad3
+
+[f6/cc0107] sayHello (1) | 3 of 3, cached: 3 ✔
+[3c/4058ba] convertToUpper (2) | 3 of 3, cached: 3 ✔
+[1a/bc5901] collectGreetings | 1 of 1, cached: 1 ✔
+There were 3 greetings in this batch
+```
+
+Nextflow recognized that it's still all the same work to be done, even if the code is split up into multiple files.
+
+### Takeaway
+
+You know how to extract a process into a local module and you know doing this doesn't break the resumability of the workflow.
+
+### What's next?
+
+Practice making more modules.
+Once you've done one, you can do a million modules...
+But let's just do two more for now.
+
+---
+
+## 3. Modularize the `convertToUpper()` process
+
+### 3.1. Create a file stub for the new module
+
+Create an empty file for the module called `convertToUpper.nf`.
+
+```bash
+touch modules/convertToUpper.nf
+```
+
+### 3.2. Move the `convertToUpper` process code to the module file
+
+Copy the whole process definition over from the workflow file to the module file, making sure to copy over the `#!/usr/bin/env nextflow` shebang too.
+
+```groovy title="modules/convertToUpper.nf" linenums="1"
+#!/usr/bin/env nextflow
+
+/*
+ * Use a text replacement tool to convert the greeting to uppercase
+ */
+process convertToUpper {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ path input_file
+
+ output:
+ path "UPPER-${input_file}"
+
+ script:
+ """
+ cat '$input_file' | tr '[a-z]' '[A-Z]' > 'UPPER-${input_file}'
+ """
+}
+```
+
+Once that is done, delete the process definition from the workflow file, but make sure to leave the shebang in place.
+
+### 3.3. Add an import declaration before the workflow block
+
+Insert the import declaration above the workflow block and fill it out appropriately.
+
+_Before:_
+
+```groovy title="hello-modules.nf" linenums="31"
+// Include modules
+include { sayHello } from './modules/sayHello.nf'
+
+workflow {
+```
+
+_After:_
+
+```groovy title="hello-modules.nf" linenums="31"
+// Include modules
+include { sayHello } from './modules/sayHello.nf'
+include { convertToUpper } from './modules/convertToUpper.nf'
+
+workflow {
+```
+
+### 3.4. Run the workflow to verify that it does the same thing as before
+
+Run this with the `-resume` flag.
+
+```bash
+nextflow run hello-modules.nf -resume
+```
+
+This should still produce the same output as previously.
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-modules.nf` [nauseous_heisenberg] DSL2 - revision: a04a9f2da0
+
+[c9/763d42] sayHello (3) | 3 of 3, cached: 3 ✔
+[60/bc6831] convertToUpper (3) | 3 of 3, cached: 3 ✔
+[1a/bc5901] collectGreetings | 1 of 1, cached: 1 ✔
+There were 3 greetings in this batch
+```
+
+Two done, one more to go!
+
+---
+
+## 4. Modularize the `collectGreetings()` process
+
+### 4.1. Create a file stub for the new module
+
+Create an empty file for the module called `collectGreetings.nf`.
+
+```bash
+touch modules/collectGreetings.nf
+```
+
+### 4.2. Move the `collectGreetings` process code to the module file
+
+Copy the whole process definition over from the workflow file to the module file, making sure to copy over the `#!/usr/bin/env nextflow` shebang too.
+
+```groovy title="modules/collectGreetings.nf" linenums="1"
+#!/usr/bin/env nextflow
+
+/*
+ * Collect uppercase greetings into a single output file
+ */
+process collectGreetings {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ path input_files
+ val batch_name
+
+ output:
+ path "COLLECTED-${batch_name}-output.txt" , emit: outfile
+ val count_greetings , emit: count
+
+ script:
+ count_greetings = input_files.size()
+ """
+ cat ${input_files} > 'COLLECTED-${batch_name}-output.txt'
+ """
+}
+```
+
+Once that is done, delete the process definition from the workflow file, but make sure to leave the shebang in place.
+
+### 4.3. Add an import declaration before the workflow block
+
+Insert the import declaration above the workflow block and fill it out appropriately.
+
+_Before:_
+
+```groovy title="hello-modules.nf" linenums="9"
+// Include modules
+include { sayHello } from './modules/sayHello.nf'
+include { convertToUpper } from './modules/convertToUpper.nf'
+
+workflow {
+```
+
+_After:_
+
+```groovy title="hello-modules.nf" linenums="9"
+// Include modules
+include { sayHello } from './modules/sayHello.nf'
+include { convertToUpper } from './modules/convertToUpper.nf'
+include { collectGreetings } from './modules/collectGreetings.nf'
+
+workflow {
+```
+
+### 4.4. Run the workflow to verify that it does the same thing as before
+
+Run this with the `-resume` flag.
+
+```bash
+nextflow run hello-modules.nf -resume
+```
+
+This should still produce the same output as previously.
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-modules.nf` [friendly_coulomb] DSL2 - revision: 7aa2b9bc0f
+
+[f6/cc0107] sayHello (1) | 3 of 3, cached: 3 ✔
+[3c/4058ba] convertToUpper (2) | 3 of 3, cached: 3 ✔
+[1a/bc5901] collectGreetings | 1 of 1, cached: 1 ✔
+There were 3 greetings in this batch
+```
+
+### Takeaway
+
+You know how to modularize multiple processes in a workflow.
+
+Congratulations, you've done all this work and absolutely nothing has changed to how the pipeline works!
+
+Jokes aside, now your code is more modular, and if you decide to write another pipeline that calls on one of those processes, you just need to type one short import statement to use the relevant module.
+This is better than just copy-pasting the code, because if later you decide to improve the module, all your pipelines will inherit the improvements.
+
+### What's next?
+
+Take a short break if you feel like it.
+When you're ready, move on to Part 5 to learn how to use containers to manage software dependencies more conveniently and reproducibly.
diff --git a/docs/hello_nextflow/05_hello_containers.md b/docs/hello_nextflow/05_hello_containers.md
new file mode 100644
index 000000000..414625050
--- /dev/null
+++ b/docs/hello_nextflow/05_hello_containers.md
@@ -0,0 +1,649 @@
+# Part 5: Hello Containers
+
+In Parts 1-4 of this training course, you learned how to use the basic building blocks of Nextflow to assemble a simple workflow capable of processing some text, parallelizing execution if there were multiple inputs, and collecting the results for further processing.
+
+However, you were limited to basic UNIX tools available in your environment.
+Real-world tasks often require various tools and packages not included by default.
+Typically, you'd need to install these tools, manage their dependencies, and resolve any conflicts.
+
+That is all very tedious and annoying, so we're going to show you how to use **containers** to solve this problem much more conveniently.
+
+A **container** is a lightweight, standalone, executable unit of software created from a container **image** that includes everything needed to run an application including code, system libraries and settings.
+
+!!! note
+
+ We'll be teaching this using the technology [Docker](https://www.docker.com/get-started/), but Nextflow supports [several other container technologies](https://www.nextflow.io/docs/latest/container.html#) as well.
+
+---
+
+## 0. Warmup: Run `hello-containers.nf`
+
+We're going to use the workflow script `hello-containers.nf` as a starting point for the second section.
+It is equivalent to the script produced by working through Part 4 of this training course.
+
+Just to make sure everything is working, run the script once before making any changes:
+
+```bash
+nextflow run hello-containers.nf
+```
+
+This should produce the following output:
+
+```console title="Output"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-containers.nf` [tender_becquerel] DSL2 - revision: f7cat8e223
+
+executor > local (7)
+[bd/4bb541] sayHello (1) [100%] 3 of 3 ✔
+[85/b627e8] convertToUpper (3) [100%] 3 of 3 ✔
+[7d/f7961c] collectGreetings [100%] 1 of 1 ✔
+```
+
+As previously, you will find the output files in the `results` directory (specified by the `publishDir` directive).
+
+```console title="Directory contents"
+results
+├── Bonjour-output.txt
+├── COLLECTED-output.txt
+├── COLLECTED-test-batch-output.txt
+├── COLLECTED-trio-output.txt
+├── Hello-output.txt
+├── Holà-output.txt
+├── UPPER-Bonjour-output.txt
+├── UPPER-Hello-output.txt
+└── UPPER-Holà-output.txt
+```
+
+!!! note
+
+ There may also be a file named `output.txt` left over if you worked through Part 2 in the same environment.
+
+If that worked for you, you're ready to learn how to use containers.
+
+---
+
+## 1. Use a container 'manually'
+
+What we want to do is add a step to our workflow that will use a container for execution.
+
+However, we are first going to go over some basic concepts and operations to solidify your understanding of what containers are before we start using them in Nextflow.
+
+### 1.1. Pull the container image
+
+To use a container, you usually download or "pull" a container image from a container registry, and then run the container image to create a container instance.
+
+The general syntax is as follows:
+
+```bash title="Syntax"
+docker pull ''
+```
+
+The `docker pull` part is the instruction to the container system to pull a container image from a repository.
+
+The `''` part is the URI address of the container image.
+
+As an example, let's pull a container image that contains [cowpy](https://github.com/jeffbuttars/cowpy), a python implementation of a tool called `cowsay` that generates ASCII art to display arbitrary text inputs in a fun way.
+
+There are various repositories where you can find published containers.
+We used the [Seqera Containers](https://seqera.io/containers/) service to generate this Docker container from the `cowpy` Conda package: `'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273'`.
+
+Run the complete pull command:
+
+```bash
+docker pull 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273'
+```
+
+This gives you the following console output as the system downloads the image:
+
+```console title="Output"
+Unable to find image 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273' locally
+131d6a1b707a8e65: Pulling from library/cowpy
+dafa2b0c44d2: Pull complete
+dec6b097362e: Pull complete
+f88da01cff0b: Pull complete
+4f4fb700ef54: Pull complete
+92dc97a3ef36: Pull complete
+403f74b0f85e: Pull complete
+10b8c00c10a5: Pull complete
+17dc7ea432cc: Pull complete
+bb36d6c3110d: Pull complete
+0ea1a16bbe82: Pull complete
+030a47592a0a: Pull complete
+622dd7f15040: Pull complete
+895fb5d0f4df: Pull complete
+Digest: sha256:fa50498b32534d83e0a89bb21fec0c47cc03933ac95c6b6587df82aaa9d68db3
+Status: Downloaded newer image for community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273
+community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273
+```
+
+Once the download is complete, you have a local copy of the container image.
+
+### 1.2. Use the container to run `cowpy` as a one-off command
+
+One very common way that people use containers is to run them directly, _i.e._ non-interactively.
+This is great for running one-off commands.
+
+The general syntax is as follows:
+
+```bash title="Syntax"
+docker run --rm '' [tool command]
+```
+
+The `docker run --rm ''` part is the instruction to the container system to spin up a container instance from a container image and execute a command in it.
+The `--rm` flag tells the system to shut down the container instance after the command has completed.
+
+The `[tool command]` syntax depends on the tool you are using and how the container is set up.
+Let's just start with `cowpy`.
+
+Fully assembled, the container execution command looks like this:
+
+```bash
+docker run --rm 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273' cowpy
+```
+
+Run it to produce the following output:
+
+```console title="Output"
+ ______________________________________________________
+< Cowacter, eyes:default, tongue:False, thoughts:False >
+ ------------------------------------------------------
+ \ ^__^
+ \ (oo)\_______
+ (__)\ )\/\
+ ||----w |
+ || ||
+```
+
+The system spun up the container, ran the `cowpy` command with its parameters, sent the output to the console and finally, shut down the container instance.
+
+### 1.3. Use the container to run `cowpy` interactively
+
+You can also run a container interactively, which gives you a shell prompt inside the container and allows you to play with the command.
+
+#### 1.3.1. Spin up the container
+
+To run interactively, we just add `-it` to the `docker pull` command.
+Optionally, we can specify the shell we want to use inside the container by appending _e.g._ `/bin/bash` to the command.
+
+```bash
+docker run --rm -it 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273' /bin/bash
+```
+
+Notice that your prompt changes to something like `(base) root@b645838b3314:/tmp#`, which indicates that you are now inside the container.
+
+You can verify this by running `ls` to list directory contents:
+
+```bash
+ls /
+```
+
+```console title="Output"
+bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var
+```
+
+You can see that the filesystem inside the container is different from the filesystem on your host system.
+
+!!! note
+
+ When you run a container, it is isolated from the host system by default.
+ This means that the container can't access any files on the host system unless you explicitly allow it to do so.
+
+You will learn how to do that in a minute.
+
+#### 1.3.2. Run the desired tool command(s)
+
+Now that you are inside the container, you can run the `cowpy` command directly and give it some parameters.
+For example, the tool documentation says we can change the character ('cowacter') with `-c`.
+
+```bash
+cowpy "Hello Containers" -c tux
+```
+
+Now the output shows the Linux penguin, Tux, instead of the default cow, because we specified `-c tux` parameter.
+
+```console title="Output"
+ __________________
+< Hello Containers >
+ ------------------
+ \
+ \
+ .--.
+ |o_o |
+ |:_/ |
+ // \ \
+ (| | )
+ /'\_ _/`\
+ \___)=(___/
+```
+
+Because you're inside the container, you can run the cowpy command as many times as you like, varying the input parameters, without having to bother with Docker commands.
+
+!!! Tip
+
+ Use the '-c' flag to pick a different character, including:
+ `beavis`, `cheese`, `daemon`, `dragonandcow`, `ghostbusters`, `kitty`, `moose`, `milk`, `stegosaurus`, `turkey`, `turtle`, `tux`
+
+This is neat. What would be even neater is if we could feed our `greetings.csv` as input into this.
+But since we don't have access to the filesystem, we can't.
+
+Let's fix that.
+
+#### 1.3.3. Exit the container
+
+To exit the container, you can type `exit` at the prompt or use the ++ctrl+d++ keyboard shortcut.
+
+```bash
+exit
+```
+
+Your prompt should now be back to what it was before you started the container.
+
+#### 1.3.4. Mount data into the container
+
+When you run a container, it is isolated from the host system by default.
+This means that the container can't access any files on the host system unless you explicitly allow it to do so.
+
+One way to do this is to **mount** a **volume** from the host system into the container using the following syntax:
+
+```bash title="Syntax"
+-v :
+```
+
+In our case `` will be the current working directory, so we can just use a dot (`.`), and `` is just a name we make up; let's call it `/data`.
+
+To mount a volume, we replace the paths and add the volume mounting argument to the docker run command as follows:
+
+```bash
+docker run --rm -it -v .:/data 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273' /bin/bash
+```
+
+This mounts the current working directory as a volume that will be accessible under `/data` inside the container.
+
+You can check that it works by listing the contents of `/data`:
+
+```bash
+ls /data
+```
+
+```console title="Output"
+demo-params.json hello-channels.nf hello-workflow.nf modules results
+greetings.csv hello-modules.nf hello-world.nf nextflow.config work
+```
+
+
+
+You can now see the contents of the `data` directory from inside the container, including the `greetings.csv` file.
+
+This effectively established a tunnel through the container wall that you can use to access that part of your filesystem.
+
+#### 1.3.5. Use the mounted data
+
+Now that we have mounted the `data` directory into the container, we can use the `cowpy` command to display the contents of the `greetings.csv` file.
+
+To do this, we'll use `cat /data/greetings.csv | ` to pipe the contents of the CSV file into the `cowpy` command.
+
+```bash
+cat /data/greetings.csv | cowpy -c turkey
+```
+
+This produces the desired ASCII art of a turkey rattling off our example greetings:
+
+```console title="Output"
+ _________
+/ HOLà \
+| HELLO |
+\ BONJOUR /
+ ---------
+ \ ,+*^^*+___+++_
+ \ ,*^^^^ )
+ \ _+* ^**+_
+ \ +^ _ _++*+_+++_, )
+ _+^^*+_ ( ,+*^ ^ \+_ )
+ { ) ( ,( ,_+--+--, ^) ^\
+ { (\@) } f ,( ,+-^ __*_*_ ^^\_ ^\ )
+ {:;-/ (_+*-+^^^^^+*+*<_ _++_)_ ) ) /
+ ( / ( ( ,___ ^*+_+* ) < < \
+ U _/ ) *--< ) ^\-----++__) ) ) )
+ ( ) _(^)^^)) ) )\^^^^^))^*+/ / /
+ ( / (_))_^)) ) ) ))^^^^^))^^^)__/ +^^
+ ( ,/ (^))^)) ) ) ))^^^^^^^))^^) _)
+ *+__+* (_))^) ) ) ))^^^^^^))^^^^^)____*^
+ \ \_)^)_)) ))^^^^^^^^^^))^^^^)
+ (_ ^\__^^^^^^^^^^^^))^^^^^^^)
+ ^\___ ^\__^^^^^^))^^^^^^^^)\\
+ ^^^^^\uuu/^^\uuu/^^^^\^\^\^\^\^\^\^\
+ ___) >____) >___ ^\_\_\_\_\_\_\)
+ ^^^//\\_^^//\\_^ ^(\_\_\_\)
+ ^^^ ^^ ^^^ ^
+```
+
+Feel free to play around with this command.
+When you're done, exit the container as previously:
+
+```bash
+exit
+```
+
+You will find yourself back in your normal shell.
+
+### Takeaway
+
+You know how to pull a container and run it either as a one-off or interactively. You also know how to make your data accessible from within your container, which lets you try any tool you're interested in on real data without having to install any software on your system.
+
+### What's next?
+
+Learn how to use containers for the execution of Nextflow processes.
+
+---
+
+## 2. Use containers in Nextflow
+
+Nextflow has built-in support for running processes inside containers to let you run tools you don't have installed in your compute environment.
+This means that you can use any container image you like to run your processes, and Nextflow will take care of pulling the image, mounting the data, and running the process inside it.
+
+To demonstrate this, we are going to add a `cowpy` step to the pipeline we've been developing, after the `collectGreetings` step.
+
+### 2.1. Write a `cowpy` module
+
+#### 2.1.1. Create a file stub for the new module
+
+Create an empty file for the module called `cowpy.nf`.
+
+```bash
+touch modules/cowpy.nf
+```
+
+This gives us a place to put the process code.
+
+#### 2.1.2. Copy the `cowpy` process code in the module file
+
+We can model our `cowpy` process on the other processes we've written previously.
+
+```groovy title="modules/cowpy.nf" linenums="1"
+#!/usr/bin/env nextflow
+
+// Generate ASCII art with cowpy
+process cowpy {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ path input_file
+ val character
+
+ output:
+ path "cowpy-${input_file}"
+
+ script:
+ """
+ cat $input_file | cowpy -c "$character" > cowpy-${input_file}
+ """
+
+}
+```
+
+The output will be a new text file containing the ASCII art generated by the `cowpy` tool.
+
+### 2.2. Add cowpy to the workflow
+
+Now we need to import the module and call the process.
+
+#### 2.2.1. Import the `cowpy` process into `hello-containers.nf`
+
+Insert the import declaration above the workflow block and fill it out appropriately.
+
+_Before:_
+
+```groovy title="hello-containers.nf" linenums="9"
+// Include modules
+include { sayHello } from './modules/sayHello.nf'
+include { convertToUpper } from './modules/convertToUpper.nf'
+include { collectGreetings } from './modules/collectGreetings.nf'
+
+workflow {
+```
+
+_After:_
+
+```groovy title="hello-containers.nf" linenums="9"
+// Include modules
+include { sayHello } from './modules/sayHello.nf'
+include { convertToUpper } from './modules/convertToUpper.nf'
+include { collectGreetings } from './modules/collectGreetings.nf'
+include { cowpy } from './modules/cowpy.nf'
+
+workflow {
+```
+
+#### 2.2.2. Add a call to the `cowpy` process in the workflow
+
+Let's connect the `cowpy()` process to the output of the `collectGreetings()` process, which as you may recall produces two outputs:
+
+- `collectGreetings.out.outfile` contains the output file
+- `collectGreetings.out.count` contains the count of greetings per batch
+
+In the workflow block, make the following code change:
+
+_Before:_
+
+```groovy title="hello-containers.nf" linenums="28"
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out.collect(), params.batch)
+
+ // emit a message about the size of the batch
+ collectGreetings.out.count.view{ "There were $it greetings in this batch" }
+```
+
+_After:_
+
+```groovy title="hello-containers.nf" linenums="28"
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out.collect(), params.batch)
+
+ // emit a message about the size of the batch
+ collectGreetings.out.count.view{ "There were $it greetings in this batch" }
+
+ // generate ASCII art of the greetings with cowpy
+ cowpy(collectGreetings.out.outfile, params.character)
+```
+
+Notice that we include a new CLI parameter, `params.character`, in order to specify which character we want to have say the greetings.
+
+#### 2.2.3. Set a default value for `params.character`
+
+We like to be lazy and skip typing parameters in our command lines.
+
+_Before:_
+
+```groovy title="hello-containers.nf" linenums="3"
+/*
+ * Pipeline parameters
+ */
+params.greeting = 'greetings.csv'
+params.batch = 'test-batch'
+```
+
+_After:_
+
+```groovy title="hello-containers.nf" linenums="3"
+/*
+ * Pipeline parameters
+ */
+params.greeting = 'greetings.csv'
+params.batch = 'test-batch'
+params.character = 'turkey'
+```
+
+That should be all we need to make this work.
+
+#### 2.2.4. Run the workflow to verify that it works
+
+Run this with the `-resume` flag.
+
+```bash
+nextflow run hello-containers.nf -resume
+```
+
+Oh no, there's an error!
+
+```console title="Output"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-containers.nf` [special_lovelace] DSL2 - revision: 028a841db1
+
+executor > local (1)
+[f6/cc0107] sayHello (1) | 3 of 3, cached: 3 ✔
+[2c/67a06b] convertToUpper (3) | 3 of 3, cached: 3 ✔
+[1a/bc5901] collectGreetings | 1 of 1, cached: 1 ✔
+[b2/488871] cowpy | 0 of 1
+There were 3 greetings in this batch
+ERROR ~ Error executing process > 'cowpy'
+
+Caused by:
+ Process `cowpy` terminated with an error exit status (127)
+```
+
+This error code, `error exit status (127)` means the executable we asked for was not found.
+
+Of course, since we're calling the `cowpy` tool but we haven't actually specified a container yet.
+
+### 2.3. Use a container to run it
+
+We need to specify a container and tell Nextflow to use it for the `cowpy()` process.
+
+#### 2.3.1. Specify a container for the `cowpy` process to use
+
+Edit the `cowpy.nf` module to add the `container` directive to the process definition as follows:
+
+_Before:_
+
+```groovy title="modules/cowpy.nf" linenums="4"
+process cowpy {
+
+ publishDir 'containers/results', mode: 'copy'
+```
+
+_After:_
+
+```groovy title="modules/cowpy.nf" linenums="4"
+process cowpy {
+
+ publishDir 'containers/results', mode: 'copy'
+
+ container 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273'
+```
+
+This tells Nextflow that if the use of Docker is enabled, it should use the container image specified here to execute the process.
+
+#### 2.3.2. Enable use of Docker via the `nextflow.config` file
+
+Here we are going to slightly anticipate the topic of the next and last part of this course (Part 6), which covers configuration.
+
+One of the main ways Nextflow offers for configuring workflow execution is to use a `nextflow.config` file. When such a file is present in the current directory, Nextflow will automatically load it in and apply any configuration it contains.
+
+We provided a `nextflow.config` file with a single line of code that disables Docker: `docker.enabled = false`.
+
+Now, let's switch that to `true` to enable Docker:
+
+_Before:_
+
+```console title="nextflow.config" linenums="1"
+docker.enabled = false
+```
+
+_After:_
+
+```console title="nextflow.config" linenums="1"
+docker.enabled = true
+```
+
+!!! note
+
+ It is possible to enable Docker execution from the command-line, on a per-run basis, using the `-with-docker ` parameter.
+ However, that only allows us to specify one container for the entire workflow, whereas the approach we just showed you allows us to specify a different container per process.
+ This is better for modularity, code maintenance and reproducibility.
+
+#### 2.3.3. Run the workflow with Docker enabled
+
+Run the workflow with the `-resume` flag:
+
+```bash
+nextflow run hello-containers.nf -resume
+```
+
+This time it does indeed work.
+
+```console title="Output" linenums="1"
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-containers.nf` [elegant_brattain] DSL2 - revision: 028a841db1
+
+executor > local (1)
+[95/fa0bac] sayHello (3) | 3 of 3, cached: 3 ✔
+[92/32533f] convertToUpper (3) | 3 of 3, cached: 3 ✔
+[aa/e697a2] collectGreetings | 1 of 1, cached: 1 ✔
+[7f/caf718] cowpy | 1 of 1 ✔
+There were 3 greetings in this batch
+```
+
+You can find the cowpy'ed output in the `results` directory.
+
+```console title="results/cowpy-COLLECTED-test-batch-output.txt"
+ _______
+ / \
+| HELLO |
+| HOLà |
+| BONJOUR |
+ \ /
+ =======
+ \
+ \
+ \
+ \
+ ,.
+ (_|,.
+ ,' /, )_______ _
+ __j o``-' `.'-)'
+ (") \'
+ `-j |
+ `-._( /
+ |_\ |--^. /
+ /_]'|_| /_)_/
+ /_]' /_]'
+```
+
+You see that the character is saying all the greetings, just as it did when we ran the `cowpy` command on the `greetings.csv` file from inside the container.
+
+
+
+#### 2.3.4. Inspect how Nextflow launched the containerized task
+
+Let's take a look at the work subdirectory for one of the `cowpy` process calls to get a bit more insight on how Nextflow works with containers under the hood.
+
+Check the output from your `nextflow run` command to find the call ID for the `cowpy` process.
+Then navigate to the work subdirectory.
+In it, you will find the `.command.run` file that contains all the commands Nextflow ran on your behalf in the course of executing the pipeline.
+
+Open the `.command.run` file and search for `nxf_launch`; you should see something like this:
+
+```bash
+nxf_launch() {
+ docker run -i --cpu-shares 1024 -e "NXF_TASK_WORKDIR" -v /workspace/gitpod/hello-nextflow/work:/workspace/gitpod/hello-nextflow/work -w "$NXF_TASK_WORKDIR" --name $NXF_BOXID community.wave.seqera.io/library/pip_cowpy:131d6a1b707a8e65 /bin/bash -ue /workspace/gitpod/hello-nextflow/work/7f/caf7189fca6c56ba627b75749edcb3/.command.sh
+}
+```
+
+As you can see, Nextflow is using the `docker run` command to launch the process call.
+It also mounts the corresponding work subdirectory into the container, sets the working directory inside the container accordingly, and runs our templated bash script in the `.command.sh` file.
+
+All the hard work we had to do manually in the previous section is done for us by Nextflow!
+
+### Takeaway
+
+You know how to use containers in Nextflow to run processes.
+
+### What's next?
+
+Take a break!
+When you're ready, move on to Part 6 to learn how to configure the execution of your pipeline to fit your infrastructure as well as manage configuration of inputs and parameters.
+It's the very last part and then you're done!
diff --git a/docs/hello_nextflow/06_hello_config.md b/docs/hello_nextflow/06_hello_config.md
index a780aaef4..24791db1a 100644
--- a/docs/hello_nextflow/06_hello_config.md
+++ b/docs/hello_nextflow/06_hello_config.md
@@ -1,341 +1,161 @@
-# Part 5: Hello Config
+# Part 6: Hello Config
This section will explore how to set up and manage the configuration of your Nextflow pipeline so that you'll be able to customize its behavior, adapt it to different environments, and optimize resource usage _without altering a single line of the workflow code itself_.
-We're going to cover essential components of Nextflow configuration such as config files, profiles, process directives, executors, and parameter files.
-By learning to utilize these configuration options effectively, you can enhance the flexibility, scalability, and performance of your pipelines.
-
----
-
-## 0. Warmup: Moving to a formal project structure
-
-So far we've been working with a very loose structure, with just one workflow code file and a tiny configuration file that we've mostly ignored, because we were very focused on learning how to implement the workflow itself.
-However, we're now moving into the phase of this training series that is more focused on code development and maintenance practices.
-
-As part of that, we're going to adopt a formal project structure.
-We're going to work inside a dedicated project directory called `hello-config`, and we've renamed the workflow file `main.nf` to match the recommended Nextflow convention.
-
-### 0.1. Explore the `hello-config` directory
-
-We want to launch the workflow from inside the `hello-config` directory, so let's move into it now.
-
-```bash
-cd hello-config
-```
-
-Let's take a look at the contents.
-You can use the file explorer or the terminal; here we're using the output of `tree` to display the top-level directory contents.
-
-```console title="Directory contents"
-hello-config
-├── demo-params.json
-├── main.nf
-└── nextflow.config
-```
-
-- **`main.nf`** is a workflow based on `hello-operators.nf`, the workflow produced by completing Part 4 of this training course;
-
-- **`nextflow.config`** is a copy of the original `nextflow.config` file from the `hello-nextflow` directory, one level up (where we've been working so far).
- Whenever there is a file named `nextflow.config` in the current directory, Nextflow will automatically load configuration from it. The one we have been using contains the following lines:
-
- ```console title="nextflow.config" linenums="1"
- docker.fixOwnership = true
- docker.enabled = true
- ```
-
- The `docker.fixOwnership = true` line is not really interesting.
- It's a workaround for an issue that sometimes occur with containerized tools that set the wrong permissions on the files they write (which is the case with GenomicsDBImport in the GATK container image in our workflow).
-
- The `docker.enabled = true` line is what we care about here.
- It specifies that Nextflow should use Docker to run process calls that specify a container image.
- We're going to be playing with that shortly.
+There are multiple ways to do this; here we are going to use the simplest and most common configuration file mechanism, the `nextflow.config` file.
+Whenever there is a file named `nextflow.config` in the current directory, Nextflow will automatically load configuration from it.
!!!note
Anything you put into the `nextflow.config` can be overridden at runtime by providing the relevant process directives or parameters and values on the command line, or by importing another configuration file, according to the order of precedence described [here](https://www.nextflow.io/docs/latest/config.html).
-- **`demo-params.json`** is a parameter file intended for supplying parameter values to a workflow.
- We will use it in section 5 of this tutorial.
-
-The one thing that's missing is a way to point to the original data without making a copy of it or updating the file paths wherever they're specified.
-The simplest solution is to link to the data location.
-
-### 0.2. Create a symbolic link to the data
-
-Run this command from inside the `hello-config` directory:
-
-```bash
-ln -s ../data data
-```
-
-This creates a symbolic link called `data` pointing to the data directory, which allows us to avoid having to change anything to how the file paths are set up.
-
-```console title="Directory contents"
-hello-config
-├── data -> ../data
-├── demo-params.json
-├── main.nf
-└── nextflow.config
-```
-
-Later we'll cover a better way of handling this, but this will do for now.
+In this part of the training, we're going to use the `nextflow.config` file to demonstrate essential components of Nextflow configuration such as process directives, executors, profiles, and parameter files.
-### 0.3. Verify that the initial workflow runs properly
-
-Now that everything is in place, we should be able to run the workflow successfully.
-
-```bash
-nextflow run main.nf
-```
-
-This should run successfully:
-
-```console title="Output"
-Nextflow 24.09.2-edge is available - Please consider updating your version to it
-
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `main.nf` [tender_brahmagupta] DSL2 - revision: 848ff2f9b5
-
-executor > local (7)
-[fb/f755b1] SAMTOOLS_INDEX (1) [100%] 3 of 3 ✔
-[d8/467767] GATK_HAPLOTYPECALLER (1) [100%] 3 of 3 ✔
-[ee/2c7855] GATK_JOINTGENOTYPING [100%] 1 of 1 ✔
-```
-
-There will now be a `work` directory and a `results_genomics` directory inside your `hello-config` directory.
-
-### Takeaway
-
-You know what are the two most important files in a Nextflow project: `main.nf` and its `nextflow.config`.
-
-### What's next?
-
-Learn how to modify basic configuration properties to adapt to your compute environment's requirements.
+By learning to utilize these configuration options effectively, you can enhance the flexibility, scalability, and performance of your pipelines.
---
-## 1. Determine what software packaging technology to use
-
-In the very first part of this training course (Part 1: Hello World) we just used locally installed software in our workflow. Then from Part 2 onward, we've been using Docker containers.
-
-Now, let's pretend we're working on an HPC cluster and the admin doesn't allow the use of Docker for security reasons.
+## 0. Warmup: Check that Docker is enabled and run the Hello Config workflow
-### 1.1. Disable Docker in the config file
+First, a quick check. There is a `nextflow.config` file in the current directory that contains the line `docker.enabled = `, where `` is either `true` or `false` depending on whether or not you've worked through Part 5 of this course in the same environment.
-First, we have to switch the value of `docker.enabled` to false.
+If it is set to `true`, you don't need to do anything.
-_Before:_
+If it is set to `false`, switch it to `true` now.
```console title="nextflow.config" linenums="1"
-docker.fixOwnership = true
docker.enabled = true
```
-_After:_
-
-```console title="nextflow.config" linenums="1"
-docker.fixOwnership = true
-docker.enabled = false
-```
-
-Let's see what happens if we run that.
-
-### 1.2. Run the workflow without Docker
-
-We are now launching the `main.nf` workflow from inside the `hello-config` directory.
+Once you've done that, verify that the initial workflow runs properly:
```bash
-nextflow run main.nf
+nextflow run hello-config.nf
```
-As expected, the run fails with an error message that looks like this:
-
```console title="Output"
N E X T F L O W ~ version 24.10.0
- ┃ Launching `hello-config/main.nf` [silly_ramanujan] DSL2 - revision: 9129bc4618
+Launching `hello-config.nf` [reverent_heisenberg] DSL2 - revision: 028a841db1
+
+executor > local (8)
+[7f/0da515] sayHello (1) | 3 of 3 ✔
+[f3/42f5a5] convertToUpper (3) | 3 of 3 ✔
+[04/fe90e4] collectGreetings | 1 of 1 ✔
+[81/4f5fa9] cowpy | 1 of 1 ✔
+There were 3 greetings in this batch
+```
-executor > local (3)
-[93/4417d0] SAMTOOLS_INDEX (1) [ 0%] 0 of 3
-[- ] GATK_HAPLOTYPECALLER -
-[- ] GATK_JOINTGENOTYPING -
-ERROR ~ Error executing process > 'SAMTOOLS_INDEX (2)'
+If everything works, you're ready to learn how to modify basic configuration properties to adapt to your compute environment's requirements.
-Caused by:
- Process `SAMTOOLS_INDEX (2)` terminated with an error exit status (127)
+---
-Command executed:
+## 1. Determine what software packaging technology to use
- samtools index 'reads_father.bam'
+The first step toward adapting your workflow configuration to your compute environment is specifying where the software packages that will get run in each step are going to be coming from.
+Are they already installed in the local compute environment? Do we need to retrieve images and run them via a container system? Or do we need to retrieve Conda packages and build a local Conda environment?
-Command exit status:
- 127
+In the very first part of this training course (Parts 1-4) we just used locally installed software in our workflow.
+Then in Part 5, we introduced Docker containers and the `nextflow.config` file, which we used to enable the use of Docker containers.
-Command output:
- (empty)
+In the warmup to this section, you checked that Docker was enabled in `nextflow.config` file and ran the workflow, which used a Docker container to execute the `cowpy()` process.
-Command error:
- .command.sh: line 2: samtools: command not found
-```
+!!! note
-Command not found? Of course, we don't have Samtools installed in our environment, and we can no longer use the Docker container. What to do?
+ If that doesn't sound familiar, you should probably go back and work through Part 5 before continuing.
-!!!note
+Now let's see how we can configure an alternative software packaging option via the `nextflow.config` file.
- Nextflow supports multiple other container technologies such as including Singularity (which is more widely used on HPC), and software package managers such as Conda.
+### 1.1. Disable Docker and enable Conda in the config file
-Let's try using Conda environments for our workflow.
+Let's pretend we're working on an HPC cluster and the admin doesn't allow the use of Docker for security reasons.
-### 1.3. Enable Conda in the configuration file
+Fortunately for us, Nextflow supports multiple other container technologies such as including Singularity (which is more widely used on HPC), and software package managers such as Conda.
-First, we need to add a directive enabling the use of Conda, right after the line that controls the use of Docker.
-And while we're at it, let's put a blank line before those two to emphasize the logical grouping.
+We can change our configuration file to use Conda instead of Docker.
+To do so, we switch the value of `docker.enabled` to `false`, and add a directive enabling the use of Conda:
_Before:_
```groovy title="nextflow.config" linenums="1"
-docker.fixOwnership = true
-docker.enabled = false
+docker.enabled = true
```
_After:_
```groovy title="nextflow.config" linenums="1"
-docker.fixOwnership = true
-
docker.enabled = false
conda.enabled = true
```
-This should allow Nextflow to create and utilize Conda environments for processes that have Conda packages specified. Which means we now need to add those to our processes!
+This will allow Nextflow to create and utilize Conda environments for processes that have Conda packages specified.
+Which means we now need to add one of those to our `cowpy` process!
-### 1.4. Specify Conda packages in the process definitions
+### 1.2. Specify a Conda package in the process definition
-We know that the Bioconda project provides Conda packages for Samtools and GATK, so we just need to retrieve their URIs and add them to the corresponding process definitions using the `conda` directive.
+We've already retrieved the URI for a Conda package containing the `cowpy` tool: `conda-forge::cowpy==1.1.5`
!!! note
There are a few different ways to get the URI for a given conda package.
- We recommend using the [Seqera Containers](https://seqera.io/containers/) search query, which will give you a URI that you can copy paste, even if you're not creating a container.
-
-For your convenience, we are providing the URIs below. Just make sure to _add_ the `conda` directive.
-To be clear, we're not _replacing_ the `docker` directive, just adding an alternative option.
-
-#### 1.4.1. Update SAMTOOLS_INDEX
-
-The URI is `"bioconda::samtools=1.20"`.
-
-_Before:_
-
-```console title="main.nf" linenums="22"
-process SAMTOOLS_INDEX {
-
- container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'
-
- publishDir params.outdir, mode: 'symlink'
-```
-
-_After:_
-
-```console title="main.nf" linenums="22"
-process SAMTOOLS_INDEX {
-
- container "community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464"
- conda "bioconda::samtools=1.20"
+ We recommend using the [Seqera Containers](https://seqera.io/containers/) search query, which will give you a URI that you can copy and paste, even if you're not planning to create a container from it.
- publishDir params.outdir, mode: 'symlink'
-```
-
-#### 1.4.2. Update GATK_HAPLOTYPECALLER
-
-The URI is `"bioconda::gatk4=4.5.0.0"`.
-
-_Before:_
-
-```console title="main.nf" linenums="43"
-process GATK_HAPLOTYPECALLER {
-
- container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
-
- publishDir params.outdir, mode: 'symlink'
-```
-
-_After:_
-
-```console title="main.nf" linenums="43"
-process GATK_HAPLOTYPECALLER {
-
- container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
- conda "bioconda::gatk4=4.5.0.0"
-
- publishDir params.outdir, mode: 'symlink'
-```
-
-#### 1.4.3. Update GATK_JOINTGENOTYPING
-
-The URI is `"bioconda::gatk4=4.5.0.0"`.
+Now we add the URI to the `cowpy` process definition using the `conda` directive:
_Before:_
-```console title="main.nf" linenums="74"
-process GATK_JOINTGENOTYPING {
+```console title="modules/cowpy.nf" linenums="4"
+process cowpy {
- container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
+ container 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273'
- publishDir params.outdir, mode: 'symlink'
+ publishDir 'results', mode: 'copy'
```
_After:_
-```console title="main.nf" linenums="74"
-process GATK_JOINTGENOTYPING {
+```console title="modules/cowpy.nf" linenums="4"
+process cowpy {
- container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
- conda "bioconda::gatk4=4.5.0.0"
+ container 'community.wave.seqera.io/library/cowpy:1.1.5--3db457ae1977a273'
+ conda 'conda-forge::cowpy==1.1.5'
- publishDir params.outdir, mode: 'symlink'
+ publishDir 'results', mode: 'copy'
```
-Once all three processes are updated, we can try running the workflow again.
+To be clear, we're not _replacing_ the `docker` directive, we're _adding_ an alternative option.
-### 1.5. Run the workflow to verify that it can use Conda
+### 1.3. Run the workflow to verify that it can use Conda
Let's try it out.
```bash
-nextflow run main.nf
+nextflow run hello-config.nf
```
-This will take a bit longer than usual the first time, and you might see the console output stay 'stuck' at this stage for a minute or so:
+This should work without issue.
```console title="Output"
N E X T F L O W ~ version 24.10.0
- ┃ Launching `main.nf` [extravagant_thompson] DSL2 - revision: 848ff2f9b5
+Launching `hello-config.nf` [trusting_lovelace] DSL2 - revision: 028a841db1
-[- ] SAMTOOLS_INDEX -
-[- ] GATK_HAPLOTYPECALLER -
-[- ] GATK_JOINTGENOTYPING -
-Creating env using conda: bioconda::samtools=1.20 [cache /workspace/gitpod/hello-nextflow/hello-config/work/conda/env-6684ea23d69ceb1742019ff36904f612]
+executor > local (8)
+[ee/4ca1f2] sayHello (3) | 3 of 3 ✔
+[20/2596a7] convertToUpper (1) | 3 of 3 ✔
+[b3/e15de5] collectGreetings | 1 of 1 ✔
+[c5/af5f88] cowpy | 1 of 1 ✔
+There were 3 greetings in this batch
```
-That's because Nextflow has to retrieve the Conda packages and create the environment, which takes a bit of work behind the scenes. The good news is that you don't need to deal with any of it yourself!
-
-After a few moments, it should spit out some more output, and eventually complete without error.
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
+Behind the scenes, Nextflow has retrieved the Conda packages and created the environment, which normally takes a bit of work; so it's nice that we don't have to do any of that ourselves!
- ┃ Launching `main.nf` [silly_goldstine] DSL2 - revision: a60f9fd6af
+!!! note
-executor > local (7)
-[23/b59106] SAMTOOLS_INDEX (1) [100%] 3 of 3 ✔
-[da/e1bf1d] GATK_HAPLOTYPECALLER (1) [100%] 3 of 3 ✔
-[2e/e6ffca] GATK_JOINTGENOTYPING [100%] 1 of 1 ✔
-```
+ This runs quickly because the `cowpy` package is quite small, but if you're working with large packages, it may take a bit longer than usual the first time, and you might see the console output stay 'stuck' for a minute or so before completing.
+ This is normal and is due to the extra work Nextflow does the first time you use a new package.
-And from our standpoint, it looks like it works exactly the same as running with Docker, even though on the backend the mechanics are a bit different.
+From our standpoint, it looks like it works exactly the same as running with Docker, even though on the backend the mechanics are a bit different.
This means we're all set to run with Conda environments if needed.
@@ -353,432 +173,103 @@ You know how to configure which software package each process should use, and ho
### What's next?
-Learn how to use profiles to make selecting an option easier.
-
----
-
-## 2. Use profiles to select preset configurations
-
-Profiles are a great way to adapt your workflow configuration by selecting preset options at runtime, to avoid having to edit a file every time you want to run something differently.
-
-### 2.1. Create profiles for switching between Docker and Conda
-
-Setting up these profiles mainly involves restructuring how we specify the `docker` and `conda` directives.
-
-_Before:_
-
-```groovy title="nextflow.config" linenums="1"
-docker.fixOwnership = true
-
-docker.enabled = false
-conda.enabled = true
-```
-
-_After:_
-
-```groovy title="nextflow.config" linenums="1"
-docker.fixOwnership = true
-
-profiles {
- docker_on {
- docker.enabled = true
- }
- conda_on {
- conda.enabled = true
- }
-}
-```
-
-This makes it possible to activate one or the other by specifying a profile in our Nextflow run command line.
-
-### 2.2. Run the workflow with a profile
-
-Let's try running the workflow with Conda.
-
-```bash
-nextflow run main.nf -profile conda_on
-```
-
-It works! Convenient, isn't it?
-
-```
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `main.nf` [sharp_gauss] DSL2 - revision: 66cd7c255a
-
-executor > local (7)
-[f4/ef2cb6] SAMTOOLS_INDEX (1) [100%] 3 of 3 ✔
-[70/77152c] GATK_HAPLOTYPECALLER (1) [100%] 3 of 3 ✔
-[a6/0f72fd] GATK_JOINTGENOTYPING [100%] 1 of 1 ✔
-```
-
-Feel free to try it out with the Docker profile too. You just have to switch `-profile conda_on` to `-profile docker_on` in the command.
-
-### Takeaway
-
-You know how to use profiles to select a preset configuration at runtime with minimal hassle.
-
-### What's next?
-
Learn how to change the executor used by Nextflow to actually do the work.
---
-## 3. Determine what executor(s) should be used to do the work
-
-Until now, we have been running our pipeline with the local executor.
-This runs each step on the same machine that Nextflow is running on.
-However, for large workloads, you will typically want to use a distributed executor such as an HPC or cloud.
-Nextflow supports several different distributed executors, including:
+## 2. Allocate compute resources with process directives
-- HPC (SLURM, PBS, SGE)
-- AWS Batch
-- Google Batch
-- Azure Batch
-- Kubernetes
-- GA4GH TES
+Most high-performance computing platforms allow (and sometimes require) that you specify certain resource allocation parameters such as number of CPUs and memory.
-The executor is subject to a process directive called `executor`. By default it is set to `local`, so the following configuration is implied:
+By default, Nextflow will use a single CPU and 2GB of memory for each process.
+The corresponding process directives are called `cpus` and `memory`, so the following configuration is implied:
-```groovy title="Built-in configuration"
+```groovy title="Built-in configuration" linenums="1"
process {
- executor = 'local'
+ cpus = 1
+ memory = 2.GB
}
```
-Let's look at what it would take to using a Slurm scheduler, assuming we had a connection to a cluster and Slurm was installed appropriately.
-
-!!! warning
-
- What follows is for demonstration purposes but **will not execute the work** since we don't have access to an external executor.
-
-### 3.1. Set up a Slurm executor
-
-Add the following lines to the `nextflow.config` file:
+You can modify these values, either for all processes or for specific named processes, using additional process directives in your configuration file.
+Nextflow will translate them into the appropriate instructions for the chosen executor.
-```groovy title="nextflow.config" linenums="12"
-process {
- executor = 'slurm'
-}
-```
+But how do you know what values to use?
-And... that's it! As noted before, this does assume that Slurm itself is already set up for you, but this is really all Nextflow itself needs to know.
+### 2.1. Run the workflow to generate a resource utilization report
-Basically we are telling Nextflow to generate a Slurm submission script and submit it using an `sbatch` command.
+If you don't know up front how much CPU and memory your processes are likely to need, you can do some resource profiling, meaning you run the workflow with some default allocations, record how much each process used, and from there, estimate how to adjust the base allocations.
-### 3.2. Launch the workflow to generate the job submission script
+Conveniently, Nextflow includes built-in tools for doing this, and will happily generate a report for you on request.
-Let's try running this; even though we know it won't execute (since we don't have Slurm set up in this Gitpod environment) we'll be able to see what the submission script looks like.
+To do so, add `-with-report .html` to your command line.
```bash
-nextflow run main.nf -profile conda_on
-```
-
-As expected, this fails with a fairly unambiguous error:
-
-```console title="Output"
-nextflow
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `main.nf` [grave_gauss] DSL2 - revision: 66cd7c255a
-
-[- ] SAMTOOLS_INDEX [ 0%] 0 of 3
-[eb/2962ce] SAMTOOLS_INDEX (3) [ 33%] 1 of 3, failed: 1
-[- ] GATK_HAPLOTYPECALLER -
-[- ] GATK_JOINTGENOTYPING -
-ERROR ~ Error executing process > 'SAMTOOLS_INDEX (3)'
-
-Caused by:
- java.io.IOException: Cannot run program "sbatch" (in directory "/workspace/gitpod/hello-nextflow/hello-config/work/eb/2962ce167b3025a41ece6ce6d7efc2"): error=2, No such file or directory
-
-Command executed:
-
- sbatch .command.run
-```
-
-However, it did produce what we are looking for: the `.command.run` file that Nextflow tried to submit to Slurm via the `sbatch` command.
-
-Let's take a look inside.
-
-```bash title=".command.run" linenums="1"
-#!/bin/bash
-#SBATCH -J nf-SAMTOOLS_INDEX_(1)
-#SBATCH -o /home/gitpod/work/34/850fe31af0eb62a0eb1643ed77b84f/.command.log
-#SBATCH --no-requeue
-#SBATCH --signal B:USR2@30
-NXF_CHDIR=/home/gitpod/work/34/850fe31af0eb62a0eb1643ed77b84f
-### ---
-### name: 'SAMTOOLS_INDEX (1)'
-### container: 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'
-### outputs:
-### - 'reads_father.bam'
-### - 'reads_father.bam.bai'
-### ...
+nextflow run hello-config.nf -with-report report-config-1.html
```
-This shows the job submission details that Nextflow is trying to hand over to Slurm.
-
-!!!note
-
- There other options that we could additionally set using other process directives to control resource allocations, which we'll get to in a little bit.
- These would also be included in the `.command.run` file and directly passed to the Slurm execution.
+The report is an html file, which you can download and open in your browser. You can also right click it in the file explorer on the left and click on `Show preview` in order to view it in the training environment.
-You can try using any of the other supported executors in the same way. Nextflow will translate the values submitted to the executor into the appropriate equivalent instructions.
-
-Conveniently, you can also set up profiles to select which executor you want to use at runtime, just like we did for the Docker vs. Conda environments selection earlier.
+Take a few minutes to look through the report and see if you can identify some opportunities for adjusting resources.
+Make sure to click on the tabs that show the utilization results as a percentage of what was allocated.
+There is some [documentation](https://www.nextflow.io/docs/latest/reports.html) describing all the available features.
-### 3.3. Set up profiles for executors too
+
-Let's replace the process block we had added with the executor selection profiles.
+### 2.2. Set resource allocations for all processes
-_Before:_
+The profiling shows that the processes in our training workflow are very lightweight, so let's reduce the default memory allocation to 1GB per process.
-```groovy title="nextflow.config" linenums="3"
-profiles {
- docker_on {
- docker.enabled = true
- }
- conda_on {
- conda.enabled = true
- }
-}
+Add the following to your `nextflow.config` file:
+```groovy title="nextflow.config" linenums="4"
process {
- executor = 'slurm'
-}
-```
-
-_After:_
-
-```groovy title="nextflow.config" linenums="3"
-profiles {
- docker_on {
- docker.enabled = true
- }
- conda_on {
- conda.enabled = true
- }
- local_exec {
- process.executor = 'local'
- }
- slurm_exec {
- process.executor = 'slurm'
- }
+ memory = 1.GB
}
```
-Although it may look like these are going to be mutually exclusive, you can actually combine multiple profiles.
-Let's try that now.
-
-### 3.4. Run with a combination of profiles
-
-To use two profiles at the same time, simply give both to the `-profile` parameter, separated by a comma.
-
-```bash
-nextflow run main.nf -profile docker_on,local_exec
-```
-
-With that, we've returned to the original configuration of using Docker containers with local execution, not that you can tell from the console output:
+### 2.3. Set resource allocations for an individual process
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `main.nf` [irreverent_bassi] DSL2 - revision: 66cd7c255a
-
-executor > local (7)
-[17/82bbc4] SAMTOOLS_INDEX (2) [100%] 3 of 3 ✔
-[8e/93609c] GATK_HAPLOTYPECALLER (2) [100%] 3 of 3 ✔
-[e6/df6740] GATK_JOINTGENOTYPING [100%] 1 of 1 ✔
-```
-
-The point is, we can now use profiles to switch to a different software packaging system (Conda) or a different executor (such as Slurm) with a single command-line option.
-For example, if we were back on our hypothetical HPC from earlier, we would switch to using `-profile conda_on,slurm_exec` in our Nextflow command line.
-
-Feel free to test that on your own to satisfy yourself that it works as expected.
-
-Moving on, we're going to take this logic a step further, and set up dedicated profiles for groups of configuration elements that we usually want to activate together.
-
-### 3.5. Create profiles that combine several configuration elements
-
-Let's set up some dedicated profiles for the two case figures we've been envisioning: running locally with Docker, which we'll call `my_laptop`, and running on the HPC cluster with Conda, which we'll call `univ_hpc`.
+At the same time, we're going to pretend that the `cowpy` process requires more resources than the others, just so we can demonstrate how to adjust allocations for an individual process.
_Before:_
-```groovy title="nextflow.config" linenums="3"
-profiles {
- docker_on {
- docker.enabled = true
- }
- conda_on {
- conda.enabled = true
- }
- local_exec {
- process.executor = 'local'
- }
- slurm_exec {
- process.executor = 'slurm'
- }
-}
-```
-
-_After:_
-
-```groovy title="nextflow.config" linenums="3"
-profiles {
- docker_on {
- docker.enabled = true
- }
- conda_on {
- conda.enabled = true
- }
- my_laptop {
- process.executor = 'local'
- docker.enabled = true
- }
- univ_hpc {
- process.executor = 'slurm'
- conda.enabled = true
- }
-}
-```
-
-Now we have profiles for the two main case figures we've been considering.
-If in the future we find other elements of configuration that are always co-occurring with these, we can simply add them to the corresponding profile(s).
-
-Feel free to test these new profiles on your own using either `-profile my_laptop` or `-profile univ_hpc`.
-Just remember that the `univ_hpc` one won't work unless you run it in an environment that is set up appropriately to use Slurm.
-
-!!!note
-
- You'll notice we've removed the two profiles that _only_ specified the executor, because in those cases we're always going to want to specify the software packaging technology too.
-
- We're leaving in the Docker and Conda profiles because those ones come in handy by themselves, although there are also some dedicated command line flags for those, and it's a nice illustration of the fact that you can have the same directives set in multiple profiles.
- Just keep in mind that if you combine profiles with conflicting settings for the same directives, you might be surprised by the results.
-
-### Takeaway
-
-You now know how to change the executor and combine that with other environment settings using profiles.
-
-### What's next?
-
-Learn how to control the resources allocated for executing processes.
-
----
-
-## 4. Allocate compute resources with process directives
-
-We've covered how to control what compute environment Nextflow is going to use to run the workflow, so now the next logical question is, how do we control the resources (CPU, memory etc) that will be allocated?
-
-The answer may not surprise you; it's process directives again.
-
-### 4.1. Increase default process resource allocations
-
-By default, Nextflow will use a single CPU and 2GB of memory for each process.
-Let's say we decide to double that.
-
-We can modify this behavior by setting the `cpu` and `memory` directives in the `process` block. Add the following to the end of your `nextflow.config` file:
-
-```groovy title="nextflow.config" linenums="20"
+```groovy title="nextflow.config" linenums="14"
process {
- // defaults for all processes
- cpus = 2
- memory = 4.GB
+ memory = 1.GB
}
```
-### 4.2. Run the workflow with the increased defaults
-
-Let's try that out, bearing in mind that we need to keep `-profile my_laptop` in the command going forward.
-
-```bash
-nextflow run main.nf -profile my_laptop
-```
-
-You may not notice any real difference in how quickly this runs, since this is such a small workload.
-But if you have a machine with few CPUs and you allocate a high number per process, you might see process calls getting queued behind each other.
-This is because Nextflow will ensure we aren't using more CPUs than are available.
-
-!!! tip
-
- You can check the number of CPUs allocated to a given process by looking at the `.command.run` log in its work directory.
- There will be a function called `nxf_launch()` that includes the command `docker run—i—-CPU 1024`, where `--cpu-shares` refers to the CPU time given to this process' tasks. Setting one task's cpu_share to 512 and another to 1024 means that the second task will get double the amount of CPU time as the first.
-
-You're probably wondering if you can set resource allocations per individual process, and the answer is of course yes, yes you can!
-We'll show you how to do that in a moment.
-
-But first, let's talk about how you can find out how much CPU and memory your processes are likely to need.
-The classic approach is to do resource profiling, meaning you run the workflow with some default allocations, record how much each process used, and from there, estimate how to adjust the base allocations.
-
-The truly excellent news on this front is that Nextflow includes built-in tools for doing this, and will happily generate a report for you on request.
-Let's try that out.
-
-### 4.3. Run the workflow to generate a resource utilization report
-
-To have Nextflow generate the report automatically, simply add `-with-report .html` to your command line.
-
-```bash
-nextflow run main.nf -profile my_laptop -with-report report-config-1.html
-```
-
-The report is an html file, which you can download and open in your browser. You can also right click it in the file explorer on the left and click on `Show preview` in order to view it on Gitpod.
-
-Take a few minutes to look through the report and see if you can identify some opportunities for adjusting resources.
-Make sure to click on the tabs that show the utilization results as a percentage of what was allocated.
-There is some [documentation](https://www.nextflow.io/docs/latest/reports.html) describing all the available features.
-
-
-
-One observation is that the `GATK_JOINTGENOTYPING` seems to be very hungry for CPU, which makes sense since it performs a lot of complex calculations.
-So we could try boosting that and see if it cuts down on runtime.
-
-However, we seem to have overshot the mark with the memory allocations; all processes are only using a fraction of what we're giving them.
-We should dial that back down and save some resources.
-
-### 4.4. Adjust resource allocations for a specific process
-
-We can specify resource allocations for a given process using the `withName` process selector.
-The syntax looks like this when it's by itself in a process block:
+_After:_
-```groovy title="Syntax"
+```groovy title="nextflow.config" linenums="4"
process {
- withName: 'GATK_JOINTGENOTYPING' {
- cpus = 4
+ memory = 1.GB
+ withName: 'cowpy' {
+ memory = 2.GB
+ cpus = 2
}
}
```
-Let's add that to the existing process block in the `nextflow.config` file.
+With this configuration, all processes will request 1GB of memory and a single CPU (the implied default), except the `cowpy` process, which will request 2GB and 2 CPUs.
-```groovy title="nextflow.config" linenums="11"
-process {
- // defaults for all processes
- cpus = 2
- memory = 2.GB
- // allocations for a specific process
- withName: 'GATK_JOINTGENOTYPING' {
- cpus = 4
- }
-}
-```
+!!! note
-With that specified, the default settings will apply to all processes **except** the `GATK_JOINTGENOTYPING` process, which is a special snowflake that gets a lot more CPU.
-Hopefully that should have an effect.
+ If you have a machine with few CPUs and you allocate a high number per process, you might see process calls getting queued behind each other.
+ This is because Nextflow ensures we don't request more CPUs than are available.
-### 4.5. Run again with the modified configuration
+### 2.4. Run the workflow with the modified configuration
-Let's run the workflow again with the modified configuration and with the reporting flag turned on, but notice we're giving the report a different name so we can differentiate them.
+Let's try that out, supplying a different filename for the profiling report so we can compare performance before and after the configuration changes.
```bash
-nextflow run main.nf -profile my_laptop -with-report report-config-2.html
+nextflow run hello-config.nf -with-report report-config-2.html
```
-Once again, you probably won't notice a substantial difference in runtime, because this is such a small workload and the tools spend more time in ancillary tasks than in performing the 'real' work.
-
-However, the second report shows that our resource utilization is more balanced now.
+You will probably not notice any real difference since this is such a small workload, but this is the approach you would use to analyze the performance and resource requirements of a real-world workflow.
-
-
-As you can see, this approach is useful when your processes have different resource requirements. It empowers you to right-size the resource allocations you set up for each process based on actual data, not guesswork.
+It is very useful when your processes have different resource requirements. It empowers you to right-size the resource allocations you set up for each process based on actual data, not guesswork.
!!!note
@@ -788,13 +279,14 @@ As you can see, this approach is useful when your processes have different resou
We'll cover both of those approaches in an upcoming part of this training course.
-That being said, there may be some constraints on what you can (or must) allocate depending on what computing executor and compute infrastructure you're using. For example, your cluster may require you to stay within certain limits that don't apply when you're running elsewhere.
+### 2.5. Add resource limits
-### 4.6. Add resource limits to an HPC profile
+Depending on what computing executor and compute infrastructure you're using, there may be some constraints on what you can (or must) allocate.
+For example, your cluster may require you to stay within certain limits.
You can use the `resourceLimits` directive to set the relevant limitations. The syntax looks like this when it's by itself in a process block:
-```groovy title="Syntax"
+```groovy title="Syntax example"
process {
resourceLimits = [
memory: 750.GB,
@@ -804,256 +296,179 @@ process {
}
```
-Let's add this to the `univ_hpc` profile we set up earlier.
-
-_Before:_
-
-```groovy title="nextflow.config"
- univ_hpc {
- process.executor = 'slurm'
- conda.enabled = true
- }
-```
+Nextflow will translate these values into the appropriate instructions depending on the executor that you specified.
-_After:_
-
-```groovy title="nextflow.config"
- univ_hpc {
- process.executor = 'slurm'
- conda.enabled = true
- process.resourceLimits = [
- memory: 750.GB,
- cpus: 200,
- time: 30.d
- ]
- }
-```
-
-We can't test this since we don't have a live connection to Slurm in the Gitpod environment.
-However, you can try running the workflow with resource allocations that exceed these limits, then look up the `sbatch` command in the `.command.run` script file.
-You should see that the requests that actually get sent to the executor are capped at the values specified by `resourceLimits`.
+We're not going to run this, since we don't have access to relevant infrastructure in the training environment.
+However, if you were to try running the workflow with resource allocations that exceed these limits, then look up the `sbatch` command in the `.command.run` script file, you would see that the requests that actually get sent to the executor are capped at the values specified by `resourceLimits`.
!!!note
The nf-core project has compiled a [collection of configuration files](https://nf-co.re/configs/) shared by various institutions around the world, covering a wide range of HPC and cloud executors.
- Those shared configs are valuable both for people who work there and can therefore just utilize their institution's configuration out of the box, and for people who are looking to develop a configuration for their own infrastructure.
+ Those shared configs are valuable both for people who work there and can therefore just utilize their institution's configuration out of the box, and as a model for people who are looking to develop a configuration for their own infrastructure.
### Takeaway
-You know how to allocate process resources, tweak those allocations based on the utilization report, and use a profile to adapt the allocations to the compute environment.
+You know how to generate a profiling report to assess resource utilization and how to modify resource allocations for all processes and/or for individual processes, as well as set resource limitations for running on HPC.
### What's next?
-Configuring the parameters destined for the tools and operations wrapped within processes.
+Learn to use a parameter file to store workflow parameters.
---
-## 5. Configure workflow parameters
-
-So far we've been exploring options for configuring how Nextflow behaves in terms of executing the work.
-That's all well and good, but how do we manage the parameters that are meant for the workflow itself, and the tools it calls within the processes?
-That is also something we should be able to do without editing code files every time we want to run on some new data or switch to a different set of reference files.
-
-As it turns out, there's a lot of overlap between this kind of configuration and the infrastructure configuration, starting with the `nextflow.config` file, which can also house default values for command line parameters.
-
-### 5.1. Move the default parameter declarations to the configuration file
-
-We originally stored all our default parameter values in the workflow script itself, but we can move them out into the `nextflow.config` file if we prefer.
+## 3. Use a parameter file to store workflow parameters
-So let's cut this set of params out of `main.nf`:
+So far we've been looking at configuration from the technical point of view of the compute infrastructure.
+Now let's consider another aspect of workflow configuration that is very important for reproducibility: the configuration of the workflow parameters.
-```groovy title="main.nf" linenums="3"
-/*
- * Pipeline parameters
- */
+Currently, our workflow is set up to accept several parameter values via the command-line, with default values set in the workflow script itself.
+This is fine for a simple workflow with very few parameters that need to be set for a given run.
+However, many real-world workflows will have many more parameters that may be run-specific, and putting all of them in the command line would be tedious and error-prone.
-// Primary input (file of input files, one per line)
-params.reads_bam = "${projectDir}/data/sample_bams.txt"
+Nextflow allows us to specify parameters via a parameter file in JSON format, which makes it very convenient to manage and distribute alternative sets of default values, for example, as well as run-specific parameter values.
-// Output directory
-params.outdir = 'results_genomics'
+We provide an example parameter file in the current directory, called `test-params.json`:
-// Accessory files
-params.reference = "${projectDir}/data/ref/ref.fasta"
-params.reference_index = "${projectDir}/data/ref/ref.fasta.fai"
-params.reference_dict = "${projectDir}/data/ref/ref.dict"
-params.intervals = "${projectDir}/data/ref/intervals.bed"
-
-// Base name for final output file
-params.cohort_name = "family_trio"
+```json title="test-params.json" linenums="1"
+{
+ "greeting": "greetings.csv",
+ "batch": "Trio",
+ "character": "turkey"
+}
```
-And let's stick it into the `nextflow.config` file.
+This parameter file contains a key-value pair for each of the inputs our workflow expects.
-!!!note
+### 3.1. Run the workflow using a parameter file
- It doesn't really matter where we put it into the file, as long as we keep the params together and avoid mixing them in with the infrastructure configuration, for the sake of readability.
- So putting it at the end will do just fine.
-
-### 5.2. Run the workflow with `-resume` to verify that it still works
-
-Let's check that we haven't broken anything, and let's include the `-resume` flag.
+To run the workflow with this parameter file, simply add `-params-file ` to the base command.
```bash
-nextflow run main.nf -profile my_laptop -resume
+nextflow run hello-config.nf -params-file test-params.json
```
-Not only does everything work, but all of the process calls are recognized as having been run previously.
+It works! And as expected, this produces the same outputs as previously.
```console title="Output"
N E X T F L O W ~ version 24.10.0
- ┃ Launching `main.nf` [modest_kay] DSL2 - revision: 328869237b
+Launching `hello-config.nf` [disturbed_sammet] DSL2 - revision: ede9037d02
-[d6/353bb0] SAMTOOLS_INDEX (3) [100%] 3 of 3, cached: 3 ✔
-[dc/2a9e3f] GATK_HAPLOTYPECALLER (2) [100%] 3 of 3, cached: 3 ✔
-[fe/a940b2] GATK_JOINTGENOTYPING [100%] 1 of 1, cached: 1 ✔
+executor > local (8)
+[f0/35723c] sayHello (2) | 3 of 3 ✔
+[40/3efd1a] convertToUpper (3) | 3 of 3 ✔
+[17/e97d32] collectGreetings | 1 of 1 ✔
+[98/c6b57b] cowpy | 1 of 1 ✔
+There were 3 greetings in this batch
```
-Indeed, having moved the parameter values to a different file changes nothing to the command submission that Nextflow generates. The resumability of the pipeline is preserved.
+This may seem like overkill when you only have a few parameters to specify, but some pipelines expect dozens of parameters.
+In those cases, using a parameter file will allow us to provide parameter values at runtime without having to type massive command lines and without modifying the workflow script.
-### 5.3. Streamline the syntax of the parameter defaults
-
-Now that our default parameter declarations are in `nextflow.config`, we can switch to using a more structured syntax using a `params` block. That allows us to remove the repeated `params.`.
-
-```groovy title="nextflow.config" linenums="35"
-/*
- * Pipeline parameters
- */
-
-params {
- // Primary input (file of input files, one per line)
- reads_bam = "${projectDir}/data/sample_bams.txt"
-
- // Output directory
- outdir = 'results_genomics'
+### Takeaway
- // Accessory files
- reference = "${projectDir}/data/ref/ref.fasta"
- reference_index = "${projectDir}/data/ref/ref.fasta.fai"
- reference_dict = "${projectDir}/data/ref/ref.dict"
- intervals = "${projectDir}/data/ref/intervals.bed"
+You know how to manage parameter defaults and override them at runtime using a parameter file.
- // Base name for final output file
- cohort_name = "family_trio"
-}
-```
+### What's next?
-Feel free to re-run this with the same command as above to verify that it works and still preserves the resumability of the pipeline.
+Learn how to use profiles to conveniently switch between alternative configurations.
-At this point, you may be wondering how to provide actual data and reference files to run this workflow for real, since what we've put in here is just a tiny test set.
+---
-There are several options.
-As we mentioned earlier (see note at the start of this page), you can override the defaults specified in the `nextflow.config` file by providing directives or parameter values on the command line, or by importing other configuration files.
+## 3. Determine what executor(s) should be used to do the work
-In this particular case, the best solution is to use a parameter file, which is a JSON file containing key-value pairs for all of the parameters you want to supply values for.
+Until now, we have been running our pipeline with the local executor.
+This executes each task on the machine that Nextflow is running on.
+When Nextflow begins, it looks at the available CPUs and memory.
+If the resources of the tasks ready to run exceed the available resources, Nextflow will hold the last tasks back from execution until one or more of the earlier tasks have finished, freeing up the necessary resources.
-### 5.4. Using a parameter file to override defaults
+For very large workloads, you may discover that your local machine is a bottleneck, either because you have a single task that requires more resources than you have available, or because you have so many tasks that waiting for a single machine to run them would take too long.
+The local executor is convenient and efficient, but is limited to that single machine.
+Nextflow supports [many different execution backends](https://www.nextflow.io/docs/latest/executor.html), including HPC schedulers (Slurm, LSF, SGE, PBS, Moab, OAR, Bridge, HTCondor and others) as well as cloud execution backends such (AWS Batch, Google Cloud Batch, Azure Batch, Kubernetes and more).
-We provide a parameter file in the current directory, called `demo-params.json`, which contains key-value pairs for all of the parameters our workflow expects.
-The values are the same input files and reference files we've been using so far.
+Each of these systems uses different technologies, syntaxes and configurations for defining how a job should be defined. For example, /if we didn't have Nextflow/, a job requiring 8 CPUs and 4GB of RAM to be executed on the queue "my-science-work" would need to include the following configuration on SLURM and submit the job using `sbatch`:
-```json title="demo-params.json" linenums="1"
-{
- "reads_bam": "data/sample_bams.txt",
- "outdir": "results_genomics",
- "reference": "data/ref/ref.fasta",
- "reference_index": "data/ref/ref.fasta.fai",
- "reference_dict": "data/ref/ref.dict",
- "intervals": "data/ref/intervals.bed",
- "cohort_name": "family_trio"
-}
+```bash
+#SBATCH -o /path/to/my/task/directory/my-task-1.log
+#SBATCH --no-requeue
+#SBATCH -c 8
+#SBATCH --mem 4096M
+#SBATCH -p my-science-work
```
-To run the workflow with this parameter file, simply add `-params-file demo-params.json` to the base command.
+If I wanted to make the workflow available to a colleague running on PBS, I'd need to remember to use a different submission program `qsub` and I'd need to change my scripts to use a new syntax for resources:
```bash
-nextflow run main.nf -profile my_laptop -params-file demo-params.json
+#PBS -o /path/to/my/task/directory/my-task-1.log
+#PBS -j oe
+#PBS -q my-science-work
+#PBS -l nodes=1:ppn=5
+#PBS -l mem=4gb
```
-It works! And as expected, this produces the same outputs as previously.
+If I wanted to use SGE, the configuration would be slightly different again:
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `main.nf` [marvelous_mandelbrot] DSL2 - revision: 328869237b
-
-executor > local (7)
-[63/23a827] SAMTOOLS_INDEX (1) [100%] 3 of 3 ✔
-[aa/60aa4a] GATK_HAPLOTYPECALLER (2) [100%] 3 of 3 ✔
-[35/bda5eb] GATK_JOINTGENOTYPING [100%] 1 of 1 ✔
+```bash
+#$ -o /path/to/my/task/directory/my-task-1.log
+#$ -j y
+#$ -terse
+#$ -notify
+#$ -q my-science-work
+#$ -l slots=5
+#$ -l h_rss=4096M,mem_free=4096M
```
-However, you may be thinking, well, did we really override the configuration? How would we know, since those were the same files?
+Running on a single cloud execution engine would require a new approach again, likely using an SDK that uses the cloud platform's APIs.
-### 5.5. Remove or generalize default values from `nextflow.config`
+Nextflow makes it easy to write a single workflow that can be run on each of these different infrastructures and systems, without having to modify the workflow.
+The executor is subject to a process directive called `executor`.
+By default it is set to `local`, so the following configuration is implied:
-Let's strip out all the file paths from the `params` block in `nextflow.config`, replacing them with `null`, and replace the `cohort_name` value with something more generic.
-
-_Before:_
-
-```groovy title="nextflow.config" linenums="39"
-params {
- // Primary input (file of input files, one per line)
- reads_bam = "${projectDir}/data/sample_bams.txt"
+```groovy title="Built-in configuration"
+process {
+ executor = 'local'
+}
+```
- // Output directory
- outdir = 'results_genomics'
+### 3.1. Targeting a different backend
- // Accessory files
- reference = "${projectDir}/data/ref/ref.fasta"
- reference_index = "${projectDir}/data/ref/ref.fasta.fai"
- reference_dict = "${projectDir}/data/ref/ref.dict"
- intervals = "${projectDir}/data/ref/intervals.bed"
+By default, this training environment does not include a running HPC schedulder, but if you were running on a system with SLURM installed, for example, you can have Nextflow convert the `cpus`, `memory`, `queue` and other process directives into the correct syntax at runtime by adding following lines to the `nextflow.config` file:
- // Base name for final output file
- cohort_name = "family_trio"
+```groovy title="nextflow.config"
+process {
+ executor = 'slurm'
}
```
-_After:_
+And... that's it! As noted before, this does assume that Slurm itself is already set up for you, but this is really all Nextflow itself needs to know.
+
+Basically we are telling Nextflow to generate a Slurm submission script and submit it using an `sbatch` command.
-```groovy title="nextflow.config" linenums="39"
-params {
- // Primary input (file of input files, one per line)
- reads_bam = null
+### Takeaway
- // Output directory
- outdir = null
+You now know how to change the executor to use different kinds of computing infrastructure.
- // Accessory files
- reference = null
- reference_index = null
- reference_dict = null
- intervals = null
+### What's next?
- // Base name for final output file
- cohort_name = "my_cohort"
-}
-```
+Learn how to control the resources allocated for executing processes.
+
+---
-Now, if you run the same command again, it will still work.
-So yes, we're definitely able to pull those parameter values from the parameter file.
+## 4. Use profiles to select preset configurations
-This is great because, with the parameter file in hand, we'll now be able to provide parameter values at runtime without having to type massive command lines **and** without modifying the workflow nor the default configuration.
+You may want to switch between alternative settings depending on what computing infrastructure you're using. For example, you might want to develop and run small-scale tests locally on your laptop, then run full-scale workloads on HPC or cloud.
-That being said, it was nice to be able to demo the workflow without having to keep track of filenames and such. Let's see if we can use a profile to replicate that behavior.
+Nextflow lets you set up profiles that describe different configurations, which you can then select at runtime using a command-line argument, rather than having to modify the configuration file itself.
-### 5.6. Create a demo profile
+### 4.1. Create profiles for switching between local development and execution on HPC
-Yes we can! We just need to retrieve the default parameter declarations as they were written in the original workflow (with the `params.*` syntax) and copy them into a new profile that we'll call `demo`.
+Let's set up two alternative profiles; one for running small scale loads on a regular computer, where we'll use Docker containers, and one for running on a university HPC with a Slurm scheduler, where we'll use Conda packages.
-_Before:_
+Add the following to your `nextflow.config` file:
```groovy title="nextflow.config" linenums="3"
profiles {
- docker_on {
- docker.enabled = true
- }
- conda_on {
- conda.enabled = true
- }
my_laptop {
process.executor = 'local'
docker.enabled = true
@@ -1070,16 +485,62 @@ profiles {
}
```
-_After:_
+You see that for the university HPC, we're also specifying resource limitations.
-```groovy title="nextflow.config" linenums="3"
-profiles {
- docker_on {
- docker.enabled = true
- }
- conda_on {
- conda.enabled = true
+### 4.2. Run the workflow with a profile
+
+To specify a profile in our Nextflow command line, we use the `-profile` argument.
+
+Let's try running the workflow with the `my_laptop` configuration.
+
+```bash
+nextflow run hello-config.nf -profile my_laptop
+```
+
+This still produces the following output:
+
+```
+ N E X T F L O W ~ version 24.10.0
+
+Launching `hello-config.nf` [gigantic_brazil] DSL2 - revision: ede9037d02
+
+executor > local (8)
+[58/da9437] sayHello (3) | 3 of 3 ✔
+[35/9cbe77] convertToUpper (2) | 3 of 3 ✔
+[67/857d05] collectGreetings | 1 of 1 ✔
+[37/7b51b5] cowpy | 1 of 1 ✔
+There were 3 greetings in this batch
+```
+
+As you can see, this allows us to toggle between configurations very conveniently at runtime.
+
+!!! warning
+
+ The `univ_hpc` profile will not run properly in the training environment since we do not have access to a Slurm scheduler.
+
+If in the future we find other elements of configuration that are always co-occurring with these, we can simply add them to the corresponding profile(s).
+We can also create additional profiles if there are other elements of configuration that we want to group together.
+
+### 4.3. Create a test profile
+
+Profiles are not only for infrastructure configuration.
+We can also use them to set default values for workflow parameters, to make it easier for others to try out the workflow without having to gather appropriate input values themselves.
+This is intended as an alternative to using a parameter file.
+
+The syntax for expressing default values is the same as when writing them into the workflow file itself, except we wrap them in a block named `test`:
+
+```groovy title="Syntax example"
+ test {
+ params.
+ params.
+ ...
}
+```
+
+If we add a test profile for our workflow, the `profiles` block becomes:
+
+```groovy title="nextflow.config" linenums="4"
+profiles {
my_laptop {
process.executor = 'local'
docker.enabled = true
@@ -1093,55 +554,60 @@ profiles {
time: 30.d
]
}
- demo {
- // Primary input (file of input files, one per line)
- params.reads_bam = "data/sample_bams.txt"
-
- // Output directory
- params.outdir = 'results_genomics'
-
- // Accessory files
- params.reference = "data/ref/ref.fasta"
- params.reference_index = "data/ref/ref.fasta.fai"
- params.reference_dict = "data/ref/ref.dict"
- params.intervals = "data/ref/intervals.bed"
-
- // Base name for final output file
- params.cohort_name = "family_trio"
+ test {
+ params.greeting = 'greetings.csv'
+ params.batch = 'test-batch'
+ params.character = 'turkey'
}
}
```
-As long as we distribute the data bundle with the workflow code, this will enable anyone to quickly try out the workflow without having to supply their own inputs or pointing to the parameter file. Besides, we can provide URLs to where files are stored and Nextflow will download them automatically.
+Just like for technical configuration profiles, you can set up multiple different profiles specifying parameters under any arbitrary name you like.
+
+### 4.4. Run the workflow locally with the test profile
-### 5.7. Run with the demo profile
+Conveniently, profiles are not mutually exclusive, so we can specify multiple profiles in our command line using the following syntax `-profile ,` (for any number of profiles).
-Let's try that out:
+!!! note
+
+ If you combine profiles that set values for the same elements of configuration and are described in the same configuration file, Nextflow will resolve the conflict by using whichever value it read in last (_i.e._ whatever comes later in the file).
+ If the conflicting settings are set in different configuration sources, the default [order of precedence](https://www.nextflow.io/docs/latest/config.html) applies.
+
+Let's try adding the test profile to our previous command:
```bash
-nextflow run main.nf -profile my_laptop,demo
+nextflow run hello-config.nf -profile my_laptop,test
```
-And it works perfectly!
+This should produce the following:
```console title="Output"
N E X T F L O W ~ version 24.10.0
- ┃ Launching `main.nf` [cheesy_shaw] DSL2 - revision: 328869237b
+Launching `hello-config.nf` [gigantic_brazil] DSL2 - revision: ede9037d02
-executor > local (7)
-[4f/5ea14f] SAMTOOLS_INDEX (1) [100%] 3 of 3 ✔
-[fc/761e86] GATK_HAPLOTYPECALLER (3) [100%] 3 of 3 ✔
-[8a/2f498f] GATK_JOINTGENOTYPING [100%] 1 of 1 ✔
+executor > local (8)
+[58/da9437] sayHello (3) | 3 of 3 ✔
+[35/9cbe77] convertToUpper (2) | 3 of 3 ✔
+[67/857d05] collectGreetings | 1 of 1 ✔
+[37/7b51b5] cowpy | 1 of 1 ✔
+There were 3 greetings in this batch
```
-Imagine what we can do with this tooling in place.
-For example, we could also add profiles with popular sets of reference files to save people the trouble of providing their own.
+
+
+This means that as long as we distribute any test data files with the workflow code, anyone can quickly try out the workflow without having to supply their own inputs via the command line or a parameter file.
+
+!!! note
+
+ We can even point to URLs for larger files that are stored externally.
+ Nextflow will download them automatically as long as there is an open connection.
### Takeaway
-You know how to manage parameter defaults, override them at runtime using a parameter file, and set up profiles.
+You know how to use profiles to select a preset configuration at runtime with minimal hassle. More generally, you know how to configure your workflow executions to suit different compute platforms and enhance the reproducibility of your analyses.
### What's next?
-Celebrate and relax. Then we'll move on to learning how to modularize the workflow code for optimal maintainability and reuse.
+Celebrate and give yourself a big pat on the back! You have completed your very first Nextflow developer course.
+Then check out the training portal homepage for more training content that may be of interest.
diff --git a/docs/hello_nextflow/07_hello_modules.md b/docs/hello_nextflow/07_hello_modules.md
deleted file mode 100644
index 5dfdd78ab..000000000
--- a/docs/hello_nextflow/07_hello_modules.md
+++ /dev/null
@@ -1,411 +0,0 @@
-# Part 6: Hello Modules
-
-This section covers how to organize your workflow code to make development and maintenance of your pipeline more efficient and sustainable.
-Specifically, we are going to demonstrate how to use **modules**.
-
-In Nextflow, a **module** is a single process definition that is encapsulated by itself in a standalone code file.
-To use a module in a workflow, you just add a single-line import statement to your workflow code file; then you can integrate the process into the workflow the same way you normally would.
-
-Putting processes into individual modules makes it possible to reuse process definitions in multiple workflows without producing multiple copies of the code.
-This makes the code more shareable, flexible and maintainable.
-
-!!!note
-
- It is also possible to encapsulate a section of a workflow as a 'subworkflow' that can be imported into a larger pipeline, but that is outside the scope of this training.
-
----
-
-## 0. Warmup
-
-When we started developing our workflow, we put everything in one single code file.
-In Part 5 (Hello Config), we started turning our one-file workflow into a proper pipeline project.
-We moved to the standard Nextflow convention of naming the workflow file `main.nf`, fleshed out the configuration file, and added a parameter file.
-
-Now it's time to tackle **modularizing** our code, _i.e._ extracting the process definitions into modules.
-
-We're going to be working with a clean set of project files inside the project directory called `hello-modules` (for Modules).
-
-### 0.1. Explore the `hello-modules` directory
-
-Let's move into the project directory.
-
-```bash
-cd hello-modules
-```
-
-!!! warning
-
- If you're continuing on directly from Part 5, you'll need to move up one directory first.
- ```
- cd ../hello-modules
- ```
-
-The `hello-modules` directory has the same content and structure that you're expected to end up with in `hello-config` on completion of Part 5.
-
-```console title="Directory contents"
-hello-modules/
-├── demo-params.json
-├── main.nf
-└── nextflow.config
-```
-
-For a detailed description of these files, see the warmup section in Part 5.
-
-### 0.2. Create a symbolic link to the data
-
-Just like last time, we need to set up a symlink to the data.
-To do so, run this command from inside the `hello-modules` directory:
-
-```bash
-ln -s ../data data
-```
-
-This creates a symbolic link called `data` pointing to the data directory one level up.
-
-### 0.3 Run the workflow using the appropriate profiles
-
-Now that everything is in place, we should be able to run the workflow using the profiles we set up in Part 5.
-
-```bash
-nextflow run main.nf -profile my_laptop,demo
-```
-
-And so it does.
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `main.nf` [special_brenner] DSL2 - revision: 5a07b4894b
-
-executor > local (7)
-[26/60774a] SAMTOOLS_INDEX (1) | 3 of 3 ✔
-[5a/eb40c4] GATK_HAPLOTYPECALLER (2) | 3 of 3 ✔
-[8f/94ac86] GATK_JOINTGENOTYPING | 1 of 1 ✔
-```
-
-Like previously, there will now be a `work` directory and a `results_genomics` directory inside your project directory.
-
-### Takeaway
-
-You're ready to start modularizing your workflow.
-
-### What's next?
-
-Learn how to create your first module following conventions inspired by the nf-core project.
-
----
-
-## 1. Create a module for the `SAMTOOLS_INDEX` process
-
-From a technical standpoint, you can create a module simply by copying the process definition into its own file, and you can name that file anything you want.
-However, the Nextflow community has adopted certain conventions for code organization, influenced in large part by the [nf-core](https://nf-co.re) project (which we'll cover later in this training series).
-
-The convention for Nextflow modules is that the process definition should be written to a standalone file named `main.nf`, stored in a directory structure with three to four levels:
-
-```console title="Directory structure"
-modules
-└── local
- └── ()
- └──
- └── main.nf
-```
-
-By convention, all modules are stored in a directory named `modules`.
-Additionally, the convention distinguishes _local_ modules (which are part of your project) from _remote_ modules contained in remote repositories.
-
-The next levels down are named after the toolkit (if there is one) then the tool itself.
-If the process defined in the module invokes more than one tool, as the GATK_JOINTGENOTYPING does in our example workflow, the name of the module can be the name of the method, or something to that effect.
-
-For example, the module we create for the `SAMTOOLS_INDEX` process will live under `modules/local/samtools/index/`.
-
-```console title="Directory structure"
-modules
-└── local
- └── samtools
- └── index
- └── main.nf
-```
-
-!!!note
-
- We will cover remote modules later in this training, when we introduce the [nf-core library of modules](https://nf-co.re/modules/).
-
-So let's get started.
-
-### 2.1. Create a directory to house the local module code for the `SAMTOOLS_INDEX` process
-
-Run this command to create the appropriate directory structure:
-
-```bash
-mkdir -p modules/local/samtools/index
-```
-
-The `-p` flag takes care of creating parent directories as needed.
-
-### 2.2. Create a file stub for the `SAMTOOLS_INDEX` process module
-
-Now let's create an empty `main.nf` file for the module.
-
-```bash
-touch modules/local/samtools/index/main.nf
-```
-
-This gives us a place to put the process code.
-
-### 2.3. Move the `SAMTOOLS_INDEX` process code to the module file
-
-Copy the whole process definition over from the workflow's `main.nf` file to the module's `main.nf` file, making sure to copy over the `#!/usr/bin/env nextflow` shebang too.
-
-```groovy title="hello-modules/modules/local/samtools/index/main.nf" linenums="1"
-#!/usr/bin/env nextflow
-
-/*
- * Generate BAM index file
- */
-process SAMTOOLS_INDEX {
-
- container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'
- conda "bioconda::samtools=1.20"
-
- publishDir params.outdir, mode: 'symlink'
-
- input:
- path input_bam
-
- output:
- tuple path(input_bam), path("${input_bam}.bai")
-
- script:
- """
- samtools index '$input_bam'
- """
-}
-```
-
-Once that is done, delete the process definition from the workflow's `main.nf` file, but make sure to leave the shebang in place.
-
-### 2.4. Add an import declaration before the workflow block
-
-The syntax for importing a local module is fairly straightforward:
-
-```groovy title="Import declaration syntax"
-include { } from './modules/local/>//main.nf'
-```
-
-Let's insert that above the workflow block and fill it out appropriately.
-
-_Before:_
-
-```groovy title="hello-modules/main.nf" linenums="73"
-workflow {
-```
-
-_After:_
-
-```groovy title="hello-modules/main.nf" linenums="73"
-// Include modules
-include { SAMTOOLS_INDEX } from './modules/local/samtools/index/main.nf'
-
-workflow {
-```
-
-### 2.5. Run the workflow to verify that it does the same thing as before
-
-We're running the workflow with essentially the same code and inputs as before, so let's add the `-resume` flag and see what happens.
-
-```bash
-nextflow run main.nf -profile my_laptop,demo -resume
-```
-
-Sure enough, Nextflow recognizes that it's still all the same work to be done, even if the code is split up into multiple files.
-
-```console title="Output"
- N E X T F L O W ~ version 24.10.0
-
- ┃ Launching `main.nf` [agitated_cuvier] DSL2 - revision: 0ce0cd0c04
-
-[c3/0d53a4] SAMTOOLS_INDEX (3) | 3 of 3, cached: 3 ✔
-[c6/8c6c30] GATK_HAPLOTYPECALLER (1) | 3 of 3, cached: 3 ✔
-[38/82b2e2] GATK_JOINTGENOTYPING | 1 of 1, cached: 1 ✔
-```
-
-So modularizing the code in the course of development does not break resumability!
-
-### Takeaway
-
-You know how to extract a process into a local module.
-
-### What's next?
-
-Practice making more modules.
-
----
-
-## 3. Repeat procedure for the remaining processes
-
-Once you've done one, you can do a million modules...
-But let's just do two more for now.
-
-### 3.1. Create directories to house the code for the two GATK modules
-
-Since GATK_HAPLOTYPECALLER and GATK_JOINTGENOTYPING both run GATK tools, we'll house them both under a shared `gatk` directory.
-
-```bash
-mkdir -p modules/local/gatk/haplotypecaller
-mkdir -p modules/local/gatk/jointgenotyping
-```
-
-You can imagine how it'll be useful to have that optional directory for grouping modules at the toolkit level.
-
-### 3.2. Create file stubs for the process modules
-
-Now let's make the file stubs to put the code into.
-
-```bash
-touch modules/local/gatk/haplotypecaller/main.nf
-touch modules/local/gatk/jointgenotyping/main.nf
-```
-
-### 3.3. Move the process code to the module files
-
-And finally, move the code for each process to the corresponding `main.nf` file, making sure to copy the shebang line too each time.
-
-### 3.3.1. GATK_HAPLOTYPECALLER module
-
-```groovy title="hello-modules/modules/local/gatk/haplotypecaller/main.nf" linenums="1"
-#!/usr/bin/env nextflow
-
-/*
- * Call variants with GATK HaplotypeCaller
- */
-process GATK_HAPLOTYPECALLER {
-
- container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
- conda "bioconda::gatk4=4.5.0.0"
-
- publishDir params.outdir, mode: 'symlink'
-
- input:
- tuple path(input_bam), path(input_bam_index)
- path ref_fasta
- path ref_index
- path ref_dict
- path interval_list
-
- output:
- path "${input_bam}.g.vcf" , emit: vcf
- path "${input_bam}.g.vcf.idx" , emit: idx
-
- script:
- """
- gatk HaplotypeCaller \
- -R ${ref_fasta} \
- -I ${input_bam} \
- -O ${input_bam}.g.vcf \
- -L ${interval_list} \
- -ERC GVCF
- """
-}
-```
-
-### 3.3.2. GATK_JOINTGENOTYPING module
-
-```groovy title="hello-modules/modules/local/gatk/jointgenotyping/main.nf" linenums="1"
-#!/usr/bin/env nextflow
-
-/*
- * Combine GVCFs into GenomicsDB datastore and run joint genotyping to produce cohort-level calls
- */
-process GATK_JOINTGENOTYPING {
-
- container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
- conda "bioconda::gatk4=4.5.0.0"
-
- publishDir params.outdir, mode: 'symlink'
-
- input:
- path all_gvcfs
- path all_idxs
- path interval_list
- val cohort_name
- path ref_fasta
- path ref_index
- path ref_dict
-
- output:
- path "${cohort_name}.joint.vcf" , emit: vcf
- path "${cohort_name}.joint.vcf.idx" , emit: idx
-
- script:
- def gvcfs_line = all_gvcfs.collect { gvcf -> "-V ${gvcf}" }.join(' ')
- """
- gatk GenomicsDBImport \
- ${gvcfs_line} \
- -L ${interval_list} \
- --genomicsdb-workspace-path ${cohort_name}_gdb
-
- gatk GenotypeGVCFs \
- -R ${ref_fasta} \
- -V gendb://${cohort_name}_gdb \
- -L ${interval_list} \
- -O ${cohort_name}.joint.vcf
- """
-}
-```
-
-### 3.4. Add import declarations to the workflow `main.nf` file
-
-Now all that remains is to add the import statements:
-
-_Before:_
-
-```groovy title="hello-modules/main.nf" linenums="3"
-// Include modules
-include { SAMTOOLS_INDEX } from './modules/local/samtools/index/main.nf'
-
-workflow {
-```
-
-_After:_
-
-```groovy title="hello-modules/main.nf" linenums="3"
-// Include modules
-include { SAMTOOLS_INDEX } from './modules/local/samtools/index/main.nf'
-include { GATK_HAPLOTYPECALLER } from './modules/local/gatk/haplotypecaller/main.nf'
-include { GATK_JOINTGENOTYPING } from './modules/local/gatk/jointgenotyping/main.nf'
-
-workflow {
-```
-
-### 3.5. Run the workflow to verify that everything still works as expected
-
-Look at that short `main.nf` file! Let's run it once last time.
-
-```bash
-nextflow run main.nf -profile my_laptop,demo -resume
-```
-
-Yep, everything still works, including the resumability of the pipeline.
-
-```console title="Output"
-N E X T F L O W ~ version 24.02.0-edge
-
-┃ Launching `main.nf` [tiny_blackwell] DSL2 - revision: 0ce0cd0c04
-
-[62/21cdc5] SAMTOOLS_INDEX (1) | 3 of 3, cached: 3 ✔
-[c6/8c6c30] GATK_HAPLOTYPECALLER (2) | 3 of 3, cached: 3 ✔
-[38/82b2e2] GATK_JOINTGENOTYPING | 1 of 1, cached: 1 ✔
-```
-
-Congratulations, you've done all this work and absolutely nothing has changed to how the pipeline works!
-
-Jokes aside, now your code is more modular, and if you decide to write another pipeline that calls on one of those processes, you just need to type one short import statement to use the relevant module.
-This is better than just copy-pasting the code, because if later you decide to improve the module, all your pipelines will inherit the improvements.
-
-### Takeaway
-
-You know how to modularize multiple processes in a workflow.
-
-### What's next?
-
-Learn to add tests to your pipeline using the nf-test framework.
diff --git a/docs/hello_nextflow/index.md b/docs/hello_nextflow/index.md
index cfcb9c8bc..2be88cd58 100644
--- a/docs/hello_nextflow/index.md
+++ b/docs/hello_nextflow/index.md
@@ -12,7 +12,7 @@ The rise of big data has made it increasingly necessary to be able to analyze an
During this training, you will be introduced to Nextflow in a series of complementary hands-on workshops.
-Let's get started!
+Let's get started! Click on the "Open in Gitpod" button below.
[![Open in Gitpod](https://img.shields.io/badge/Gitpod-%20Open%20in%20Gitpod-908a85?logo=gitpod)](https://gitpod.io/#https://github.com/nextflow-io/training)
@@ -34,5 +34,5 @@ This is a workshop for those who are completely new to Nextflow. Some basic fami
**Prerequisites**
-- A GitHub account
-- Experience with command line
+- A GitHub account and Gitpod login OR a local installation as described [here](envsetup/02_local).
+- Experience with command line and basic scripting
diff --git a/docs/index.md b/docs/index.md
index d1a01d491..9b8f25914 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -13,7 +13,7 @@ Welcome to the Nextflow community training portal!
We have several distinct training courses available on this website. Scroll down to find the one that's right for you!
-The training courses listed below are designed to be useable as a self-service resource; you can work through them on your own at any time (see Environment Setup for practical details). However, you may get even more out of them by joining a group training event.
+The training courses listed below are designed to be usable as a self-service resource; you can work through them on your own at any time (see Environment Setup for practical details). However, you may get even more out of them by joining a group training event.
- Free online events are run regularly by the nf-core community, see the [nf-core events page](https://nf-co.re/events) for more.
- Seqera (the company that develops Nextflow) runs a variety of training events, see the [Seqera Events](https://seqera.io/events/) page and look for 'Seqera Sessions' and 'Nextflow Summit'.
@@ -29,87 +29,117 @@ When you're ready to get down to work, click on the 'Open in Gitpod' button, eit
!!! quote inline end ""
- :material-lightbulb: Essential for setting up your environment for the first time.
+ :material-lightbulb: Set up your environment for the first time.
- Instructions for setting up your environment to work through training materials (all courses). Provides an orientation to Gitpod as well as alternate installation instructions for working on your own local machine.
+ Instructions for setting up your environment to work through training materials (all courses). Provides an orientation to the training platform as well as alternate installation instructions for working on your own local machine.
[Launch the Environment Setup training :material-arrow-right:](envsetup/index.md){ .md-button .md-button--primary }
## Nextflow for Newcomers
+These are foundational, domain-agnostic courses intended for those who are completely new to Nextflow. Each course consists of a series of training modules that are designed to help learners build up their skills progressively.
+
!!! exercise "Hello Nextflow"
!!! quote inline end ""
- :material-run-fast: A modular training series for getting started with Nextflow.
+ :material-run-fast: Learn to develop pipelines in Nextflow.
- This is a foundational course for those who are completely new to Nextflow. It consists of a series of training modules that are designed to help learners build up their skills progressively. The series covers the core components of the Nextflow language as well as essential pipeline design and development practices, and effective use of third-party resources.
+ This is a course for newcomers who wish to learn how to develop their own pipelines. The course covers the core components of the Nextflow language in enough detail to enable developing simple but fully functional pipelines. It also covers key elements of pipeline design, development and configuration practices.
- [Launch the Hello Nextflow training :material-arrow-right:](hello_nextflow/index.md){ .md-button }
+ [Launch the Hello Nextflow training :material-arrow-right:](hello_nextflow/index.md){ .md-button .md-button--primary }
-## In-depth Nextflow Training
+**Coming soon:** "Nextflow Run" — Learn to run Nextflow pipelines (run only, no code development)
-!!! exercise "Fundamentals Training"
+
-!!! exercise "Advanced Training"
+## Nextflow for Science
+
+These are courses that demonstrate how to apply the concepts and components presented in 'Hello Nextflow' (see above) to specific scientific use cases. Each course consists of a series of training modules that are designed to help learners build up their skills progressively.
+
+!!! exercise "Nextflow for Genomics"
!!! quote inline end ""
- :material-lightbulb: Advanced training material for mastering Nextflow.
+ :material-run-fast: Learn to develop a pipeline for genomics in Nextflow.
- Advanced material exploring the more advanced features of the Nextflow language and runtime, and how to use them to write efficient and scalable data-intensive workflows.
+ This is a course for researchers who wish to learn how to develop their own genomics pipelines. The course uses a variant calling use case to demonstrate how to develop a simple but functional genomics pipeline.
- [Launch the Advanced Training :material-arrow-right:](advanced/index.md){ .md-button .md-button--primary }
+ [Launch the Nextflow for Genomics training :material-arrow-right:](nf4_science/genomics/){ .md-button .md-button--primary }
-## Other/Experimental
+**Coming soon:** "Nextflow for RNAseq" — Learn to develop a pipeline for bulk RNAseq analysis in Nextflow
-!!! exercise "Configure the execution of an nf-core pipeline"
+
+
+## In-depth Nextflow Training
- [Launch the nf-core configuration training :material-arrow-right:](nf_customize/index.md){ .md-button }
+These are courses that demonstrate how to use Nextflow features in more detail or at a more advanced level. Each course consists of one or more training modules that are designed to help learners hone their skills on the corresponding topics.
-!!! exercise "Develop a pipeline with the nf-core template"
+
-!!! exercise "Troubleshooting exercises"
+!!! exercise "Fundamentals Training"
+
+ !!! tip inline end ""
+
+ :material-lightbulb: Comprehensive training material for exploring the full scope of Nextflow's capabilities.
+
+ The fundamentals training material covers all things Nextflow. Intended as a reference material for anyone looking to build complex workflows with Nextflow.
+
+ [Launch the Fundamentals Training :material-arrow-right:](basic_training/index.md){ .md-button .md-button--primary }
+
+!!! exercise "Advanced Training"
!!! quote inline end ""
- :material-run-fast: This course will help you troubleshooting common pipeline errors.
+ :material-lightbulb: Advanced training material for mastering Nextflow.
+
+ Advanced material exploring the more advanced features of the Nextflow language and runtime, and how to use them to write efficient and scalable data-intensive workflows.
- A "learn by doing" troubleshooting tutorial for pipeline developers and users.
+ [Launch the Advanced Training :material-arrow-right:](advanced/index.md){ .md-button .md-button--primary }
- [Launch the Troubleshooting training :material-arrow-right:](troubleshoot/index.md){ .md-button }
+## Other/Experimental
-## Deprecated
+These are training courses that are not being actively taught/maintained and that we may repurpose elsewhere or delete in the near future.
+The corresponding materials are not available within the training environment.
+You can still find the materials in the GitHub repository and download them for local use.
-!!! exercise "Simple RNA-seq variant calling"
+- **nf-customize** — Configuring nf-core pipelines ([docs](other/nf_customize) / [code](../other/nf-customize))
- !!! quote inline end ""
+- **nf-develop** — Developing a pipeline with the nf-core template ([docs](other/nf_develop) / [code](../other/nf-develop))
- :material-run-fast: A short hands-on tutorial focused on a concrete analysis pipeline example.
+- **troubleshoot** — Troubleshooting exercises ([docs](other/troubleshoot) / [code](../other/troubleshoot))
- This course was developed as a "learn by doing" tutorial intended as a fast, hands-on way to get to grips with Nextflow using a very concrete analysis pipeline example. You can still find the materials in the GitHub repository, but it is no longer being maintained and can no longer be launched in Gitpod or in the training portal.
+- **hands-on (rnaseq)** — Developing a pipeline for bulk RNAseq (deprecated) ([docs](other/hands_on) / [code](../other/hands-on))
## Resources
diff --git a/docs/nextflow_run/01_orientation.md b/docs/nextflow_run/01_orientation.md
new file mode 100644
index 000000000..4353bdd7f
--- /dev/null
+++ b/docs/nextflow_run/01_orientation.md
@@ -0,0 +1,46 @@
+# Orientation
+
+The Gitpod environment contains all the software, code and data necessary to work through this training course, so you don't need to install anything yourself.
+However, you do need a (free) account to log in, and you should take a few minutes to familiarize yourself with the interface.
+
+If you have not yet done so, please follow [this link](../../envsetup/) before going any further.
+
+## Materials provided
+
+Throughout this training course, we'll be working in the `run-nextflow/` directory.
+This directory contains all the code files, test data and accessory files you will need.
+
+Feel free to explore the contents of this directory; the easiest way to do so is to use the file explorer on the left-hand side of the Gitpod workspace.
+Alternatively, you can use the `tree` command.
+Throughout the course, we use the output of `tree` to represent directory structure and contents in a readable form, sometimes with minor modifications for clarity.
+
+Here we generate a table of contents to the second level down:
+
+```bash
+tree . -L 2
+```
+
+If you run this inside `run-nextflow`, you should see the following output: [TODO]
+
+```console title="Directory contents"
+.
+```
+
+!!!note
+
+ Don't worry if this seems like a lot; we'll go through the relevant pieces at each step of the course.
+ This is just meant to give you an overview.
+
+**Here's a summary of what you should know to get started:**
+
+[TODO]
+
+!!!tip
+
+ If for whatever reason you move out of this directory, you can always run this command to return to it:
+
+ ```bash
+ cd /workspace/gitpod/run-nextflow
+ ```
+
+Now, to begin the course, click on the arrow in the bottom right corner of this page.
diff --git a/docs/nextflow_run/02_run_basics.md b/docs/nextflow_run/02_run_basics.md
new file mode 100644
index 000000000..14f0b7e56
--- /dev/null
+++ b/docs/nextflow_run/02_run_basics.md
@@ -0,0 +1,10 @@
+# Part 1: Run Basics
+
+[TODO]
+
+Should cover:
+
+- basic project structure (main.nf, modules, nextflow.config)
+- run from CLI (re-use from Hello-World)
+- basic config elements (refer to hello-config) including profiles
+- running resource profiling and adapting the config
diff --git a/docs/hello_nextflow/09_hello_nf-core.md b/docs/nextflow_run/03_run_nf-core.md
similarity index 99%
rename from docs/hello_nextflow/09_hello_nf-core.md
rename to docs/nextflow_run/03_run_nf-core.md
index c0e3ade52..b017884df 100644
--- a/docs/hello_nextflow/09_hello_nf-core.md
+++ b/docs/nextflow_run/03_run_nf-core.md
@@ -1,4 +1,4 @@
-# Part 8: Hello nf-core
+# Part 3: Run nf-core
nf-core is a community effort to develop and maintain a curated set of analysis pipelines built using Nextflow.
@@ -1194,7 +1194,7 @@ This section should feel familiar to the `hello_modules` section.
If you have a module that you would like to contribute back to the community, reach out on the nf-core slack or open a pull request to the modules repository.
-Start by using the nf-core tooling to create a sceleton local module:
+Start by using the nf-core tooling to create a skeleton local module:
```console
nf-core modules create
diff --git a/docs/hello_nextflow/10_hello_seqera.md b/docs/nextflow_run/04_run_seqera.md
similarity index 100%
rename from docs/hello_nextflow/10_hello_seqera.md
rename to docs/nextflow_run/04_run_seqera.md
diff --git a/docs/nextflow_run/img/cpu-after.png b/docs/nextflow_run/img/cpu-after.png
new file mode 100644
index 000000000..f173ef020
Binary files /dev/null and b/docs/nextflow_run/img/cpu-after.png differ
diff --git a/docs/nextflow_run/img/cpu-before.png b/docs/nextflow_run/img/cpu-before.png
new file mode 100644
index 000000000..e0b8c76c8
Binary files /dev/null and b/docs/nextflow_run/img/cpu-before.png differ
diff --git a/docs/nextflow_run/img/memory-after.png b/docs/nextflow_run/img/memory-after.png
new file mode 100644
index 000000000..d61b4a7c5
Binary files /dev/null and b/docs/nextflow_run/img/memory-after.png differ
diff --git a/docs/nextflow_run/img/memory-before.png b/docs/nextflow_run/img/memory-before.png
new file mode 100644
index 000000000..ce0f7ac27
Binary files /dev/null and b/docs/nextflow_run/img/memory-before.png differ
diff --git a/docs/hello_nextflow/img/nested.excalidraw.svg b/docs/nextflow_run/img/nested.excalidraw.svg
similarity index 100%
rename from docs/hello_nextflow/img/nested.excalidraw.svg
rename to docs/nextflow_run/img/nested.excalidraw.svg
diff --git a/docs/hello_nextflow/img/nf-core-modules.png b/docs/nextflow_run/img/nf-core-modules.png
similarity index 100%
rename from docs/hello_nextflow/img/nf-core-modules.png
rename to docs/nextflow_run/img/nf-core-modules.png
diff --git a/docs/hello_nextflow/img/pipeline.excalidraw.svg b/docs/nextflow_run/img/pipeline.excalidraw.svg
similarity index 100%
rename from docs/hello_nextflow/img/pipeline.excalidraw.svg
rename to docs/nextflow_run/img/pipeline.excalidraw.svg
diff --git a/docs/hello_nextflow/img/pipeline_schema.png b/docs/nextflow_run/img/pipeline_schema.png
similarity index 100%
rename from docs/hello_nextflow/img/pipeline_schema.png
rename to docs/nextflow_run/img/pipeline_schema.png
diff --git a/docs/nextflow_run/img/report_cover.png b/docs/nextflow_run/img/report_cover.png
new file mode 100644
index 000000000..9feb1792c
Binary files /dev/null and b/docs/nextflow_run/img/report_cover.png differ
diff --git a/docs/hello_nextflow/img/seqera-containers-1.png b/docs/nextflow_run/img/seqera-containers-1.png
similarity index 100%
rename from docs/hello_nextflow/img/seqera-containers-1.png
rename to docs/nextflow_run/img/seqera-containers-1.png
diff --git a/docs/hello_nextflow/img/seqera-containers-2.png b/docs/nextflow_run/img/seqera-containers-2.png
similarity index 100%
rename from docs/hello_nextflow/img/seqera-containers-2.png
rename to docs/nextflow_run/img/seqera-containers-2.png
diff --git a/docs/nextflow_run/index.md b/docs/nextflow_run/index.md
new file mode 100644
index 000000000..66eaedaa1
--- /dev/null
+++ b/docs/nextflow_run/index.md
@@ -0,0 +1,39 @@
+---
+title: Run Nextflow
+hide:
+ - toc
+---
+
+# Run Nextflow
+
+Hello! You are now on the path to running reproducible and scalable scientific workflows using Nextflow.
+
+[TODO] NEED TO DISTINGUISH CLEARLY FROM HELLO NEXTFLOW
+
+The rise of big data has made it increasingly necessary to be able to analyze and perform experiments on large datasets in a portable and reproducible manner. Parallelization and distributed computing are the best ways to tackle this challenge, but the tools commonly available to computational scientists often lack good support for these techniques, or they provide a model that fits poorly with the needs of computational scientists. Nextflow was particularly created to address these challenges.
+
+During this training, you will be introduced to Nextflow in a series of complementary hands-on workshops.
+
+Let's get started!
+
+[![Open in Gitpod](https://img.shields.io/badge/Gitpod-%20Open%20in%20Gitpod-908a85?logo=gitpod)](https://gitpod.io/#https://github.com/nextflow-io/training)
+
+## Learning objectives
+
+In this workshop, you will learn foundational concepts for building pipelines.
+
+By the end of this workshop you will be able to:
+
+- Launch a Nextflow workflow locally
+- Find and interpret outputs (results) and log files generated by Nextflow
+- Troubleshoot basic issues
+- [TODO]
+
+## Audience & prerequisites
+
+This is a workshop for those who are completely new to Nextflow. Some basic familiarity with the command line, and common file formats is assumed.
+
+**Prerequisites**
+
+- A GitHub account
+- Experience with command line
diff --git a/docs/hello_nextflow/seqera/01_run_with_cli.md b/docs/nextflow_run/seqera/01_run_with_cli.md
similarity index 100%
rename from docs/hello_nextflow/seqera/01_run_with_cli.md
rename to docs/nextflow_run/seqera/01_run_with_cli.md
diff --git a/docs/hello_nextflow/seqera/02_run_with_launchpad.md b/docs/nextflow_run/seqera/02_run_with_launchpad.md
similarity index 100%
rename from docs/hello_nextflow/seqera/02_run_with_launchpad.md
rename to docs/nextflow_run/seqera/02_run_with_launchpad.md
diff --git a/docs/hello_nextflow/seqera/img/compute_env_platforms.png b/docs/nextflow_run/seqera/img/compute_env_platforms.png
similarity index 100%
rename from docs/hello_nextflow/seqera/img/compute_env_platforms.png
rename to docs/nextflow_run/seqera/img/compute_env_platforms.png
diff --git a/docs/hello_nextflow/seqera/img/launchpad.gif b/docs/nextflow_run/seqera/img/launchpad.gif
similarity index 100%
rename from docs/hello_nextflow/seqera/img/launchpad.gif
rename to docs/nextflow_run/seqera/img/launchpad.gif
diff --git a/docs/hello_nextflow/seqera/img/resolved_configuration.png b/docs/nextflow_run/seqera/img/resolved_configuration.png
similarity index 100%
rename from docs/hello_nextflow/seqera/img/resolved_configuration.png
rename to docs/nextflow_run/seqera/img/resolved_configuration.png
diff --git a/docs/hello_nextflow/seqera/img/run_with_tower.png b/docs/nextflow_run/seqera/img/run_with_tower.png
similarity index 100%
rename from docs/hello_nextflow/seqera/img/run_with_tower.png
rename to docs/nextflow_run/seqera/img/run_with_tower.png
diff --git a/docs/hello_nextflow/seqera/img/running_pipeline.png b/docs/nextflow_run/seqera/img/running_pipeline.png
similarity index 100%
rename from docs/hello_nextflow/seqera/img/running_pipeline.png
rename to docs/nextflow_run/seqera/img/running_pipeline.png
diff --git a/docs/hello_nextflow/seqera/img/task_details.png b/docs/nextflow_run/seqera/img/task_details.png
similarity index 100%
rename from docs/hello_nextflow/seqera/img/task_details.png
rename to docs/nextflow_run/seqera/img/task_details.png
diff --git a/docs/hello_nextflow/seqera/img/usage_create_token.png b/docs/nextflow_run/seqera/img/usage_create_token.png
similarity index 100%
rename from docs/hello_nextflow/seqera/img/usage_create_token.png
rename to docs/nextflow_run/seqera/img/usage_create_token.png
diff --git a/docs/hello_nextflow/seqera/img/usage_name_token.png b/docs/nextflow_run/seqera/img/usage_name_token.png
similarity index 100%
rename from docs/hello_nextflow/seqera/img/usage_name_token.png
rename to docs/nextflow_run/seqera/img/usage_name_token.png
diff --git a/docs/hello_nextflow/seqera/img/usage_token.png b/docs/nextflow_run/seqera/img/usage_token.png
similarity index 100%
rename from docs/hello_nextflow/seqera/img/usage_token.png
rename to docs/nextflow_run/seqera/img/usage_token.png
diff --git a/docs/nf4_science/genomics/00_orientation.md b/docs/nf4_science/genomics/00_orientation.md
new file mode 100644
index 000000000..990854331
--- /dev/null
+++ b/docs/nf4_science/genomics/00_orientation.md
@@ -0,0 +1,68 @@
+# Orientation
+
+The training environment contains all the software, code and data necessary to work through this training course, so you don't need to install anything yourself.
+However, you do need a (free) account to log in, and you should take a few minutes to familiarize yourself with the interface.
+
+If you have not yet done so, please follow [this link](../../envsetup/) before going any further.
+
+## Materials provided
+
+Throughout this training course, we'll be working in the `nf4-science/genomics/` directory, which you need to move into when you open the training workspace.
+This directory contains all the code files, test data and accessory files you will need.
+
+Feel free to explore the contents of this directory; the easiest way to do so is to use the file explorer on the left-hand side of the training workspace in the VSCode interface.
+Alternatively, you can use the `tree` command.
+Throughout the course, we use the output of `tree` to represent directory structure and contents in a readable form, sometimes with minor modifications for clarity.
+
+Here we generate a table of contents to the second level down:
+
+```bash
+tree . -L 2
+```
+
+If you run this inside `nf4-science/genomics`, you should see the following output:
+
+```console title="Directory contents"
+
+.
+├── data
+│ ├── bam
+│ ├── ref
+│ ├── sample_bams.txt
+│ └── samplesheet.csv
+├── genomics-1.nf
+├── genomics-2.nf
+└── solutions
+
+4 directories, 2 files
+
+```
+
+!!!note
+
+ Don't worry if this seems like a lot; we'll go through the relevant pieces at each step of the course.
+ This is just meant to give you an overview.
+
+**Here's a summary of what you should know to get started:**
+
+- **The `.nf` files** are workflow scripts that are named based on what part of the course they're used in.
+
+- **The file `nextflow.config`** is a configuration file that sets minimal environment properties.
+ You can ignore it for now.
+
+- \*\*The `data` directory contains input data and related resources, described later in the course.
+
+- **The `solutions` directory** contains the completed workflow scripts that result from each step of the course.
+ They are intended to be used as a reference to check your work and troubleshoot any issues.
+ The name and number in the filename correspond to the step of the relevant part of the course.
+ For example, the file `TODO.nf` is the expected result of completing steps X through Y of Part 1: TODO.
+
+!!!tip
+
+ If for whatever reason you move out of this directory, you can always run this command to return to it:
+
+ ```bash
+ cd /workspace/gitpod/nf4-science/genomics
+ ```
+
+Now, to begin the course, click on the arrow in the bottom right corner of this page.
diff --git a/docs/hello_nextflow/04_hello_genomics.md b/docs/nf4_science/genomics/01_per_sample_variant_calling.md
similarity index 80%
rename from docs/hello_nextflow/04_hello_genomics.md
rename to docs/nf4_science/genomics/01_per_sample_variant_calling.md
index b1b3e5227..1a26b96a4 100644
--- a/docs/hello_nextflow/04_hello_genomics.md
+++ b/docs/nf4_science/genomics/01_per_sample_variant_calling.md
@@ -1,15 +1,6 @@
-# Part 3: Hello Genomics
+# Part 1: Per-sample variant calling
-In Part 1, you learned how to use the basic building blocks of Nextflow to assemble a simple pipeline capable of processing some text and parallelizing execution if there were multiple inputs.
-Then in Part 2, you learned how to use containers to pull in command line tools to test them and integrate them into your pipelines without having to deal with software dependency issues.
-
-Now, we show you how to use the same components and principles to build a pipeline that does something a bit more interesting, and hopefully a bit more relatable to your work.
-Specifically, we show you how to implement a simple variant calling pipeline with [GATK](https://gatk.broadinstitute.org/) (Genome Analysis Toolkit), a widely used software package for analyzing high-throughput sequencing data.
-
-!!! note
-
- Don't worry if you're not familiar with GATK or genomics in general.
- We'll summarize the necessary concepts as we go, and the workflow implementation principles we demonstrate here apply broadly to any command line tool that takes in some input files and produce some output files.
+In the first part of this course, we show you how to build a simple variant calling pipeline that applies GATK variant calling to individual sequencing samples.
### Method overview
@@ -18,43 +9,44 @@ Here we are going to use tools and methods designed for calling short variants,
![GATK pipeline](img/gatk-pipeline.png)
-A full variant calling pipeline typically involves a lot of steps, including mapping to the reference and variant filtering and prioritization.
-For simplicity, we are going to focus on the core variant calling step, which takes as its main input a file of short-read sequencing data in [BAM](https://samtools.github.io/hts-specs/SAMv1.pdf) format (Binary Alignment Map, a compressed version of SAM, Sequence Alignment Map), as well as a reference genome and a list of genomic intervals to analyze.
+A full variant calling pipeline typically involves a lot of steps, including mapping to the reference (sometime referred to as genome alignment) and variant filtering and prioritization.
+For simplicity, in this part of the course we are going to focus on just the variant calling part.
-For this exercise, we provide you with three samples in BAM format (see Dataset below).
-However, GATK requires an index file for each BAM file, which we did not provide (on purpose), so the workflow will have to create one as a preliminary step.
+### Dataset
-!!! note
+We provide the following data and related resources:
- Index files are a common feature of bioinformatics file formats; they contain information about the structure of the main file that allows tools like GATK to access a subset of the data without having to read through the whole file.
- This is important because of how big these files can get.
+- **A reference genome** consisting of a small region of the human chromosome 20 (from hg19/b37) and its accessory files (index and sequence dictionary).
+- **Three whole genome sequencing samples** corresponding to a family trio (mother, father and son), which have been subset to a small slice of data on chromosome 20 to keep the file sizes small.
+ This is Illumina short-read sequencing data that have already been mapped to the reference genome, provided in [BAM](https://samtools.github.io/hts-specs/SAMv1.pdf) format (Binary Alignment Map, a compressed version of SAM, Sequence Alignment Map).
+- **A list of genomic intervals**, i.e. coordinates on the genome where our samples have data suitable for calling variants, provided in BED format.
+
+### Workflow
-So to recap, we're going to develop a workflow that does the following:
+In this part of the course, we're going to develop a workflow that does the following:
1. Generate an index file for each BAM input file using [Samtools](https://www.htslib.org/)
2. Run the GATK HaplotypeCaller on each BAM input file to generate per-sample variant calls in VCF (Variant Call Format)
---8<-- "docs/hello_nextflow/img/hello-gatk-1.svg"
+--8<-- "docs/nf4_science/genomics/img/hello-gatk-1.svg"
-### Dataset
+!!! note
-- **A reference genome** consisting of a small region of the human chromosome 20 (from hg19/b37) and its accessory files (index and sequence dictionary).
-- **Three whole genome sequencing samples** corresponding to a family trio (mother, father and son), which have been subset to a small slice of data on chromosome 20 to keep the file sizes small.
- The sequencing data is in (Binary Alignment Map) format, i.e. genome sequencing reads that have already been mapped to the reference genome.
-- **A list of genomic intervals**, i.e. coordinates on the genome where our samples have data suitable for calling variants, provided in BED format.
+ Index files are a common feature of bioinformatics file formats; they contain information about the structure of the main file that allows tools like GATK to access a subset of the data without having to read through the whole file.
+ This is important because of how big these files can get.
---
## 0. Warmup: Test the Samtools and GATK commands interactively
-Just like in the Hello World example, we want to try out the commands manually before we attempt to wrap them in a workflow.
-The tools we need (Samtools and GATK) are not installed in the Gitpod environment, but that's not a problem since you learned how to work with containers in Part 2 of this training series (Hello Containers).
+First we want to try out the commands manually before we attempt to wrap them in a workflow.
+The tools we need (Samtools and GATK) are not installed in the Gitpod environment, so we'll use them via containers (see [Hello Containers](../../hello_nextflow/05_hello_containers.md)).
!!! note
- Make sure you're in the `hello-nextflow` directory so that the last part of the path shown when you type `pwd` is `hello-nextflow`.
+ Make sure you're in the `nf4-science/genomics` directory so that the last part of the path shown when you type `pwd` is `genomics`.
### 0.1. Index a BAM input file with Samtools
@@ -165,14 +157,14 @@ Learn how to wrap those same commands into a two-step workflow that uses contain
## 1. Write a single-stage workflow that runs Samtools index on a BAM file
-We provide you with a workflow file, `hello-genomics.nf`, that outlines the main parts of the workflow.
+We provide you with a workflow file, `genomics-1.nf`, that outlines the main parts of the workflow.
It's not functional; its purpose is just to serve as a skeleton that you'll use to write the actual workflow.
### 1.1. Define the indexing process
Let's start by writing a process, which we'll call `SAMTOOLS_INDEX`, describing the indexing operation.
-```groovy title="hello-genomics.nf" linenums="9"
+```groovy title="genomics-1.nf" linenums="9"
/*
* Generate BAM index file
*/
@@ -208,7 +200,7 @@ This process is going to require us to pass in a file path via the `input_bam` i
At the top of the file, under the `Pipeline parameters` section, we declare a CLI parameter called `reads_bam` and give it a default value.
That way, we can be lazy and not specify the input when we type the command to launch the pipeline (for development purposes). We're also going to set `params.outdir` with a default value for the output directory.
-```groovy title="hello-genomics.nf" linenums="3"
+```groovy title="genomics-1.nf" linenums="3"
/*
* Pipeline parameters
*/
@@ -220,9 +212,11 @@ params.outdir = "results_genomics"
Now we have a process ready, as well as a parameter to give it an input to run on, so let's wire those things up together.
+
+
!!! note
- `${projectDir}` is a built-in Nextflow variable that points to the directory where the current Nextflow workflow script (`hello-genomics.nf`) is located.
+ `${projectDir}` is a built-in Nextflow variable that points to the directory where the current Nextflow workflow script (`genomics-1.nf`) is located.
This makes it easy to reference files, data directories, and other resources included in the workflow repository without hardcoding absolute paths.
@@ -230,7 +224,7 @@ Now we have a process ready, as well as a parameter to give it an input to run o
In the `workflow` block, we need to set up a **channel** to feed the input to the `SAMTOOLS_INDEX` process; then we can call the process itself to run on the contents of that channel.
-```groovy title="hello-genomics.nf" linenums="30"
+```groovy title="genomics-1.nf" linenums="30"
workflow {
// Create input channel (single file via CLI parameter)
@@ -241,16 +235,16 @@ workflow {
}
```
-You'll notice we're using the same `.fromPath` channel factory as we used at the end of Part 1 (Hello World) of this training series.
+You'll notice we're using the same `.fromPath` channel factory as we used in [Hello Channels](../../hello_nextflow/02_hello_channels.md).
Indeed, we're doing something very similar.
-The difference is that this time we're telling Nextflow to load the file path itself into the channel as an input element, rather than reading in its contents.
+The difference is that we're telling Nextflow to just load the file path itself into the channel as an input element, rather than reading in its contents.
### 1.4. Run the workflow to verify that the indexing step works
Let's run the workflow! As a reminder, we don't need to specify an input in the command line because we set up a default value for the input when we declared the input parameter.
```bash
-nextflow run hello-genomics.nf
+nextflow run genomics-1.nf
```
The command should produce something like this:
@@ -258,7 +252,7 @@ The command should produce something like this:
```console title="Output"
N E X T F L O W ~ version 24.10.0
- ┃ Launching `hello-genomics.nf` [reverent_sinoussi] DSL2 - revision: 41d43ad7fe
+ ┃ Launching `genomics-1.nf` [reverent_sinoussi] DSL2 - revision: 41d43ad7fe
executor > local (1)
[2a/e69536] SAMTOOLS_INDEX (1) | 1 of 1 ✔
@@ -268,7 +262,7 @@ You can check that the index file has been generated correctly by looking in the
```console title="Directory contents"
work/2a/e695367b2f60df09cf826b07192dc3
-├── reads_mother.bam -> /workspace/gitpod/hello-nextflow/data/bam/reads_mother.bam
+├── reads_mother.bam -> /workspace/gitpod/nf4-science/genomics/data/bam/reads_mother.bam
└── reads_mother.bam.bai
```
@@ -281,7 +275,7 @@ There it is!
### Takeaway
-You know how to wrap a real bioinformatics tool in a single-step Nextflow workflow and have it run using a container.
+You know how to wrap a genomics tool in a single-step Nextflow workflow and have it run using a container.
### What's next?
@@ -297,7 +291,7 @@ Now that we have an index for our input file, we can move on to setting up the v
Let's write a process, which we'll call `GATK_HAPLOTYPECALLER`, describing the variant calling operation.
-```groovy title="hello-genomics.nf" linenums="30"
+```groovy title="genomics-1.nf" linenums="30"
/*
* Call variants with GATK HaplotypeCaller
*/
@@ -348,7 +342,7 @@ Similarly, we have to list the output VCF's index file (the `"${input_bam}.vcf.i
Since our new process expects a handful of additional files to be provided, we set up some CLI parameters for them under the `Pipeline parameters` section, along with some default values (same reasons as before).
-```groovy title="hello-genomics.nf" linenums="10"
+```groovy title="genomics-1.nf" linenums="10"
// Accessory files
params.reference = "${projectDir}/data/ref/ref.fasta"
params.reference_index = "${projectDir}/data/ref/ref.fasta.fai"
@@ -362,7 +356,7 @@ While main data inputs are streamed dynamically through channels, there are two
Add this to the workflow block (after the `reads_ch` creation):
-```groovy title="hello-genomics.nf" linenums="71"
+```groovy title="genomics-1.nf" linenums="71"
// Load the file paths for the accessory files (reference and intervals)
ref_file = file(params.reference)
ref_index_file = file(params.reference_index)
@@ -376,7 +370,7 @@ This will make the accessory file paths available for providing as input to any
Now that we've got our second process set up and all the inputs and accessory files are ready and available, we can add a call to the `GATK_HAPLOTYPECALLER` process in the workflow body.
-```groovy title="hello-genomics.nf" linenums="80"
+```groovy title="genomics-1.nf" linenums="80"
// Call variants from the indexed BAM file
GATK_HAPLOTYPECALLER(
reads_ch,
@@ -400,7 +394,7 @@ You should recognize the `*.out` syntax from Part 1 of this training series; we
Let's run the expanded workflow with `-resume` so that we don't have to run the indexing step again.
```bash
-nextflow run hello-genomics.nf -resume
+nextflow run genomics-1.nf -resume
```
Now if we look at the console output, we see the two processes listed:
@@ -408,7 +402,7 @@ Now if we look at the console output, we see the two processes listed:
```console title="Output"
N E X T F L O W ~ version 24.10.0
- ┃ Launching `hello-genomics.nf` [grave_volta] DSL2 - revision: 4790abc96a
+ ┃ Launching `genomics-1.nf` [grave_volta] DSL2 - revision: 4790abc96a
executor > local (1)
[2a/e69536] SAMTOOLS_INDEX (1) | 1 of 1, cached: 1 ✔
@@ -422,8 +416,8 @@ You'll find the output file `reads_mother.bam.vcf` in the results directory, as
```console title="Directory contents"
results_genomics/
├── reads_mother.bam.bai
-├── reads_mother.bam.vcf -> /workspace/gitpod/hello-nextflow/work/53/e18e987d56c47f59b7dd268649ec01/reads_mother.bam.vcf
-└── reads_mother.bam.vcf.idx -> /workspace/gitpod/hello-nextflow/work/53/e18e987d56c47f59b7dd268649ec01/reads_mother.bam.vcf.idx
+├── reads_mother.bam.vcf -> /workspace/gitpod/nf4-science/genomics/work/53/e18e987d56c47f59b7dd268649ec01/reads_mother.bam.vcf
+└── reads_mother.bam.vcf.idx -> /workspace/gitpod/nf4-science/genomics/work/53/e18e987d56c47f59b7dd268649ec01/reads_mother.bam.vcf.idx
```
If you open the VCF file, you should see the same contents as in the file you generated by running the GATK command directly in the container.
@@ -439,7 +433,7 @@ This is the output we care about generating for each sample in our study.
### Takeaway
-You know how to make a very basic two-step workflow that does real analysis work and is capable of dealing with bioinformatics idiosyncrasies like the accessory files.
+You know how to make a very basic two-step workflow that does real analysis work and is capable of dealing with genomics file format idiosyncrasies like the accessory files.
### What's next?
@@ -460,14 +454,14 @@ Let's turn that default file path in the input BAM file declaration into an arra
_Before:_
-```groovy title="hello-genomics.nf" linenums="7"
+```groovy title="genomics-1.nf" linenums="7"
// Primary input
params.reads_bam = "${projectDir}/data/bam/reads_mother.bam"
```
_After:_
-```groovy title="hello-genomics.nf" linenums="7"
+```groovy title="genomics-1.nf" linenums="7"
// Primary input (array of three samples)
params.reads_bam = [
"${projectDir}/data/bam/reads_mother.bam",
@@ -488,7 +482,7 @@ And that's actually all we need to do, because the channel factory we use in the
Let's try running the workflow now that the plumbing is set up to run on all three test samples.
```bash
-nextflow run hello-genomics.nf -resume
+nextflow run genomics-1.nf -resume
```
Funny thing: this _might work_, OR it _might fail_.
@@ -497,7 +491,7 @@ If your workflow run succeeded, run it again until you get an error like this:
```console title="Output"
N E X T F L O W ~ version 24.10.0
- ┃ Launching `hello-genomics.nf` [loving_pasteur] DSL2 - revision: d2a8e63076
+ ┃ Launching `genomics-1.nf` [loving_pasteur] DSL2 - revision: d2a8e63076
executor > local (4)
[01/eea165] SAMTOOLS_INDEX (2) | 3 of 3, cached: 1 ✔
@@ -531,14 +525,14 @@ Let's take a look inside the work directory for the failed `GATK_HAPLOTYPECALLER
```console title="Directory contents"
work/a5/fa9fd0994b6beede5fb9ea073596c2
-├── intervals.bed -> /workspace/gitpod/hello-nextflow/data/ref/intervals.bed
-├── reads_father.bam.bai -> /workspace/gitpod/hello-nextflow/work/01/eea16597bd6e810fb4cf89e60f8c2d/reads_father.bam.bai
-├── reads_son.bam -> /workspace/gitpod/hello-nextflow/data/bam/reads_son.bam
+├── intervals.bed -> /workspace/gitpod/nf4-science/genomics/data/ref/intervals.bed
+├── reads_father.bam.bai -> /workspace/gitpod/nf4-science/genomics/work/01/eea16597bd6e810fb4cf89e60f8c2d/reads_father.bam.bai
+├── reads_son.bam -> /workspace/gitpod/nf4-science/genomics/data/bam/reads_son.bam
├── reads_son.bam.vcf
├── reads_son.bam.vcf.idx
-├── ref.dict -> /workspace/gitpod/hello-nextflow/data/ref/ref.dict
-├── ref.fasta -> /workspace/gitpod/hello-nextflow/data/ref/ref.fasta
-└── ref.fasta.fai -> /workspace/gitpod/hello-nextflow/data/ref/ref.fasta.fai
+├── ref.dict -> /workspace/gitpod/nf4-science/genomics/data/ref/ref.dict
+├── ref.fasta -> /workspace/gitpod/nf4-science/genomics/data/ref/ref.fasta
+└── ref.fasta.fai -> /workspace/gitpod/nf4-science/genomics/data/ref/ref.fasta.fai
```
Pay particular attention to the names of the BAM file and the BAM index that are listed in this directory: `reads_son.bam` and `reads_father.bam.bai`.
@@ -549,7 +543,7 @@ What the heck? Nextflow has staged an index file in this process call's work dir
Add these two lines in the workflow body before the `GATK_HAPLOTYPER` process call:
-```groovy title="hello-genomics.nf" linenums="84"
+```groovy title="genomics-1.nf" linenums="84"
// temporary diagnostics
reads_ch.view()
SAMTOOLS_INDEX.out.view()
@@ -558,7 +552,7 @@ Add these two lines in the workflow body before the `GATK_HAPLOTYPER` process ca
Then run the workflow command again.
```bash
-nextflow run hello-genomics.nf
+nextflow run genomics-1.nf
```
You may need to run it several times for it to fail again.
@@ -567,12 +561,12 @@ This error will not reproduce consistently because it is dependent on some varia
This is what the output of the two `.view()` calls we added looks like for a failed run:
```console title="Output"
-/workspace/gitpod/hello-nextflow/data/bam/reads_mother.bam
-/workspace/gitpod/hello-nextflow/data/bam/reads_father.bam
-/workspace/gitpod/hello-nextflow/data/bam/reads_son.bam
-/workspace/gitpod/hello-nextflow/work/9c/53492e3518447b75363e1cd951be4b/reads_father.bam.bai
-/workspace/gitpod/hello-nextflow/work/cc/37894fffdf6cc84c3b0b47f9b536b7/reads_son.bam.bai
-/workspace/gitpod/hello-nextflow/work/4d/dff681a3d137ba7d9866e3d9307bd0/reads_mother.bam.bai
+/workspace/gitpod/nf4-science/genomics/data/bam/reads_mother.bam
+/workspace/gitpod/nf4-science/genomics/data/bam/reads_father.bam
+/workspace/gitpod/nf4-science/genomics/data/bam/reads_son.bam
+/workspace/gitpod/nf4-science/genomics/work/9c/53492e3518447b75363e1cd951be4b/reads_father.bam.bai
+/workspace/gitpod/nf4-science/genomics/work/cc/37894fffdf6cc84c3b0b47f9b536b7/reads_son.bam.bai
+/workspace/gitpod/nf4-science/genomics/work/4d/dff681a3d137ba7d9866e3d9307bd0/reads_mother.bam.bai
```
The first three lines correspond to the input channel and the second, to the output channel.
@@ -605,14 +599,14 @@ First, let's change the output of the `SAMTOOLS_INDEX` process to include the BA
_Before:_
-```groovy title="hello-genomics.nf" linenums="32"
+```groovy title="genomics-1.nf" linenums="32"
output:
path "${input_bam}.bai"
```
_After:_
-```groovy title="hello-genomics.nf" linenums="32"
+```groovy title="genomics-1.nf" linenums="32"
output:
tuple path(input_bam), path("${input_bam}.bai")
```
@@ -627,7 +621,7 @@ Specifically, where we previously declared two separate input paths in the input
_Before:_
-```groovy title="hello-genomics.nf" linenums="49"
+```groovy title="genomics-1.nf" linenums="49"
input:
path input_bam
path input_bam_index
@@ -635,7 +629,7 @@ input:
_After:_
-```groovy title="hello-genomics.nf" linenums="49"
+```groovy title="genomics-1.nf" linenums="49"
input:
tuple path(input_bam), path(input_bam_index)
```
@@ -650,7 +644,7 @@ As a result, we can simply delete that line.
_Before:_
-```groovy title="hello-genomics.nf" linenums="84"
+```groovy title="genomics-1.nf" linenums="84"
GATK_HAPLOTYPECALLER(
reads_ch,
SAMTOOLS_INDEX.out,
@@ -658,7 +652,7 @@ GATK_HAPLOTYPECALLER(
_After:_
-```groovy title="hello-genomics.nf" linenums="84"
+```groovy title="genomics-1.nf" linenums="84"
GATK_HAPLOTYPECALLER(
SAMTOOLS_INDEX.out,
```
@@ -670,7 +664,7 @@ That is all the re-wiring that is necessary to solve the index mismatch problem.
Of course, the proof is in the pudding, so let's run the workflow again a few times to make sure this will work reliably going forward.
```bash
-nextflow run hello-genomics.nf
+nextflow run genomics-1.nf
```
This time (and every time) everything should run correctly:
@@ -678,7 +672,7 @@ This time (and every time) everything should run correctly:
```console title="Output"
N E X T F L O W ~ version 24.10.0
- ┃ Launching `hello-genomics.nf` [special_goldstine] DSL2 - revision: 4cbbf6ea3e
+ ┃ Launching `genomics-1.nf` [special_goldstine] DSL2 - revision: 4cbbf6ea3e
executor > local (6)
[d6/10c2c4] SAMTOOLS_INDEX (1) | 3 of 3 ✔
@@ -687,7 +681,7 @@ executor > local (6)
If you'd like, you can use `.view()` again to peek at what the contents of the `SAMTOOLS_INDEX` output channel looks like:
-```groovy title="hello-genomics.nf" linenums="92"
+```groovy title="genomics-1.nf" linenums="92"
SAMTOOLS_INDEX.out.view()
```
@@ -723,9 +717,9 @@ Here we are going to show you how to do the simple case.
We already made a text file listing the input file paths, called `sample_bams.txt`, which you can find in the `data/` directory.
```txt title="sample_bams.txt"
-/workspace/gitpod/hello-nextflow/data/bam/reads_mother.bam
-/workspace/gitpod/hello-nextflow/data/bam/reads_father.bam
-/workspace/gitpod/hello-nextflow/data/bam/reads_son.bam
+/workspace/gitpod/nf4-science/genomics/data/bam/reads_mother.bam
+/workspace/gitpod/nf4-science/genomics/data/bam/reads_father.bam
+/workspace/gitpod/nf4-science/genomics/data/bam/reads_son.bam
```
As you can see, we listed one file path per line, and they are absolute paths.
@@ -740,7 +734,7 @@ Let's switch the default value for our `reads_bam` input parameter to point to t
_Before:_
-```groovy title="hello-genomics.nf" linenums="7"
+```groovy title="genomics-1.nf" linenums="7"
// Primary input
params.reads_bam = [
"${projectDir}/data/bam/reads_mother.bam",
@@ -751,7 +745,7 @@ params.reads_bam = [
_After:_
-```groovy title="hello-genomics.nf" linenums="7"
+```groovy title="genomics-1.nf" linenums="7"
// Primary input (file of input files, one per line)
params.reads_bam = "${projectDir}/data/sample_bams.txt"
```
@@ -767,14 +761,14 @@ Fortunately we can do that very simply, just by adding the [`.splitText()` opera
_Before:_
-```groovy title="hello-genomics.nf" linenums="68"
+```groovy title="genomics-1.nf" linenums="68"
// Create input channel (single file via CLI parameter)
reads_ch = Channel.fromPath(params.reads_bam)
```
_After:_
-```groovy title="hello-genomics.nf" linenums="68"
+```groovy title="genomics-1.nf" linenums="68"
// Create input channel from a text file listing input file paths
reads_ch = Channel.fromPath(params.reads_bam).splitText()
```
@@ -788,7 +782,7 @@ reads_ch = Channel.fromPath(params.reads_bam).splitText()
Let's run the workflow one more time.
```bash
-nextflow run hello-genomics.nf -resume
+nextflow run genomics-1.nf -resume
```
This should produce the same result as before, right?
@@ -796,7 +790,7 @@ This should produce the same result as before, right?
```console title="Output"
N E X T F L O W ~ version 24.10.0
- ┃ Launching `hello-genomics.nf` [sick_albattani] DSL2 - revision: 46d84642f6
+ ┃ Launching `genomics-1.nf` [sick_albattani] DSL2 - revision: 46d84642f6
[18/23b4bb] SAMTOOLS_INDEX (1) | 3 of 3, cached: 3 ✔
[12/f727bb] GATK_HAPLOTYPECALLER (3) | 3 of 3, cached: 3 ✔
@@ -808,12 +802,12 @@ And that's it! Our simple variant calling workflow has all the basic features we
### Takeaway
-You know how to make a multi-step linear workflow handle a file containing input file paths.
+You know how to make a multi-step linear workflow to index a BAM file and apply per-sample variant calling using GATK.
-More generally, you've learned how to use essential Nextflow components and logic to build a pipeline that does real work, taking into account the idiosyncrasies of bioinformatics file formats and tool requirements.
+More generally, you've learned how to use essential Nextflow components and logic to build a simple genomics pipeline that does real work, taking into account the idiosyncrasies of genomics file formats and tool requirements.
### What's next?
Celebrate your success and take an extra long break!
-In the next training module, you'll learn how to use a few additional Nextflow features (including more channel operators) to develop pipelines with more complex plumbing.
+In the next part of this course, you'll learn how to use a few additional Nextflow features (including more channel operators) to apply joint variant calling to the data.
diff --git a/docs/hello_nextflow/05_hello_operators.md b/docs/nf4_science/genomics/02_joint_calling.md
similarity index 92%
rename from docs/hello_nextflow/05_hello_operators.md
rename to docs/nf4_science/genomics/02_joint_calling.md
index fcceff174..77ca32353 100644
--- a/docs/hello_nextflow/05_hello_operators.md
+++ b/docs/nf4_science/genomics/02_joint_calling.md
@@ -1,18 +1,13 @@
-# Part 4: Hello Operators
+# Part 2: Joint calling on a cohort
-In Part 3, you built a pipeline that was completely linear and processed each sample's data independently of the others.
-However, in real pipelines, you may need to combine data from multiple samples, or combine different kinds of data.
-Here we show you how to use channels and channel operators to implement a pipeline with more interesting plumbing.
+In the first part of this course, you built a variant calling pipeline that was completely linear and processed each sample's data independently of the others.
+However, in a real genomics use case, you'll typically need to look at the variant calls of multiple samples together.
-Specifically, we show you how to implement joint variant calling with GATK, building on the pipeline from Part 2.
-
-!!! note
-
- Don't worry if you're not familiar with GATK or genomics in general. We'll summarize the necessary concepts as we go, and the workflow implementation principles we demonstrate here apply broadly to any use case that follows a similar pattern.
+In this second part, we show you how to use channels and channel operators to implement joint variant calling with GATK, building on the pipeline from Part 1.
### Method overview
-The GATK variant calling method we used in Part 3 simply generated variant calls per sample.
+The GATK variant calling method we used in first part of this course simply generated variant calls per sample.
That's fine if you only want to look at the variants from each sample in isolation, but that yields limited information.
It's often more interesting to look at how variant calls differ across multiple samples, and to do so, GATK offers an alternative method called joint variant calling, which we demonstrate here.
@@ -23,13 +18,15 @@ Joint variant calling involves generating a special kind of variant output calle
What's special about a sample's GVCF is that it contains records summarizing sequence data statistics about all positions in the targeted area of the genome, not just the positions where the program found evidence of variation.
This is critical for the joint genotyping calculation ([further reading](https://gatk.broadinstitute.org/hc/en-us/articles/360035890431-The-logic-of-joint-calling-for-germline-short-variants)).
-The GVCF is produced by GATK HaplotypeCaller, the same tool we used in Part 3, with an additional parameter (`-ERC GVCF`).
+The GVCF is produced by GATK HaplotypeCaller, the same tool we used in Part 1, with an additional parameter (`-ERC GVCF`).
Combining the GVCFs is done with GATK GenomicsDBImport, which combines the per-sample calls into a data store (analogous to a database), then the actual 'joint genotyping' analysis is done with GATK GenotypeGVCFs.
-So to recap, we're going to develop a workflow that does the following:
+### Workflow
+
+So to recap, in this part of the course, we're going to develop a workflow that does the following:
---8<-- "docs/hello_nextflow/img/hello-gatk-2.svg"
+--8<-- "docs/nf4_science/genomics/img/hello-gatk-2.svg"
1. Generate an index file for each BAM input file using Samtools
@@ -37,12 +34,7 @@ So to recap, we're going to develop a workflow that does the following:
3. Collect all the GVCFs and combine them into a GenomicsDB data store
4. Run joint genotyping on the combined GVCF data store to produce a cohort-level VCF
-### Dataset
-
-- **A reference genome** consisting of a small region of the human chromosome 20 (from hg19/b37) and its accessory files (index and sequence dictionary).
-- **Three whole genome sequencing samples** corresponding to a family trio (mother, father and son), which have been subset to a small portion on chromosome 20 to keep the file sizes small.
- The sequencing data is in [BAM](https://samtools.github.io/hts-specs/SAMv1.pdf) (Binary Alignment Map) format, _i.e._ genome sequencing reads that have already been mapped to the reference genome.
-- **A list of genomic intervals**, _i.e._ coordinates on the genome where our samples have data suitable for calling variants, provided in BED format.
+We'll apply this to the same dataset as in Part 1.
---
@@ -53,11 +45,11 @@ Just like previously, we want to try out the commands manually before we attempt
!!! note
Make sure you're in the correct working directory:
- `cd /workspace/gitpod/hello-nextflow`
+ `cd /workspace/gitpod/nf4-science/genomics`
### 0.1. Index a BAM input file with Samtools
-This first step is the same as in Part 3: Hello Genomics, so it should feel very familiar, but this time we need to do it for all three samples.
+This first step is the same as in Part 1, so it should feel very familiar, but this time we need to do it for all three samples.
!!! note
@@ -99,7 +91,7 @@ exit
### 0.2. Call variants with GATK HaplotypeCaller in GVCF mode
-This second step is very similar to what we did Part 3: Hello Genomics, but we are now going to run GATK in 'GVCF mode'.
+This second step is very similar to what we did Part 1: Hello Genomics, but we are now going to run GATK in 'GVCF mode'.
#### 0.2.1. Spin up the GATK container interactively
@@ -125,7 +117,7 @@ gatk HaplotypeCaller \
This creates the GVCF output file `reads_mother.g.vcf` in the current working directory in the container.
-If you `cat` it to view the contents, you'll see it's much longer than the equivalent VCF we generated in Part 3. You can't even scroll up to the start of the file, and most of the lines look quite different from what we saw in the VCF in Part 3.
+If you `cat` it to view the contents, you'll see it's much longer than the equivalent VCF we generated in Part 1. You can't even scroll up to the start of the file, and most of the lines look quite different from what we saw in the VCF in Part 1.
```console title="Output" linenums="1674"
20_10037292_10066351 14714 . T . . END=14718 GT:DP:GQ:MIN_DP:PL 0/0:37:99:37:0,99,1192
@@ -143,7 +135,7 @@ In a GVCF, there are typically lots of such non-variant lines, with a smaller nu
20_10037292_10066351 3481 . T . . END=3481 GT:DP:GQ:MIN_DP:PL 0/0:21:51:21:0,51,765
```
-The second line shows the first variant record in the file, which corresponds to the first variant in the VCF file we looked at in Part 3.
+The second line shows the first variant record in the file, which corresponds to the first variant in the VCF file we looked at in Part 1.
Just like the original VCF was, the output GVCF file is also accompanied by an index file, called `reads_mother.g.vcf.idx`.
@@ -217,7 +209,7 @@ It's another reasonably small file so you can `cat` this file to view its conten
20_10037292_10066351 3529 . T A 154.29 . AC=1;AF=0.167;AN=6;BaseQRankSum=-5.440e-01;DP=104;ExcessHet=0.0000;FS=1.871;MLEAC=1;MLEAF=0.167;MQ=60.00;MQRankSum=0.00;QD=7.71;ReadPosRankSum=-1.158e+00;SOR=1.034 GT:AD:DP:GQ:PL 0/0:44,0:44:99:0,112,1347 0/1:12,8:20:99:163,0,328 0/0:39,0:39:99:0,105,1194
```
-This looks more like the original VCF we generated in Part 3, except this time we have genotype-level information for all three samples.
+This looks more like the original VCF we generated in Part 1, except this time we have genotype-level information for all three samples.
The last three columns in the file are the genotype blocks for the samples, listed in alphabetical order.
If we look at the genotypes called for our test family trio for the very first variant, we see that the father is heterozygous-variant (`0/1`), and the mother and son are both homozygous-variant (`1/1`).
@@ -232,7 +224,7 @@ exit
### Takeaway
-You know how to run the individual commands in the terminal to verify that they will produce the information you want.
+You know how to run the individual commands involved in joint variant calling in the terminal to verify that they will produce the information you want.
### What's next?
@@ -242,13 +234,13 @@ Wrap these commands into an actual pipeline.
## 1. Modify the per-sample variant calling step to produce a GVCF
-The good news is that we don't need to start all over, since we already wrote a workflow that does some of this work in Part 3.
+The good news is that we don't need to start all over, since we already wrote a workflow that does some of this work in Part 1.
However, that pipeline produces VCF files, whereas now we want GVCF files in order to do the joint genotyping.
So we need to start by switching on the GVCF variant calling mode and updating the output file extension.
!!! note
- For convenience, we are going to work with a fresh copy of the GATK workflow as it stands at the end of Part 3, but under a different name: `hello-operators.nf`.
+ For convenience, we are going to work with a fresh copy of the GATK workflow as it stands at the end of Part 1, but under a different name: `hello-operators.nf`.
### 1.1. Tell HaplotypeCaller to emit a GVCF and update the output extension
@@ -473,7 +465,7 @@ The resulting `all_gvcfs_ch` and `all_idxs_ch` channels are what we're going to
!!!note
- In case you were wondering, we collect the GVCFs and their index files separately because the GATK GenomicsDBImport command only wants to see the GVCF file paths. Fortunately, since Nextflow will stage all the files together for execution, we don't have to worry about the order of files like we did for BAMs and their index in Part 3.
+ In case you were wondering, we collect the GVCFs and their index files separately because the GATK GenomicsDBImport command only wants to see the GVCF file paths. Fortunately, since Nextflow will stage all the files together for execution, we don't have to worry about the order of files like we did for BAMs and their index in Part 1.
### 2.4. Add a call to the workflow block to run GATK_GENOMICSDB
@@ -844,7 +836,7 @@ You'll find the final output file, `family_trio.joint.vcf` (and its file index),
20_10037292_10066351 3529 . T A 154.29 . AC=1;AF=0.167;AN=6;BaseQRankSum=-5.440e-01;DP=104;ExcessHet=0.0000;FS=1.871;MLEAC=1;MLEAF=0.167;MQ=60.00;MQRankSum=0.00;QD=7.71;ReadPosRankSum=-1.158e+00;SOR=1.034 GT:AD:DP:GQ:PL 0/0:44,0:44:99:0,112,1347 0/1:12,8:20:99:163,0,328 0/0:39,0:39:99:0,105,1194
```
-You now have an automated, fully reproducible variant calling workflow!
+You now have an automated, fully reproducible joint variant calling workflow!
!!!note
@@ -860,6 +852,6 @@ You know how to use some common operators as well as Groovy closures to control
Celebrate your success and take an extra super mega long break! This was tough and you deserve it.
-In the next training, you'll learn how to leverage commonly used workflow configuration options.
+When you're ready to move on, have a look at our training portal to browse available training courses and select your next step.
**Good luck!**
diff --git a/docs/nf4_science/genomics/03_configuration.md b/docs/nf4_science/genomics/03_configuration.md
new file mode 100644
index 000000000..485ef6d22
--- /dev/null
+++ b/docs/nf4_science/genomics/03_configuration.md
@@ -0,0 +1,89 @@
+# Part 3: Resource profiling and optimization
+
+THIS IS A PLACEHOLDER
+
+!!!note
+
+ This training module is under redevelopment.
+
+---
+
+TODO
+
+### 4.3. Run the workflow to generate a resource utilization report
+
+To have Nextflow generate the report automatically, simply add `-with-report .html` to your command line.
+
+```bash
+nextflow run main.nf -profile my_laptop -with-report report-config-1.html
+```
+
+The report is an html file, which you can download and open in your browser. You can also right click it in the file explorer on the left and click on `Show preview` in order to view it on Gitpod.
+
+Take a few minutes to look through the report and see if you can identify some opportunities for adjusting resources.
+Make sure to click on the tabs that show the utilization results as a percentage of what was allocated.
+There is some [documentation](https://www.nextflow.io/docs/latest/reports.html) describing all the available features.
+
+
+
+One observation is that the `GATK_JOINTGENOTYPING` seems to be very hungry for CPU, which makes sense since it performs a lot of complex calculations.
+So we could try boosting that and see if it cuts down on runtime.
+
+However, we seem to have overshot the mark with the memory allocations; all processes are only using a fraction of what we're giving them.
+We should dial that back down and save some resources.
+
+### 4.4. Adjust resource allocations for a specific process
+
+We can specify resource allocations for a given process using the `withName` process selector.
+The syntax looks like this when it's by itself in a process block:
+
+```groovy title="Syntax"
+process {
+ withName: 'GATK_JOINTGENOTYPING' {
+ cpus = 4
+ }
+}
+```
+
+Let's add that to the existing process block in the `nextflow.config` file.
+
+```groovy title="nextflow.config" linenums="11"
+process {
+ // defaults for all processes
+ cpus = 2
+ memory = 2.GB
+ // allocations for a specific process
+ withName: 'GATK_JOINTGENOTYPING' {
+ cpus = 4
+ }
+}
+```
+
+With that specified, the default settings will apply to all processes **except** the `GATK_JOINTGENOTYPING` process, which is a special snowflake that gets a lot more CPU.
+Hopefully that should have an effect.
+
+### 4.5. Run again with the modified configuration
+
+Let's run the workflow again with the modified configuration and with the reporting flag turned on, but notice we're giving the report a different name so we can differentiate them.
+
+```bash
+nextflow run main.nf -profile my_laptop -with-report report-config-2.html
+```
+
+Once again, you probably won't notice a substantial difference in runtime, because this is such a small workload and the tools spend more time in ancillary tasks than in performing the 'real' work.
+
+However, the second report shows that our resource utilization is more balanced now.
+
+
+
+As you can see, this approach is useful when your processes have different resource requirements. It empowers you to right-size the resource allocations you set up for each process based on actual data, not guesswork.
+
+!!!note
+
+ This is just a tiny taster of what you can do to optimize your use of resources.
+ Nextflow itself has some really neat [dynamic retry logic](https://training.nextflow.io/basic_training/debugging/#dynamic-resources-allocation) built in to retry jobs that fail due to resource limitations.
+ Additionally, the Seqera Platform offers AI-driven tooling for optimizing your resource allocations automatically as well.
+
+ We'll cover both of those approaches in an upcoming part of this training course.
+
+That being said, there may be some constraints on what you can (or must) allocate depending on what computing executor and compute infrastructure you're using. For example, your cluster may require you to stay within certain limits that don't apply when you're running elsewhere.
diff --git a/docs/hello_nextflow/08_hello_nf-test.md b/docs/nf4_science/genomics/04_testing.md
similarity index 99%
rename from docs/hello_nextflow/08_hello_nf-test.md
rename to docs/nf4_science/genomics/04_testing.md
index 1665d1818..9d1a9ef74 100644
--- a/docs/hello_nextflow/08_hello_nf-test.md
+++ b/docs/nf4_science/genomics/04_testing.md
@@ -1,4 +1,10 @@
-# Part 7: Hello nf-test
+# Part 4: Hello nf-test
+
+THIS IS A PLACEHOLDER
+
+!!!note
+
+ This training module is under redevelopment.
Being able to systematically test that every part of your workflow is doing what it's supposed to do is critical for reproducibility and long-term maintenance.
And it's also helpful during the development process!
diff --git a/docs/hello_nextflow/img/gatk-pipeline.png b/docs/nf4_science/genomics/img/gatk-pipeline.png
similarity index 100%
rename from docs/hello_nextflow/img/gatk-pipeline.png
rename to docs/nf4_science/genomics/img/gatk-pipeline.png
diff --git a/docs/hello_nextflow/img/haplotype-caller.excalidraw.svg b/docs/nf4_science/genomics/img/haplotype-caller.excalidraw.svg
similarity index 100%
rename from docs/hello_nextflow/img/haplotype-caller.excalidraw.svg
rename to docs/nf4_science/genomics/img/haplotype-caller.excalidraw.svg
diff --git a/docs/hello_nextflow/img/hello-gatk-1.svg b/docs/nf4_science/genomics/img/hello-gatk-1.svg
similarity index 100%
rename from docs/hello_nextflow/img/hello-gatk-1.svg
rename to docs/nf4_science/genomics/img/hello-gatk-1.svg
diff --git a/docs/hello_nextflow/img/hello-gatk-2.svg b/docs/nf4_science/genomics/img/hello-gatk-2.svg
similarity index 100%
rename from docs/hello_nextflow/img/hello-gatk-2.svg
rename to docs/nf4_science/genomics/img/hello-gatk-2.svg
diff --git a/docs/hello_nextflow/img/joint-calling.png b/docs/nf4_science/genomics/img/joint-calling.png
similarity index 100%
rename from docs/hello_nextflow/img/joint-calling.png
rename to docs/nf4_science/genomics/img/joint-calling.png
diff --git a/docs/nf4_science/genomics/img/memory-after.png b/docs/nf4_science/genomics/img/memory-after.png
new file mode 100644
index 000000000..d61b4a7c5
Binary files /dev/null and b/docs/nf4_science/genomics/img/memory-after.png differ
diff --git a/docs/nf4_science/genomics/img/memory-before.png b/docs/nf4_science/genomics/img/memory-before.png
new file mode 100644
index 000000000..ce0f7ac27
Binary files /dev/null and b/docs/nf4_science/genomics/img/memory-before.png differ
diff --git a/docs/hello_nextflow/img/variants.png b/docs/nf4_science/genomics/img/variants.png
similarity index 100%
rename from docs/hello_nextflow/img/variants.png
rename to docs/nf4_science/genomics/img/variants.png
diff --git a/docs/nf4_science/genomics/index.md b/docs/nf4_science/genomics/index.md
new file mode 100644
index 000000000..4ae6f3233
--- /dev/null
+++ b/docs/nf4_science/genomics/index.md
@@ -0,0 +1,48 @@
+---
+title: Nextflow for Genomics
+hide:
+ - toc
+---
+
+# Nextflow for Genomics
+
+This training course is intended for researchers in genomics and related fields who are interested in developing or customizing data analysis pipelines.
+It builds on the [Hello Nextflow](../hello_nextflow/) beginner training and demonstrates how to use Nextflow in the specific context of the genomics domain.
+
+Specifically, this course demonstrates how to implement a simple variant calling pipeline with [GATK](https://gatk.broadinstitute.org/) (Genome Analysis Toolkit), a widely used software package for analyzing high-throughput sequencing data.
+
+!!! note
+
+ Don't worry if you're not familiar with GATK specifically.
+ We'll summarize the necessary concepts as we go, and the workflow implementation principles we demonstrate here apply broadly to any command line tool that processes genomics data.
+
+## Learning objectives
+
+By working through this course, you will learn how to apply foundational Nextflow concepts and tooling to a typical genomics use case.
+
+By the end of this workshop you will be able to:
+
+- Write a linear workflow to apply variant calling to a single sample
+- Handle accessory files such as index files and reference genome resources appropriately
+- Leverage Nextflow's dataflow paradigm to parallelize per-sample variant calling
+- Implement multi-sample variant calling using relevant channel operators
+- Configure pipeline execution and manage and optimize resource allocations
+- Implement per-step and end-to-end pipeline tests that handle genomics-specific idiosyncrasies appropriately
+
+
+
+## Prerequisites
+
+The course assumes some minimal familiarity with the following:
+
+- Tools and file formats commonly used in this scientific domain
+- Experience with the command line
+- Foundational Nextflow concepts and tooling covered in the [Hello Nextflow](../hello_nextflow/) beginner training.
+
+For technical requirements and environment setup, see the [Environment Setup](../envsetup/) mini-course.
+
+## Get started
+
+To get started, open the training environment by clicking the 'Open in Gitpod' button below.
+
+[![Open in Gitpod](https://img.shields.io/badge/Gitpod-%20Open%20in%20Gitpod-908a85?logo=gitpod)](https://gitpod.io/#https://github.com/nextflow-io/training)
diff --git a/docs/nf4_science/index.md b/docs/nf4_science/index.md
new file mode 100644
index 000000000..b8bbdb3d4
--- /dev/null
+++ b/docs/nf4_science/index.md
@@ -0,0 +1,33 @@
+---
+title: Nextflow for Science
+hide:
+ - toc
+---
+
+# Nextflow for Science
+
+These are courses that demonstrate how to apply the concepts and components presented in the [Hello Nextflow](../hello_nextflow/) beginner course to specific scientific use cases. Each course consists of a series of training modules that are designed to help learners build up their skills progressively.
+
+!!! exercise "Nextflow for Genomics"
+
+ !!! quote inline end ""
+
+ :material-run-fast: Learn to develop a pipeline for genomics in Nextflow.
+
+ This is a course for researchers who wish to learn how to develop their own genomics pipelines. The course uses a variant calling use case to demonstrate how to develop a simple but functional genomics pipeline.
+
+ [Launch the Nextflow for Genomics training :material-arrow-right:](genomics/){ .md-button .md-button--primary }
+
+**Coming soon:** "Nextflow for RNAseq" — Learn to develop a pipeline for bulk RNAseq analysis in Nextflow
+
+
diff --git a/docs/other/README.md b/docs/other/README.md
new file mode 100644
index 000000000..aa353d56a
--- /dev/null
+++ b/docs/other/README.md
@@ -0,0 +1 @@
+This directory contains older training courses that are not actively maintained and that we may repurpose elsewhere or delete in the near future. The corresponding materials are not available within the training environment. You can still find the materials in the GitHub repository and download them for local use.
diff --git a/docs/hands_on/01_datasets.md b/docs/other/hands_on/01_datasets.md
similarity index 100%
rename from docs/hands_on/01_datasets.md
rename to docs/other/hands_on/01_datasets.md
diff --git a/docs/hands_on/02_workflow.md b/docs/other/hands_on/02_workflow.md
similarity index 100%
rename from docs/hands_on/02_workflow.md
rename to docs/other/hands_on/02_workflow.md
diff --git a/docs/hands_on/03_setup.md b/docs/other/hands_on/03_setup.md
similarity index 100%
rename from docs/hands_on/03_setup.md
rename to docs/other/hands_on/03_setup.md
diff --git a/docs/hands_on/04_implementation.md b/docs/other/hands_on/04_implementation.md
similarity index 100%
rename from docs/hands_on/04_implementation.md
rename to docs/other/hands_on/04_implementation.md
diff --git a/docs/hands_on/index.md b/docs/other/hands_on/index.md
similarity index 100%
rename from docs/hands_on/index.md
rename to docs/other/hands_on/index.md
diff --git a/docs/nf_customize/01_orientation.md b/docs/other/nf_customize/01_orientation.md
similarity index 100%
rename from docs/nf_customize/01_orientation.md
rename to docs/other/nf_customize/01_orientation.md
diff --git a/docs/nf_customize/02_nf-core.md b/docs/other/nf_customize/02_nf-core.md
similarity index 100%
rename from docs/nf_customize/02_nf-core.md
rename to docs/other/nf_customize/02_nf-core.md
diff --git a/docs/nf_customize/03_execution.md b/docs/other/nf_customize/03_execution.md
similarity index 100%
rename from docs/nf_customize/03_execution.md
rename to docs/other/nf_customize/03_execution.md
diff --git a/docs/nf_customize/04_config.md b/docs/other/nf_customize/04_config.md
similarity index 100%
rename from docs/nf_customize/04_config.md
rename to docs/other/nf_customize/04_config.md
diff --git a/docs/nf_customize/05_tools.md b/docs/other/nf_customize/05_tools.md
similarity index 100%
rename from docs/nf_customize/05_tools.md
rename to docs/other/nf_customize/05_tools.md
diff --git a/docs/nf_customize/img/args.excalidraw.svg b/docs/other/nf_customize/img/args.excalidraw.svg
similarity index 100%
rename from docs/nf_customize/img/args.excalidraw.svg
rename to docs/other/nf_customize/img/args.excalidraw.svg
diff --git a/docs/nf_customize/img/demo-parameters.png b/docs/other/nf_customize/img/demo-parameters.png
similarity index 100%
rename from docs/nf_customize/img/demo-parameters.png
rename to docs/other/nf_customize/img/demo-parameters.png
diff --git a/docs/nf_customize/img/gitpod.welcome.png b/docs/other/nf_customize/img/gitpod.welcome.png
similarity index 100%
rename from docs/nf_customize/img/gitpod.welcome.png
rename to docs/other/nf_customize/img/gitpod.welcome.png
diff --git a/docs/nf_customize/img/launch.png b/docs/other/nf_customize/img/launch.png
similarity index 100%
rename from docs/nf_customize/img/launch.png
rename to docs/other/nf_customize/img/launch.png
diff --git a/docs/nf_customize/img/nested.excalidraw.svg b/docs/other/nf_customize/img/nested.excalidraw.svg
similarity index 100%
rename from docs/nf_customize/img/nested.excalidraw.svg
rename to docs/other/nf_customize/img/nested.excalidraw.svg
diff --git a/docs/nf_customize/img/nf-core-logo.png b/docs/other/nf_customize/img/nf-core-logo.png
similarity index 100%
rename from docs/nf_customize/img/nf-core-logo.png
rename to docs/other/nf_customize/img/nf-core-logo.png
diff --git a/docs/nf_customize/img/pipelines.png b/docs/other/nf_customize/img/pipelines.png
similarity index 100%
rename from docs/nf_customize/img/pipelines.png
rename to docs/other/nf_customize/img/pipelines.png
diff --git a/docs/nf_customize/img/subway.excalidraw.svg b/docs/other/nf_customize/img/subway.excalidraw.svg
similarity index 100%
rename from docs/nf_customize/img/subway.excalidraw.svg
rename to docs/other/nf_customize/img/subway.excalidraw.svg
diff --git a/docs/nf_customize/index.md b/docs/other/nf_customize/index.md
similarity index 100%
rename from docs/nf_customize/index.md
rename to docs/other/nf_customize/index.md
diff --git a/docs/nf_develop/1_01_orientation.md b/docs/other/nf_develop/1_01_orientation.md
similarity index 100%
rename from docs/nf_develop/1_01_orientation.md
rename to docs/other/nf_develop/1_01_orientation.md
diff --git a/docs/nf_develop/1_02_create.md b/docs/other/nf_develop/1_02_create.md
similarity index 100%
rename from docs/nf_develop/1_02_create.md
rename to docs/other/nf_develop/1_02_create.md
diff --git a/docs/nf_develop/1_03_pipeline.md b/docs/other/nf_develop/1_03_pipeline.md
similarity index 100%
rename from docs/nf_develop/1_03_pipeline.md
rename to docs/other/nf_develop/1_03_pipeline.md
diff --git a/docs/nf_develop/1_04_parameters.md b/docs/other/nf_develop/1_04_parameters.md
similarity index 100%
rename from docs/nf_develop/1_04_parameters.md
rename to docs/other/nf_develop/1_04_parameters.md
diff --git a/docs/nf_develop/2_00_introduction.md b/docs/other/nf_develop/2_00_introduction.md
similarity index 100%
rename from docs/nf_develop/2_00_introduction.md
rename to docs/other/nf_develop/2_00_introduction.md
diff --git a/docs/nf_develop/2_01_orientation.md b/docs/other/nf_develop/2_01_orientation.md
similarity index 100%
rename from docs/nf_develop/2_01_orientation.md
rename to docs/other/nf_develop/2_01_orientation.md
diff --git a/docs/nf_develop/2_02_custom.md b/docs/other/nf_develop/2_02_custom.md
similarity index 100%
rename from docs/nf_develop/2_02_custom.md
rename to docs/other/nf_develop/2_02_custom.md
diff --git a/docs/nf_develop/extra.md b/docs/other/nf_develop/extra.md
similarity index 100%
rename from docs/nf_develop/extra.md
rename to docs/other/nf_develop/extra.md
diff --git a/docs/nf_develop/img/branches.excalidraw.svg b/docs/other/nf_develop/img/branches.excalidraw.svg
similarity index 100%
rename from docs/nf_develop/img/branches.excalidraw.svg
rename to docs/other/nf_develop/img/branches.excalidraw.svg
diff --git a/docs/nf_develop/img/create_1.png b/docs/other/nf_develop/img/create_1.png
similarity index 100%
rename from docs/nf_develop/img/create_1.png
rename to docs/other/nf_develop/img/create_1.png
diff --git a/docs/nf_develop/img/create_2.png b/docs/other/nf_develop/img/create_2.png
similarity index 100%
rename from docs/nf_develop/img/create_2.png
rename to docs/other/nf_develop/img/create_2.png
diff --git a/docs/nf_develop/img/create_3.png b/docs/other/nf_develop/img/create_3.png
similarity index 100%
rename from docs/nf_develop/img/create_3.png
rename to docs/other/nf_develop/img/create_3.png
diff --git a/docs/nf_develop/img/create_4.png b/docs/other/nf_develop/img/create_4.png
similarity index 100%
rename from docs/nf_develop/img/create_4.png
rename to docs/other/nf_develop/img/create_4.png
diff --git a/docs/nf_develop/img/create_5.png b/docs/other/nf_develop/img/create_5.png
similarity index 100%
rename from docs/nf_develop/img/create_5.png
rename to docs/other/nf_develop/img/create_5.png
diff --git a/docs/nf_develop/img/create_6.png b/docs/other/nf_develop/img/create_6.png
similarity index 100%
rename from docs/nf_develop/img/create_6.png
rename to docs/other/nf_develop/img/create_6.png
diff --git a/docs/nf_develop/img/create_7.png b/docs/other/nf_develop/img/create_7.png
similarity index 100%
rename from docs/nf_develop/img/create_7.png
rename to docs/other/nf_develop/img/create_7.png
diff --git a/docs/nf_develop/img/github.actions.png b/docs/other/nf_develop/img/github.actions.png
similarity index 100%
rename from docs/nf_develop/img/github.actions.png
rename to docs/other/nf_develop/img/github.actions.png
diff --git a/docs/nf_develop/img/github.new.png b/docs/other/nf_develop/img/github.new.png
similarity index 100%
rename from docs/nf_develop/img/github.new.png
rename to docs/other/nf_develop/img/github.new.png
diff --git a/docs/nf_develop/img/gitpod.dashboard.png b/docs/other/nf_develop/img/gitpod.dashboard.png
similarity index 100%
rename from docs/nf_develop/img/gitpod.dashboard.png
rename to docs/other/nf_develop/img/gitpod.dashboard.png
diff --git a/docs/nf_develop/img/gitpod.opendashboard.png b/docs/other/nf_develop/img/gitpod.opendashboard.png
similarity index 100%
rename from docs/nf_develop/img/gitpod.opendashboard.png
rename to docs/other/nf_develop/img/gitpod.opendashboard.png
diff --git a/docs/nf_develop/img/gitpod.permissions.png b/docs/other/nf_develop/img/gitpod.permissions.png
similarity index 100%
rename from docs/nf_develop/img/gitpod.permissions.png
rename to docs/other/nf_develop/img/gitpod.permissions.png
diff --git a/docs/nf_develop/img/gitpod.providers.png b/docs/other/nf_develop/img/gitpod.providers.png
similarity index 100%
rename from docs/nf_develop/img/gitpod.providers.png
rename to docs/other/nf_develop/img/gitpod.providers.png
diff --git a/docs/nf_develop/img/gitpod.usersettings.png b/docs/other/nf_develop/img/gitpod.usersettings.png
similarity index 100%
rename from docs/nf_develop/img/gitpod.usersettings.png
rename to docs/other/nf_develop/img/gitpod.usersettings.png
diff --git a/docs/nf_develop/img/gitpod.welcome.png b/docs/other/nf_develop/img/gitpod.welcome.png
similarity index 100%
rename from docs/nf_develop/img/gitpod.welcome.png
rename to docs/other/nf_develop/img/gitpod.welcome.png
diff --git a/docs/nf_develop/img/nested.excalidraw.svg b/docs/other/nf_develop/img/nested.excalidraw.svg
similarity index 100%
rename from docs/nf_develop/img/nested.excalidraw.svg
rename to docs/other/nf_develop/img/nested.excalidraw.svg
diff --git a/docs/nf_develop/img/pipeline.excalidraw.svg b/docs/other/nf_develop/img/pipeline.excalidraw.svg
similarity index 100%
rename from docs/nf_develop/img/pipeline.excalidraw.svg
rename to docs/other/nf_develop/img/pipeline.excalidraw.svg
diff --git a/docs/nf_develop/img/schemabuild.png b/docs/other/nf_develop/img/schemabuild.png
similarity index 100%
rename from docs/nf_develop/img/schemabuild.png
rename to docs/other/nf_develop/img/schemabuild.png
diff --git a/docs/nf_develop/img/template.png b/docs/other/nf_develop/img/template.png
similarity index 100%
rename from docs/nf_develop/img/template.png
rename to docs/other/nf_develop/img/template.png
diff --git a/docs/nf_develop/index.md b/docs/other/nf_develop/index.md
similarity index 100%
rename from docs/nf_develop/index.md
rename to docs/other/nf_develop/index.md
diff --git a/docs/troubleshoot/01_exercise.md b/docs/other/troubleshoot/01_exercise.md
similarity index 100%
rename from docs/troubleshoot/01_exercise.md
rename to docs/other/troubleshoot/01_exercise.md
diff --git a/docs/troubleshoot/01_orientation.md b/docs/other/troubleshoot/01_orientation.md
similarity index 100%
rename from docs/troubleshoot/01_orientation.md
rename to docs/other/troubleshoot/01_orientation.md
diff --git a/docs/troubleshoot/02_exercise.md b/docs/other/troubleshoot/02_exercise.md
similarity index 100%
rename from docs/troubleshoot/02_exercise.md
rename to docs/other/troubleshoot/02_exercise.md
diff --git a/docs/troubleshoot/03_exercise.md b/docs/other/troubleshoot/03_exercise.md
similarity index 100%
rename from docs/troubleshoot/03_exercise.md
rename to docs/other/troubleshoot/03_exercise.md
diff --git a/docs/troubleshoot/04_exercise.md b/docs/other/troubleshoot/04_exercise.md
similarity index 100%
rename from docs/troubleshoot/04_exercise.md
rename to docs/other/troubleshoot/04_exercise.md
diff --git a/docs/troubleshoot/05_exercise.md b/docs/other/troubleshoot/05_exercise.md
similarity index 100%
rename from docs/troubleshoot/05_exercise.md
rename to docs/other/troubleshoot/05_exercise.md
diff --git a/docs/troubleshoot/06_exercise.md b/docs/other/troubleshoot/06_exercise.md
similarity index 100%
rename from docs/troubleshoot/06_exercise.md
rename to docs/other/troubleshoot/06_exercise.md
diff --git a/docs/troubleshoot/img/gitpod.welcome.png b/docs/other/troubleshoot/img/gitpod.welcome.png
similarity index 100%
rename from docs/troubleshoot/img/gitpod.welcome.png
rename to docs/other/troubleshoot/img/gitpod.welcome.png
diff --git a/docs/troubleshoot/index.md b/docs/other/troubleshoot/index.md
similarity index 100%
rename from docs/troubleshoot/index.md
rename to docs/other/troubleshoot/index.md
diff --git a/docs/side_quests/README.md b/docs/side_quests/README.md
new file mode 100644
index 000000000..cffa39fdf
--- /dev/null
+++ b/docs/side_quests/README.md
@@ -0,0 +1 @@
+This is a placeholder for the future Side Quests (deep dive trainings). Docs currently in here are stubs based on content recycled from elsewhere.
diff --git a/docs/side_quests/containers.md b/docs/side_quests/containers.md
new file mode 100644
index 000000000..ff62a0bba
--- /dev/null
+++ b/docs/side_quests/containers.md
@@ -0,0 +1,189 @@
+# Part 1: More Containers
+
+[TODO]
+
+---
+
+## 1. How to find or make container images
+
+Some software developers provide container images for their software that are available on container registries like Docker Hub, but many do not.
+In this optional section, we'll show you to two ways to get a container image for tools you want to use in your Nextflow pipelines: using Seqera Containers and building the container image yourself.
+
+You'll be getting/building a container image for the `quote` pip package, which will be used in the exercise at the end of this section.
+
+### 1.1. Get a container image from Seqera Containers
+
+Seqera Containers is a free service that builds container images for pip and conda (including bioconda) installable tools.
+Navigate to [Seqera Containers](https://www.seqera.io/containers/) and search for the `quote` pip package.
+
+![Seqera Containers](img/seqera-containers-1.png)
+
+Click on "+Add" and then "Get Container" to request a container image for the `quote` pip package.
+
+![Seqera Containers](img/seqera-containers-2.png)
+
+If this is the first time a community container has been built for this version of the package, it may take a few minutes to complete.
+Click to copy the URI (e.g. `community.wave.seqera.io/library/pip_quote:ae07804021465ee9`) of the container image that was created for you.
+
+You can now use the container image to run the `quote` command and get a random saying from Grace Hopper.
+
+```bash
+docker run --rm community.wave.seqera.io/library/pip_quote:ae07804021465ee9 quote "Grace Hopper"
+```
+
+Output:
+
+```console title="Output"
+Humans are allergic to change. They love to say, 'We've always done it
+this way.' I try to fight that. That's why I have a clock on my wall
+that runs counter-clockwise.
+```
+
+### 1.2. Build the container image yourself
+
+Let's use some build details from the Seqera Containers website to build the container image for the `quote` pip package ourselves.
+Return to the Seqera Containers website and click on the "Build Details" button.
+
+The first item we'll look at is the `Dockerfile`, a type of script file that contains all the commands needed to build the container image.
+We've added some explanatory comments to the Dockerfile below to help you understand what each part does.
+
+```Dockerfile title="Dockerfile"
+# Start from the micromamba base docker image
+FROM mambaorg/micromamba:1.5.10-noble
+# Copy the conda.yml file into the container
+COPY --chown=$MAMBA_USER:$MAMBA_USER conda.yml /tmp/conda.yml
+# Install various utilities for Nextflow to use and the packages in the conda.yml file
+RUN micromamba install -y -n base -f /tmp/conda.yml \
+ && micromamba install -y -n base conda-forge::procps-ng \
+ && micromamba env export --name base --explicit > environment.lock \
+ && echo ">> CONDA_LOCK_START" \
+ && cat environment.lock \
+ && echo "<< CONDA_LOCK_END" \
+ && micromamba clean -a -y
+# Run the container as the root user
+USER root
+# Set the PATH environment variable to include the micromamba installation directory
+ENV PATH="$MAMBA_ROOT_PREFIX/bin:$PATH"
+```
+
+The second item we'll look at is the `conda.yml` file, which contains the list of packages that need to be installed in the container image.
+
+```conda.yml title="conda.yml"
+channels:
+- conda-forge
+- bioconda
+dependencies:
+- pip
+- pip:
+ - quote==3.0.0 #
+```
+
+Copy the contents of these files into the stubs located in the `containers/build` directory, then run the following command to build the container image yourself.
+
+!!! Note
+
+ We use the `-t quote:latest` flag to tag the container image with the name `quote` and the tag `latest`.
+ We will be able to use this tag to refer to the container image when running it on this system.
+
+```bash
+docker build -t quote:latest containers/build
+```
+
+After it has finished building, you can run the container image you just built.
+
+```bash
+docker run --rm quote:latest quote "Margaret Oakley Dayhoff"
+```
+
+### Takeaway
+
+You've learned two different ways to get a container image for a tool you want to use in your Nextflow pipelines: using Seqera Containers and building the container image yourself.
+
+### What's next?
+
+You have everything you need to continue to the [next chapter](./04_hello_genomics.md) of this training series.
+You can also continue on with an optional exercise to fetch quotes on computer/biology pioneers using the `quote` container and output them using the `cowsay` container.
+
+---
+
+## 2. Make the cow quote famous scientists
+
+This section contains some stretch exercises, to practice what you've learned so far.
+Doing these exercises is _not required_ to understand later parts of the training, but provide a fun way to reinforce your learnings by figuring out how to make the cow quote famous scientists.
+
+```console title="cowsay-output-Grace-Hopper.txt"
+ _________________________________________________
+ / \
+| Humans are allergic to change. They love to |
+| say, 'We've always done it this way.' I try to fi |
+| ght that. That's why I have a clock on my wall th |
+| at runs counter-clockwise. |
+| -Grace Hopper |
+ \ /
+ =================================================
+ \
+ \
+ ^__^
+ (oo)\_______
+ (__)\ )\/\
+ ||----w |
+ || ||
+```
+
+### 2.1. Modify the `hello-containers.nf` script to use a getQuote process
+
+We have a list of computer and biology pioneers in the `containers/data/pioneers.csv` file.
+At a high level, to complete this exercise you will need to:
+
+- Modify the default `params.input_file` to point to the `pioneers.csv` file.
+- Create a `getQuote` process that uses the `quote` container to fetch a quote for each input.
+- Connect the output of the `getQuote` process to the `cowsay` process to display the quote.
+
+For the `quote` container image, you can either use the one you built yourself in the previous stretch exercise or use the one you got from Seqera Containers .
+
+!!! Hint
+
+ A good choice for the `script` block of your getQuote process might be:
+ ```groovy
+ script:
+ def safe_author = author.tokenize(' ').join('-')
+ """
+ quote "$author" > quote-${safe_author}.txt
+ echo "-${author}" >> quote-${safe_author}.txt
+ """
+ ```
+
+You can find a solution to this exercise in `containers/solutions/hello-containers-4.1.nf`.
+
+### 2.2. Modify your Nextflow pipeline to allow it to execute in `quote` and `sayHello` modes.
+
+Add some branching logic using to your pipeline to allow it to accept inputs intended for both `quote` and `sayHello`.
+Here's an example of how to use an `if` statement in a Nextflow workflow:
+
+```groovy title="hello-containers.nf"
+workflow {
+ if (params.quote) {
+ ...
+ }
+ else {
+ ...
+ }
+ cowSay(text_ch)
+}
+```
+
+!!! Hint
+
+ You can use `new_ch = processName.out` to assign a name to the output channel of a process.
+
+You can find a solution to this exercise in `containers/solutions/hello-containers-4.2.nf`.
+
+### Takeaway
+
+You know how to use containers in Nextflow to run processes, and how to build some branching logic into your pipelines!
+
+### What's next?
+
+Celebrate, take a stretch break and drink some water!
+
+When you are ready, move on to Part 3 of this training series to learn how to apply what you've learned so far to a more realistic data analysis use case.
diff --git a/docs/side_quests/if_else.md b/docs/side_quests/if_else.md
new file mode 100644
index 000000000..62bb74853
--- /dev/null
+++ b/docs/side_quests/if_else.md
@@ -0,0 +1,87 @@
+# Part 2: If - Else
+
+[TODO]
+
+---
+
+## 1. Make the cow quote famous scientists
+
+This section contains some stretch exercises, to practice what you've learned so far.
+Doing these exercises is _not required_ to understand later parts of the training, but provide a fun way to reinforce your learnings by figuring out how to make the cow quote famous scientists.
+
+```console title="cowsay-output-Grace-Hopper.txt"
+ _________________________________________________
+ / \
+| Humans are allergic to change. They love to |
+| say, 'We've always done it this way.' I try to fi |
+| ght that. That's why I have a clock on my wall th |
+| at runs counter-clockwise. |
+| -Grace Hopper |
+ \ /
+ =================================================
+ \
+ \
+ ^__^
+ (oo)\_______
+ (__)\ )\/\
+ ||----w |
+ || ||
+```
+
+### 1.1. Modify the `hello-containers.nf` script to use a getQuote process
+
+We have a list of computer and biology pioneers in the `containers/data/pioneers.csv` file.
+At a high level, to complete this exercise you will need to:
+
+- Modify the default `params.input_file` to point to the `pioneers.csv` file.
+- Create a `getQuote` process that uses the `quote` container to fetch a quote for each input.
+- Connect the output of the `getQuote` process to the `cowsay` process to display the quote.
+
+For the `quote` container image, you can either use the one you built yourself in the previous stretch exercise or use the one you got from Seqera Containers .
+
+!!! Hint
+
+ A good choice for the `script` block of your getQuote process might be:
+ ```groovy
+ script:
+ def safe_author = author.tokenize(' ').join('-')
+ """
+ quote "$author" > quote-${safe_author}.txt
+ echo "-${author}" >> quote-${safe_author}.txt
+ """
+ ```
+
+You can find a solution to this exercise in `containers/solutions/hello-containers-4.1.nf`.
+
+### 1.2. Modify your Nextflow pipeline to allow it to execute in `quote` and `sayHello` modes.
+
+Add some branching logic using to your pipeline to allow it to accept inputs intended for both `quote` and `sayHello`.
+Here's an example of how to use an `if` statement in a Nextflow workflow:
+
+```groovy title="hello-containers.nf"
+workflow {
+ if (params.quote) {
+ ...
+ }
+ else {
+ ...
+ }
+ cowSay(text_ch)
+}
+```
+
+!!! Hint
+
+ You can use `new_ch = processName.out` to assign a name to the output channel of a process.
+
+You can find a solution to this exercise in `containers/solutions/hello-containers-4.2.nf`.
+
+### Takeaway
+
+You know how to use containers in Nextflow to run processes, and how to build some branching logic into your pipelines!
+
+### What's next?
+
+Celebrate, take a stretch break and drink some water!
+
+When you are ready, move on to Part 3 of this training series to learn how to apply what you've learned so far to a more realistic data analysis use case.
diff --git a/docs/side_quests/index.md b/docs/side_quests/index.md
new file mode 100644
index 000000000..16e484e91
--- /dev/null
+++ b/docs/side_quests/index.md
@@ -0,0 +1,27 @@
+---
+title: Side Quests
+hide:
+ - toc
+---
+
+# Side Quests
+
+[TODO] This is a collection of standalone training modules that go deeper into specific topics. You can go through them in any order.
+
+Let's get started!
+
+[![Open in Gitpod](https://img.shields.io/badge/Gitpod-%20Open%20in%20Gitpod-908a85?logo=gitpod)](https://gitpod.io/#https://github.com/nextflow-io/training)
+
+## Learning objectives
+
+The learning objectives are specific to each section. [TODO]
+
+## Audience & prerequisites
+
+[TODO]
+
+**Prerequisites**
+
+- A GitHub account
+- Experience with command line
+ [TODO]
diff --git a/docs/side_quests/orientation.md b/docs/side_quests/orientation.md
new file mode 100644
index 000000000..4a2047847
--- /dev/null
+++ b/docs/side_quests/orientation.md
@@ -0,0 +1,46 @@
+# Orientation
+
+The Gitpod environment contains all the software, code and data necessary to work through this training course, so you don't need to install anything yourself.
+However, you do need a (free) account to log in, and you should take a few minutes to familiarize yourself with the interface.
+
+If you have not yet done so, please follow [this link](../../envsetup/) before going any further.
+
+## Materials provided
+
+Throughout this training course, we'll be working in the `tapas/` directory.
+This directory contains all the code files, test data and accessory files you will need.
+
+Feel free to explore the contents of this directory; the easiest way to do so is to use the file explorer on the left-hand side of the Gitpod workspace.
+Alternatively, you can use the `tree` command.
+Throughout the course, we use the output of `tree` to represent directory structure and contents in a readable form, sometimes with minor modifications for clarity.
+
+Here we generate a table of contents to the second level down:
+
+```bash
+tree . -L 2
+```
+
+If you run this inside `tapas`, you should see the following output: [TODO]
+
+```console title="Directory contents"
+.
+```
+
+!!!note
+
+ Don't worry if this seems like a lot; we'll go through the relevant pieces at each step of the course.
+ This is just meant to give you an overview.
+
+**Here's a summary of what you should know to get started:**
+
+[TODO]
+
+!!!tip
+
+ If for whatever reason you move out of this directory, you can always run this command to return to it:
+
+ ```bash
+ cd /workspace/gitpod/tapas
+ ```
+
+Now, to begin the course, click on the arrow in the bottom right corner of this page.
diff --git a/gitpod-ws.code-workspace b/gitpod-ws.code-workspace
index eac21106c..f6865101f 100644
--- a/gitpod-ws.code-workspace
+++ b/gitpod-ws.code-workspace
@@ -1,11 +1,9 @@
{
"folders": [
{ "path": "./hello-nextflow" },
+ { "path": "./nf4-science" },
{ "path": "./nf-training" },
{ "path": "./nf-training-advanced" },
- { "path": "./nf-customize" },
- { "path": "./nf-develop" },
- { "path": "./troubleshoot" },
],
"settings": {
"typescript.tsdk": "gitpod/node_modules/typescript/lib",
diff --git a/hello-nextflow/containers/build/Dockerfile b/hello-nextflow/containers/build/Dockerfile
deleted file mode 100644
index e69de29bb..000000000
diff --git a/hello-nextflow/containers/build/conda.yml b/hello-nextflow/containers/build/conda.yml
deleted file mode 100644
index e69de29bb..000000000
diff --git a/hello-nextflow/containers/data/greetings.csv b/hello-nextflow/containers/data/greetings.csv
deleted file mode 100644
index ab8a5901d..000000000
--- a/hello-nextflow/containers/data/greetings.csv
+++ /dev/null
@@ -1 +0,0 @@
-Hello,Bonjour,Holà
diff --git a/hello-nextflow/containers/data/pioneers.csv b/hello-nextflow/containers/data/pioneers.csv
deleted file mode 100644
index d97928012..000000000
--- a/hello-nextflow/containers/data/pioneers.csv
+++ /dev/null
@@ -1,4 +0,0 @@
-Grace Hopper
-Alan Turing
-Margaret Oakley Dayhoff
-Rosalind Franklin
diff --git a/hello-nextflow/containers/results/.gitignore b/hello-nextflow/containers/results/.gitignore
deleted file mode 100644
index d6b7ef32c..000000000
--- a/hello-nextflow/containers/results/.gitignore
+++ /dev/null
@@ -1,2 +0,0 @@
-*
-!.gitignore
diff --git a/hello-nextflow/containers/solutions/hello-containers-3.nf b/hello-nextflow/containers/solutions/hello-containers-3.nf
deleted file mode 100644
index cdefb478d..000000000
--- a/hello-nextflow/containers/solutions/hello-containers-3.nf
+++ /dev/null
@@ -1,63 +0,0 @@
-#!/usr/bin/env nextflow
-
-/*
- * Pipeline parameters
- */
-params.input_file = "containers/data/greetings.csv"
-// params.character can be any of 'beavis', 'cheese', 'cow', 'daemon', 'dragon', 'fox', 'ghostbusters', 'kitty',
-// 'meow', 'miki', 'milk', 'octopus', 'pig', 'stegosaurus', 'stimpy', 'trex', 'turkey', 'turtle', 'tux'
-params.character = "cow"
-
-/*
- * Use echo to print 'Hello World!' to standard out
- */
-process sayHello {
-
- publishDir 'containers/results', mode: 'copy'
-
- input:
- val greeting
-
- output:
- path "output-*.txt"
-
- script:
- // Replace the spaces in the greeting with hyphens for the output filename
- def safe_greeting = greeting.tokenize(' ').join('-')
- """
- echo '$greeting' > 'output-${safe_greeting}.txt'
- """
-}
-
-/*
- * Use a cow (or other character) to say some text
- */
-process cowSay {
-
- publishDir 'containers/results', mode: 'copy'
- container 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65'
-
- input:
- path input_file
-
- output:
- path "cowsay-*"
-
- script:
- """
- cowsay -c "$params.character" -t "\$(cat $input_file)" > cowsay-${input_file}
- """
-}
-
-workflow {
-
- // create a channel for inputs from a CSV file
- input_ch = Channel.fromPath(params.input_file)
- .splitCsv()
- .flatten()
-
- sayHello(input_ch)
-
- // cowSay the text
- cowSay(sayHello.out)
-}
diff --git a/hello-nextflow/containers/solutions/hello-containers-4.1.nf b/hello-nextflow/containers/solutions/hello-containers-4.1.nf
deleted file mode 100644
index fc2e9ba7f..000000000
--- a/hello-nextflow/containers/solutions/hello-containers-4.1.nf
+++ /dev/null
@@ -1,83 +0,0 @@
-#!/usr/bin/env nextflow
-
-/*
- * Pipeline parameters
- */
-params.input_file = "containers/data/pioneers.csv"
-// params.character can be any of 'beavis', 'cheese', 'cow', 'daemon', 'dragon', 'fox', 'ghostbusters', 'kitty',
-// 'meow', 'miki', 'milk', 'octopus', 'pig', 'stegosaurus', 'stimpy', 'trex', 'turkey', 'turtle', 'tux'
-params.character = "cow"
-
-/*
- * Use echo to print 'Hello World!' to standard out
- */
-process sayHello {
-
- publishDir 'containers/results', mode: 'copy'
-
- input:
- val greeting
-
- output:
- path "output-*.txt"
-
- script:
- // Replace the spaces in the greeting with hyphens for the output filename
- def safe_greeting = greeting.tokenize(' ').join('-')
- """
- echo '$greeting' > 'output-${safe_greeting}.txt'
- """
-}
-
-/*
- * Use a cow (or other character) to say some text
- */
-process cowSay {
-
- publishDir 'containers/results', mode: 'copy'
- container 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65'
-
- input:
- path input_file
-
- output:
- path "cowsay-*"
-
- script:
- """
- cowsay -c "$params.character" -t "\$(cat $input_file)" > cowsay-${input_file}
- """
-}
-
-process getQuote {
-
- publishDir 'containers/results', mode: 'copy'
- container 'community.wave.seqera.io/library/pip_quote:ae07804021465ee9'
-
- input:
- val author
-
- output:
- path "quote-*.txt"
-
- script:
- // Replace the spaces in the author with hyphens for the output filename
- def safe_author = author.tokenize(' ').join('-')
- """
- quote "$author" > quote-${safe_author}.txt
- echo "-${author}" >> quote-${safe_author}.txt
- """
-}
-
-workflow {
-
- // create a channel for inputs from a CSV file
- input_ch = Channel.fromPath(params.input_file)
- .splitCsv()
- .flatten()
-
- getQuote(input_ch)
-
- // cowSay the text
- cowSay(getQuote.out)
-}
diff --git a/hello-nextflow/containers/solutions/hello-containers-4.2.nf b/hello-nextflow/containers/solutions/hello-containers-4.2.nf
deleted file mode 100644
index 9a38316c8..000000000
--- a/hello-nextflow/containers/solutions/hello-containers-4.2.nf
+++ /dev/null
@@ -1,94 +0,0 @@
-#!/usr/bin/env nextflow
-
-/*
- * Pipeline parameters
- */
-params.input_file = "containers/data/pioneers.csv"
-params.quote = true
-// params.character can be any of 'beavis', 'cheese', 'cow', 'daemon', 'dragon', 'fox', 'ghostbusters', 'kitty',
-// 'meow', 'miki', 'milk', 'octopus', 'pig', 'stegosaurus', 'stimpy', 'trex', 'turkey', 'turtle', 'tux'
-params.character = "cow"
-
-/*
- * Use echo to print 'Hello World!' to standard out
- */
-process sayHello {
-
- publishDir 'containers/results', mode: 'copy'
-
- input:
- val greeting
-
- output:
- path "output-*.txt"
-
- script:
- // Replace the spaces in the greeting with hyphens for the output filename
- def safe_greeting = greeting.tokenize(' ').join('-')
- """
- echo '$greeting' > 'output-${safe_greeting}.txt'
- """
-}
-
-/*
- * Use a cow (or other character) to say some text
- */
-process cowSay {
-
- publishDir 'containers/results', mode: 'copy'
- container 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65'
-
- input:
- path input_file
-
- output:
- path "cowsay-*"
-
- script:
- """
- cowsay -c "$params.character" -t "\$(cat $input_file)" > cowsay-${input_file}
- """
-}
-
-/*
- * Get a quote by author name from the goodreads API
- */
-process getQuote {
-
- publishDir 'containers/results', mode: 'copy'
- container 'community.wave.seqera.io/library/pip_quote:ae07804021465ee9'
-
- input:
- val author
-
- output:
- path "quote-*.txt"
-
- script:
- // Replace the spaces in the author with hyphens for the output filename
- def safe_author = author.tokenize(' ').join('-')
- """
- quote "$author" > quote-${safe_author}.txt
- echo "-${author}" >> quote-${safe_author}.txt
- """
-}
-
-workflow {
-
- // create a channel for inputs from a CSV file
- input_ch = Channel.fromPath(params.input_file)
- .splitCsv()
- .flatten()
-
- // create a channel for the text to be processed
- if (params.quote) {
- getQuote(input_ch)
- text_ch = getQuote.out
- } else {
- sayHello(input_ch)
- text_ch = sayHello.out
- }
-
- // cowSay the text
- cowSay(text_ch)
-}
diff --git a/hello-nextflow/data/greetings.csv b/hello-nextflow/data/greetings.csv
deleted file mode 100644
index ab8a5901d..000000000
--- a/hello-nextflow/data/greetings.csv
+++ /dev/null
@@ -1 +0,0 @@
-Hello,Bonjour,Holà
diff --git a/hello-nextflow/data/sample_bams.txt b/hello-nextflow/data/sample_bams.txt
deleted file mode 100644
index 64e2ce928..000000000
--- a/hello-nextflow/data/sample_bams.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-/workspace/gitpod/hello-nextflow/data/bam/reads_mother.bam
-/workspace/gitpod/hello-nextflow/data/bam/reads_father.bam
-/workspace/gitpod/hello-nextflow/data/bam/reads_son.bam
diff --git a/hello-nextflow/data/samplesheet.csv b/hello-nextflow/data/samplesheet.csv
deleted file mode 100644
index 17fb15a46..000000000
--- a/hello-nextflow/data/samplesheet.csv
+++ /dev/null
@@ -1,4 +0,0 @@
-id,reads_bam
-NA12878,/workspace/gitpod/hello-nextflow/data/bam/reads_mother.bam
-NA12877,/workspace/gitpod/hello-nextflow/data/bam/reads_father.bam
-NA12882,/workspace/gitpod/hello-nextflow/data/bam/reads_son.bam
diff --git a/hello-nextflow/greetings.csv b/hello-nextflow/greetings.csv
new file mode 100644
index 000000000..c5889e19a
--- /dev/null
+++ b/hello-nextflow/greetings.csv
@@ -0,0 +1,3 @@
+Hello
+Bonjour
+Holà
diff --git a/hello-nextflow/hello-channels.nf b/hello-nextflow/hello-channels.nf
new file mode 100644
index 000000000..6236eea21
--- /dev/null
+++ b/hello-nextflow/hello-channels.nf
@@ -0,0 +1,31 @@
+#!/usr/bin/env nextflow
+
+/*
+ * Use echo to print 'Hello World!' to a file
+ */
+process sayHello {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ val greeting
+
+ output:
+ path 'output.txt'
+
+ script:
+ """
+ echo '$greeting' > output.txt
+ """
+}
+
+/*
+ * Pipeline parameters
+ */
+params.greeting = 'Holà mundo!'
+
+workflow {
+
+ // emit a greeting
+ sayHello(params.greeting)
+}
diff --git a/hello-nextflow/hello-config.nf b/hello-nextflow/hello-config.nf
new file mode 100644
index 000000000..f187831d1
--- /dev/null
+++ b/hello-nextflow/hello-config.nf
@@ -0,0 +1,37 @@
+#!/usr/bin/env nextflow
+
+/*
+ * Pipeline parameters
+ */
+params.greeting = 'greetings.csv'
+params.batch = 'test-batch'
+params.character = 'turkey'
+
+// Include modules
+include { sayHello } from './modules/sayHello.nf'
+include { convertToUpper } from './modules/convertToUpper.nf'
+include { collectGreetings } from './modules/collectGreetings.nf'
+include { cowpy } from './modules/cowpy.nf'
+
+workflow {
+
+ // create a channel for inputs from a CSV file
+ greeting_ch = Channel.fromPath(params.greeting)
+ .splitCsv()
+ .map { line -> line[0] }
+
+ // emit a greeting
+ sayHello(greeting_ch)
+
+ // convert the greeting to uppercase
+ convertToUpper(sayHello.out)
+
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out.collect(), params.batch)
+
+ // emit a message about the size of the batch
+ collectGreetings.out.count.view { "There were $it greetings in this batch" }
+
+ // generate ASCII art of the greetings with cowpy
+ cowpy(collectGreetings.out.outfile, params.character)
+}
diff --git a/hello-nextflow/hello-config/demo-params.json b/hello-nextflow/hello-config/demo-params.json
deleted file mode 100644
index 59be1e1d4..000000000
--- a/hello-nextflow/hello-config/demo-params.json
+++ /dev/null
@@ -1,9 +0,0 @@
-{
- "reads_bam": "data/sample_bams.txt",
- "outdir": "results_genomics",
- "reference": "data/ref/ref.fasta",
- "reference_index": "data/ref/ref.fasta.fai",
- "reference_dict": "data/ref/ref.dict",
- "intervals": "data/ref/intervals.bed",
- "cohort_name": "family_trio"
-}
diff --git a/hello-nextflow/hello-config/main.nf b/hello-nextflow/hello-config/main.nf
deleted file mode 100644
index 755b55455..000000000
--- a/hello-nextflow/hello-config/main.nf
+++ /dev/null
@@ -1,149 +0,0 @@
-#!/usr/bin/env nextflow
-
-/*
- * Pipeline parameters
- */
-
-// Primary input (file of input files, one per line)
-params.reads_bam = "${projectDir}/data/sample_bams.txt"
-
-// Output directory
-params.outdir = "results_genomics"
-
-// Accessory files
-params.reference = "${projectDir}/data/ref/ref.fasta"
-params.reference_index = "${projectDir}/data/ref/ref.fasta.fai"
-params.reference_dict = "${projectDir}/data/ref/ref.dict"
-params.intervals = "${projectDir}/data/ref/intervals.bed"
-
-// Base name for final output file
-params.cohort_name = "family_trio"
-
-/*
- * Generate BAM index file
- */
-process SAMTOOLS_INDEX {
-
- container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'
-
- publishDir params.outdir, mode: 'symlink'
-
- input:
- path input_bam
-
- output:
- tuple path(input_bam), path("${input_bam}.bai")
-
- script:
- """
- samtools index '$input_bam'
- """
-}
-
-/*
- * Call variants with GATK HaplotypeCaller
- */
-process GATK_HAPLOTYPECALLER {
-
- container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
-
- publishDir params.outdir, mode: 'symlink'
-
- input:
- tuple path(input_bam), path(input_bam_index)
- path ref_fasta
- path ref_index
- path ref_dict
- path interval_list
-
- output:
- path "${input_bam}.g.vcf" , emit: vcf
- path "${input_bam}.g.vcf.idx" , emit: idx
-
- script:
- """
- gatk HaplotypeCaller \
- -R ${ref_fasta} \
- -I ${input_bam} \
- -O ${input_bam}.g.vcf \
- -L ${interval_list} \
- -ERC GVCF
- """
-}
-
-/*
- * Combine GVCFs into GenomicsDB datastore and run joint genotyping to produce cohort-level calls
- */
-process GATK_JOINTGENOTYPING {
-
- container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
-
- publishDir params.outdir, mode: 'symlink'
-
- input:
- path all_gvcfs
- path all_idxs
- path interval_list
- val cohort_name
- path ref_fasta
- path ref_index
- path ref_dict
-
- output:
- path "${cohort_name}.joint.vcf"
- path "${cohort_name}.joint.vcf.idx"
-
- script:
- def gvcfs_line = all_gvcfs.collect { gvcf -> "-V ${gvcf}" }.join(' ')
- """
- gatk GenomicsDBImport \
- ${gvcfs_line} \
- -L ${interval_list} \
- --genomicsdb-workspace-path ${cohort_name}_gdb
-
- gatk GenotypeGVCFs \
- -R ${ref_fasta} \
- -V gendb://${cohort_name}_gdb \
- -L ${interval_list} \
- -O ${cohort_name}.joint.vcf
- """
-}
-
-workflow {
-
- // Create input channel from a text file listing input file paths
- reads_ch = Channel.fromPath(params.reads_bam).splitText()
-
- // Load the file paths for the accessory files (reference and intervals)
- ref_file = file(params.reference)
- ref_index_file = file(params.reference_index)
- ref_dict_file = file(params.reference_dict)
- intervals_file = file(params.intervals)
-
- // Create index file for input BAM file
- SAMTOOLS_INDEX(reads_ch)
-
- // Call variants from the indexed BAM file
- GATK_HAPLOTYPECALLER(
- SAMTOOLS_INDEX.out,
- ref_file,
- ref_index_file,
- ref_dict_file,
- intervals_file
- )
-
- // Collect variant calling outputs across samples
- all_gvcfs_ch = GATK_HAPLOTYPECALLER.out.vcf.collect()
- all_idxs_ch = GATK_HAPLOTYPECALLER.out.idx.collect()
-
- // Combine GVCFs into a GenomicsDB data store and apply joint genotyping
- GATK_JOINTGENOTYPING(
- all_gvcfs_ch,
- all_idxs_ch,
- intervals_file,
- params.cohort_name,
- ref_file,
- ref_index_file,
- ref_dict_file
- )
-}
diff --git a/hello-nextflow/hello-config/nextflow.config b/hello-nextflow/hello-config/nextflow.config
deleted file mode 100644
index 187d8f216..000000000
--- a/hello-nextflow/hello-config/nextflow.config
+++ /dev/null
@@ -1,2 +0,0 @@
-docker.fixOwnership = true
-docker.enabled = true
diff --git a/hello-nextflow/hello-containers.nf b/hello-nextflow/hello-containers.nf
index fa799b78b..8764606bf 100644
--- a/hello-nextflow/hello-containers.nf
+++ b/hello-nextflow/hello-containers.nf
@@ -3,60 +3,30 @@
/*
* Pipeline parameters
*/
-params.input_file = "containers/data/greetings.csv"
-// params.character can be any of 'beavis', 'cheese', 'cow', 'daemon', 'dragon', 'fox', 'ghostbusters', 'kitty',
-// 'meow', 'miki', 'milk', 'octopus', 'pig', 'stegosaurus', 'stimpy', 'trex', 'turkey', 'turtle', 'tux'
-params.character = "cow"
+params.greeting = 'greetings.csv'
+params.batch = 'test-batch'
-/*
- * Use echo to print 'Hello World!' to standard out
- */
-process sayHello {
-
- publishDir 'containers/results', mode: 'copy'
-
- input:
- val greeting
-
- output:
- path "output-*.txt"
-
- script:
- // Replace the spaces in the greeting with hyphens for the output filename
- def safe_greeting = greeting.tokenize(' ').join('-')
- """
- echo '$greeting' > 'output-${safe_greeting}.txt'
- """
-}
-
-/*
- * Use a cow (or other character) to say some text
- */
-process cowSay {
-
- publishDir 'containers/results', mode: 'copy'
-
- input:
- path input_file
-
- output:
- path "cowsay-*"
-
- script:
- """
- cowsay -c "$params.character" -t "\$(cat $input_file)" > cowsay-${input_file}
- """
-}
+// Include modules
+include { sayHello } from './modules/sayHello.nf'
+include { convertToUpper } from './modules/convertToUpper.nf'
+include { collectGreetings } from './modules/collectGreetings.nf'
workflow {
// create a channel for inputs from a CSV file
- input_ch = Channel.fromPath(params.input_file)
- .splitCsv()
- .flatten()
+ greeting_ch = Channel.fromPath(params.greeting)
+ .splitCsv()
+ .map { line -> line[0] }
+
+ // emit a greeting
+ sayHello(greeting_ch)
+
+ // convert the greeting to uppercase
+ convertToUpper(sayHello.out)
- sayHello(input_ch)
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out.collect(), params.batch)
- // cowSay the text
- cowSay(sayHello.out)
+ // emit a message about the size of the batch
+ collectGreetings.out.count.view { "There were $it greetings in this batch" }
}
diff --git a/hello-nextflow/hello-modules.nf b/hello-nextflow/hello-modules.nf
new file mode 100644
index 000000000..ccf29abd6
--- /dev/null
+++ b/hello-nextflow/hello-modules.nf
@@ -0,0 +1,87 @@
+#!/usr/bin/env nextflow
+
+/*
+ * Use echo to print 'Hello World!' to a file
+ */
+process sayHello {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ val greeting
+
+ output:
+ path "${greeting}-output.txt"
+
+ script:
+ """
+ echo '$greeting' > '$greeting-output.txt'
+ """
+}
+
+/*
+ * Use a text replacement tool to convert the greeting to uppercase
+ */
+process convertToUpper {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ path input_file
+
+ output:
+ path "UPPER-${input_file}"
+
+ script:
+ """
+ cat '$input_file' | tr '[a-z]' '[A-Z]' > 'UPPER-${input_file}'
+ """
+}
+
+/*
+ * Collect uppercase greetings into a single output file
+ */
+process collectGreetings {
+
+ publishDir 'results', mode: 'copy'
+
+ input:
+ path input_files
+ val batch_name
+
+ output:
+ path "COLLECTED-${batch_name}-output.txt" , emit: outfile
+ val count_greetings , emit: count
+
+ script:
+ count_greetings = input_files.size()
+ """
+ cat ${input_files} > 'COLLECTED-${batch_name}-output.txt'
+ """
+}
+
+/*
+ * Pipeline parameters
+ */
+params.greeting = 'greetings.csv'
+params.batch = 'test-batch'
+
+workflow {
+
+ // create a channel for inputs from a CSV file
+ greeting_ch = Channel.fromPath(params.greeting)
+ .splitCsv()
+ .map { line -> line[0] }
+
+ // emit a greeting
+ sayHello(greeting_ch)
+
+ // convert the greeting to uppercase
+ convertToUpper(sayHello.out)
+
+ // collect all the greetings into one file
+ collectGreetings(convertToUpper.out.collect(), params.batch)
+
+ // emit a message about the size of the batch
+ collectGreetings.out.count.view { "There were $it greetings in this batch" }
+}
diff --git a/hello-nextflow/hello-modules/demo-params.json b/hello-nextflow/hello-modules/demo-params.json
deleted file mode 100644
index 59be1e1d4..000000000
--- a/hello-nextflow/hello-modules/demo-params.json
+++ /dev/null
@@ -1,9 +0,0 @@
-{
- "reads_bam": "data/sample_bams.txt",
- "outdir": "results_genomics",
- "reference": "data/ref/ref.fasta",
- "reference_index": "data/ref/ref.fasta.fai",
- "reference_dict": "data/ref/ref.dict",
- "intervals": "data/ref/intervals.bed",
- "cohort_name": "family_trio"
-}
diff --git a/hello-nextflow/hello-modules/main.nf b/hello-nextflow/hello-modules/main.nf
deleted file mode 100644
index 316b0c3a8..000000000
--- a/hello-nextflow/hello-modules/main.nf
+++ /dev/null
@@ -1,133 +0,0 @@
-#!/usr/bin/env nextflow
-
-/*
- * Generate BAM index file
- */
-process SAMTOOLS_INDEX {
-
- container 'community.wave.seqera.io/library/samtools:1.20--b5dfbd93de237464'
- conda "bioconda::samtools=1.20"
-
- publishDir params.outdir, mode: 'symlink'
-
- input:
- path input_bam
-
- output:
- tuple path(input_bam), path("${input_bam}.bai")
-
- script:
- """
- samtools index '$input_bam'
- """
-}
-
-/*
- * Call variants with GATK HaplotypeCaller
- */
-process GATK_HAPLOTYPECALLER {
-
- container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
- conda "bioconda::gatk4=4.5.0.0"
-
- publishDir params.outdir, mode: 'symlink'
-
- input:
- tuple path(input_bam), path(input_bam_index)
- path ref_fasta
- path ref_index
- path ref_dict
- path interval_list
-
- output:
- path "${input_bam}.g.vcf" , emit: vcf
- path "${input_bam}.g.vcf.idx" , emit: idx
-
- script:
- """
- gatk HaplotypeCaller \
- -R ${ref_fasta} \
- -I ${input_bam} \
- -O ${input_bam}.g.vcf \
- -L ${interval_list} \
- -ERC GVCF
- """
-}
-
-/*
- * Combine GVCFs into GenomicsDB datastore and run joint genotyping to produce cohort-level calls
- */
-process GATK_JOINTGENOTYPING {
-
- container "community.wave.seqera.io/library/gatk4:4.5.0.0--730ee8817e436867"
- conda "bioconda::gatk4=4.5.0.0"
-
- publishDir params.outdir, mode: 'symlink'
-
- input:
- path all_gvcfs
- path all_idxs
- path interval_list
- val cohort_name
- path ref_fasta
- path ref_index
- path ref_dict
-
- output:
- path "${cohort_name}.joint.vcf"
- path "${cohort_name}.joint.vcf.idx"
-
- script:
- def gvcfs_line = all_gvcfs.collect { gvcf -> "-V ${gvcf}" }.join(' ')
- """
- gatk GenomicsDBImport \
- ${gvcfs_line} \
- -L ${interval_list} \
- --genomicsdb-workspace-path ${cohort_name}_gdb
-
- gatk GenotypeGVCFs \
- -R ${ref_fasta} \
- -V gendb://${cohort_name}_gdb \
- -L ${interval_list} \
- -O ${cohort_name}.joint.vcf
- """
-}
-
-workflow {
-
- // Create input channel from a text file listing input file paths
- reads_ch = Channel.fromPath(params.reads_bam).splitText()
-
- // Load the file paths for the accessory files (reference and intervals)
- ref_file = file(params.reference)
- ref_index_file = file(params.reference_index)
- ref_dict_file = file(params.reference_dict)
- intervals_file = file(params.intervals)
-
- // Create index file for input BAM file
- SAMTOOLS_INDEX(reads_ch)
-
- // Call variants from the indexed BAM file
- GATK_HAPLOTYPECALLER(
- SAMTOOLS_INDEX.out,
- ref_file,
- ref_index_file,
- ref_dict_file,
- intervals_file
- )
-
- // Collect variant calling outputs across samples
- all_gvcfs_ch = GATK_HAPLOTYPECALLER.out.vcf.collect()
- all_idxs_ch = GATK_HAPLOTYPECALLER.out.idx.collect()
-
- // Combine GVCFs into a GenomicsDB data store and apply joint genotyping
- GATK_JOINTGENOTYPING(
- all_gvcfs_ch,
- all_idxs_ch,
- intervals_file,
- params.cohort_name,
- ref_file,
- ref_index_file,
- ref_dict_file
- )
-}
diff --git a/hello-nextflow/hello-modules/nextflow.config b/hello-nextflow/hello-modules/nextflow.config
deleted file mode 100644
index 86142d64b..000000000
--- a/hello-nextflow/hello-modules/nextflow.config
+++ /dev/null
@@ -1,70 +0,0 @@
-docker.fixOwnership = true
-
-/*
- * Pipeline parameters
- */
-
-params {
- // Primary input (file of input files, one per line)
- reads_bam = null
-
- // Output directory
- params.outdir = "results_genomics"
-
- // Accessory files
- reference = null
- reference_index = null
- reference_dict = null
- intervals = null
-
- // Base name for final output file
- cohort_name = "my_cohort"
-}
-
-profiles {
- docker_on {
- docker.enabled = true
- }
- conda_on {
- conda.enabled = true
- }
- my_laptop {
- process.executor = 'local'
- docker.enabled = true
- }
- univ_hpc {
- process.executor = 'slurm'
- conda.enabled = true
- process.resourceLimits = [
- memory: 750.GB,
- cpus: 200,
- time: 30.d
- ]
- }
- demo {
- // Primary input (file of input files, one per line)
- params.reads_bam = "data/sample_bams.txt"
-
- // Output directory
- params.outdir = "results_genomics"
-
- // Accessory files
- params.reference = "data/ref/ref.fasta"
- params.reference_index = "data/ref/ref.fasta.fai"
- params.reference_dict = "data/ref/ref.dict"
- params.intervals = "data/ref/intervals.bed"
-
- // Base name for final output file
- params.cohort_name = "family_trio"
- }
-}
-
-process {
- // defaults for all processes
- cpus = 2
- memory = 2.GB
- // allocations for a specific process
- withName: 'GATK_JOINTGENOTYPING' {
- cpus = 4
- }
-}
diff --git a/hello-nextflow/hello-nf-core/data/sequencer_samplesheet.csv b/hello-nextflow/hello-nf-core/data/sequencer_samplesheet.csv
deleted file mode 100644
index f2c3b7805..000000000
--- a/hello-nextflow/hello-nf-core/data/sequencer_samplesheet.csv
+++ /dev/null
@@ -1,5 +0,0 @@
-sample,sequencer,fastq_1,fastq_2
-SAMPLE1_PE,sequencer1,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R2.fastq.gz
-SAMPLE2_PE,sequencer2,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample2_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample2_R2.fastq.gz
-SAMPLE3_SE,sequencer3,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R1.fastq.gz,
-SAMPLE3_SE,sequencer3,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample2_R1.fastq.gz,
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.editorconfig b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.editorconfig
deleted file mode 100644
index 72dda289a..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.editorconfig
+++ /dev/null
@@ -1,33 +0,0 @@
-root = true
-
-[*]
-charset = utf-8
-end_of_line = lf
-insert_final_newline = true
-trim_trailing_whitespace = true
-indent_size = 4
-indent_style = space
-
-[*.{md,yml,yaml,html,css,scss,js}]
-indent_size = 2
-
-# These files are edited and tested upstream in nf-core/modules
-[/modules/nf-core/**]
-charset = unset
-end_of_line = unset
-insert_final_newline = unset
-trim_trailing_whitespace = unset
-indent_style = unset
-[/subworkflows/nf-core/**]
-charset = unset
-end_of_line = unset
-insert_final_newline = unset
-trim_trailing_whitespace = unset
-indent_style = unset
-
-[/assets/email*]
-indent_size = unset
-
-# ignore python and markdown
-[*.{py,md}]
-indent_style = unset
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.nf-core.yml b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.nf-core.yml
deleted file mode 100644
index be6f132c5..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.nf-core.yml
+++ /dev/null
@@ -1,104 +0,0 @@
-bump_version: null
-lint:
- files_exist:
- - .github/ISSUE_TEMPLATE/bug_report.yml
- - .github/ISSUE_TEMPLATE/feature_request.yml
- - .github/PULL_REQUEST_TEMPLATE.md
- - .github/CONTRIBUTING.md
- - .github/.dockstore.yml
- - .github/workflows/branch.yml
- - .github/workflows/ci.yml
- - .github/workflows/linting_comment.yml
- - .github/workflows/linting.yml
- - conf/igenomes.config
- - conf/igenomes_ignored.config
- - CITATIONS.md
- - CHANGELOG.md
- - .github/ISSUE_TEMPLATE/bug_report.yml
- - .github/ISSUE_TEMPLATE/feature_request.yml
- - .github/PULL_REQUEST_TEMPLATE.md
- - .github/CONTRIBUTING.md
- - .github/.dockstore.yml
- - .github/workflows/branch.yml
- - .github/workflows/ci.yml
- - .github/workflows/linting_comment.yml
- - .github/workflows/linting.yml
- - conf/igenomes.config
- - conf/igenomes_ignored.config
- - CITATIONS.md
- - CHANGELOG.md
- - CODE_OF_CONDUCT.md
- - assets/nf-core-myfirstpipeline_logo_light.png
- - docs/images/nf-core-myfirstpipeline_logo_light.png
- - docs/images/nf-core-myfirstpipeline_logo_dark.png
- - .github/ISSUE_TEMPLATE/config.yml
- - .github/workflows/awstest.yml
- - .github/workflows/awsfulltest.yml
- files_unchanged:
- - .github/ISSUE_TEMPLATE/bug_report.yml
- - .github/ISSUE_TEMPLATE/config.yml
- - .github/ISSUE_TEMPLATE/feature_request.yml
- - .github/PULL_REQUEST_TEMPLATE.md
- - .github/workflows/branch.yml
- - .github/workflows/linting_comment.yml
- - .github/workflows/linting.yml
- - .github/CONTRIBUTING.md
- - .github/.dockstore.yml
- - .github/CONTRIBUTING.md
- - .prettierignore
- - .prettierignore
- - .github/ISSUE_TEMPLATE/bug_report.yml
- - .github/ISSUE_TEMPLATE/config.yml
- - .github/ISSUE_TEMPLATE/feature_request.yml
- - .github/PULL_REQUEST_TEMPLATE.md
- - .github/workflows/branch.yml
- - .github/workflows/linting_comment.yml
- - .github/workflows/linting.yml
- - .github/CONTRIBUTING.md
- - .github/.dockstore.yml
- - .github/CONTRIBUTING.md
- - .prettierignore
- - .prettierignore
- - CODE_OF_CONDUCT.md
- - assets/nf-core-myfirstpipeline_logo_light.png
- - docs/images/nf-core-myfirstpipeline_logo_light.png
- - docs/images/nf-core-myfirstpipeline_logo_dark.png
- - .github/ISSUE_TEMPLATE/bug_report.yml
- multiqc_config:
- - report_comment
- nextflow_config:
- - manifest.name
- - manifest.homePage
- - validation.help.beforeText
- - validation.help.afterText
- - validation.summary.beforeText
- - validation.summary.afterText
- readme:
- - nextflow_badge
- - nextflow_badge
- - nextflow_badge
-nf_core_version: 3.0.2
-org_path: null
-repository_type: pipeline
-template:
- author: gitpod
- description: My first pipeline
- force: true
- is_nfcore: false
- name: myfirstpipeline
- org: myorg
- outdir: .
- skip_features:
- - github
- - ci
- - igenomes
- - github_badges
- - citations
- - gitpod
- - codespaces
- - fastqc
- - changelog
- - adaptivecard
- - slackreport
- version: 1.0.0dev
-update: null
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.pre-commit-config.yaml b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.pre-commit-config.yaml
deleted file mode 100644
index 9e9f0e1c4..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.pre-commit-config.yaml
+++ /dev/null
@@ -1,13 +0,0 @@
-repos:
- - repo: https://github.com/pre-commit/mirrors-prettier
- rev: "v3.1.0"
- hooks:
- - id: prettier
- additional_dependencies:
- - prettier@3.2.5
-
- - repo: https://github.com/editorconfig-checker/editorconfig-checker.python
- rev: "3.0.3"
- hooks:
- - id: editorconfig-checker
- alias: ec
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.prettierignore b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.prettierignore
deleted file mode 100644
index e702cf0b9..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.prettierignore
+++ /dev/null
@@ -1,10 +0,0 @@
-email_template.html
-.nextflow*
-work/
-data/
-results/
-.DS_Store
-testing/
-testing*
-*.pyc
-bin/
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.prettierrc.yml b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.prettierrc.yml
deleted file mode 100644
index c81f9a766..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/.prettierrc.yml
+++ /dev/null
@@ -1 +0,0 @@
-printWidth: 120
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/LICENSE b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/LICENSE
deleted file mode 100644
index 6f6f81482..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/LICENSE
+++ /dev/null
@@ -1,21 +0,0 @@
-MIT License
-
-Copyright (c) gitpod
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/README.md b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/README.md
deleted file mode 100644
index 7c215977b..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/README.md
+++ /dev/null
@@ -1,77 +0,0 @@
-# myorg/myfirstpipeline
-
-## Introduction
-
-**myorg/myfirstpipeline** is a bioinformatics pipeline that ...
-
-
-
-
-
-
-2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
-
-## Usage
-
-> [!NOTE]
-> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
-
-
-
-Now, you can run the pipeline using:
-
-
-
-```bash
-nextflow run myorg/myfirstpipeline \
- -profile \
- --input samplesheet.csv \
- --outdir
-```
-
-> [!WARNING]
-> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
-
-## Credits
-
-myorg/myfirstpipeline was originally written by gitpod.
-
-We thank the following people for their extensive assistance in the development of this pipeline:
-
-
-
-## Contributions and Support
-
-If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).
-
-## Citations
-
-
-
-
-This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/main/LICENSE).
-
-> **The nf-core framework for community-curated bioinformatics pipelines.**
->
-> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
->
-> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/email_template.html b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/email_template.html
deleted file mode 100644
index 68fe28ceb..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/email_template.html
+++ /dev/null
@@ -1,110 +0,0 @@
-
-
-
-
-
-
-
- myorg/myfirstpipeline Pipeline Report
-
-
-
-
-
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/email_template.txt b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/email_template.txt
deleted file mode 100644
index 03310b67d..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/email_template.txt
+++ /dev/null
@@ -1,31 +0,0 @@
-Run Name: $runName
-
-<% if (success){
- out << "## myorg/myfirstpipeline execution completed successfully! ##"
-} else {
- out << """####################################################
-## myorg/myfirstpipeline execution completed unsuccessfully! ##
-####################################################
-The exit status of the task that caused the workflow execution to fail was: $exitStatus.
-The full error message was:
-
-${errorReport}
-"""
-} %>
-
-
-The workflow was completed at $dateComplete (duration: $duration)
-
-The command used to launch the workflow was as follows:
-
- $commandLine
-
-
-
-Pipeline Configuration:
------------------------
-<% out << summary.collect{ k,v -> " - $k: $v" }.join("\n") %>
-
---
-myorg/myfirstpipeline
-https://github.com/myorg/myfirstpipeline
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/multiqc_config.yml b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/multiqc_config.yml
deleted file mode 100644
index 2306d17cf..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/multiqc_config.yml
+++ /dev/null
@@ -1,14 +0,0 @@
-report_comment: >
- This report has been generated by the myorg/myfirstpipeline
- analysis pipeline.
-report_section_order:
- "myorg-myfirstpipeline-methods-description":
- order: -1000
- software_versions:
- order: -1001
- "myorg-myfirstpipeline-summary":
- order: -1002
-
-export_plots: true
-
-disable_version_detection: true
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/samplesheet.csv b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/samplesheet.csv
deleted file mode 100644
index 5f653ab7b..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/samplesheet.csv
+++ /dev/null
@@ -1,3 +0,0 @@
-sample,fastq_1,fastq_2
-SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz
-SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/schema_input.json b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/schema_input.json
deleted file mode 100644
index ab54904b1..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/schema_input.json
+++ /dev/null
@@ -1,38 +0,0 @@
-{
- "$schema": "https://json-schema.org/draft/2020-12/schema",
- "$id": "https://raw.githubusercontent.com/myorg/myfirstpipeline/master/assets/schema_input.json",
- "title": "myorg/myfirstpipeline pipeline - params.input schema",
- "description": "Schema for the file provided with params.input",
- "type": "array",
- "items": {
- "type": "object",
- "properties": {
- "sample": {
- "type": "string",
- "pattern": "^\\S+$",
- "errorMessage": "Sample name must be provided and cannot contain spaces",
- "meta": ["id"]
- },
- "sequencer": {
- "type": "string",
- "pattern": "^\\S+$",
- "meta": ["sequencer"]
- },
- "fastq_1": {
- "type": "string",
- "format": "file-path",
- "exists": true,
- "pattern": "^\\S+\\.f(ast)?q\\.gz$",
- "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
- },
- "fastq_2": {
- "type": "string",
- "format": "file-path",
- "exists": true,
- "pattern": "^\\S+\\.f(ast)?q\\.gz$",
- "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
- }
- },
- "required": ["sample", "fastq_1"]
- }
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/sendmail_template.txt b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/sendmail_template.txt
deleted file mode 100644
index 6fd27c517..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/assets/sendmail_template.txt
+++ /dev/null
@@ -1,53 +0,0 @@
-To: $email
-Subject: $subject
-Mime-Version: 1.0
-Content-Type: multipart/related;boundary="nfcoremimeboundary"
-
---nfcoremimeboundary
-Content-Type: text/html; charset=utf-8
-
-$email_html
-
---nfcoremimeboundary
-Content-Type: image/png;name="myorg-myfirstpipeline_logo.png"
-Content-Transfer-Encoding: base64
-Content-ID:
-Content-Disposition: inline; filename="myorg-myfirstpipeline_logo_light.png"
-
-<% out << new File("$projectDir/assets/myorg-myfirstpipeline_logo_light.png").
- bytes.
- encodeBase64().
- toString().
- tokenize( '\n' )*.
- toList()*.
- collate( 76 )*.
- collect { it.join() }.
- flatten().
- join( '\n' ) %>
-
-<%
-if (mqcFile){
-def mqcFileObj = new File("$mqcFile")
-if (mqcFileObj.length() < mqcMaxSize){
-out << """
---nfcoremimeboundary
-Content-Type: text/html; name=\"multiqc_report\"
-Content-Transfer-Encoding: base64
-Content-ID:
-Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\"
-
-${mqcFileObj.
- bytes.
- encodeBase64().
- toString().
- tokenize( '\n' )*.
- toList()*.
- collate( 76 )*.
- collect { it.join() }.
- flatten().
- join( '\n' )}
-"""
-}}
-%>
-
---nfcoremimeboundary--
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/conf/base.config b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/conf/base.config
deleted file mode 100644
index 34cad808f..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/conf/base.config
+++ /dev/null
@@ -1,62 +0,0 @@
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- myorg/myfirstpipeline Nextflow base config file
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- A 'blank slate' config file, appropriate for general use on most high performance
- compute environments. Assumes that all software is installed and available on
- the PATH. Runs in `local` mode - all jobs will be run on the logged in environment.
-----------------------------------------------------------------------------------------
-*/
-
-process {
-
- // TODO nf-core: Check the defaults for all processes
- cpus = { 1 * task.attempt }
- memory = { 6.GB * task.attempt }
- time = { 4.h * task.attempt }
-
- errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' }
- maxRetries = 1
- maxErrors = '-1'
-
- // Process-specific resource requirements
- // NOTE - Please try and re-use the labels below as much as possible.
- // These labels are used and recognised by default in DSL2 files hosted on nf-core/modules.
- // If possible, it would be nice to keep the same label naming convention when
- // adding in your local modules too.
- // TODO nf-core: Customise requirements for specific processes.
- // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors
- withLabel:process_single {
- cpus = { 1 }
- memory = { 6.GB * task.attempt }
- time = { 4.h * task.attempt }
- }
- withLabel:process_low {
- cpus = { 2 * task.attempt }
- memory = { 12.GB * task.attempt }
- time = { 4.h * task.attempt }
- }
- withLabel:process_medium {
- cpus = { 6 * task.attempt }
- memory = { 36.GB * task.attempt }
- time = { 8.h * task.attempt }
- }
- withLabel:process_high {
- cpus = { 12 * task.attempt }
- memory = { 72.GB * task.attempt }
- time = { 16.h * task.attempt }
- }
- withLabel:process_long {
- time = { 20.h * task.attempt }
- }
- withLabel:process_high_memory {
- memory = { 200.GB * task.attempt }
- }
- withLabel:error_ignore {
- errorStrategy = 'ignore'
- }
- withLabel:error_retry {
- errorStrategy = 'retry'
- maxRetries = 2
- }
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/conf/modules.config b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/conf/modules.config
deleted file mode 100644
index 1da0fcec6..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/conf/modules.config
+++ /dev/null
@@ -1,33 +0,0 @@
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Config file for defining DSL2 per module options and publishing paths
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Available keys to override module options:
- ext.args = Additional arguments appended to command in module.
- ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
- ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
- ext.prefix = File name prefix for output files.
-----------------------------------------------------------------------------------------
-*/
-
-process {
-
- publishDir = [
- path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
- mode: params.publish_dir_mode,
- saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
- ]
-
- withName: 'SEQTK_TRIM' {
- ext.args = "-b 5"
- }
- withName: 'MULTIQC' {
- ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
- publishDir = [
- path: { "${params.outdir}/multiqc" },
- mode: params.publish_dir_mode,
- saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
- ]
- }
-
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/conf/test.config b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/conf/test.config
deleted file mode 100644
index 474ef3196..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/conf/test.config
+++ /dev/null
@@ -1,31 +0,0 @@
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Nextflow config file for running minimal tests
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Defines input files and everything required to run a fast and simple pipeline test.
-
- Use as follows:
- nextflow run myorg/myfirstpipeline -profile test, --outdir
-
-----------------------------------------------------------------------------------------
-*/
-
-process {
- resourceLimits = [
- cpus: 4,
- memory: '15.GB',
- time: '1.h'
- ]
-}
-
-params {
- config_profile_name = 'Test profile'
- config_profile_description = 'Minimal test dataset to check pipeline function'
-
- // Input data
- // TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
- // TODO nf-core: Give any required params for the test so that command line flags are not needed
- input = params.pipelines_testdata_base_path + 'viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv'
-
-
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/conf/test_full.config b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/conf/test_full.config
deleted file mode 100644
index 9b520728a..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/conf/test_full.config
+++ /dev/null
@@ -1,24 +0,0 @@
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Nextflow config file for running full-size tests
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Defines input files and everything required to run a full size pipeline test.
-
- Use as follows:
- nextflow run myorg/myfirstpipeline -profile test_full, --outdir
-
-----------------------------------------------------------------------------------------
-*/
-
-params {
- config_profile_name = 'Full test profile'
- config_profile_description = 'Full test dataset to check pipeline function'
-
- // Input data for full size test
- // TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
- // TODO nf-core: Give any required params for the test so that command line flags are not needed
- input = params.pipelines_testdata_base_path + 'viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv'
-
- // Fasta references
- fasta = params.pipelines_testdata_base_path + 'viralrecon/genome/NC_045512.2/GCF_009858895.2_ASM985889v3_genomic.200409.fna.gz'
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/docs/README.md b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/docs/README.md
deleted file mode 100644
index 18062fa6b..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/docs/README.md
+++ /dev/null
@@ -1,8 +0,0 @@
-# myorg/myfirstpipeline: Documentation
-
-The myorg/myfirstpipeline documentation is split into the following pages:
-
-- [Usage](usage.md)
- - An overview of how the pipeline works, how to run it and a description of all of the different command-line flags.
-- [Output](output.md)
- - An overview of the different results produced by the pipeline and how to interpret them.
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/docs/output.md b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/docs/output.md
deleted file mode 100644
index 593ee5685..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/docs/output.md
+++ /dev/null
@@ -1,47 +0,0 @@
-# myorg/myfirstpipeline: Output
-
-## Introduction
-
-This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
-
-The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
-
-
-
-## Pipeline overview
-
-The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
-
-- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
-- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
-
-### MultiQC
-
-
-Output files
-
-- `multiqc/`
- - `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
- - `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
- - `multiqc_plots/`: directory containing static images from the report in various formats.
-
-
-
-[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.
-
-Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see .
-
-### Pipeline information
-
-
-Output files
-
-- `pipeline_info/`
- - Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`.
- - Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline.
- - Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`.
- - Parameters used by the pipeline run: `params.json`.
-
-
-
-[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/docs/usage.md b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/docs/usage.md
deleted file mode 100644
index 40429377b..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/docs/usage.md
+++ /dev/null
@@ -1,216 +0,0 @@
-# myorg/myfirstpipeline: Usage
-
-> _Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files._
-
-## Introduction
-
-
-
-## Samplesheet input
-
-You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below.
-
-```bash
---input '[path to samplesheet file]'
-```
-
-### Multiple runs of the same sample
-
-The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes:
-
-```csv title="samplesheet.csv"
-sample,fastq_1,fastq_2
-CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
-CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz
-CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz
-```
-
-### Full samplesheet
-
-The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below.
-
-A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice.
-
-```csv title="samplesheet.csv"
-sample,fastq_1,fastq_2
-CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
-CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz
-CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz
-TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz,
-TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz,
-TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz,
-TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz,
-```
-
-| Column | Description |
-| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
-| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
-| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
-
-An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.
-
-## Running the pipeline
-
-The typical command for running the pipeline is as follows:
-
-```bash
-nextflow run myorg/myfirstpipeline --input ./samplesheet.csv --outdir ./results --genome GRCh37 -profile docker
-```
-
-This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
-
-Note that the pipeline will create the following files in your working directory:
-
-```bash
-work # Directory containing the nextflow working files
- # Finished results in specified location (defined with --outdir)
-.nextflow_log # Log file from Nextflow
-# Other nextflow hidden files, eg. history of pipeline runs and old logs.
-```
-
-If you wish to repeatedly use the same parameters for multiple runs, rather than specifying each flag in the command, you can specify these in a params file.
-
-Pipeline settings can be provided in a `yaml` or `json` file via `-params-file `.
-
-:::warning
-Do not use `-c ` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args).
-:::
-
-The above pipeline run specified with a params file in yaml format:
-
-```bash
-nextflow run myorg/myfirstpipeline -profile docker -params-file params.yaml
-```
-
-with:
-
-```yaml title="params.yaml"
-input: './samplesheet.csv'
-outdir: './results/'
-genome: 'GRCh37'
-<...>
-```
-
-You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).
-
-### Updating the pipeline
-
-When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
-
-```bash
-nextflow pull myorg/myfirstpipeline
-```
-
-### Reproducibility
-
-It is a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since.
-
-First, go to the [myorg/myfirstpipeline releases page](https://github.com/myorg/myfirstpipeline/releases) and find the latest pipeline version - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`. Of course, you can switch to another version by changing the number after the `-r` flag.
-
-This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future. For example, at the bottom of the MultiQC reports.
-
-To further assist in reproducbility, you can use share and re-use [parameter files](#running-the-pipeline) to repeat pipeline runs with the same settings without having to write out a command with every single parameter.
-
-:::tip
-If you wish to share such profile (such as upload as supplementary material for academic publications), make sure to NOT include cluster specific paths to files, nor institutional specific profiles.
-:::
-
-## Core Nextflow arguments
-
-:::note
-These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen).
-:::
-
-### `-profile`
-
-Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments.
-
-Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Apptainer, Conda) - see below.
-
-:::info
-We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.
-:::
-
-The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation).
-
-Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important!
-They are loaded in sequence, so later profiles can overwrite earlier profiles.
-
-If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended, since it can lead to different results on different machines dependent on the computer enviroment.
-
-- `test`
- - A profile with a complete configuration for automated testing
- - Includes links to test data so needs no other parameters
-- `docker`
- - A generic configuration profile to be used with [Docker](https://docker.com/)
-- `singularity`
- - A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/)
-- `podman`
- - A generic configuration profile to be used with [Podman](https://podman.io/)
-- `shifter`
- - A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/)
-- `charliecloud`
- - A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/)
-- `apptainer`
- - A generic configuration profile to be used with [Apptainer](https://apptainer.org/)
-- `wave`
- - A generic configuration profile to enable [Wave](https://seqera.io/wave/) containers. Use together with one of the above (requires Nextflow ` 24.03.0-edge` or later).
-- `conda`
- - A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer.
-
-### `-resume`
-
-Specify this when restarting a pipeline. Nextflow will use cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. For input to be considered the same, not only the names must be identical but the files' contents as well. For more info about this parameter, see [this blog post](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html).
-
-You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names.
-
-### `-c`
-
-Specify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information.
-
-## Custom configuration
-
-### Resource requests
-
-Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped.
-
-To change the resource requests, please see the [max resources](https://nf-co.re/docs/usage/configuration#max-resources) and [tuning workflow resources](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources) section of the nf-core website.
-
-### Custom Containers
-
-In some cases you may wish to change which container or conda environment a step of the pipeline uses for a particular tool. By default nf-core pipelines use containers and software from the [biocontainers](https://biocontainers.pro/) or [bioconda](https://bioconda.github.io/) projects. However in some cases the pipeline specified version maybe out of date.
-
-To use a different container from the default container or conda environment specified in a pipeline, please see the [updating tool versions](https://nf-co.re/docs/usage/configuration#updating-tool-versions) section of the nf-core website.
-
-### Custom Tool Arguments
-
-A pipeline might not always support every possible argument or option of a particular tool used in pipeline. Fortunately, nf-core pipelines provide some freedom to users to insert additional parameters that the pipeline does not include by default.
-
-To learn how to provide additional arguments to a particular tool of the pipeline, please see the [customising tool arguments](https://nf-co.re/docs/usage/configuration#customising-tool-arguments) section of the nf-core website.
-
-### nf-core/configs
-
-In most cases, you will only need to create a custom config as a one-off but if you and others within your organisation are likely to be running nf-core pipelines regularly and need to use the same settings regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter. You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile.
-
-See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information about creating your own configuration files.
-
-If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs).
-
-## Running in the background
-
-Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished.
-
-The Nextflow `-bg` flag launches Nextflow in the background, detached from your terminal so that the workflow does not stop if you log out of your session. The logs are saved to a file.
-
-Alternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time.
-Some HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs).
-
-## Nextflow memory requirements
-
-In some cases, the Nextflow Java virtual machines can start to request a large amount of memory.
-We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`):
-
-```bash
-NXF_OPTS='-Xms1g -Xmx4g'
-```
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/main.nf b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/main.nf
deleted file mode 100644
index de383a8bd..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/main.nf
+++ /dev/null
@@ -1,89 +0,0 @@
-#!/usr/bin/env nextflow
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- myorg/myfirstpipeline
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Github : https://github.com/myorg/myfirstpipeline
-----------------------------------------------------------------------------------------
-*/
-
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS / WORKFLOWS
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-*/
-
-include { MYFIRSTPIPELINE } from './workflows/myfirstpipeline'
-include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_myfirstpipeline_pipeline'
-include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_myfirstpipeline_pipeline'
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- NAMED WORKFLOWS FOR PIPELINE
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-*/
-
-//
-// WORKFLOW: Run main analysis pipeline depending on type of input
-//
-workflow MYORG_MYFIRSTPIPELINE {
-
- take:
- samplesheet // channel: samplesheet read in from --input
-
- main:
-
- //
- // WORKFLOW: Run pipeline
- //
- MYFIRSTPIPELINE (
- samplesheet
- )
- emit:
- multiqc_report = MYFIRSTPIPELINE.out.multiqc_report // channel: /path/to/multiqc_report.html
-}
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- RUN MAIN WORKFLOW
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-*/
-
-workflow {
-
- main:
- //
- // SUBWORKFLOW: Run initialisation tasks
- //
- PIPELINE_INITIALISATION (
- params.version,
- params.validate_params,
- params.monochrome_logs,
- args,
- params.outdir,
- params.input
- )
-
- //
- // WORKFLOW: Run main workflow
- //
- MYORG_MYFIRSTPIPELINE (
- PIPELINE_INITIALISATION.out.samplesheet
- )
- //
- // SUBWORKFLOW: Run completion tasks
- //
- PIPELINE_COMPLETION (
- params.email,
- params.email_on_fail,
- params.plaintext_email,
- params.outdir,
- params.monochrome_logs,
-
- MYORG_MYFIRSTPIPELINE.out.multiqc_report
- )
-}
-
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- THE END
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-*/
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules.json b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules.json
deleted file mode 100644
index e37067859..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules.json
+++ /dev/null
@@ -1,41 +0,0 @@
-{
- "name": "myorg/myfirstpipeline",
- "homePage": "https://github.com/myorg/myfirstpipeline",
- "repos": {
- "https://github.com/nf-core/modules.git": {
- "modules": {
- "nf-core": {
- "multiqc": {
- "branch": "master",
- "git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d",
- "installed_by": ["modules"]
- },
- "seqtk/trim": {
- "branch": "master",
- "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
- "installed_by": ["modules"]
- }
- }
- },
- "subworkflows": {
- "nf-core": {
- "utils_nextflow_pipeline": {
- "branch": "master",
- "git_sha": "3aa0aec1d52d492fe241919f0c6100ebf0074082",
- "installed_by": ["subworkflows"]
- },
- "utils_nfcore_pipeline": {
- "branch": "master",
- "git_sha": "1b6b9a3338d011367137808b49b923515080e3ba",
- "installed_by": ["subworkflows"]
- },
- "utils_nfschema_plugin": {
- "branch": "master",
- "git_sha": "bbd5a41f4535a8defafe6080e00ea74c45f4f96c",
- "installed_by": ["subworkflows"]
- }
- }
- }
- }
- }
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/local/fastqe.nf b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/local/fastqe.nf
deleted file mode 100644
index 827843d98..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/local/fastqe.nf
+++ /dev/null
@@ -1,35 +0,0 @@
-process FASTQE {
- tag "$meta.id"
- label 'process_single'
-
- conda "${moduleDir}/environment.yml"
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/fastqe:0.3.3--pyhdfd78af_0':
- 'biocontainers/fastqe:0.3.3--pyhdfd78af_0' }"
-
- input:
- tuple val(meta), path(reads)
-
- output:
- tuple val(meta), path("*.tsv"), emit: tsv
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- def args = task.ext.args ?: ''
- def prefix = task.ext.prefix ?: "${meta.id}"
- def VERSION = '0.3.3'
- """
- fastqe \\
- $args \\
- $reads \\
- --output ${prefix}.tsv
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- fastqe: $VERSION
- END_VERSIONS
- """
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/environment.yml b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/environment.yml
deleted file mode 100644
index 6f5b867b7..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/environment.yml
+++ /dev/null
@@ -1,5 +0,0 @@
-channels:
- - conda-forge
- - bioconda
-dependencies:
- - bioconda::multiqc=1.25.1
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/main.nf b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/main.nf
deleted file mode 100644
index cc0643e1d..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/main.nf
+++ /dev/null
@@ -1,63 +0,0 @@
-process MULTIQC {
- label 'process_single'
-
- conda "${moduleDir}/environment.yml"
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/multiqc:1.25.1--pyhdfd78af_0' :
- 'biocontainers/multiqc:1.25.1--pyhdfd78af_0' }"
-
- input:
- path multiqc_files, stageAs: "?/*"
- path(multiqc_config)
- path(extra_multiqc_config)
- path(multiqc_logo)
- path(replace_names)
- path(sample_names)
-
- output:
- path "*multiqc_report.html", emit: report
- path "*_data" , emit: data
- path "*_plots" , optional:true, emit: plots
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- def args = task.ext.args ?: ''
- def prefix = task.ext.prefix ? "--filename ${task.ext.prefix}.html" : ''
- def config = multiqc_config ? "--config $multiqc_config" : ''
- def extra_config = extra_multiqc_config ? "--config $extra_multiqc_config" : ''
- def logo = multiqc_logo ? "--cl-config 'custom_logo: \"${multiqc_logo}\"'" : ''
- def replace = replace_names ? "--replace-names ${replace_names}" : ''
- def samples = sample_names ? "--sample-names ${sample_names}" : ''
- """
- multiqc \\
- --force \\
- $args \\
- $config \\
- $prefix \\
- $extra_config \\
- $logo \\
- $replace \\
- $samples \\
- .
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
- END_VERSIONS
- """
-
- stub:
- """
- mkdir multiqc_data
- mkdir multiqc_plots
- touch multiqc_report.html
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
- END_VERSIONS
- """
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/meta.yml b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/meta.yml
deleted file mode 100644
index b16c18792..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/meta.yml
+++ /dev/null
@@ -1,78 +0,0 @@
-name: multiqc
-description: Aggregate results from bioinformatics analyses across many samples into
- a single report
-keywords:
- - QC
- - bioinformatics tools
- - Beautiful stand-alone HTML report
-tools:
- - multiqc:
- description: |
- MultiQC searches a given directory for analysis logs and compiles a HTML report.
- It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.
- homepage: https://multiqc.info/
- documentation: https://multiqc.info/docs/
- licence: ["GPL-3.0-or-later"]
- identifier: biotools:multiqc
-input:
- - - multiqc_files:
- type: file
- description: |
- List of reports / files recognised by MultiQC, for example the html and zip output of FastQC
- - - multiqc_config:
- type: file
- description: Optional config yml for MultiQC
- pattern: "*.{yml,yaml}"
- - - extra_multiqc_config:
- type: file
- description: Second optional config yml for MultiQC. Will override common sections
- in multiqc_config.
- pattern: "*.{yml,yaml}"
- - - multiqc_logo:
- type: file
- description: Optional logo file for MultiQC
- pattern: "*.{png}"
- - - replace_names:
- type: file
- description: |
- Optional two-column sample renaming file. First column a set of
- patterns, second column a set of corresponding replacements. Passed via
- MultiQC's `--replace-names` option.
- pattern: "*.{tsv}"
- - - sample_names:
- type: file
- description: |
- Optional TSV file with headers, passed to the MultiQC --sample_names
- argument.
- pattern: "*.{tsv}"
-output:
- - report:
- - "*multiqc_report.html":
- type: file
- description: MultiQC report file
- pattern: "multiqc_report.html"
- - data:
- - "*_data":
- type: directory
- description: MultiQC data dir
- pattern: "multiqc_data"
- - plots:
- - "*_plots":
- type: file
- description: Plots created by MultiQC
- pattern: "*_data"
- - versions:
- - versions.yml:
- type: file
- description: File containing software versions
- pattern: "versions.yml"
-authors:
- - "@abhi18av"
- - "@bunop"
- - "@drpatelh"
- - "@jfy133"
-maintainers:
- - "@abhi18av"
- - "@bunop"
- - "@drpatelh"
- - "@jfy133"
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/tests/main.nf.test b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/tests/main.nf.test
deleted file mode 100644
index 33316a7dd..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/tests/main.nf.test
+++ /dev/null
@@ -1,92 +0,0 @@
-nextflow_process {
-
- name "Test Process MULTIQC"
- script "../main.nf"
- process "MULTIQC"
-
- tag "modules"
- tag "modules_nfcore"
- tag "multiqc"
-
- config "./nextflow.config"
-
- test("sarscov2 single-end [fastqc]") {
-
- when {
- process {
- """
- input[0] = Channel.of(file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastqc/test_fastqc.zip', checkIfExists: true))
- input[1] = []
- input[2] = []
- input[3] = []
- input[4] = []
- input[5] = []
- """
- }
- }
-
- then {
- assertAll(
- { assert process.success },
- { assert process.out.report[0] ==~ ".*/multiqc_report.html" },
- { assert process.out.data[0] ==~ ".*/multiqc_data" },
- { assert snapshot(process.out.versions).match("multiqc_versions_single") }
- )
- }
-
- }
-
- test("sarscov2 single-end [fastqc] [config]") {
-
- when {
- process {
- """
- input[0] = Channel.of(file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastqc/test_fastqc.zip', checkIfExists: true))
- input[1] = Channel.of(file("https://github.com/nf-core/tools/raw/dev/nf_core/pipeline-template/assets/multiqc_config.yml", checkIfExists: true))
- input[2] = []
- input[3] = []
- input[4] = []
- input[5] = []
- """
- }
- }
-
- then {
- assertAll(
- { assert process.success },
- { assert process.out.report[0] ==~ ".*/multiqc_report.html" },
- { assert process.out.data[0] ==~ ".*/multiqc_data" },
- { assert snapshot(process.out.versions).match("multiqc_versions_config") }
- )
- }
- }
-
- test("sarscov2 single-end [fastqc] - stub") {
-
- options "-stub"
-
- when {
- process {
- """
- input[0] = Channel.of(file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastqc/test_fastqc.zip', checkIfExists: true))
- input[1] = []
- input[2] = []
- input[3] = []
- input[4] = []
- input[5] = []
- """
- }
- }
-
- then {
- assertAll(
- { assert process.success },
- { assert snapshot(process.out.report.collect { file(it).getName() } +
- process.out.data.collect { file(it).getName() } +
- process.out.plots.collect { file(it).getName() } +
- process.out.versions ).match("multiqc_stub") }
- )
- }
-
- }
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/tests/main.nf.test.snap b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/tests/main.nf.test.snap
deleted file mode 100644
index 261dc0fac..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/tests/main.nf.test.snap
+++ /dev/null
@@ -1,41 +0,0 @@
-{
- "multiqc_versions_single": {
- "content": [
- [
- "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916"
- ]
- ],
- "meta": {
- "nf-test": "0.9.0",
- "nextflow": "24.04.4"
- },
- "timestamp": "2024-10-02T17:51:46.317523"
- },
- "multiqc_stub": {
- "content": [
- [
- "multiqc_report.html",
- "multiqc_data",
- "multiqc_plots",
- "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916"
- ]
- ],
- "meta": {
- "nf-test": "0.9.0",
- "nextflow": "24.04.4"
- },
- "timestamp": "2024-10-02T17:52:20.680978"
- },
- "multiqc_versions_config": {
- "content": [
- [
- "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916"
- ]
- ],
- "meta": {
- "nf-test": "0.9.0",
- "nextflow": "24.04.4"
- },
- "timestamp": "2024-10-02T17:52:09.185842"
- }
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/tests/nextflow.config b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/tests/nextflow.config
deleted file mode 100644
index c537a6a3e..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/tests/nextflow.config
+++ /dev/null
@@ -1,5 +0,0 @@
-process {
- withName: 'MULTIQC' {
- ext.prefix = null
- }
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/tests/tags.yml b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/tests/tags.yml
deleted file mode 100644
index bea6c0d37..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/multiqc/tests/tags.yml
+++ /dev/null
@@ -1,2 +0,0 @@
-multiqc:
- - modules/nf-core/multiqc/**
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/environment.yml b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/environment.yml
deleted file mode 100644
index 693aa5c17..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/environment.yml
+++ /dev/null
@@ -1,5 +0,0 @@
-channels:
- - conda-forge
- - bioconda
-dependencies:
- - bioconda::seqtk=1.4
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/main.nf b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/main.nf
deleted file mode 100644
index 0f7e4d7f8..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/main.nf
+++ /dev/null
@@ -1,38 +0,0 @@
-process SEQTK_TRIM {
- tag "$meta.id"
- label 'process_low'
-
- conda "${moduleDir}/environment.yml"
- container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
- 'https://depot.galaxyproject.org/singularity/seqtk:1.4--he4a0461_1' :
- 'biocontainers/seqtk:1.4--he4a0461_1' }"
-
- input:
- tuple val(meta), path(reads)
-
- output:
- tuple val(meta), path("*.fastq.gz"), emit: reads
- path "versions.yml" , emit: versions
-
- when:
- task.ext.when == null || task.ext.when
-
- script:
- def args = task.ext.args ?: ''
- def prefix = task.ext.prefix ?: "${meta.id}"
- """
- printf "%s\\n" $reads | while read f;
- do
- seqtk \\
- trimfq \\
- $args \\
- \$f \\
- | gzip --no-name > ${prefix}_\$(basename \$f)
- done
-
- cat <<-END_VERSIONS > versions.yml
- "${task.process}":
- seqtk: \$(echo \$(seqtk 2>&1) | sed 's/^.*Version: //; s/ .*\$//')
- END_VERSIONS
- """
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/meta.yml b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/meta.yml
deleted file mode 100644
index 3a0198ac0..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/meta.yml
+++ /dev/null
@@ -1,45 +0,0 @@
-# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/yaml-schema.json
-name: seqtk_trim
-description: Trim low quality bases from FastQ files
-keywords:
- - trimfq
- - fastq
- - seqtk
-tools:
- - "seqtk":
- description: "Seqtk is a fast and lightweight tool for processing sequences in
- the FASTA or FASTQ format"
- homepage: https://github.com/lh3/seqtk
- documentation: https://docs.csc.fi/apps/seqtk/
- tool_dev_url: https://github.com/lh3/seqtk
- licence: ["MIT"]
- identifier: biotools:seqtk
-
-input:
- - - meta:
- type: map
- description: |
- Groovy Map containing sample information
- e.g. [ id:'test', single_end:false ]
- - reads:
- type: file
- description: List of input FastQ files
- pattern: "*.{fastq.gz}"
-output:
- - reads:
- - meta:
- type: map
- description: |
- Groovy Map containing sample information
- e.g. [ id:'test', single_end:false ]
- - "*.fastq.gz":
- type: file
- description: Filtered FastQ files
- pattern: "*.{fastq.gz}"
- - versions:
- - versions.yml:
- type: file
- description: File containing software versions
- pattern: "versions.yml"
-authors:
- - "@laramiellindsey"
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/tests/main.nf.test b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/tests/main.nf.test
deleted file mode 100644
index be53186d8..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/tests/main.nf.test
+++ /dev/null
@@ -1,65 +0,0 @@
-nextflow_process {
-
- name "Test Process SEQTK_TRIM"
- script "modules/nf-core/seqtk/trim/main.nf"
- process "SEQTK_TRIM"
-
- tag "modules"
- tag "modules_nfcore"
- tag "seqtk"
- tag "seqtk/trim"
-
- test("Single-end") {
-
- when {
- params {
- outdir = $outputDir
- }
- process {
- """
- input[0] = [
- [ id:'test', single_end:true ], // meta map
- file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)
- ]
- """
- }
- }
-
- then {
- assertAll (
- { assert process.success },
- { assert snapshot(process.out).match()}
- )
- }
-
- }
-
-test("Paired-end") {
-
- when {
- params {
- outdir = $outputDir
- }
- process {
- """
- input[0] = [
- [ id:'test', single_end:false ], // meta map
- [
- file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
- file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
- ]
- ]
- """
- }
- }
-
- then {
- assertAll (
- { assert process.success },
- { assert snapshot(process.out).match()}
- )
- }
-
- }
-
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/tests/main.nf.test.snap b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/tests/main.nf.test.snap
deleted file mode 100644
index 90da25d2b..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/tests/main.nf.test.snap
+++ /dev/null
@@ -1,78 +0,0 @@
-{
- "Single-end": {
- "content": [
- {
- "0": [
- [
- {
- "id": "test",
- "single_end": true
- },
- "test_test_1.fastq.gz:md5,4161df271f9bfcd25d5845a1e220dbec"
- ]
- ],
- "1": [
- "versions.yml:md5,d061ca0231d089b087e22d2001cd7c32"
- ],
- "reads": [
- [
- {
- "id": "test",
- "single_end": true
- },
- "test_test_1.fastq.gz:md5,4161df271f9bfcd25d5845a1e220dbec"
- ]
- ],
- "versions": [
- "versions.yml:md5,d061ca0231d089b087e22d2001cd7c32"
- ]
- }
- ],
- "meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
- },
- "timestamp": "2024-05-03T06:10:55.544977"
- },
- "Paired-end": {
- "content": [
- {
- "0": [
- [
- {
- "id": "test",
- "single_end": false
- },
- [
- "test_test_1.fastq.gz:md5,4161df271f9bfcd25d5845a1e220dbec",
- "test_test_2.fastq.gz:md5,2ebae722295ea66d84075a3b042e2b42"
- ]
- ]
- ],
- "1": [
- "versions.yml:md5,d061ca0231d089b087e22d2001cd7c32"
- ],
- "reads": [
- [
- {
- "id": "test",
- "single_end": false
- },
- [
- "test_test_1.fastq.gz:md5,4161df271f9bfcd25d5845a1e220dbec",
- "test_test_2.fastq.gz:md5,2ebae722295ea66d84075a3b042e2b42"
- ]
- ]
- ],
- "versions": [
- "versions.yml:md5,d061ca0231d089b087e22d2001cd7c32"
- ]
- }
- ],
- "meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
- },
- "timestamp": "2024-05-03T06:11:38.487227"
- }
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/tests/tags.yml b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/tests/tags.yml
deleted file mode 100644
index 250a1382f..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/modules/nf-core/seqtk/trim/tests/tags.yml
+++ /dev/null
@@ -1,2 +0,0 @@
-seqtk/trim:
- - "modules/nf-core/seqtk/trim/**"
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/nextflow.config b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/nextflow.config
deleted file mode 100644
index e94053c29..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/nextflow.config
+++ /dev/null
@@ -1,242 +0,0 @@
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- myorg/myfirstpipeline Nextflow config file
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Default config options for all compute environments
-----------------------------------------------------------------------------------------
-*/
-
-// Global default params, used in configs
-params {
-
- // TODO nf-core: Specify your pipeline's command line flags
- // Input options
- input = null
-
- skip_trim = false
-
- // MultiQC options
- multiqc_config = null
- multiqc_title = null
- multiqc_logo = null
- max_multiqc_email_size = '25.MB'
-
-
- // Boilerplate options
- outdir = null
- publish_dir_mode = 'copy'
- email = null
- email_on_fail = null
- plaintext_email = false
- monochrome_logs = false
-
- help = false
- help_full = false
- show_hidden = false
- version = false
- pipelines_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/'
-
- // Config options
- config_profile_name = null
- config_profile_description = null
-
- custom_config_version = 'master'
- custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}"
- config_profile_contact = null
- config_profile_url = null
-
- // Schema validation default options
- validate_params = true
-}
-
-// Load base.config by default for all pipelines
-includeConfig 'conf/base.config'
-
-profiles {
- debug {
- dumpHashes = true
- process.beforeScript = 'echo $HOSTNAME'
- cleanup = false
- nextflow.enable.configProcessNamesValidation = true
- }
- conda {
- conda.enabled = true
- docker.enabled = false
- singularity.enabled = false
- podman.enabled = false
- shifter.enabled = false
- charliecloud.enabled = false
- conda.channels = ['conda-forge', 'bioconda']
- apptainer.enabled = false
- }
- mamba {
- conda.enabled = true
- conda.useMamba = true
- docker.enabled = false
- singularity.enabled = false
- podman.enabled = false
- shifter.enabled = false
- charliecloud.enabled = false
- apptainer.enabled = false
- }
- docker {
- docker.enabled = true
- conda.enabled = false
- singularity.enabled = false
- podman.enabled = false
- shifter.enabled = false
- charliecloud.enabled = false
- apptainer.enabled = false
- docker.runOptions = '-u $(id -u):$(id -g)'
- }
- arm {
- docker.runOptions = '-u $(id -u):$(id -g) --platform=linux/amd64'
- }
- singularity {
- singularity.enabled = true
- singularity.autoMounts = true
- conda.enabled = false
- docker.enabled = false
- podman.enabled = false
- shifter.enabled = false
- charliecloud.enabled = false
- apptainer.enabled = false
- }
- podman {
- podman.enabled = true
- conda.enabled = false
- docker.enabled = false
- singularity.enabled = false
- shifter.enabled = false
- charliecloud.enabled = false
- apptainer.enabled = false
- }
- shifter {
- shifter.enabled = true
- conda.enabled = false
- docker.enabled = false
- singularity.enabled = false
- podman.enabled = false
- charliecloud.enabled = false
- apptainer.enabled = false
- }
- charliecloud {
- charliecloud.enabled = true
- conda.enabled = false
- docker.enabled = false
- singularity.enabled = false
- podman.enabled = false
- shifter.enabled = false
- apptainer.enabled = false
- }
- apptainer {
- apptainer.enabled = true
- apptainer.autoMounts = true
- conda.enabled = false
- docker.enabled = false
- singularity.enabled = false
- podman.enabled = false
- shifter.enabled = false
- charliecloud.enabled = false
- }
- wave {
- apptainer.ociAutoPull = true
- singularity.ociAutoPull = true
- wave.enabled = true
- wave.freeze = true
- wave.strategy = 'conda,container'
- }
-
- test { includeConfig 'conf/test.config' }
- test_full { includeConfig 'conf/test_full.config' }
-}
-
-// Load nf-core custom profiles from different Institutions
-includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null"
-
-// Load myorg/myfirstpipeline custom profiles from different institutions.
-// TODO nf-core: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
-// includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/pipeline/myfirstpipeline.config" : "/dev/null"
-
-// Set default registry for Apptainer, Docker, Podman, Charliecloud and Singularity independent of -profile
-// Will not be used unless Apptainer / Docker / Podman / Charliecloud / Singularity are enabled
-// Set to your registry if you have a mirror of containers
-apptainer.registry = 'quay.io'
-docker.registry = 'quay.io'
-podman.registry = 'quay.io'
-singularity.registry = 'quay.io'
-charliecloud.registry = 'quay.io'
-
-
-
-// Export these variables to prevent local Python/R libraries from conflicting with those in the container
-// The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container.
-// See https://apeltzer.github.io/post/03-julia-lang-nextflow/ for details on that. Once we have a common agreement on where to keep Julia packages, this is adjustable.
-
-env {
- PYTHONNOUSERSITE = 1
- R_PROFILE_USER = "/.Rprofile"
- R_ENVIRON_USER = "/.Renviron"
- JULIA_DEPOT_PATH = "/usr/local/share/julia"
-}
-
-// Set bash options
-process.shell = """\
-bash
-
-set -e # Exit if a tool returns a non-zero status/exit code
-set -u # Treat unset variables and parameters as an error
-set -o pipefail # Returns the status of the last command to exit with a non-zero status or zero if all successfully execute
-set -C # No clobber - prevent output redirection from overwriting files.
-"""
-
-// Disable process selector warnings by default. Use debug profile to enable warnings.
-nextflow.enable.configProcessNamesValidation = false
-
-def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')
-timeline {
- enabled = true
- file = "${params.outdir}/pipeline_info/execution_timeline_${trace_timestamp}.html"
-}
-report {
- enabled = true
- file = "${params.outdir}/pipeline_info/execution_report_${trace_timestamp}.html"
-}
-trace {
- enabled = true
- file = "${params.outdir}/pipeline_info/execution_trace_${trace_timestamp}.txt"
-}
-dag {
- enabled = true
- file = "${params.outdir}/pipeline_info/pipeline_dag_${trace_timestamp}.html"
-}
-
-manifest {
- name = 'myorg/myfirstpipeline'
- author = """gitpod"""
- homePage = 'https://github.com/myorg/myfirstpipeline'
- description = """My first pipeline"""
- mainScript = 'main.nf'
- nextflowVersion = '!>=24.04.2'
- version = '1.0.0dev'
- doi = ''
-}
-
-// Nextflow plugins
-plugins {
- id 'nf-schema@2.1.1' // Validation of pipeline parameters and creation of an input channel from a sample sheet
-}
-
-validation {
- defaultIgnoreParams = ["genomes"]
- help {
- enabled = true
- command = "nextflow run $manifest.name -profile --input samplesheet.csv --outdir "
- fullParameter = "help_full"
- showHiddenParameter = "show_hidden"
-
- }
-}
-
-// Load modules.config for DSL2 module specific options
-includeConfig 'conf/modules.config'
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/nextflow_schema.json b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/nextflow_schema.json
deleted file mode 100644
index 327a92417..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/nextflow_schema.json
+++ /dev/null
@@ -1,188 +0,0 @@
-{
- "$schema": "https://json-schema.org/draft/2020-12/schema",
- "$id": "https://raw.githubusercontent.com/myorg/myfirstpipeline/master/nextflow_schema.json",
- "title": "myorg/myfirstpipeline pipeline parameters",
- "description": "My first pipeline",
- "type": "object",
- "$defs": {
- "input_output_options": {
- "title": "Input/output options",
- "type": "object",
- "fa_icon": "fas fa-terminal",
- "description": "Define where the pipeline should find input data and save output data.",
- "required": ["input", "outdir"],
- "properties": {
- "input": {
- "type": "string",
- "format": "file-path",
- "exists": true,
- "schema": "assets/schema_input.json",
- "mimetype": "text/csv",
- "pattern": "^\\S+\\.csv$",
- "description": "Path to comma-separated file containing information about the samples in the experiment.",
- "help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row.",
- "fa_icon": "fas fa-file-csv"
- },
- "outdir": {
- "type": "string",
- "format": "directory-path",
- "description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.",
- "fa_icon": "fas fa-folder-open"
- },
- "skip_trim": {
- "type": "boolean"
- },
- "email": {
- "type": "string",
- "description": "Email address for completion summary.",
- "fa_icon": "fas fa-envelope",
- "help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.",
- "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$"
- },
- "multiqc_title": {
- "type": "string",
- "description": "MultiQC report title. Printed as page header, used for filename if not otherwise specified.",
- "fa_icon": "fas fa-file-signature"
- }
- }
- },
- "institutional_config_options": {
- "title": "Institutional config options",
- "type": "object",
- "fa_icon": "fas fa-university",
- "description": "Parameters used to describe centralised config profiles. These should not be edited.",
- "help_text": "The centralised nf-core configuration profiles use a handful of pipeline parameters to describe themselves. This information is then printed to the Nextflow log when you run a pipeline. You should not need to change these values when you run a pipeline.",
- "properties": {
- "custom_config_version": {
- "type": "string",
- "description": "Git commit id for Institutional configs.",
- "default": "master",
- "hidden": true,
- "fa_icon": "fas fa-users-cog"
- },
- "custom_config_base": {
- "type": "string",
- "description": "Base directory for Institutional configs.",
- "default": "https://raw.githubusercontent.com/nf-core/configs/master",
- "hidden": true,
- "help_text": "If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.",
- "fa_icon": "fas fa-users-cog"
- },
- "config_profile_name": {
- "type": "string",
- "description": "Institutional config name.",
- "hidden": true,
- "fa_icon": "fas fa-users-cog"
- },
- "config_profile_description": {
- "type": "string",
- "description": "Institutional config description.",
- "hidden": true,
- "fa_icon": "fas fa-users-cog"
- },
- "config_profile_contact": {
- "type": "string",
- "description": "Institutional config contact information.",
- "hidden": true,
- "fa_icon": "fas fa-users-cog"
- },
- "config_profile_url": {
- "type": "string",
- "description": "Institutional config URL link.",
- "hidden": true,
- "fa_icon": "fas fa-users-cog"
- }
- }
- },
- "generic_options": {
- "title": "Generic options",
- "type": "object",
- "fa_icon": "fas fa-file-import",
- "description": "Less common options for the pipeline, typically set in a config file.",
- "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.",
- "properties": {
- "version": {
- "type": "boolean",
- "description": "Display version and exit.",
- "fa_icon": "fas fa-question-circle",
- "hidden": true
- },
- "publish_dir_mode": {
- "type": "string",
- "default": "copy",
- "description": "Method used to save pipeline results to output directory.",
- "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.",
- "fa_icon": "fas fa-copy",
- "enum": ["symlink", "rellink", "link", "copy", "copyNoFollow", "move"],
- "hidden": true
- },
- "email_on_fail": {
- "type": "string",
- "description": "Email address for completion summary, only when pipeline fails.",
- "fa_icon": "fas fa-exclamation-triangle",
- "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$",
- "help_text": "An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.",
- "hidden": true
- },
- "plaintext_email": {
- "type": "boolean",
- "description": "Send plain-text email instead of HTML.",
- "fa_icon": "fas fa-remove-format",
- "hidden": true
- },
- "max_multiqc_email_size": {
- "type": "string",
- "description": "File size limit when attaching MultiQC reports to summary emails.",
- "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$",
- "default": "25.MB",
- "fa_icon": "fas fa-file-upload",
- "hidden": true
- },
- "monochrome_logs": {
- "type": "boolean",
- "description": "Do not use coloured log outputs.",
- "fa_icon": "fas fa-palette",
- "hidden": true
- },
- "multiqc_config": {
- "type": "string",
- "format": "file-path",
- "description": "Custom config file to supply to MultiQC.",
- "fa_icon": "fas fa-cog",
- "hidden": true
- },
- "multiqc_logo": {
- "type": "string",
- "description": "Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file",
- "fa_icon": "fas fa-image",
- "hidden": true
- },
- "validate_params": {
- "type": "boolean",
- "description": "Boolean whether to validate parameters against the schema at runtime",
- "default": true,
- "fa_icon": "fas fa-check-square",
- "hidden": true
- },
- "pipelines_testdata_base_path": {
- "type": "string",
- "fa_icon": "far fa-check-circle",
- "description": "Base URL or local path to location of pipeline test dataset files",
- "default": "https://raw.githubusercontent.com/nf-core/test-datasets/",
- "hidden": true
- }
- }
- }
- },
- "allOf": [
- {
- "$ref": "#/$defs/input_output_options"
- },
- {
- "$ref": "#/$defs/institutional_config_options"
- },
- {
- "$ref": "#/$defs/generic_options"
- }
- ]
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/local/utils_nfcore_myfirstpipeline_pipeline/main.nf b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/local/utils_nfcore_myfirstpipeline_pipeline/main.nf
deleted file mode 100644
index 4e4a4ece4..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/local/utils_nfcore_myfirstpipeline_pipeline/main.nf
+++ /dev/null
@@ -1,222 +0,0 @@
-//
-// Subworkflow with functionality specific to the myorg/myfirstpipeline pipeline
-//
-
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-*/
-
-include { UTILS_NFSCHEMA_PLUGIN } from '../../nf-core/utils_nfschema_plugin'
-include { paramsSummaryMap } from 'plugin/nf-schema'
-include { samplesheetToList } from 'plugin/nf-schema'
-include { completionEmail } from '../../nf-core/utils_nfcore_pipeline'
-include { completionSummary } from '../../nf-core/utils_nfcore_pipeline'
-include { UTILS_NFCORE_PIPELINE } from '../../nf-core/utils_nfcore_pipeline'
-include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline'
-
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- SUBWORKFLOW TO INITIALISE PIPELINE
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-*/
-
-workflow PIPELINE_INITIALISATION {
-
- take:
- version // boolean: Display version and exit
- validate_params // boolean: Boolean whether to validate parameters against the schema at runtime
- monochrome_logs // boolean: Do not use coloured log outputs
- nextflow_cli_args // array: List of positional nextflow CLI args
- outdir // string: The output directory where the results will be saved
- input // string: Path to input samplesheet
-
- main:
-
- ch_versions = Channel.empty()
-
- //
- // Print version and exit if required and dump pipeline parameters to JSON file
- //
- UTILS_NEXTFLOW_PIPELINE (
- version,
- true,
- outdir,
- workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1
- )
-
- //
- // Validate parameters and generate parameter summary to stdout
- //
- UTILS_NFSCHEMA_PLUGIN (
- workflow,
- validate_params,
- null
- )
-
- //
- // Check config provided to the pipeline
- //
- UTILS_NFCORE_PIPELINE (
- nextflow_cli_args
- )
-
- //
- // Create channel from input file provided through params.input
- //
-
- Channel
- .fromList(samplesheetToList(params.input, "${projectDir}/assets/schema_input.json"))
- .map {
- meta, fastq_1, fastq_2 ->
- if (!fastq_2) {
- return [ meta.id, meta + [ single_end:true ], [ fastq_1 ] ]
- } else {
- return [ meta.id, meta + [ single_end:false ], [ fastq_1, fastq_2 ] ]
- }
- }
- .groupTuple()
- .map { samplesheet ->
- validateInputSamplesheet(samplesheet)
- }
- .map {
- meta, fastqs ->
- return [ meta, fastqs.flatten() ]
- }
- .set { ch_samplesheet }
-
- emit:
- samplesheet = ch_samplesheet
- versions = ch_versions
-}
-
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- SUBWORKFLOW FOR PIPELINE COMPLETION
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-*/
-
-workflow PIPELINE_COMPLETION {
-
- take:
- email // string: email address
- email_on_fail // string: email address sent on pipeline failure
- plaintext_email // boolean: Send plain-text email instead of HTML
- outdir // path: Path to output directory where results will be published
- monochrome_logs // boolean: Disable ANSI colour codes in log output
-
- multiqc_report // string: Path to MultiQC report
-
- main:
- summary_params = paramsSummaryMap(workflow, parameters_schema: "nextflow_schema.json")
-
- //
- // Completion email and summary
- //
- workflow.onComplete {
- if (email || email_on_fail) {
- completionEmail(
- summary_params,
- email,
- email_on_fail,
- plaintext_email,
- outdir,
- monochrome_logs,
- multiqc_report.toList()
- )
- }
-
- completionSummary(monochrome_logs)
- }
-
- workflow.onError {
- log.error "Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting"
- }
-}
-
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- FUNCTIONS
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-*/
-
-//
-// Validate channels from input samplesheet
-//
-def validateInputSamplesheet(input) {
- def (metas, fastqs) = input[1..2]
-
- // Check that multiple runs of the same sample are of the same datatype i.e. single-end / paired-end
- def endedness_ok = metas.collect{ meta -> meta.single_end }.unique().size == 1
- if (!endedness_ok) {
- error("Please check input samplesheet -> Multiple runs of a sample must be of the same datatype i.e. single-end or paired-end: ${metas[0].id}")
- }
-
- return [ metas[0], fastqs ]
-}
-//
-// Generate methods description for MultiQC
-//
-def toolCitationText() {
- // TODO nf-core: Optionally add in-text citation tools to this list.
- // Can use ternary operators to dynamically construct based conditions, e.g. params["run_xyz"] ? "Tool (Foo et al. 2023)" : "",
- // Uncomment function in methodsDescriptionText to render in MultiQC report
- def citation_text = [
- "Tools used in the workflow included:",
-
- "MultiQC (Ewels et al. 2016)",
- "."
- ].join(' ').trim()
-
- return citation_text
-}
-
-def toolBibliographyText() {
- // TODO nf-core: Optionally add bibliographic entries to this list.
- // Can use ternary operators to dynamically construct based conditions, e.g. params["run_xyz"] ? "
Author (2023) Pub name, Journal, DOI
" : "",
- // Uncomment function in methodsDescriptionText to render in MultiQC report
- def reference_text = [
-
- "
Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354
"
- ].join(' ').trim()
-
- return reference_text
-}
-
-def methodsDescriptionText(mqc_methods_yaml) {
- // Convert to a named map so can be used as with familar NXF ${workflow} variable syntax in the MultiQC YML file
- def meta = [:]
- meta.workflow = workflow.toMap()
- meta["manifest_map"] = workflow.manifest.toMap()
-
- // Pipeline DOI
- if (meta.manifest_map.doi) {
- // Using a loop to handle multiple DOIs
- // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers
- // Removing ` ` since the manifest.doi is a string and not a proper list
- def temp_doi_ref = ""
- def manifest_doi = meta.manifest_map.doi.tokenize(",")
- manifest_doi.each { doi_ref ->
- temp_doi_ref += "(doi: ${doi_ref.replace("https://doi.org/", "").replace(" ", "")}), "
- }
- meta["doi_text"] = temp_doi_ref.substring(0, temp_doi_ref.length() - 2)
- } else meta["doi_text"] = ""
- meta["nodoi_text"] = meta.manifest_map.doi ? "" : "
If available, make sure to update the text to include the Zenodo DOI of version of the pipeline used.
"
-
- // Tool references
- meta["tool_citations"] = ""
- meta["tool_bibliography"] = ""
-
- // TODO nf-core: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
- // meta["tool_citations"] = toolCitationText().replaceAll(", \\.", ".").replaceAll("\\. \\.", ".").replaceAll(", \\.", ".")
- // meta["tool_bibliography"] = toolBibliographyText()
-
-
- def methods_text = mqc_methods_yaml.text
-
- def engine = new groovy.text.SimpleTemplateEngine()
- def description_html = engine.createTemplate(methods_text).make(meta)
-
- return description_html.toString()
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/main.nf b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/main.nf
deleted file mode 100644
index 0fcbf7b3f..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/main.nf
+++ /dev/null
@@ -1,124 +0,0 @@
-//
-// Subworkflow with functionality that may be useful for any Nextflow pipeline
-//
-
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- SUBWORKFLOW DEFINITION
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-*/
-
-workflow UTILS_NEXTFLOW_PIPELINE {
- take:
- print_version // boolean: print version
- dump_parameters // boolean: dump parameters
- outdir // path: base directory used to publish pipeline results
- check_conda_channels // boolean: check conda channels
-
- main:
-
- //
- // Print workflow version and exit on --version
- //
- if (print_version) {
- log.info("${workflow.manifest.name} ${getWorkflowVersion()}")
- System.exit(0)
- }
-
- //
- // Dump pipeline parameters to a JSON file
- //
- if (dump_parameters && outdir) {
- dumpParametersToJSON(outdir)
- }
-
- //
- // When running with Conda, warn if channels have not been set-up appropriately
- //
- if (check_conda_channels) {
- checkCondaChannels()
- }
-
- emit:
- dummy_emit = true
-}
-
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- FUNCTIONS
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-*/
-
-//
-// Generate version string
-//
-def getWorkflowVersion() {
- def version_string = "" as String
- if (workflow.manifest.version) {
- def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : ''
- version_string += "${prefix_v}${workflow.manifest.version}"
- }
-
- if (workflow.commitId) {
- def git_shortsha = workflow.commitId.substring(0, 7)
- version_string += "-g${git_shortsha}"
- }
-
- return version_string
-}
-
-//
-// Dump pipeline parameters to a JSON file
-//
-def dumpParametersToJSON(outdir) {
- def timestamp = new java.util.Date().format('yyyy-MM-dd_HH-mm-ss')
- def filename = "params_${timestamp}.json"
- def temp_pf = new File(workflow.launchDir.toString(), ".${filename}")
- def jsonStr = groovy.json.JsonOutput.toJson(params)
- temp_pf.text = groovy.json.JsonOutput.prettyPrint(jsonStr)
-
- nextflow.extension.FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json")
- temp_pf.delete()
-}
-
-//
-// When running with -profile conda, warn if channels have not been set-up appropriately
-//
-def checkCondaChannels() {
- def parser = new org.yaml.snakeyaml.Yaml()
- def channels = []
- try {
- def config = parser.load("conda config --show channels".execute().text)
- channels = config.channels
- }
- catch (NullPointerException e) {
- log.warn("Could not verify conda channel configuration.")
- return null
- }
- catch (IOException e) {
- log.warn("Could not verify conda channel configuration.")
- return null
- }
-
- // Check that all channels are present
- // This channel list is ordered by required channel priority.
- def required_channels_in_order = ['conda-forge', 'bioconda']
- def channels_missing = ((required_channels_in_order as Set) - (channels as Set)) as Boolean
-
- // Check that they are in the right order
- def channel_priority_violation = required_channels_in_order != channels.findAll { ch -> ch in required_channels_in_order }
-
- if (channels_missing | channel_priority_violation) {
- log.warn """\
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- There is a problem with your Conda configuration!
- You will need to set-up the conda-forge and bioconda channels correctly.
- Please refer to https://bioconda.github.io/
- The observed channel order is
- ${channels}
- but the following channel order is required:
- ${required_channels_in_order}
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
- """.stripIndent(true)
- }
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/meta.yml b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/meta.yml
deleted file mode 100644
index e5c3a0a82..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/meta.yml
+++ /dev/null
@@ -1,38 +0,0 @@
-# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json
-name: "UTILS_NEXTFLOW_PIPELINE"
-description: Subworkflow with functionality that may be useful for any Nextflow pipeline
-keywords:
- - utility
- - pipeline
- - initialise
- - version
-components: []
-input:
- - print_version:
- type: boolean
- description: |
- Print the version of the pipeline and exit
- - dump_parameters:
- type: boolean
- description: |
- Dump the parameters of the pipeline to a JSON file
- - output_directory:
- type: directory
- description: Path to output dir to write JSON file to.
- pattern: "results/"
- - check_conda_channel:
- type: boolean
- description: |
- Check if the conda channel priority is correct.
-output:
- - dummy_emit:
- type: boolean
- description: |
- Dummy emit to make nf-core subworkflows lint happy
-authors:
- - "@adamrtalbot"
- - "@drpatelh"
-maintainers:
- - "@adamrtalbot"
- - "@drpatelh"
- - "@maxulysse"
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.function.nf.test b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.function.nf.test
deleted file mode 100644
index 68718e4f5..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.function.nf.test
+++ /dev/null
@@ -1,54 +0,0 @@
-
-nextflow_function {
-
- name "Test Functions"
- script "subworkflows/nf-core/utils_nextflow_pipeline/main.nf"
- config "subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config"
- tag 'subworkflows'
- tag 'utils_nextflow_pipeline'
- tag 'subworkflows/utils_nextflow_pipeline'
-
- test("Test Function getWorkflowVersion") {
-
- function "getWorkflowVersion"
-
- then {
- assertAll(
- { assert function.success },
- { assert snapshot(function.result).match() }
- )
- }
- }
-
- test("Test Function dumpParametersToJSON") {
-
- function "dumpParametersToJSON"
-
- when {
- function {
- """
- // define inputs of the function here. Example:
- input[0] = "$outputDir"
- """.stripIndent()
- }
- }
-
- then {
- assertAll(
- { assert function.success }
- )
- }
- }
-
- test("Test Function checkCondaChannels") {
-
- function "checkCondaChannels"
-
- then {
- assertAll(
- { assert function.success },
- { assert snapshot(function.result).match() }
- )
- }
- }
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.function.nf.test.snap b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.function.nf.test.snap
deleted file mode 100644
index 846287c41..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.function.nf.test.snap
+++ /dev/null
@@ -1,20 +0,0 @@
-{
- "Test Function getWorkflowVersion": {
- "content": [
- "v9.9.9"
- ],
- "meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
- },
- "timestamp": "2024-02-28T12:02:05.308243"
- },
- "Test Function checkCondaChannels": {
- "content": null,
- "meta": {
- "nf-test": "0.8.4",
- "nextflow": "23.10.1"
- },
- "timestamp": "2024-02-28T12:02:12.425833"
- }
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.workflow.nf.test b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.workflow.nf.test
deleted file mode 100644
index ca964ce8e..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/main.workflow.nf.test
+++ /dev/null
@@ -1,111 +0,0 @@
-nextflow_workflow {
-
- name "Test Workflow UTILS_NEXTFLOW_PIPELINE"
- script "../main.nf"
- config "subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config"
- workflow "UTILS_NEXTFLOW_PIPELINE"
- tag 'subworkflows'
- tag 'utils_nextflow_pipeline'
- tag 'subworkflows/utils_nextflow_pipeline'
-
- test("Should run no inputs") {
-
- when {
- workflow {
- """
- print_version = false
- dump_parameters = false
- outdir = null
- check_conda_channels = false
-
- input[0] = print_version
- input[1] = dump_parameters
- input[2] = outdir
- input[3] = check_conda_channels
- """
- }
- }
-
- then {
- assertAll(
- { assert workflow.success }
- )
- }
- }
-
- test("Should print version") {
-
- when {
- workflow {
- """
- print_version = true
- dump_parameters = false
- outdir = null
- check_conda_channels = false
-
- input[0] = print_version
- input[1] = dump_parameters
- input[2] = outdir
- input[3] = check_conda_channels
- """
- }
- }
-
- then {
- assertAll(
- { assert workflow.success },
- { assert workflow.stdout.contains("nextflow_workflow v9.9.9") }
- )
- }
- }
-
- test("Should dump params") {
-
- when {
- workflow {
- """
- print_version = false
- dump_parameters = true
- outdir = 'results'
- check_conda_channels = false
-
- input[0] = false
- input[1] = true
- input[2] = outdir
- input[3] = false
- """
- }
- }
-
- then {
- assertAll(
- { assert workflow.success }
- )
- }
- }
-
- test("Should not create params JSON if no output directory") {
-
- when {
- workflow {
- """
- print_version = false
- dump_parameters = true
- outdir = null
- check_conda_channels = false
-
- input[0] = false
- input[1] = true
- input[2] = outdir
- input[3] = false
- """
- }
- }
-
- then {
- assertAll(
- { assert workflow.success }
- )
- }
- }
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config
deleted file mode 100644
index a09572e5b..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config
+++ /dev/null
@@ -1,9 +0,0 @@
-manifest {
- name = 'nextflow_workflow'
- author = """nf-core"""
- homePage = 'https://127.0.0.1'
- description = """Dummy pipeline"""
- nextflowVersion = '!>=23.04.0'
- version = '9.9.9'
- doi = 'https://doi.org/10.5281/zenodo.5070524'
-}
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/tags.yml b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/tags.yml
deleted file mode 100644
index f84761125..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nextflow_pipeline/tests/tags.yml
+++ /dev/null
@@ -1,2 +0,0 @@
-subworkflows/utils_nextflow_pipeline:
- - subworkflows/nf-core/utils_nextflow_pipeline/**
diff --git a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nfcore_pipeline/main.nf b/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nfcore_pipeline/main.nf
deleted file mode 100644
index 5cb7bafef..000000000
--- a/hello-nextflow/hello-nf-core/solution/myorg-myfirstpipeline/subworkflows/nf-core/utils_nfcore_pipeline/main.nf
+++ /dev/null
@@ -1,462 +0,0 @@
-//
-// Subworkflow with utility functions specific to the nf-core pipeline template
-//
-
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- SUBWORKFLOW DEFINITION
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-*/
-
-workflow UTILS_NFCORE_PIPELINE {
- take:
- nextflow_cli_args
-
- main:
- valid_config = checkConfigProvided()
- checkProfileProvided(nextflow_cli_args)
-
- emit:
- valid_config
-}
-
-/*
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- FUNCTIONS
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-*/
-
-//
-// Warn if a -profile or Nextflow config has not been provided to run the pipeline
-//
-def checkConfigProvided() {
- def valid_config = true as Boolean
- if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) {
- log.warn(
- "[${workflow.manifest.name}] You are attempting to run the pipeline without any custom configuration!\n\n" + "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + "Please refer to the quick start section and usage docs for the pipeline.\n "
- )
- valid_config = false
- }
- return valid_config
-}
-
-//
-// Exit pipeline if --profile contains spaces
-//
-def checkProfileProvided(nextflow_cli_args) {
- if (workflow.profile.endsWith(',')) {
- error(
- "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n"
- )
- }
- if (nextflow_cli_args[0]) {
- log.warn(
- "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n"
- )
- }
-}
-
-//
-// Citation string for pipeline
-//
-def workflowCitation() {
- def temp_doi_ref = ""
- def manifest_doi = workflow.manifest.doi.tokenize(",")
- // Handling multiple DOIs
- // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers
- // Removing ` ` since the manifest.doi is a string and not a proper list
- manifest_doi.each { doi_ref ->
- temp_doi_ref += " https://doi.org/${doi_ref.replace('https://doi.org/', '').replace(' ', '')}\n"
- }
- return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + "* The pipeline\n" + temp_doi_ref + "\n" + "* The nf-core framework\n" + " https://doi.org/10.1038/s41587-020-0439-x\n\n" + "* Software dependencies\n" + " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md"
-}
-
-//
-// Generate workflow version string
-//
-def getWorkflowVersion() {
- def version_string = "" as String
- if (workflow.manifest.version) {
- def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : ''
- version_string += "${prefix_v}${workflow.manifest.version}"
- }
-
- if (workflow.commitId) {
- def git_shortsha = workflow.commitId.substring(0, 7)
- version_string += "-g${git_shortsha}"
- }
-
- return version_string
-}
-
-//
-// Get software versions for pipeline
-//
-def processVersionsFromYAML(yaml_file) {
- def yaml = new org.yaml.snakeyaml.Yaml()
- def versions = yaml.load(yaml_file).collectEntries { k, v -> [k.tokenize(':')[-1], v] }
- return yaml.dumpAsMap(versions).trim()
-}
-
-//
-// Get workflow version for pipeline
-//
-def workflowVersionToYAML() {
- return """
- Workflow:
- ${workflow.manifest.name}: ${getWorkflowVersion()}
- Nextflow: ${workflow.nextflow.version}
- """.stripIndent().trim()
-}
-
-//
-// Get channel of software versions used in pipeline in YAML format
-//
-def softwareVersionsToYAML(ch_versions) {
- return ch_versions.unique().map { version -> processVersionsFromYAML(version) }.unique().mix(Channel.of(workflowVersionToYAML()))
-}
-
-//
-// Get workflow summary for MultiQC
-//
-def paramsSummaryMultiqc(summary_params) {
- def summary_section = ''
- summary_params
- .keySet()
- .each { group ->
- def group_params = summary_params.get(group)
- // This gets the parameters of that particular group
- if (group_params) {
- summary_section += "