Namaste user manual

Quick start

Install dependencies: git, mamba and Snakemake.

Download the repository:

git clone https://github.com/UtrechtUniversity/MEGAISurv-Namaste.git

Move into the downloaded directory:

cd MEGAISurv-Namaste

Collect long-read metagenomes for input, for example: (using sracha to download public metagenomes from the European Nucleotide Archive (ENA).)

sracha get --output-dir data SRR28879900
sracha get --output-dir data SRR28879905
sracha get --output-dir data SRR28879907

Or insert a link to files in a different location:

cd data
ln -s /path/to/metagenomes/*fastq.gz .

(Where you replace "/path/to/metagenomes/" with the actual path on your system!)

Input files in the directory data/ should be automatically recognised. Test this by doing a dry-run:

snakemake --profile config -n

If that returns no errors, proceed with running the actual workflow:

snakemake --profile config

1. Before you start

The Namaste workflow processes long-read metagenomes from the specified input folder (default='data/'). It is based on the Snakemake workflow management system and uses mamba for installing dependencies. Furthermore, you will need git as that is currently the only available method to install Namaste. (Download from GitHub.)

Input files are detected automatically as long as they are in the specified input folder, which is defined in config/parameters.yaml.

Estimated disk use

Besides the input metagenomes that may be big, Namaste needs a number of databases to work. These include:

AMRFinder: ~10MB
Centrifuger 'cfr_hpv+gbsarscov2': 43GB
geNomad default database: 1.4GB
MetaPointFinder (=AMRFinder)
ResFinder: ~100MB
NCBI Blast taxonomy (taxdump): ~500MB

Total: ~45GB

Download and install software

Before you begin, you need to install: (follow these links to find installation instructions)

We recommend Snakemake is installed via mamba. This is also the default and linked above. Namaste has been tested with Snakemake version 9.3.0 and is expected to work with any version >=6.

When you have these tools installed, you can download Namaste:

git clone https://github.com/UtrechtUniversity/MEGAISurv-Namaste.git

Change directory into the newly downloaded folder to get started:

cd MEGAISurv-Namaste

You may rename this folder if you want to, for example:

mv MEGAISurv-Namaste namaste
cd namaste

Adjusting parameters

Namaste has a few options that may be modified by the user. These are listed in two configuration files:

config
├── config.yaml
└── parameters.yaml

The most important are the input directory ('input_directory' in config/parameters.yaml, default='data/') and the number of CPU threads to use ('jobs' in config/config.yaml, default=72).

Please modify these in your favourite text editor to fit your setup.

2. Running the workflow

The workflow is fully automated and should complete with one command. For details on what happens under the hood, see the tab 'Workflow details'.

One can do a 'dry-run' to test if all preparations have been satisfied:

snakemake --profile config -n

To run the actual workflow:

snakemake --profile config

Exceptions: failed assembly

Sometimes, a metagenome may not contain sufficient reads to generate a de novo assembly. For example, when negative controls (blanks) are included. The workflow cannot successfully complete the analysis of these samples and returns errors for steps downstream of the assembly. There is a script included to automatically flag these samples and move them to a subdirectory, so that the workflow may exit successfully.

python scripts/exclude_failed_assemblies.py

This script reads the config/parameters.yaml file to determine the correct input directory. Input reads are moved to a subdirectory cannot_assemble. It also generates a simple QC report listing which samples did and which did not yield a working assembly (fasta) file.

3. Interpreting results

After running the workflow, the user is presented with a number of output files. These are described in detail under the tab 'Output files'.