Metagenomic assembly
The metagenomic workflow described by Namaste is based on de novo assembly.
Assembly starts from the high-quality reads produced by the
preprocessing step and generates longer, contiguous
sequences (contigs).
The reads are assembled using
Flye (version 2.9.2).
Namaste uses the default parameters for Nanopore reads (option --nano-hq),
designed for Guppy5+ basecalling (SUP mode) and reads with a 3-5% error
rate (Q20).
Flye also calculates contig length and coverage statistics. These statistics are based on read mapping with minimap2 and include some filtering to arrive at final coverage values. To also provide 'raw' coverage values, Namaste maps all high-quality reads to the assemblies (with minimap2 version 2.30) and calculates coverage statistics with samtools coverage (version 1.22.1). Coverage values of both are reported to the end-user. Although they are usually very similar and should both be valid for use in downstream analyses, we leave the choice of which one to use to the user.
Additionally, assemblies are summarised using seqkit stats (version 2.9.0) to calculate number of sequences, total assembly length and N50 per sample.
Output files
The assembly step yields de novo assembeld contigs as fast file as well as a number of statistics reports.
results/
assembly/
assembly_statistics-seqkit.tsv # simple sequence statistics (seqkit stats)
{sample}/
assembly.fasta # contigs generated by Flye
assembly_info.txt # contig statistics by Flye
mapped_back/
{sample}.bam # reads mapped back to contigs (minimap2)
{sample}-coverage.tsv # coverage statistics per sample (minimap2+samtools)
contig_coverage.csv # overall coverage statistics (minimap2+samtools)
For details, please see output.
Next steps
Assembled metagenomic contigs are further used to:
→ Screen antibiotic resistance mutations