Output files
Namaste includes a handful of processing steps, each using a few
bioinformatics tools to get the job done. All of these tools generate
output files, yielding a plethora of files in your results/ directory!
Below is a description of the most relevant files.
1. Preprocessing
As described in preprocess, raw reads are quality filtered, producing a file with high-quality reads and a report.
The quality-filtered reads are written to the directory results/filtered_reads/,
for example:
Each of these files will be slightly smaller than the original, raw read file.
For each sample, there will be a JSON and an HTML report of the filtering
process, of which the 'summary' part is parsed for later use. These
files are saved under results/read_qc/ and results/read_qc/summary/.
As an example, these files look like:
$ cat results/read_qc/summary/ERR10188468.json
{
"summary": {
"fastplong_version": "0.2.2",
"before_filtering": {
"total_reads":1295276,
"total_bases":4898452401,
"q20_bases":1985830150,
"q30_bases":644728976,
"q20_rate":0.405399,
"q30_rate":0.131619,
"read_mean_length":3781,
"gc_content":0.411915
},
"after_filtering": {
"total_reads":490786,
"total_bases":1801486902,
"q20_bases":899296433,
"q30_bases":331205707,
"q20_rate":0.499197,
"q30_rate":0.183851,
"read_mean_length":3670,
"gc_content":0.388144
}
}
}
Finally, read quality control (QC) statistics are collected for all samples and
stored in a CSV format table: results/read_qc_summary.csv. It summarises
the number of reads and basepairs before and after quality filtering, as well
as some other quality indicators:
$ head -2 results/read_qc_summary.csv
sample,before_total_reads,before_total_bases,before_q20_bases,before_q30_bases,before_q20_rate,before_q30_rate,before_read_mean_length,before_gc_content,after_total_reads,after_total_bases,after_q20_bases,after_q30_bases,after_q20_rate,after_q30_rate,after_read_mean_length,after_gc_content
SRR17232957_1,315586,1285925859,769615133,342928711,0.598491,0.266678,4074,0.507556,263485,1083609370,696137892,320329347,0.642425,0.295613,4112,0.50747
2. Assembly
The assembly itself generates sequences in fasta format along with a
tab-separated table with basic statistics per contig. These are stored under
results/assembly/{sample}/ where {sample} indicates the base file name of
the raw reads file that was used to generate it. As described in
assembly, Namaste uses Flye to assemble reads, which generates
many different output files. For convenience, we only show the relevant ones.
$ ls -F results/assembly/SRR17232957_1/
assembly.fasta assembly_info.txt mapped_back/
$ head -2 results/assembly/SRR17232957_1/assembly_info.txt
#seq_name length cov. circ. repeat mult. alt_group graph_path
contig_1702 65004 6 N N 1 * *,1702,*
The assembly_info.txt file contains useful information such as contig length,
depth of coverage (cov.) and whether the contig is circular (circ.).
Output files from Flye are also described
on its GitHub page.
Furthermore, Namaste maps reads back to contigs, which can be found in the
subdirectory mapped_back/.
$ ls results/assembly/SRR17232957_1/mapped_back/
SRR17232957_1.bam SRR17232957_1-coverage.tsv
$ head -2 results/assembly/SRR17232957_1/mapped_back/SRR17232957_1-coverage.tsv
#rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq
contig_1 1 17948 33 17948 100 4.92378 22.1 57.5
Per sample coverage files are generated with samtools coverage and are
slightly different from the assembly_info.txt by Flye in terms of
values reported (depth of coverage = meandepth).
Finally, these per sample coverage files are combined into one file for all
samples: results/contig_coverage.tsv.
$ head -2 results/contig_coverage.csv
Sample,Contig,startpos,endpos,numreads,covbases,coverage,meandepth,meanbaseq,meanmapq
SRR17232957_1,contig_1,1,17948,33,17948,100,4.92378,22.1,57.5
And the coverage statistics of both Flye and samtools are combined in one
overall statistics file: results/assembly_stats-concatenated.tsv.gz.
$ zless results/assembly_stats-concatenated.tsv.gz | head -3
sample contig contig_length contig_depth_flye circular contig_mapped_percent contig_depth_samtools contig_mapped_reads
SRR17232957_1 contig_1702 65004 6 N 100 6.54354 207
SRR17232957_1 contig_515 63366 6 N 100 5.86153 150
3. Antibiotic resistance genes
Screening of antibiotic resistance genes (ARGs) is done
with KMA, the output files of which are briefly described on
its GitHub page.
Namaste uses the files with .frag.gz and .res extension, which are tabular
files of hits per contig and per sample, respectively.
Files for the contig-based screening of ARGs are stored in
results/resistance_genes/. Note that the .frag.gz file includes complete
sequences, which quickly becomes unwieldy when using contigs as input.
$ zless results/resistance_genes/SRR17232957_1.hmm.frag.gz | head -2
GTATCTGAACAACTATCTTGTGTGGAATAACCTTGTAAATTACGCCAAAGAAAGCGACATGGAGAAAAGGAACATCTTCTTAACTTTCGTTTTGGCAACATTGAAAACTGCTAAATGCAGAGATTTATCAAACAGACCAGCAGTTCCTCTGGTCGCCTAATTAGAATTTGTGGAGATGATAAGATGGTCAATATAACAGATGTAAAACAGATTCTTCAATTTGCAATAGATGCGGAGATTAAAGTCTTTCTTGATGGTGGCTGGGGTGTAGATGCTCTTCTTGGATATCAGTCAAGAGCCCATAATGATATTGACATTTTTGTAGAAAAGAACGATTATCAGAACTTTATAGAAATAATGAAAGCTAATGGCTTTTATGAGATTAAGATGGAATATACAACATTGAACCATACTGTATGGGAAGATTTGAAAAACAGAATTATTGATTTGCATTGTTTTGAATATACGGACGAAGGTGAAATTCTTTATGATGGGGATTGTTTTCCGGTAGAAACTTTTTCGGGTAAAGGAAGAATTGAGGAAATAGAGGTTTCCTGTATTGAACCATATAGTCAAGTAATGTTCCATCTGGGATACGAGTTTGATGAAAATGATGCACATGATGTGAAGTTATTGTGTGAGACACTTCATATCGAAATTCCAAATGAGTATAGATAACTGCAAATAACAAGTTTGTAGGGGAGTTCTGATACTCCCCTATAAAAATGGCTATTTATCAACTGTTTGTTGTGACATAGCCTTTCATATAATATATCTTTGCATGGATTCCACGGGCCTTACTCCGTGTGTTTTTGCGAATCCACGATAACTCTTCAGGTCGCTCGACCTTCAAAATTACCATGC 1 471 0 495 lnu(C)_1_AY928180 contig_418 4000 4864
AAGATGGTCAATATAACAGATGTAAAACAGATTCTTCAATTTGCAATAGATGCGGAGATTAAAGTCTTTCTTGATGGTGGCTTCTGGGGTGTAGATACTCTTCTTGGATATCAGTCAAGAGCCCATAATGATATTACGACATTTTGTAGAAAAGAACGATTATAAAGGGCACTTTATAGAAATAATGAAGCTAATGGCTTTTATGATTAAGATGGAATATACAACATTGAACCATACTGTATGGGAAGATTTGAAAAACAGACAATGATTTGCATTGTTTTGAATATACGGACGAAGGTGAAATTCTTTATGATGGGGATTGTTTTCCGGTAGAAACTTTTTCGAGGTAAAGGGAAGAATTGAGGAAATAGGGGTTTCCTGTATTGAACCATATAGTCAAGTAATGTTCCATCTGGGATACGGAAGTTTGATGAAAATGATGCACATGATGTGAAGTTATTGTGTGAGACACTTCATATCGAAATTCCCAATGAGTATAGATAACTACCAAATAACAAGTTTGTAGGGGAGTTCTGATACTCCCCTATAAAGAAAACAACATTTATCAACTGTTTGTGTGAGCACAGCCTAAAAAAAATACCTTGCCATCGACAGGGGTATTTTTGTTGACAGTAGCAATACTTAATCTTTGTATATTTCGTGGAGCTCTACCCTCGTAGATAATCGGGGCGTAGCGC 1 406 0 495 lnu(C)_1_AY928180 contig_1422 15104 15802
$ head -2 results/resistance_genes/SRR17232957_1.hmm.res
#Template Score Expected Template_length Template_Identity Template_Coverage Query_Identity Query_Coverage Depth q_value p_value
lnu(C)_1_AY928180 901 1 495 99.19 102.42 96.84 97.63 2.02 896.92 1.0e-26
Contigs in which the ARGs are masked are written back to the assembly folder:
results/assembly/{sample}/assembly_ARG_masked.fasta.
For reads, the results are written to results/resistance_genes-reads/.
$ head -2 results/resistance_genes-reads/SRR17232957_1.hmm.res
#Template Score Expected Template_length Template_Identity Template_Coverage Query_Identity Query_Coverage Depth q_value p_value
lnu(C)_1_AY928180 7599 21 495 99.19 100.00 99.19 100.00 19.28 7535.38 1.0e-26
$ zless results/resistance_genes-reads/SRR17232957_1.hmm.frag.gz | head -2
AAAGATGGTCAATATAACAGATGTAAAACAGATTCTTCAATTTGCAATAGATGCGGAGATTAAAGTCTTTCTTGATGGTGGCTTCTGGGGTGTAGATACTCTTCTTGGATATCAGTCAAGAGCCCATAATGATATTACGACATTTTGTAGAAAAGAACGATTATAAAGGGCACTTTATAGAAATAATGAAGCTAATGGCTTTTATGATTAAGATGGAATATACAACATTGAACCATACTGTATGGGAAGATTTGAAAAACAGACAATGATTTGCATTGTTTTGAATATACGGACGAAGGTGAAATTCTTTATGATGGGGATTGTTTTCCGGTAGAAACTTTTTCGAGGTAAAGGGAAGAATTGAGGAAATAGGGGTTTCCTGTATTGAACCATATAGTCAAGTAATGTTCCATCTGGGATACGGAAGTTTGATGAAAATGATGCACATGATGTGAAGTTATTGTGTGAGACACTTCATATCGAAATTCCCAATGAGTATAGATAACTACCAAATAACAAGTTTGTAGGGGAGTTCTGATACTCCCCTATAAAGAAAACAACATTTATCAACTGTTTGTGTGAGCACAGCCTAAAAAAAATACCTTGCCATCGACAGGGGTATTTTTGTTGACAGTAGCAATACTTAATCTTTGTATATTTCGTGGAGCTCTA 1 406 0 495 lnu(C)_1_AY928180 SRR17232957.300718 300718/1 0 672
AACGGTGTTTCTACCAAGTATCTGAACAACTATCTTGTGTGGAATAACCTTGTAAATTACGCCAAAGAAAGCGACATGGAGAAAAGGAACATCTTCTTAACTTTCGTTTTGGCAACATTGAAAACTGCCAGCCGTGTAGATTGATCAAACAGACCAGCAGTTCCTCTGGTCGCTTAATTAGAATTTTGTAGATGATAAGATGGTCAATATAACAGATGTAAAACAGATTTTCAATTTGCAATAGATGCGGAGATTAAAGTCTTTCTTGCAGGCTGGGGTGGATGCTCTTCTGATATCAGTCGAGAGCCCATAATGATATTGACATTTTTGTAGAAAGAACGATTATCAGAACTTTATAGAAATAATGAAAGCTAATGGCTTTTTATGAGATTAAGATGGAATATACAATACGAACCATATCAGGCAAAGATTTTGAAAACAGAATTATTGATTTAAATGTTTTGAATATACGGACGAAGGTGAAATTCTTTATGATGGGGATTGTTTTCCGGTAGAAACTTTTCGGGTAAAGGAAGAATTGAGGAAATAGAGGTTTCCCGTATTGAACCATAGTCAATGGGCTGTTCCATCTGGGATACGAGTTTGATGAAAATGATGCCATGATGTACAGTTATTGTGAGACACTTCATATCGAAATTCCAATGAGGAGTGATAACTGCAAATAACAAGTTTGTAGGGGAGTTCTGATACCCCCTATAAAAATGGCTATTTATCAATCGTTGGTTGTGACATAGCCTTCATATAATATATCTTTGCATGGATTCCACGGCCTTACCCGTGTGTTTTTGCGAATCCACGATAACTCTTCAGGTCGCCTGACCTTCAAAAATTACCATGCAAAAA 1 341 0 495 lnu(C)_1_AY928180 SRR17232957.245932 245932/1 2528 3392
And a summary of the read-based ARG screening approach for all samples is
stored as TSV table: results/resistance_genes-reads/summary.tsv.
$ head -2 results/resistance_genes-reads/summary.tsv
Sample Gene Score Expected Template_length Template_Identity Template_Coverage Query_Identity Query_Coverage Depth q_value p_value
SRR17232957_1 lnu(C)_1_AY928180 7599 21 495 99.19 100 99.19 100 19.28 7535.38 1e-26
And a complete resistome dataset with ARG info, assembly statistics, taxonomy,
and plasmid/virus predictions (CSV table with 39 columns) is saved as:
results/assembly_database.csv.gz.
$ zless results/assembly_database.csv.gz | head -3
sample,arg,arg_identity,arg_length,arg_depth,contig,start_position,end_position,hit_length,arg_short,arg_coverage,Class,Phenotype,Resistance_mechanism,medical_importance,mean_assembly_depth_flye,mean_assembly_depth_samtools,total_assembly_length,contig_length,contig_depth_flye,circular,contig_mapped_percent,contig_depth_samtools,contig_mapped_reads,Species,Taxonomy,Species_strict,Taxonomy_strict,chromosome_score,plasmid_score,virus_score,plasmid_topology,plasmid_genes,conjugation_genes,amr_genes,virus_topology,virus_genes,virus_taxonomy,genomad_prediction
SRR17232957_1,lnu(C)_1_AY928180,99.19,495,2.02,contig_418,4000,4864,496,lnu(C),100.20202020202021,Lincosamide,Lincomycin,Enzymatic inactivation,HIA,5.768397033656589,7.170406012549915,18310891,22412,5,N,100,6.5833,67,Campylobacter coli,unclassified cellular organisms superkingdom;Campylobacterota;Epsilonproteobacteria;Campylobacterales;Campylobacteraceae;Campylobacter;Campylobacter coli,Campylobacter coli,unclassified cellular organisms superkingdom;Campylobacterota;Epsilonproteobacteria;Campylobacterales;Campylobacteraceae;Campylobacter;Campylobacter coli,0.7905,0.1618,0.0477,NA,NA,NA,NA,NA,NA,NA,chromosome
SRR17232957_1,lnu(C)_1_AY928180,99.19,495,2.02,contig_1422,15104,15802,496,lnu(C),100.20202020202021,Lincosamide,Lincomycin,Enzymatic inactivation,HIA,5.768397033656589,7.170406012549915,18310891,15802,4,N,100,4.42469,19,unclassified cellular organisms species,unclassified cellular organisms superkingdom;unclassified cellular organisms phylum;unclassified cellular organisms class;unclassified cellular organisms order;unclassified cellular organisms family;unclassified cellular organisms genus;unclassified cellular organisms species,,;;;;;;,0.9389,0.0446,0.0166,NA,NA,NA,NA,NA,NA,NA,chromosome
4. Antibiotic resistance mutations
Namaste screens metagenomes for the presence of antibiotic resistance-conferring mutations using MetaPointFinder. Its output is explained on GitHub.
Results are stored under results/resistance_mutations/{sample}/:
$ ls results/resistance_mutations/SRR17232957_1/
accession SRR17232957_1.class.dna.summary.txt SRR17232957_1.frag.gz SRR17232957_1.prot.input.tsv
class SRR17232957_1.class.prot.summary.txt SRR17232957_1.gene.dna.summary.txt SRR17232957_1.prot.updated_table_with_scores_and_mutations.tsv
dna_accession SRR17232957_1.dna.input.tsv SRR17232957_1.gene.prot.summary.txt SRR17232957_1.res
dna_class SRR17232957_1.dna.updated_table_with_scores_and_mutations.tsv SRR17232957_1.log
matched_to_contigs.tsv SRR17232957_1.error SRR17232957_1.prot.hits.txt
These results are combined in the end to make one big table with all mutations
and contig annotations from the other analysis steps:
results/mutation_database.csv.gz. (A table with 33 columns.)
$ zless results/mutation_database.csv.gz | head -3
sample,contig,class,gene,mutations,Unknown,Wildtype,Resistant,fraction_resistant,mean_assembly_depth_flye,mean_assembly_depth_samtools,total_assembly_length,contig_length,contig_depth_flye,circular,contig_mapped_percent,contig_depth_samtools,contig_mapped_reads,Species,Taxonomy,Species_strict,Taxonomy_strict,chromosome_score,plasmid_score,virus_score,plasmid_topology,plasmid_genes,conjugation_genes,amr_genes,virus_topology,virus_genes,virus_taxonomy,genomad_prediction
ERR10188468,contig_20,MACROLIDE,NZ_CP018347.1@23S_ribosomal_RNA@23S:32347-35250,None,1,NA,0,0,7.500926081831958,7.098058050597744,115259403,370730,26,N,99.9957,24.7572,2715,Streptococcus sp. LPB0220,unclassified cellular organisms superkingdom;Bacillota;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcus sp. LPB0220,Streptococcus sp. LPB0220,unclassified cellular organisms superkingdom;Bacillota;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcus sp. LPB0220,0.8733,0.0833,0.0434,NA,NA,NA,NA,NA,NA,NA,chromosome
ERR10188468,contig_20,MACROLIDE_STREPTOGRAMIN,NZ_CP018347.1@23S_ribosomal_RNA@23S:32347-35250,None,1,NA,0,0,7.500926081831958,7.098058050597744,115259403,370730,26,N,99.9957,24.7572,2715,Streptococcus sp. LPB0220,unclassified cellular organisms superkingdom;Bacillota;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcus sp. LPB0220,Streptococcus sp. LPB0220,unclassified cellular organisms superkingdom;Bacillota;Bacilli;Lactobacillales;Streptococcaceae;Streptococcus;Streptococcus sp. LPB0220,0.8733,0.0833,0.0434,NA,NA,NA,NA,NA,NA,NA,chromosome
5. Taxonomic classifications
Namaste classifies contigs with Centrifuger and its database of
microbiota + human. Centrifuger produces a table in TSV format and has a
quant function to calculate microbiota profiles. Furthermore, Namaste uses
both default and 'strict' settings and assigns scientific names to each using
TaxonKit. All of these are stored as TSV files in
results/taxonomic_classification/{sample}/.
For details on the file format, consult Centrifuger's documentation.
$ ls results/taxonomic_classification/SRR17232957_1/
centrifuger_masked-quant.tsv centrifuger_masked-strict-quant.tsv centrifuger_masked-strict+taxa.tsv centrifuger_masked-strict.tsv centrifuger_masked+taxa.tsv centrifuger_masked.tsv
$ head -3 results/taxonomic_classification/SRR17232957_1/*
==> results/taxonomic_classification/SRR17232957_1/centrifuger_masked-quant.tsv <==
name taxID taxRank genomeSize numReads numUniqueReads abundance
root 1 no rank 2070143 1696 1018 1.000000
Bacteria 2 superkingdom 4534370 1628 989 0.911828
==> results/taxonomic_classification/SRR17232957_1/centrifuger_masked-strict-quant.tsv <==
name taxID taxRank genomeSize numReads numUniqueReads abundance
root 1 no rank 2070143 39 14 1.000000
Bacteria 2 superkingdom 4534370 39 14 1.000000
==> results/taxonomic_classification/SRR17232957_1/centrifuger_masked-strict+taxa.tsv <==
readID seqID taxID score 2ndBestScore hitLength queryLength numMatches
contig_1 unclassified 0 0 0 0 17948 1 ;;;;;;
contig_10 unclassified 0 0 0 0 5781 1 ;;;;;;
==> results/taxonomic_classification/SRR17232957_1/centrifuger_masked-strict.tsv <==
readID seqID taxID score 2ndBestScore hitLength queryLength numMatches
contig_1 unclassified 0 0 0 0 17948 1
contig_10 unclassified 0 0 0 0 5781 1
==> results/taxonomic_classification/SRR17232957_1/centrifuger_masked+taxa.tsv <==
readID seqID taxID score 2ndBestScore hitLength queryLength numMatches
contig_1 NZ_CP035464.1 78344 121 64 26 17948 1 Bifidobacterium pullorum unclassified cellular organisms superkingdom;Actinomycetota;Actinomycetes;Bifidobacteriales;Bifidobacteriaceae;Bifidobacterium;Bifidobacterium pullorum
contig_10 NZ_CP102287.1 679190 100 0 25 5781 1 Hoylesella buccalis unclassified cellular organisms superkingdom;Bacteroidota;Bacteroidia;Bacteroidales;Prevotellaceae;Hoylesella;Hoylesella buccalis
==> results/taxonomic_classification/SRR17232957_1/centrifuger_masked.tsv <==
readID seqID taxID score 2ndBestScore hitLength queryLength numMatches
contig_1 NZ_CP035464.1 78344 121 64 26 17948 1
contig_10 NZ_CP102287.1 679190 100 0 25 5781 1
Also, for each of these Namaste produces microbiota profiles as TSV in
results/microbiota_profile/.
$ ls results/microbiota_profile/SRR17232957_1-*
results/microbiota_profile/SRR17232957_1-per_contig.tsv results/microbiota_profile/SRR17232957_1-strict-per_contig.tsv
results/microbiota_profile/SRR17232957_1-per_species.tsv results/microbiota_profile/SRR17232957_1-strict-per_species.tsv
$ head -2 results/microbiota_profile/SRR17232957_1-*
==> results/microbiota_profile/SRR17232957_1-per_contig.tsv <==
readID seqID taxID score 2ndBestScore hitLength queryLength numMatches Species Lineage length depth total_bases grand_total Sample Percentage
contig_282 NZ_LN879430.1 1679721 66113 52062 295 22345 1 Herbinix luporum unclassified cellular organisms superkingdom;Bacillota;Clostridia;Lachnospirales;Lachnospiraceae;Herbinix;Herbinix luporum 22345 61.431 1372675.6949999998 132539452.71505 SRR17232957_1 1.0356732783189853
==> results/microbiota_profile/SRR17232957_1-per_species.tsv <==
Sample Species Lineage total_bases Percentage
SRR17232957_1 unclassified cellular organisms species unclassified cellular organisms superkingdom;unclassified cellular organisms phylum;unclassified cellular organisms class;unclassified cellular organisms order;unclassified cellular organisms family;unclassified cellular organisms genus;unclassified cellular organisms species 8955780.64689 6.757067773732448
==> results/microbiota_profile/SRR17232957_1-strict-per_contig.tsv <==
readID seqID taxID score 2ndBestScore hitLength queryLength numMatches Species Lineage length depth total_bases grand_total Sample Percentage
contig_282 family 186803 59536 59536 259 22345 1 unclassified Lachnospiraceae species unclassified cellular organisms superkingdom;Bacillota;Clostridia;Lachnospirales;Lachnospiraceae;unclassified Lachnospiraceae genus;unclassified Lachnospiraceae species 22345 61.431 1372675.6949999998 132539452.71505 SRR17232957_1 1.0356732783189853
==> results/microbiota_profile/SRR17232957_1-strict-per_species.tsv <==
Sample Species Lineage total_bases Percentage
SRR17232957_1 NA ;;;;;; 123421656.91637 93.1206930375044
In the end, per contig classifications of all samples are collected in two files:
results/classifications-concatenated.tsv.gz and
results/strict_classifications-concatenated.tsv.gz.
6. Plasmid predictions
Namaste uses geNomad to predict whether contigs derive from
chromosomal DNA, plasmids or viruses. Per sample results are saved in
results/plasmid_prediction/{sample}/.
$ ls -F results/plasmid_prediction/SRR17232957_1/
assembly_aggregated_classification/ assembly_annotate.log assembly_marker_classification/ assembly_nn_classification.log assembly_summary/
assembly_aggregated_classification.log assembly_find_proviruses/ assembly_marker_classification.log assembly_score_calibration/ assembly_summary.log
assembly_annotate/ assembly_find_proviruses.log assembly_nn_classification/ assembly_score_calibration.log
A detailed description of the outputs created by geNomad can be found in its documentation.
Output for contigs predicted to derive from plasmids and viruses are
concatenated for all samples and saved under 'results':
results/plasmid_predictions-concatenated.tsv.gz and
results/virus_predictions-concatenated.tsv.gz.
$ zless results/plasmid_predictions-concatenated.tsv.gz | head -3
contig length plasmid_topology plasmid_genes genetic_code plasmid_score fdr n_hallmarks marker_enrichment conjugation_genes amr_genes sample
contig_1753 4328 No terminal repeats 10 11 0.999 0.001 0 2.1513 NA NA SRR17232957_1
contig_1750 7896 No terminal repeats 11 11 0.999 0.001 0 0.8884 NA NA SRR17232957_1
$ zless results/virus_predictions-concatenated.tsv.gz | head -3
contig length virus_topology coordinates virus_genes genetic_code virus_score fdr n_hallmarks marker_enrichment virus_taxonomy sample
contig_941 8070 No terminal repeats NA 13 11 0.9989 0.0011 4 13.2017 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;; SRR17232957_1
contig_837 12674 No terminal repeats NA 20 11 0.9989 0.0011 6 17.1828 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;; SRR17232957_1