conda install -c bioconda kraken2
Step1: Installation of kraken2, wrt https://anaconda.org/bioconda/kraken2kraken2-build --download-taxonomy --db kraken2Step2: Installing the Taxonomy, wrt
$ kraken2-build --download-library bacteria --db kraken2
Step3: I am interested in the Bacteria custom DataBase, since Brucella is a bacterial genus.
wrt, https://en.wikipedia.org/wiki/Brucella
$ cd kraken2
hash.k2d: Contains the minimizer to taxon mappingsopts.k2d: Contains information about the options used to build the databasetaxo.k2d: Contains taxonomy information used to build the databasecat ftpfilepaths | parallel -j 20 --verbose --progress "cd all && curl -O {}"$ gunzip -c *.fna.gz$ cat *.fna | tee brucellome.fa
] # End of Scripts sub-section [...] required to construct " brucellome.fa "
Step-6: The STDOUT from Step-6 above was refined (To Print classified sequences to file)
$ kraken2 --db . brucellome.fa --classified-out bruce47.outStep-7: To Print a report with aggregrate counts/clade to file, I did$ kraken2 --db . --report bruce47.txt bruce47.out
The output of kraken-report, as we Know, is tab-delimited, with one line per taxon. The fields of the output, from left-to-right, are as follows:
The scientific names are indented using spaces, according to the tree structure specified by the taxonomy.
By default, the values of k and l are 35 and 31, respectively (or
15 and 12 for protein databases). These values can be explicitly set
with the --kmer-len and minimizer-len options, however. Note that
the minimizer length must be no more than 31 for nucleotide databases,
and 15 for protein databases. Additionally, the minimizer length l
must be no more than the k-mer length. There is no upper bound on
the value of k, but sequences less than k bp in length cannot be
classified.
Kraken 2 also utilizes a simple spaced seed approach to increase accuracy. A number s < l/4 can be chosen, and s positions in the minimizer will be masked out during all comparisons. Masked positions are chosen to alternate from the second-to-last position in the minimizer; e.g., s = 5 and l = 31 will result in masking out the 0 positions shown here:
111 1111 1111 1111 1111 1101 0101 0101
By default, s = 7 for nucleotide databases, and k = 0 for
protein databases. This can be changed using the --minimizer-spaces
option along with the --build task of kraken2-build. REFERENCE-
https://github.com/DerrickWood/kraken2/wiki/Manual#custom-databases [ Section-3].
Dear Member, it is after Step-7 that I am stuck. How do I do
(1) Binning, using Sourmash for instance.
(2) Generate Heatmaps
(3) Detect Prevelance?
(4) Estimate relative Abundances?
Also, is it technically (and biologically speaking), correct to inflate 140 -> 277 Sequences in a Multi-FastA file by merging ALL the chromosomes from individual Brucella species?
I sincerely seek your Help to proceed further.
Also, as a purely computational exploration, I wish to do Trial & Error based on,
(1) various kMer lengths, apart from 47
(2) Simulate Whole genome Shotgun metagenomic reads using
dwgsim Tool, besides Brucella isolates FastQ files from DDBJ-DRA
https://ddbj.nig.ac.jp/DRASearch/query?keyword=Brucella+metagenome&show=20
https://davetang.org/wiki/tiki-index.php?page=DWGSIM
Investigating the diversity of Brucella isolates ,
that Corresponds to the NCBI- BioProject,
The Typical results for 47-mer Brucellome classification is duly enclosed for Reference.

Thanking you. Sincerely yours ,
Praharshit Sharma https://www.linkedin.com/in/bioinformaticsharma/
+91 93 511 755 63 "He who can, Does. He who cannot, Teaches."
--
Bioclues is India's largest bioinformatics society working for bridging mentor mentee relationships. We are an affiliate of APBioNet.org and ISCB.org
http://www.bioclues.orgFacebook: https://www.facebook.com/groups/bioclues/Linkedin: http://www.linkedin.com/groups?gid=1339327
We respect your privacy. Should you feel you are not interested, please unsubscribe from the googlegroup. Alternatively, you can send a request to the moderator. Aside, we would highly appreciate if you respect open access.
---
You received this message because you are subscribed to the Google Groups "Bioclues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bioclues+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bioclues/CALrM1gAU7ZYThdfQgRqKxDBVoGq1CnySSznGze2SgbQ8epUNJQ%40mail.gmail.com.
![]()
This message is eligible for Automatic Cleanup! (praharsh...@gmail.com) Add cleanup rule | More info
President & CEO
To view this discussion on the web visit https://groups.google.com/d/msgid/bioclues/CAMG%2BiGqa_Ri2kg45rkXXy48kntKQL4cqBx8ufqsuQ-dQEsnCfA%40mail.gmail.com.