So, my first doubt is about how to implement the Uniref90 reference database. I've been looking through the documentation and I am not sure if I have to create a brand new module for it or simply add the lines below to the human_gut_profiler.ngl script:references:-name: 'Uniref'fasta-file: '../humann3/uniref/uniref90.fa'... ... :..igc_mapped = map(non_human_reads, reference='Uniref', mode_all=True)
My second doubt is about the results.On specI.raw.counts.txt, motus.counts.txt and specI.scaled.counts.txt files the result are organized on cluster instead than in microorganisms species and I have tried to look for those clusters on motus database but without success, so i don't really know where those clusters come from, how to interpret them or what are they refering to.
Similar problem with the eggNOG.traditional.counts.txt file, I don't really know where to look for the codes (for instance, 004S6@aciNOG) for retrieve some information from the results.
No, this is mixing up two different ways of doing things.If you want to just use it in the script, you can pass the fasta file directlyuniref_mapped = map(non_human_reads, fafile='../humann3/uniref/uniref90.fa', mode_all=True)Alternatively, you can build a "module" that encodes these paths using the yaml format.
I don't know if you have followed some of the most recent discussions, but we are now advising people to use a more recent version of motus: https://github.com/ngless-toolkit/ngless-contrib/tree/master/motus.ngm
These refer to eggnog OGs: http://eggnog5.embl.de/#/app/results
These refer to eggnog OGs: http://eggnog5.embl.de/#/app/resultsI found it in other chats of the forum, so sorry for asking it before looking through the other chats. Anyhow I can't find some of the codes in the link, I guess because its eggnog5 while NGLess is built with eggnog4.5 (if i remember well). So, do you know where can I find the Eggnog4.5 website, annotations, database or whatever?
Also, I would like to comment a doubt I have about the preprocessing step. I am working with paired end metatrancriptomics data that we processed with kneaddata software in order to remove human RNA and ribosomal RNA. So, I can skip the preprocessing step in the ngl scripts, and then I have been trying different combinations of functions and steps on the scripts. I have figured out that, when I skip the preprocessing step, during the mapping step the total reads number is similar to the sum of the two fastq.gz files sequences. Meanwhile, if I keep the preprocessing step the number of reads is bit lower than the number of sequences in one of the files.In both cases, the load_mocat_sample recognised the two fastq.gz files as paired end files. So it seems like the preprocessing step put together the paired reads, Do you know what is it due to or what is the fundament of this?
Please if you have any doubt about the question or need the log files, don't hesitate in asking for them.
Thank you very much, best regards.Iñigo.
--You received this message because you are subscribed to the Google Groups "NGLess" group.To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com.To view this discussion on the web visit https://groups.google.com/d/msgid/ngless/75050f22-e095-4c31-b9bd-3b36771c9826n%40googlegroups.com.
I also have the same concern about the database creation using my own data, for example, the genes from my denovo assembly of metagenomics dataset.Is there any requirement for those gene nucleotide fasta file? e.g., naming, special characters?
And I am suppose to use bwa to create index and database from that fasta file or it will be automatically created when it find out that there are no bwa index for that?
Moreover, for the current reference database, you provide annotation file OM-RGC.functional.map.gz.May I know how you link the annotation file to the fasta file.,,, is it by module.yaml file?
I also noticed that you have the Makefile there.
Please kindly suggest me how shall I link them in command line or how to create module like what you did.Do you have any documents for that?
Regards,Xianghui在2020年9月29日星期二 UTC+8 下午5:39:17<lu...@luispedro.org> 写道:Hi Iñigo,(Sorry for the delay, your message go lost in the shuffle a bit).So, my first doubt is about how to implement the Uniref90 reference database. I've been looking through the documentation and I am not sure if I have to create a brand new module for it or simply add the lines below to the human_gut_profiler.ngl script:references:-name: 'Uniref'fasta-file: '../humann3/uniref/uniref90.fa'... ... :..igc_mapped = map(non_human_reads, reference='Uniref', mode_all=True)No, this is mixing up two different ways of doing things.If you want to just use it in the script, you can pass the fasta file directlyuniref_mapped = map(non_human_reads, fafile='../humann3/uniref/uniref90.fa', mode_all=True)Alternatively, you can build a "module" that encodes these paths using the yaml format.My second doubt is about the results.On specI.raw.counts.txt, motus.counts.txt and specI.scaled.counts.txt files the result are organized on cluster instead than in microorganisms species and I have tried to look for those clusters on motus database but without success, so i don't really know where those clusters come from, how to interpret them or what are they refering to.I don't know if you have followed some of the most recent discussions, but we are now advising people to use a more recent version of motus: https://github.com/ngless-toolkit/ngless-contrib/tree/master/motus.ngmInformation as to what the clusters are can be found in the original motus tool website:Similar problem with the eggNOG.traditional.counts.txt file, I don't really know where to look for the codes (for instance, 004S6@aciNOG) for retrieve some information from the results.
--You received this message because you are subscribed to the Google Groups "NGLess" group.To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ngless/5909cff3-4f4a-4e68-bc5d-812a958ec889n%40googlegroups.com.
ngless-count.py [-h] -i INPUT -o OUTPUT [-f FEATURES] [-m {dist1,all1,1overN,unique_only}] [--auto-install] [--debug]
ngless-map.py [-h] -i INPUT [-i2 INPUT_REVERSE] [-s INPUT_SINGLES] -o OUTPUT [--auto-install] [--debug] (-r {sacCer3,susScr11,ce10,dm3,gg4,canFam2,rn4,bosTau4,mm10,hg19} | -f FASTA)
My question is how to specify the reference file. -r use the reference database name , so we still need to create the database first?
Moreover, if I only have the database fasta file and do not have an annotation, will the the map function won't check the annotation file and count function only try to look for annotation file if the features are specified.
Please kindly suggest,
Regards
Xianghui
I did not use parallel. The collect() call at line 16 could not be executed as there are partial results missing.What does that mean?That gene does not have annotation? line 16 of which file?
To view this discussion on the web visit https://groups.google.com/d/msgid/ngless/5c6da947-84d4-4596-b7f0-bd9a4e709cben%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ngless/f2ea39fa-6367-448a-a242-18f2857f7f9fn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ngless/969ddbac-42f6-4a7d-ab6b-c226a50d474bn%40googlegroups.com.
1. If I run the job without annotation file and summarise the result based on sequence, later I filter only those sequences with reads mapped, and do annotation with eggnog with those genes, ( this can save time to annotate the huge database with eggnog), can I combine the 'summarise the result based on sequence' with 'annotation with eggnog with those genes' to generate the to new summary table based on KOs.
2. I am running the job in a HPC and using PBS qsub system, I have multiple samples, I managed to create a sample file like tara.demo.sampled, however, I do not know how to work with your 'parallel'. Could you kindly give an example sh file with ngl file.3. It seems that your ngless each time process one sample in the igc.demo.short list, right? Since there are 3 samples in the list, I shall run ngless 3 times. So this is my qsub sh file. If the result file, e.g. igc.profiles.txt, is already existed, will it try to override? or the program will try to check ?
#!/bin/bash### Specify the job Name#PBS -N sra### Specify the project code#PBS -P all### Specify the queue#PBS -q std### The directive below merges standard output & error#PBS -j oe### Specify the requested cpu resource#PBS -l select=1:ncpus=48### Specify the standard output file / folder#PBS -o sra.logs### Change to current working directorycd $PBS_O_WORKDIRpwdwhich nglessngless gut-demo.nglngless gut-demo.nglngless gut-demo.ngl
To view this discussion on the web visit https://groups.google.com/d/msgid/ngless/e87a7365-e2a8-4c99-b9ac-5156b7312aedn%40googlegroups.com.