Re: SeqMule for Analysis of Arabidopsis thaliana

9 views
Skip to first unread message

Juan Manuel Luque Sánchez

unread,
Jun 5, 2021, 7:39:03 AM6/5/21
to WGLab/SeqMule, yunfei guo, seqmu...@googlegroups.com
   Dear Yunfei,

  A long time since i wrote you the last time. I want tio ask you, please, why one year ago (more or less) I worked well with SeqMule (without problems with gatk) and with java 1.8 version, and now (yesterday) with the same java 1.8 version (without changes in my SeqMule) I get123  ERROR WITH  gatk : SeqMule Execution Status: step 7 FAILED ,gatklite realn]
ERROR: command failed.
(As # 123 issue in Wiki SeqMule).

Why before (1 year ago) no problem and now yes?. May I install now java 1.7 version in my computer to solve the problem, and may I delete java 1.8 version?

Waiting for your answer, please, thank you very much for your help.

Best regards

Juan Manuel Luque

El jue, 5 sept 2019 a las 10:02, Juan Manuel Luque Sánchez (<j616...@gmail.com>) escribió:
           Yunfei, sorry, I forgot the issue number (GitHub) is  #176 .

---------- Forwarded message ---------
De: Juan Manuel Luque Sánchez <j616...@gmail.com>
Date: mié., 4 sept. 2019 a las 14:44
Subject: Re: SeqMule for Analysis of Arabidopsis thaliana
To: yunfei guo <guoyun...@outlook.com>


Hi Yunfei,
I have just sent you (by GitHUb -WGLab/SeqMule) my next question:


As you know , we are trying to apply SeqMule for Arabidopsis thaliana, and we would like to delete the messages:
NOTICE: Reference genome build is hg 19 and
NOTICE: dbsnp138 will be used for variant calling and recalibration
when pipeline starts (or to change the message about hg 19 by other reference genome build).
We have changed the next parameters:
go_dbsnp=0 #dbSNP file for GATK only
go_hapmap=0 #HapMap file for GATK
go_dbsnpver=0 #dbSNP version for variant recalibration and annotation, default is 137
go_kg=0 #1000 Genome project VCF file
go_buildver=0 #genome build version, default is hg19
and also:
o_gatklite_forceSNPHardFilter=1 #if set to 1, force using hard filtering on SNP, otherwise seqmule use VQSR only if BAM size is larger than 1GB. CAUTION: do NOT set o_gatklite_forceSNPVQSR and o_gatklite_forceSNPHardFilter both to 1!
and:
o_gatklite_forceINDELHardFilter=1 #if set to 1, force using hard filtering on INDEL, otherwise seqmule use VQSR only if BAM size is larger than 15GB. CAUTION: do NOT set o_gatklite_forceINDELVQSR and o_gatklite_forceINDELHardFilte both to 1!

But the NOTICE messages are the same again (hg19 and dbsnp 138) when we run the pipeline.

Please, what can we do to solve this?
Thank you very much.
Best.
Juan M.


El sáb., 24 ago. 2019 a las 15:13, yunfei guo (<guoyun...@outlook.com>) escribió:
Hi Juan,

The seqmule version used to generate the script is different from the version you are executing. Please rerun the script generation. Thanks.

On Aug 23, 2019, at 1:57 AM, Juan Manuel Luque Sánchez <j616...@gmail.com> wrote:

Hi Yunfei,
I have to ask you please about an ERROR with SeqMule (because I have searched in FAQ-SeqMule and internet, but I don't find anything about):

When I try  $ perl seqmule run PRUEBA1.script    (PRUEBA1 is a pipeline not ended that I want to restart), then the output is:
Current version: 1.2.3
ERROR: incompatible execution script version
Supported versions: 1.2

And I did the same yesterday with other different script WITHOUT PROBLEMS AND FINALLY RUN RIGHT TO THE END OF THE PROCESS/PIPELINE.

Could you help me, please?

Thank you very much.

Juan M.

El jue., 22 ago. 2019 a las 17:47, yunfei guo (<guoyun...@outlook.com>) escribió:
Hi Juan,

Thanks for sharing the good news. Glad to hear that. Enjoy your research!

Best,
Yunfei

On Aug 22, 2019, at 4:52 AM, Juan Manuel Luque Sánchez <j616...@gmail.com> wrote:

Hi Yunfei,
I have good news for you.For the first time the whole process or pipeline of SeqMule for Arabidopsis thaliana reach the end without problems and getting a result of variants.This is the pipeline we used:

$ perl seqmule pipeline -a Arabthcontrol_R1.fastq.gz -b Arabthcontrol_R2.fastq.gz -capture TAIR10_chr.all.bed -t 1 -prefix  PRUEBA1 --ref TAIR10_chr.all.fasta -e


We had to change (with commands) the name Chr by chr in the TAIR10_chr.all.bed and TAIR10_chr.all.fasta , because SeqMule give us ERROR: NO UPPERCASE ALLOWED IN chr OF A CONTIG NAME: Chr C..... But when we changed  those letters in these files, the process was fine until the end and giving results.Now we will analyze the results, and we will go on doing proofs.

Meanwhile thank you very much for your help.Best.

Juan M.


El mié., 21 ago. 2019 a las 5:44, yunfei guo (<guoyun...@outlook.com>) escribió:
TAIR10_Chr.all.dict is NOT a .bed file but a .dict file. Please generate the correct .bed file using the fai2bed.pl from .fai file. Thanks.

On Aug 20, 2019, at 2:55 PM, Juan Manuel Luque Sánchez <j616...@gmail.com> wrote:

Yunfei, I used(with the fai2bed.pl script) a .fai file of TAIR10_Arabidopsis and generated a TAIR10_Chr.all.dict file (that it is the same I sent you enclosed in my last mail ).But now I DON' T  KNOW  HOW CONVERT THIS .dict FILE IN A BED FILE (.bed).I have consulted and investigated several sources , but I still don't know how to do it.Please, could you help me to get this bed file of Arabidopsis?.And so to go on the proofs with SeqMule for Arabidopsis.

Thank you very much

Juan M.

El mar., 20 ago. 2019 a las 19:34, yunfei guo (<guoyun...@outlook.com>) escribió:
The bed file contains non standard characters and cannot be used.

Please use .fai file to generate a BED file.

On Aug 20, 2019, at 9:50 AM, Juan Manuel Luque Sánchez <j616...@gmail.com> wrote:

Hi Yunfei,
I paste you what you ask me , and the output:
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ head /media/ubuntu/DATOS/SeqMule-master/bin/Arabth.bed
>w~��״�⣊U |�i�9�ݐ��� [���`/����ꮮ h� l�3��� �~}���������� ^�>=��} n\��������釫�O/�Ϗ��>~} ����������������vz������{������������>�����������������/������_ ���_��D�������������O�tsw�x�ɽ޾��������3-D���7����?я⟮iI�_��H
                                                     -����I�� �Y�] w 6��!ſ�����J]�Кie-+sY�6 NZt%#]9��' �Y��ỡ�Ժj*���Z&դ���� �0 _�3�Z�<�ꐢS �&�zZ1��yqcM
"��]�նB�J,�!
            }� �%��� ~O ��"Y̹ڮ$O�L J�Z �Р�V�(�}M�3�)B�#[ Jy _�83Wk G�2)
                                                                      7�" C�Zm%��F���j�7 �d� ,sV!J���҆6��5�tp�E Fq(��
��BOv�P�Đ.�P�D�4��#F�� �f�Z�� � ��V�]��k B$s7QK� �J� ]ai�� #�� g��0�M�#%g��vA�<�*~G G�CU*�*kk���VWUF��j�{��� iʓ"�]E�B 5yOͷUȷ|A �/�}� MWT[ %d2� �P�90�F�A�Ӑ�)Gjp(-hq�u�. v�$��-2��N]�\�72|
                                 �p�So ��N* $�[Qr�k6EM��
Z]3p#穭T�:Q��M���ZK� ���� :+ �XA!
                                  8D � � ���"Cǵ�n!j�� T U�zQ�1-���V�nV�� �s8DI��y�]���4%[+F� �s�;�i �
                       “�
                         X����s'j�{v���dK�멚z���J�$���+ ڻ8��(h�� �)�� <�,@�J�Hً��"�������1�%p s� QX�WJ�
�L�(�                  I�t��2�і �*��
      `k���YYP=�'�.܅$� �v��Cwu�E۩O��T��DQ ��S:9U4��,Z�6���Y
0dR��uT(k��ăA� >��Rr� W
                         а
                          ɳ)e�f��2�% ���l�� f>';Q%a�(��` .L\V�jNS�[�9��N�˚��GT~�_B�z�!���$�JG ���.uF�
                     ��Q��� ���Z��` 8,4=,� �T�A�" ���
�]'��a�sж��� Z ��bv � 8� ��� � ��˝�����6��ȾP�� �R/�!�I��q+ � F� �aÍ��ܢ!�� ��]F 捆^ ��P� �
a� ΀ �&QgQ�#0 0�۷q$ᰈ��ѱ���I� ��!� �fз�Ȋ�W h Ԯ�� �h7��i �
                      ��-��0

                            �٫H&�eTK�/� �%/���}�X�-+��0�&CX�� m� ���/5��g�A船��E> �$�}���u
�m v2� 82Ս#��2�
               �!PmA Ґr,@ۃA��%9�" h���#� �~8
                                             "���0 Sg�<f���
c��Z?m|��u�} 1?i ��?p4 �o��A�b�B� �^�4
                                       n�#�֘����C���C ǮO߸N�G�bc2���� opq'�+�;�x�(n�U���(� p6}��<� ���8������� �W ��QS����hSe^px���a�:��e��o�lŦ�9� �g��}�V�K��әbqs׍'a�Gg�;�����9���A�N�:�Pۅb ��
                             �<I�}�`�&� ��Y�9GH���ғt,��>��;$a��{ �f���Ρ9^� U��I,��SS�0��i )�Q��n���A)�k֌2��ʇ+����T�9�I��) ��vw,�V�w���g �-"�vȡ�P���׆u*�':���9L%\& � ���R�F!�Y0c�lr_�a���8L�?4���Lnar��vۓ���S|��R�KVQ;5���: ����`�2�:� �&�lc~�<���^=�#���s\ǚ�I�v�j�9 �� GPcp�
                              �1 ���ۨDq��X ܢ^�2(m?D� �~#�����,�#�\� 4A��H�<�.8�/��U� �B��ߤL+68����6�*�� n�,p�
�ʸ +�R�1p��a��� 26�0�~ڋ����� ��� ����������Qt��^ԯ��ˀW� ����� /
  ��
    ;� V� �_!�)�}m4OIR/D������� (�JhG!�{ ��w?ā� �Nr#�ߟw� ��� ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$

By other side, I have generated other bed file for Arabidopsis  with fai2bed.pl script you recommended me, and this is TAIR10_Chr.all.dict (I send you also enclosed for you can see), but I don't know if this .dict file I can use as .bed file of Arabidopsis and put in your pipeline for SeqMule, or it must be converted to a .bed file ?.Could you answer me this question, please?. The Arabth.bed file was generated from Table browser of UCSC.

Thank you very much

Juan M.

El mar., 20 ago. 2019 a las 15:13, yunfei guo (<guoyun...@outlook.com>) escribió:
Could you show me the first few lines of your bed file? i.e. run

head /media/ubuntu/DATOS/SeqMule-master/bin/Arabth.bed

The bed file apparently contains some malformed lines.

On Aug 20, 2019, at 4:25 AM, Juan Manuel Luque Sánchez <j616...@gmail.com> wrote:

Hi Yunfei,
I have put the index files for TAIR10_arabdopsis.fasta and TAIR10_arabdopsis.fasta in the same  bin folder (before they were in the database folder), and the pipeline start without problems until step 6 (it gives some ERRORS) and you can see the last outputs:

[M::main_mem] read 28714 sequences (2167099 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (3, 10189, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (124, 143, 162)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (48, 238)
[M::mem_pestat] mean and std.dev: (143.70, 28.53)
[M::mem_pestat] low and high boundaries for proper pairs: (10, 276)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 28714 reads in 10.316 CPU sec, 10.359 real sec
[main] Version: 0.7.8-r455
[main] CMD: /media/ubuntu/DATOS/SeqMule-master/exe/bwa/bwa mem -M -T 0 -A 1 -B 4 -O 6 -E 1 -L 5 -U 17 -R @RG\tID:READGROUP\tSM:PRUEBA1\tPL:ILLUMINA\tLB:LIBRARY TAIR10_Chr.all.fasta PRUEBA1_result/PRUEBA1.0.fastq.gz PRUEBA1_result/PRUEBA1.1.fastq.gz
[main] Real time: 5750.395 sec; CPU: 5357.148 sec
[bam_sort_core] merging from 7 files...

----------NOTICE----------
[ => SeqMule Execution Status: Running 5 of 19 steps: Filter BAM file by mapping quality, at mar ago 20 11:59:35 CEST 2019, Time Elapsed: 1 hr 42 min 16 s]
[ => SeqMule Execution Status: step 4 is finished at mar ago 20 11:59:35 CEST 2019, bwamem alignment]

[samopen] SAM header is present: 7 sequences.

----------NOTICE----------
[ => SeqMule Execution Status: Running 6 of 19 steps: gatklite realn, at mar ago 20 12:05:26 CEST 2019, Time Elapsed: 1 hr 48 min 7 s]
[ => SeqMule Execution Status: step 5 is finished at mar ago 20 12:05:26 CEST 2019, Filter BAM file by mapping quality]

INFO  11:05:30,333 ArgumentTypeDescriptor - Dynamically determined type of /media/ubuntu/DATOS/SeqMule-master/bin/Arabth.bed to be BED
INFO  11:05:30,373 HelpFormatter - --------------------------------------------------------------------------------
INFO  11:05:30,378 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.3-9-gdcdccbb, Compiled 2013/01/11 20:03:13
INFO  11:05:30,379 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO  11:05:30,379 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO  11:05:30,384 HelpFormatter - Program Args: -T RealignerTargetCreator -I PRUEBA1_result/PRUEBA1.0_bwamem.sort.readfiltered.bam -R TAIR10_Chr.all.fasta -o /tmp/9785.47397996627.tmp.intervals -L /media/ubuntu/DATOS/SeqMule-master/bin/Arabth.bed
INFO  11:05:30,389 HelpFormatter - Date/Time: 2019/08/20 11:05:30
INFO  11:05:30,389 HelpFormatter - --------------------------------------------------------------------------------
INFO  11:05:30,389 HelpFormatter - --------------------------------------------------------------------------------
INFO  11:05:30,404 GenomeAnalysisEngine - Strictness is SILENT
INFO  11:05:30,413 ReferenceDataSource - Dict file /media/ubuntu/DATOS/SeqMule-master/bin/TAIR10_Chr.all.dict does not exist. Trying to create it now.
[Tue Aug 20 11:05:30 GMT+01:00 2019] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=/media/ubuntu/DATOS/SeqMule-master/bin/TAIR10_Chr.all.fasta OUTPUT=/media/ubuntu/DATOS/SeqMule-master/bin/dict4544612316941911242.tmp    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Tue Aug 20 11:05:30 GMT+01:00 2019] Executing as ubuntu@ubuntu-Compaq-CQ58-Notebook-PC on Linux 4.4.0-157-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_95-b00; Picard version: null
[Tue Aug 20 11:05:32 GMT+01:00 2019] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=224919552
INFO  11:05:32,919 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000, Using the new downsampling implementation
INFO  11:05:32,927 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO  11:05:32,942 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
WARN  11:05:34,774 RestStorageService - Error Response: PUT '/GATK_Run_Reports/N4luPhsFtHlJVINNdiniUW1wonsQ6Yt3.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 1319, Content-MD5: FWO1KXgQ4Biz3jmVwmMsZQ==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: 1563b5297810e018b3de3995c2632c65, Date: Tue, 20 Aug 2019 10:05:33 GMT, Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:ipaptihLbLa5PHalhfBn0yU6xTA=, User-Agent: JetS3t/0.8.1 (Linux/4.4.0-157-generic; amd64; es; JVM 1.7.0_95), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: E538A8F2F2342832, x-amz-id-2: 9cI/rc7m+bAVFqwbqtkK8p5rC/9nIImfXXZfirwgGpw71aew7ipCm6/f9BXCMbY3tppWfYR1bfk=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Tue, 20 Aug 2019 10:05:34 GMT, Connection: close, Server: AmazonS3]
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 2.3-9-gdcdccbb):
##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
##### ERROR Please do not post this error to the GATK forum
##### ERROR
##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: File associated with name /media/ubuntu/DATOS/SeqMule-master/bin/Arabth.bed is malformed: Problem reading the interval file caused by
##### ERROR Line: D½ÜüãóíÓëËÍíóíO·tsw÷x³É½Þ¾»¾»»úËýÃ3-D””7¢åï?Ñ âŸ®iIË_ ƒH
                                                                          -¦ë’ÖI†¤
                                                                                   Y‘] w 6ÓÎ!Å¿µ
¤”•J]¹Ðie-+sY³6 NZt%#]9•†' œYð±á»¡Ôºj*Ôò®Z&Õ¤­
                                                   ƒ” ×0 _å3ÈZë<Íê ¢S ¨&ézZ1Á·yqcMÂ
"ð‚]¤Õ¶BÆJ,ð!
             }‡ þ%ø¥ ~O §Ô"Y̹ڮ$OžL J®Z ¸Ð ÂV ( }M3•)B°#[ Jy _¦83Wk Gò2)
                                                                         7È" CëZm%¤°FåÑjÈ7 ‘dü ,sV!J‚ ÄÒ†6™´5¡tp¤E Fq(›
##### ERROR ------------------------------------------------------------------------------------------
^[[?62;c^[[?62;c

----------ERROR----------
[ => SeqMule Execution Status: step 6 FAILED at mar ago 20 12:05:35 CEST 2019, gatklite realn]
ERROR: command failed
/media/ubuntu/DATOS/SeqMule-master/bin/secondary/../../bin/secondary/worker /media/ubuntu/DATOS/SeqMule-master/bin/seqmule.08202019.3989.logs 6 "/media/ubuntu/DATOS/SeqMule-master/bin/secondary/../../bin/secondary/runGATKLITEREALN -advanced PRUEBA1.config -n 5 -ref TAIR10_Chr.all.fasta -java java -jmem 1750m -gatk /media/ubuntu/DATOS/SeqMule-master/exe/gatklite/GenomeAnalysisTKLite.jar -threads 1 -gatk-nt 2 -tmpdir /tmp  -bed /media/ubuntu/DATOS/SeqMule-master/bin/Arabth.bed -goldindel /media/ubuntu/DATOS/SeqMule-master/bin/secondary/../../database/Mills_and_1000G_gold_standard.indels.b37.vcf  -dbsnp /media/ubuntu/DATOS/SeqMule-master/bin/secondary/../../database/dbsnp_hg19_138.vcf  -samtools /media/ubuntu/DATOS/SeqMule-master/exe/samtools/samtools  -pl ILLUMINA  -bam PRUEBA1_result/PRUEBA1.0_bwamem.sort.readfiltered.bam -out PRUEBA1_result/PRUEBA1.0_bwamem.sort.readfiltered.realn.bam "
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
After fixing the problem, please execute 'cd //media/ubuntu/DATOS/SeqMule-master/bin' and 'seqmule run PRUEBA1.script' to resume analysis.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC://media/ubuntu/DATOS/SeqMule-master/bin$ perl seqmule pipeline -a Arabthcontrol_R1.fastq.gz -b Arabthcontrol_R2.fastq.gz -capture Arabth.bed -t 1 -prefix  PRUEBA1 --ref TAIR10_Chr.all.fasta -e

It tells also that the Arabth.bed file is not good.I will try to get another different bed file for Arabidopsis thaliana, but ,please, revise the output and tell me if I must change or correct something.
Thank you very much.Best.

Juan Manuel

El mar., 20 ago. 2019 a las 6:48, yunfei guo (<guoyun...@outlook.com>) escribió:
Hi Juan,

Please make sure TAIR10_Chr.all.fasta and the index files are in the same folder. They don’t necessarily have to be placed in database folder.

Best,
Yunfei

On Aug 19, 2019, at 5:40 AM, Juan Manuel Luque Sánchez <j616...@gmail.com> wrote:

Hi Yunfei,

When I try the pipeline you recommend me in this mail, I have the next output:

ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ cd /media/ubuntu/DATOS/SeqMule-master/bin
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:/media/ubuntu/DATOS/SeqMule-master/bin$ perl seqmule pipeline -a Arabthcontrol_R1.fastq.gz -b Arabthcontrol_R2.fastq.gz -capture Arabth.bed -m -advanced seqmule/misc/predefined_config/bwa_samtools.config -t 1 -prefix  PRUEBA1 --ref TAIR10_Chr.all.fasta
Current version: 1.2.3
Cannot open seqmule/misc/predefined_config/bwa_samtools.config: Not a directory

And, after, when I try another pipeline (a variant), the output is:
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:/media/ubuntu/DATOS/SeqMule-master/bin$ perl seqmule pipeline -a Arabthcontrol_R1.fastq.gz -b Arabthcontrol_R2.fastq.gz -capture Arabth.bed -m -advanced Seqmule-master/misc/predefined_config/bwa_samtools.config -t 1 -prefix  PRUEBA1 --ref TAIR10_Chr.all.fasta
Current version: 1.2.3
Cannot open Seqmule-master/misc/predefined_config/bwa_samtools.config: No such file or directory
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:/media/ubuntu/DATOS/SeqMule-master/bin$ cd
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ /media/ubuntu/DATOS/SeqMule-master
bash: /media/ubuntu/DATOS/SeqMule-master: Es un directorio
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ perl seqmule pipeline -a Arabthcontrol_R1.fastq.gz -b Arabthcontrol_R2.fastq.gz -capture Arabth.bed -m -advanced Seqmule-master/misc/predefined_config/bwa_samtools.config -t 1 -prefix  PRUEBA1 --ref TAIR10_Chr.all.fasta
Can't open perl script "seqmule": No existe el archivo o el directorio
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:~$ cd /media/ubuntu/DATOS/SeqMule-master/bin
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:/media/ubuntu/DATOS/SeqMule-master/bin$ perl seqmule pipeline -a Arabthcontrol_R1.fastq.gz -b Arabthcontrol_R2.fastq.gz -capture Arabth.bed -t 1 -prefix  PRUEBA1 --ref TAIR10_Chr.all.fastaCurrent version: 1.2.3
Reading configuration file...
Done
NOTICE: Commandline options will override advanced configuration.
NOTICE: Parsing global settings...
Done
NOTICE: checking contig(chromosome) name consistency in TAIR10_Chr.all.fasta and /media/ubuntu/DATOS/SeqMule-master/bin/Arabth.bed.
NOTICE: use --no-check-chr to skip chromosome name checking
NOTICE: contig names in capture file and reference match with each other.
NOTICE: Input BED file detected, only variants in corresponding regions will be generated.
CAUTION: You used your own reference file or index file, there is no guarantee that it will work with all programs.
ERROR: use -g for whole-genome data, -e for exome (or captured) sequencing data
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:/media/ubuntu/DATOS/SeqMule-master/bin$ perl seqmule pipeline -a Arabthcontrol_R1.fastq.gz -b Arabthcontrol_R2.fastq.gz -capture Arabth.bed -t 1 -prefix  PRUEBA1 --ref TAIR10_Chr.all.fasta -e
Current version: 1.2.3
Reading configuration file...
Done
NOTICE: Commandline options will override advanced configuration.
NOTICE: Parsing global settings...
Done
NOTICE: checking contig(chromosome) name consistency in TAIR10_Chr.all.fasta and /media/ubuntu/DATOS/SeqMule-master/bin/Arabth.bed.
NOTICE: use --no-check-chr to skip chromosome name checking
NOTICE: contig names in capture file and reference match with each other.
NOTICE: Input BED file detected, only variants in corresponding regions will be generated.
CAUTION: You used your own reference file or index file, there is no guarantee that it will work with all programs.
Checking Phred score scheme: PRUEBA1_result/PRUEBA1.0.fastq.gz PRUEBA1_result/PRUEBA1.1.fastq.gz
NOTICE: Analysis name: PRUEBA1
NOTICE: Input is exome (or captured) sequencing data
NOTICE: BED file used for caculating coverage statistics and extracting variants: Arabth.bed
NOTICE: Readgroup : READGROUP_PRUEBA1
NOTICE: Sequencing platform: ILLUMINA
NOTICE: Library : LIBRARY
NOTICE: Phred scoring scheme : 33
NOTICE: Reference genome build is hg19
NOTICE: dbsnp138 will be used for variant calling and recalibration.
NOTICE: Java memory usage is limited to 1750m
NOTICE: java executable path: java
NOTICE: Max number of processes: 1
NOTICE: /tmp will be used for storing temporary files
Generating script...
ERROR: bwamem index file(s) missing (TAIR10_Chr.all.fasta)
ubuntu@ubuntu-Compaq-CQ58-Notebook-PC:/media/ubuntu/DATOS/SeqMule-master/bin$

But however I got generate the bwa index files for TAIR10_Arabidopsis.fasta (and I put them inside of database folder).And also I used a bed file of Arabidopsis (Arabth.bed), as you can see in the pipeline.
Please Yunfei, what can I do now ?

Thank you very much for your help.Best regards.

Juan Manuel


El mar., 13 ago. 2019 a las 17:57, yunfei guo (<guoyun...@outlook.com>) escribió:
Hi Juan,

1) if you have a region of interest, i.e. a BED file for Arabidopsis, then use it, otherwise, you can use https://github.com/yunfeiguo/bioinfo_toolbox/blob/master/utilities/seq_related/fai2bed.pl to generate one from *.fai file.
2) you’ll need to generate index files for your reference genome, .e.g. bwa, bowtie2 etc. Then run analysis. A simple example would be

seqmule pipeline -a sample_lane1_R1.fq.gz,sample_lane2_R1.fq.gz -b sample_lane1_R2.fq.gz,sample_lane2_R2.fq.gz -capture your_own.bed -m -advanced seqmule/misc/predefined_config/bwa_samtools.config -quick -t 4 -prefix mySample --ref arabidopsis.fasta

Note, you need to replace the illustrative filenames with your own. For more, check out http://seqmule.openbioinformatics.org/en/latest/Manuals/pipeline/

Best,
Yunfei

> On Aug 13, 2019, at 2:18 AM, Juan Manuel Luque Sánchez <j616...@gmail.com> wrote:
>
> Hi Yunfei, how are you?
>
> I am Juan Manuel Luque (from CNB-CSIC at Madrid) and I write by this via/e-mail to ask you  about  SeqMule for Analysis of Arabidopsis thaliana (you remember it, we are trying it).I am doing the first proof to run SeqMule for Arabidopsis thaliana.I downloaded vcf.gz  files of Arabidopsis thaliana and TAIR10_Chr.all.fasta.gz files (for Arabidopsis thaliana) , both files as databases and reference genome (and I put both files inside of database folder of SeqMule).I have seen  in the beginning of process(pipeline)  that SeqMule use or tell us it will use BED FILES of hg19. The process (pipeline of SeqMule) about our first prooff is not over still.
>
> 1)Do you think  I have to get the bed files of Arabidopsis thaliana and put it into the database folder?. But ,in this case, what order or script I must write for  SeqMule  uses by default  the Arabidopsis bed file instead of the hg19 bed file (when start the run of pipeline).
>
> 2) What other new orders or scripts are necessaries in SeqMule for the analysis of Arabidopsis was correct, without problems?.For example I don't know if it is necessary to write an order or script for SeqMule uses by default the databases I downloaded of Arabidopsis (instead of the hg19 databases).
>
> Please if you could answer and help  us WITH DETAIL AND WITH ORDERS / SCRIPTS I asked you before (if it was necessary) ,I  will thank you very much.
> Thank you very much for your help and best regards
>
> Juan Manuel Luque



<TAIR10_Chr.all.dict>


Reply all
Reply to author
Forward
0 new messages