[maker-devel] MPI MAKER hanging NFS

427 views
Skip to first unread message

Heywood, Todd

unread,
May 14, 2013, 4:42:33 PM5/14/13
to maker...@yandell-lab.org
We have been getting hung NFS mounts on some nodes when running MPI MAKER (version 2.27). Processes go into a "D" state and cannot be killed. We end up having to reboot nodes to recover them. We are running MPICH2 version 1.4.1p1
with RHEL 6.3. Questions:

(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung on a sync_page system call under NFS. That *might* imply some locking issues.

(2) Has anyone else seen this?

(3) The root directory (parent of genome.maker.output directory) has lots of mpi***** files, all of which have the first line "pst0Process::MpiChunk". Is this expected?

I'm able to reproducibly hang NFS on some nodes when using at least 4 32-core nodes and 128 running MPI tasks.

Thanks,

Todd Heywood
CSHL


_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Carson Holt

unread,
May 14, 2013, 9:01:00 PM5/14/13
to Heywood, Todd, maker...@yandell-lab.org
No it does not use ROMIO.

The locking may be do to how your NFS is implemented. MAKER does a lot of
small writes. Some NFS implementations do not handle that well and only
like large infrequent writes and frequent reads?
MAKER also uses a variant of the File:::NFSLock module which uses
hardlinks to force a flush of the NFS IO cache when asyncrynous IO is
enabled (described here
http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html).
I know that the FhGFS implementation of NFS has broken hard link
functionality.


Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS
mounted location. It must be local (/tmp for example). This is because
certain types of operations are not always NFS safe and need a local
location to work with (anything involving berkley DB or SQLite for
example). Make sure you are not setting that to an NFS mounted scratch
location. The mpi**** files, are examples of some short lived files that
should not be in NFS. They hold chunks of data from threads that are
processing the genome and are very rapidly created and deleted. They will
be cleaned up automatically when maker finished or killed by standard
signals such as when you hit ^C or use kill 15.


Thanks,
Carson

Evan Ernst

unread,
May 15, 2013, 1:08:08 PM5/15/13
to Carson Holt, maker...@yandell-lab.org
Hi Carson,

For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected.

In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written. 

Thanks,
Evan



On Tue, May 14, 2013 at 9:01 PM, Carson Holt <Carso...@oicr.on.ca> wrote:
No it does not use ROMIO.

The locking may be do to how your NFS is implemented.  MAKER does a lot of
small writes.  Some NFS implementations do not handle that well and only
like large infrequent writes and frequent reads?
MAKER also uses a variant of the File:::NFSLock module which uses
hardlinks to force a flush of the NFS IO cache when asyncrynous IO is
enabled (described here
http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html).
I know that the FhGFS implementation of NFS has broken hard link
functionality.


Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS
mounted location.  It must be local (/tmp for example).  This is because
certain types of operations are not always NFS safe and need a local
location to work with (anything involving berkley DB or SQLite for
example).  Make sure you are not setting that to an NFS mounted scratch
location.  The mpi**** files, are examples of some short lived files that
should not be in NFS.  They hold chunks of data from threads that are
processing the genome and are very rapidly created and deleted.  They will
be cleaned up automatically when maker finished or killed by standard
signals such as when you hit ^C or use kill 15.


Thanks,
Carson




On 13-05-14 4:42 PM, "Heywood, Todd" <hey...@cshl.edu> wrote:

Carson Holt

unread,
May 15, 2013, 1:15:52 PM5/15/13
to Evan Ernst, maker...@yandell-lab.org
The mpi**** files should be generated in the $TMPDIR or TMP= location.  If they are happening in the working directory, then there is a problem.  If you are not setting TMP=, perhaps TMPDIR is not being exported when 'mpiexec' is launched.  You may have to manually specify that it needs to be exported to the other nodes using the mpiexec command line flags.  OpenMPI for example does not export all environmental variables by default to the other nodes.

Thanks,
Carson

Uma Maheswari

unread,
May 16, 2013, 12:08:43 PM5/16/13
to maker...@yandell-lab.org
Hi Carson,

When I was trying to load the Maker-2.27 results into ensembl, I found
that few hundreds of genes with 'duplicate exons' . When I looked in the
gff file, I found cases like this, where the exons are not actually
duplicated but have two Parents with same mRNA ID. This can be a
potential alternate transcript, attached to the same transcript by mistake?

Many thanks
Uma





3 maker gene 524271 525467 . - .
ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179
3 maker mRNA 524271 525467 . - .
ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_AED=0.50;_eAED=0.63;_QI=1476|0.33|0.75|1|0|0.25|4|0|406
3 maker exon 524271 524480 . - .
ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:573;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1
3 maker exon 524538 525182 . - .
ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:572;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1,augustus_masked-3-processed-gene-6.179-mRNA-1
3 maker exon 524271 525467 . - .
ID=augustus_masked-3-processed-gene-6.179-mRNA-1:exon:571;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1
3 maker CDS 524538 524903 . - 0
ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1
3 maker CDS 524538 525182 . - 0
ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1
3 maker CDS 524271 524480 . - 0
ID=augustus_masked-3-processed-gene-6.179-mRNA-1:cds;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1
3 maker five_prime_UTR 524271 525467 . - .
ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1
3 maker five_prime_UTR 524904 525182 . - .
ID=augustus_masked-3-processed-gene-6.179-mRNA-1:five_prime_utr;Parent=augustus_masked-3-processed-gene-6.179-mRNA-1

Carson Holt

unread,
May 16, 2013, 12:13:05 PM5/16/13
to Uma Maheswari, maker...@yandell-lab.org
I've had one other report of this on the devel list, but haven't gotten
data to test with. Do you have the run files that produced the duplicate
exon?

If so, cCould you send me theVoid directory for the contig that shows the
dulicate, and the maker_opts.ctl file?

Thanks,
Carson

Carson Holt

unread,
May 16, 2013, 12:25:36 PM5/16/13
to Uma Maheswari, maker...@yandell-lab.org
I think this also may be a result of using GFF3 pass-through. So if that
is the case, could you send me any GFF3 files you gave maker in addition
to the other files I asked for.

Thanks,
Carson



On 13-05-16 12:08 PM, "Uma Maheswari" <u...@ebi.ac.uk> wrote:

>Hi Carson,
>
>When I was trying to load the Maker-2.27 results into ensembl, I found
>that few hundreds of genes with 'duplicate exons' . When I looked in the
>gff file, I found cases like this, where the exons are not actually
>duplicated but have two Parents with same mRNA ID. This can be a
>potential alternate transcript, attached to the same transcript by
>mistake?
>
>Many thanks
>Uma
>
>
>
>
>
>3 maker gene 524271 525467 . - .
>ID=augustus_masked-3-processed-gene-6.179;Name=augustus_masked-3-processed
>-gene-6.179
>3 maker mRNA 524271 525467 . - .
>ID=augustus_masked-3-processed-gene-6.179-mRNA-1;Parent=augustus_masked-3-
>processed-gene-6.179;Name=augustus_masked-3-processed-gene-6.179-mRNA-1;_A

Daniel Hughes

unread,
May 16, 2013, 12:38:35 PM5/16/13
to Carson Holt, maker...@yandell-lab.org
hiya, are you using the same instance as michael at ebi as this sounds like the same problem he had last week and he wasn't running pass through. i've run 2.27 here 30+ times here and not seen this? is something very strange corrupted?

dan.

Daniel S. T. Hughes M.Biochem (Hons; Oxford), Ph.D (Cambridge)
-------------------------------------------------------------------------------------
ds...@cantab.net
ds...@cpan.org


2013/5/16 Carson Holt <cars...@gmail.com>

Heywood, Todd

unread,
May 17, 2013, 9:25:16 AM5/17/13
to Carson Holt, Ernst, Evan, maker...@yandell-lab.org
It appears that a kernel bug caused the NFS hang, at least for limlted scale testing (6 nodes, 192 tasks). I upgraded the kernel from 2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and cannot reproduce the hangs.

As far a TMPDIR, I'm not really sure I understand. We use SGE, and the TMPDIR we are referring to is set by SGE within a job to be /tmp/uge/JobID.TaskID.QueueName. Have you run via SGE?

Todd




From: Carson Holt <Carso...@oicr.on.ca<mailto:Carso...@oicr.on.ca>>
Date: Wednesday, May 15, 2013 1:15 PM
To: "Ernst, Evan" <eer...@cshl.edu<mailto:eer...@cshl.edu>>
Cc: Todd Heywood <hey...@cshl.edu<mailto:hey...@cshl.edu>>, "maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>" <maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>>
Subject: Re: [maker-devel] MPI MAKER hanging NFS

The mpi**** files should be generated in the $TMPDIR or TMP= location. If they are happening in the working directory, then there is a problem. If you are not setting TMP=, perhaps TMPDIR is not being exported when 'mpiexec' is launched. You may have to manually specify that it needs to be exported to the other nodes using the mpiexec command line flags. OpenMPI for example does not export all environmental variables by default to the other nodes.

Thanks,
Carson



From: Evan Ernst <eer...@cshl.edu<mailto:eer...@cshl.edu>>
Date: Wednesday, 15 May, 2013 1:08 PM
To: Carson Holt <carso...@oicr.on.ca<mailto:carso...@oicr.on.ca>>
Cc: "Heywood, Todd" <hey...@cshl.edu<mailto:hey...@cshl.edu>>, "maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>" <maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>>
Subject: Re: [maker-devel] MPI MAKER hanging NFS

Hi Carson,

For these runs, -TMP is set to the $TMPDIR environment variable via maker command line argument in the cluster job script to use the local disk on each node. We can see files being generated in those locations on each node, so it seems this is working as expected.

In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is relevant, but I'm also setting mpi_blastdb= to consolidate the databases onto a different, faster nfs mount than the working dir where the mpi**** files are being written.

Thanks,
Evan



On Tue, May 14, 2013 at 9:01 PM, Carson Holt <Carso...@oicr.on.ca<mailto:Carso...@oicr.on.ca>> wrote:
No it does not use ROMIO.

The locking may be do to how your NFS is implemented. MAKER does a lot of
small writes. Some NFS implementations do not handle that well and only
like large infrequent writes and frequent reads?
MAKER also uses a variant of the File:::NFSLock module which uses
hardlinks to force a flush of the NFS IO cache when asyncrynous IO is
enabled (described here
http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html).
I know that the FhGFS implementation of NFS has broken hard link
functionality.


Also make sure you do not set TMP= in the maker_opt.ctl file to an NFS
mounted location. It must be local (/tmp for example). This is because
certain types of operations are not always NFS safe and need a local
location to work with (anything involving berkley DB or SQLite for
example). Make sure you are not setting that to an NFS mounted scratch
location. The mpi**** files, are examples of some short lived files that
should not be in NFS. They hold chunks of data from threads that are
processing the genome and are very rapidly created and deleted. They will
be cleaned up automatically when maker finished or killed by standard
signals such as when you hit ^C or use kill 15.


Thanks,
Carson




On 13-05-14 4:42 PM, "Heywood, Todd" <hey...@cshl.edu<mailto:hey...@cshl.edu>> wrote:

>We have been getting hung NFS mounts on some nodes when running MPI MAKER
>(version 2.27). Processes go into a "D" state and cannot be killed. We
>end up having to reboot nodes to recover them. We are running MPICH2
>version 1.4.1p1
>with RHEL 6.3. Questions:
>
>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung
>on a sync_page system call under NFS. That *might* imply some locking
>issues.
>
>(2) Has anyone else seen this?
>
>(3) The root directory (parent of genome.maker.output directory) has lots
>of mpi***** files, all of which have the first line
>"pst0Process::MpiChunk". Is this expected?
>
>I'm able to reproducibly hang NFS on some nodes when using at least 4
>32-core nodes and 128 running MPI tasks.
>
>Thanks,
>
>Todd Heywood
>CSHL
>
>


_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com>
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Carson Holt

unread,
May 17, 2013, 9:40:50 AM5/17/13
to Heywood, Todd, Ernst, Evan, maker...@yandell-lab.org
I'm glad your getting better results.

With respect to environmental variables. One common error in MPI
execution is that the environment variables will not always be the same on
the other nodes since only the root node is attached to a terminal, so
variables in launch scripts (.bashrc etc.) may not be available on all
nodes. Many clusters that are part of the XSEDE network and use SGE for
example have scripts that wrap mpiexec to guarantee export of all
environmental variables when using MPI to avoid just this type of common
error. So like anything, you start with the most common cause of errors
and then work to the less common. Kernel bugs usually rank low on the
list :-) But I'm glad it's working for you now.

Thanks,
Carson

Evan Ernst

unread,
May 20, 2013, 4:36:38 PM5/20/13
to Carson Holt, maker...@yandell-lab.org, Heywood, Todd
Hi Carson,

The SGE launch script looks like this (sans SGE args):

mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1

Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers. 

Despite this, the mpi*** files are still being created in the working directory. 

If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case.

Thanks,
Evan




On Fri, May 17, 2013 at 9:40 AM, Carson Holt <Carso...@oicr.on.ca> wrote:
I'm glad your getting better results.

With respect to environmental variables.  One common error in MPI
execution is that the environment variables will not always be the same on
the other nodes since only the root node is attached to a terminal, so
variables in launch scripts (.bashrc etc.) may not be available on all
nodes.  Many clusters that are part of the XSEDE network and use SGE for
example have scripts that wrap mpiexec to guarantee export of all
environmental variables when using MPI to avoid just this type of common
error. So like anything, you start with the most common cause of errors
and then work to the less common.  Kernel bugs usually rank low on the
list :-) But I'm glad it's working for you now.

Thanks,
Carson





On 13-05-17 9:25 AM, "Heywood, Todd" <hey...@cshl.edu> wrote:

Screen Shot 2013-05-20 at 4.14.09 PM.png

Carson Holt

unread,
May 20, 2013, 7:50:28 PM5/20/13
to Evan Ernst, Carson Holt, maker...@yandell-lab.org, Heywood, Todd
Could you run the following command for me and share the ouptut with me?

mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"'

Thanks,
Carson



_______________________________________________ maker-devel mailing list maker...@box290.bluehost.com http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Evan Ernst

unread,
May 20, 2013, 8:20:22 PM5/20/13
to Carson Holt, Carson Holt, Heywood, Todd, maker...@yandell-lab.org
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory
/opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory


Todd, are these errors from the starter_with_limit.sh wrapper harmless?

Thanks,
Evan


On Mon, May 20, 2013 at 7:50 PM, Carson Holt <cars...@gmail.com> wrote:
Could you run the following command for me and share the ouptut with me?

mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"'
Thanks,
Carson



From: Evan Ernst <eer...@cshl.edu<mailto:eer...@cshl.edu>>
Date: Monday, 20 May, 2013 4:36 PM
To: Carson Holt <carso...@oicr.on.ca<mailto:carso...@oicr.on.ca>>
Cc: "maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>" <maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>>, "Heywood, Todd" <hey...@cshl.edu<mailto:hey...@cshl.edu>>
Subject: Re: [maker-devel] MPI MAKER hanging NFS

Hi Carson,

><hey...@cshl.edu<mailto:hey...@cshl.edu><mailto:hey...@cshl.edu<mailto:hey...@cshl.edu>>> wrote:
>
>>We have been getting hung NFS mounts on some nodes when running MPI MAKER
>>(version 2.27). Processes go into a "D" state and cannot be killed. We
>>end up having to reboot nodes to recover them. We are running MPICH2
>>version 1.4.1p1
>>with RHEL 6.3. Questions:
>>
>>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung
>>on a sync_page system call under NFS. That *might* imply some locking
>>issues.
>>
>>(2) Has anyone else seen this?
>>
>>(3) The root directory (parent of genome.maker.output directory) has lots
>>of mpi***** files, all of which have the first line
>>"pst0Process::MpiChunk". Is this expected?
>>
>>I'm able to reproducibly hang NFS on some nodes when using at least 4
>>32-core nodes and 128 running MPI tasks.
>>
>>Thanks,
>>
>>Todd Heywood
>>CSHL
>>
>>
>
>
>_______________________________________________
>maker-devel mailing list

Carson Holt

unread,
May 20, 2013, 8:38:41 PM5/20/13
to Heywood, Todd, Ernst, Evan, Carson Holt, maker...@yandell-lab.org
It may have just been a random failure. Try launching it again.
Basically one instance failed to launch hydra_pmi_proxy which wraps the
command being called via mpiexec. So you get 7 lines of output instead of
the 8 that should be there.

--Carson


On 13-05-20 8:33 PM, "Heywood, Todd" <hey...@cshl.edu> wrote:

>All starter_with_limit.sh does is set a ulimit for the top process for
>the job, then start it passing all parameters:
>
>#!/bin/sh
>ulimit -c 0
>exec $*
>
>
>From: Evan Ernst <eer...@cshl.edu<mailto:eer...@cshl.edu>>
>Date: Monday, May 20, 2013 8:20 PM
>To: Carson Holt <cars...@gmail.com<mailto:cars...@gmail.com>>
>Cc: Carson Holt <Carso...@oicr.on.ca<mailto:Carso...@oicr.on.ca>>,
>"maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>"
><maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>>, Todd
>Heywood <hey...@cshl.edu<mailto:hey...@cshl.edu>>
>Subject: Re: [maker-devel] MPI MAKER hanging NFS
>
>"maker...@yandell-lab.org<mailto:maker...@yandell-lab.org><mailto:ma
>ker-...@yandell-lab.org<mailto:maker...@yandell-lab.org>>"
>ker-...@yandell-lab.org<mailto:maker...@yandell-lab.org>>>,
>"Heywood, Todd"
><hey...@cshl.edu<mailto:hey...@cshl.edu><mailto:hey...@cshl.edu<mailto:
>hey...@cshl.edu>>>
>Subject: Re: [maker-devel] MPI MAKER hanging NFS
>
>Hi Carson,
>
>The SGE launch script looks like this (sans SGE args):
>
>mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl
>maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1
>
>Snooping on the running jobs (see attached image), it looks like $TMPDIR
>is evaluated to a local directory by the shell of the MPI master node as
>intended, so the evaluated path, not the env var reference, is being
>passed to the MPI workers.
>
>Despite this, the mpi*** files are still being created in the working
>directory.
>
>If I understand correctly, these mpi*** files are meant to be written to
>the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg),
>which should be equivalent, but this doesn't seem to be the case.
>
>Thanks,
>Evan
>
>
>
>
>On Fri, May 17, 2013 at 9:40 AM, Carson Holt
><Carso...@oicr.on.ca<mailto:Carso...@oicr.on.ca><mailto:Carson.Holt@
>oicr.on.ca<mailto:Carso...@oicr.on.ca>>> wrote:
>I'm glad your getting better results.
>
>With respect to environmental variables. One common error in MPI
>execution is that the environment variables will not always be the same on
>the other nodes since only the root node is attached to a terminal, so
>variables in launch scripts (.bashrc etc.) may not be available on all
>nodes. Many clusters that are part of the XSEDE network and use SGE for
>example have scripts that wrap mpiexec to guarantee export of all
>environmental variables when using MPI to avoid just this type of common
>error. So like anything, you start with the most common cause of errors
>and then work to the less common. Kernel bugs usually rank low on the
>list :-) But I'm glad it's working for you now.
>
>Thanks,
>Carson
>
>
>
>
>
>On 13-05-17 9:25 AM, "Heywood, Todd"
><hey...@cshl.edu<mailto:hey...@cshl.edu><mailto:hey...@cshl.edu<mailto:
>hey...@cshl.edu>>> wrote:
>
>>It appears that a kernel bug caused the NFS hang, at least for limlted
>>scale testing (6 nodes, 192 tasks). I upgraded the kernel from
>>2.6.32-279.9.1.el6.x86_64 to 2.6.32-358.6.1.el6.x86_64 on 6 nodes and
>>cannot reproduce the hangs.
>>
>>As far a TMPDIR, I'm not really sure I understand. We use SGE, and the
>>TMPDIR we are referring to is set by SGE within a job to be
>>/tmp/uge/JobID.TaskID.QueueName. Have you run via SGE?
>>
>>Todd
>>
>>
>>
>>
>>From: Carson Holt
>><Carso...@oicr.on.ca<mailto:Carso...@oicr.on.ca><mailto:Carson.Holt
>>@oicr.on.ca<mailto:Carso...@oicr.on.ca>><mailto:Carso...@oicr.on.ca
>><mailto:Carso...@oicr.on.ca><mailto:Carso...@oicr.on.ca<mailto:Cars
>>on....@oicr.on.ca>>>>
>>Date: Wednesday, May 15, 2013 1:15 PM
>>To: "Ernst, Evan"
>><eer...@cshl.edu<mailto:eer...@cshl.edu><mailto:eer...@cshl.edu<mailto:ee
>>rn...@cshl.edu>><mailto:eer...@cshl.edu<mailto:eer...@cshl.edu><mailto:eer
>>n...@cshl.edu<mailto:eer...@cshl.edu>>>>
>>Cc: Todd Heywood
>><hey...@cshl.edu<mailto:hey...@cshl.edu><mailto:hey...@cshl.edu<mailto
>>:hey...@cshl.edu>><mailto:hey...@cshl.edu<mailto:hey...@cshl.edu><mail
>>aker-...@yandell-lab.org<mailto:maker...@yandell-lab.org>><mailto:ma
>>ker-...@yandell-lab.org<mailto:maker...@yandell-lab.org><mailto:make
>>r-d...@yandell-lab.org<mailto:maker...@yandell-lab.org>>>"
>>aker-...@yandell-lab.org<mailto:maker...@yandell-lab.org>><mailto:ma
>>ker-...@yandell-lab.org<mailto:maker...@yandell-lab.org><mailto:make
>>r-d...@yandell-lab.org<mailto:maker...@yandell-lab.org>>>>
>>Subject: Re: [maker-devel] MPI MAKER hanging NFS
>>
>>The mpi**** files should be generated in the $TMPDIR or TMP= location.
>>If they are happening in the working directory, then there is a problem.
>>If you are not setting TMP=, perhaps TMPDIR is not being exported when
>>'mpiexec' is launched. You may have to manually specify that it needs to
>>be exported to the other nodes using the mpiexec command line flags.
>>OpenMPI for example does not export all environmental variables by
>>default to the other nodes.
>>
>>Thanks,
>>Carson
>>
>>
>>
>>From: Evan Ernst
>><eer...@cshl.edu<mailto:eer...@cshl.edu><mailto:eer...@cshl.edu<mailto:ee
>>rn...@cshl.edu>><mailto:eer...@cshl.edu<mailto:eer...@cshl.edu><mailto:eer
>>n...@cshl.edu<mailto:eer...@cshl.edu>>>>
>>Date: Wednesday, 15 May, 2013 1:08 PM
>>To: Carson Holt
>><carso...@oicr.on.ca<mailto:carso...@oicr.on.ca><mailto:carson.holt
>>@oicr.on.ca<mailto:carso...@oicr.on.ca>><mailto:carso...@oicr.on.ca
>><mailto:carso...@oicr.on.ca><mailto:carso...@oicr.on.ca<mailto:cars
>>on....@oicr.on.ca>>>>
>>Cc: "Heywood, Todd"
>><hey...@cshl.edu<mailto:hey...@cshl.edu><mailto:hey...@cshl.edu<mailto
>>:hey...@cshl.edu>><mailto:hey...@cshl.edu<mailto:hey...@cshl.edu><mail
>>aker-...@yandell-lab.org<mailto:maker...@yandell-lab.org>><mailto:ma
>>ker-...@yandell-lab.org<mailto:maker...@yandell-lab.org><mailto:make
>>r-d...@yandell-lab.org<mailto:maker...@yandell-lab.org>>>"
>>aker-...@yandell-lab.org<mailto:maker...@yandell-lab.org>><mailto:ma
>>ker-...@yandell-lab.org<mailto:maker...@yandell-lab.org><mailto:make
>>r-d...@yandell-lab.org<mailto:maker...@yandell-lab.org>>>>
>>Subject: Re: [maker-devel] MPI MAKER hanging NFS
>>
>>Hi Carson,
>>
>>For these runs, -TMP is set to the $TMPDIR environment variable via maker
>>command line argument in the cluster job script to use the local disk on
>>each node. We can see files being generated in those locations on each
>>node, so it seems this is working as expected.
>>
>>In maker_opts.ctl, I commented out the TMP line. I'm not sure if this is
>>relevant, but I'm also setting mpi_blastdb= to consolidate the databases
>>onto a different, faster nfs mount than the working dir where the mpi****
>>files are being written.
>>
>>Thanks,
>>Evan
>>
>>
>>
>>On Tue, May 14, 2013 at 9:01 PM, Carson Holt
>><Carso...@oicr.on.ca<mailto:Carso...@oicr.on.ca><mailto:Carson.Holt
>><mailto:Carso...@oicr.on.ca><mailto:Carso...@oicr.on.ca<mailto:Cars
>>:hey...@cshl.edu>><mailto:hey...@cshl.edu<mailto:hey...@cshl.edu><mail
>>to:hey...@cshl.edu<mailto:hey...@cshl.edu>>>> wrote:
>>
>>>We have been getting hung NFS mounts on some nodes when running MPI
>>>MAKER
>>>(version 2.27). Processes go into a "D" state and cannot be killed. We
>>>end up having to reboot nodes to recover them. We are running MPICH2
>>>version 1.4.1p1
>>>with RHEL 6.3. Questions:
>>>
>>>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung
>>>on a sync_page system call under NFS. That *might* imply some locking
>>>issues.
>>>
>>>(2) Has anyone else seen this?
>>>
>>>(3) The root directory (parent of genome.maker.output directory) has
>>>lots
>>>of mpi***** files, all of which have the first line
>>>"pst0Process::MpiChunk". Is this expected?
>>>
>>>I'm able to reproducibly hang NFS on some nodes when using at least 4
>>>32-core nodes and 128 running MPI tasks.
>>>
>>>Thanks,
>>>
>>>Todd Heywood
>>>CSHL
>>>
>>>
>>
>>
>>_______________________________________________
>>maker-devel mailing list
>>maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com><m
>>ailto:maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.
>>com>><mailto:maker...@box290.bluehost.com<mailto:maker...@box290.bl
>>uehost.com><mailto:maker...@box290.bluehost.com<mailto:maker-devel@box
>>290.bluehost.com>>>
>>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>>
>
>
>_______________________________________________ maker-devel mailing list
>maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com><ma
>ilto:maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.co
>m>>
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>



_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Heywood, Todd

unread,
May 20, 2013, 8:33:32 PM5/20/13
to Ernst, Evan, Carson Holt, Carson Holt, maker...@yandell-lab.org
All starter_with_limit.sh does is set a ulimit for the top process for the job, then start it passing all parameters:

#!/bin/sh
ulimit -c 0
exec $*


From: Evan Ernst <eer...@cshl.edu<mailto:eer...@cshl.edu>>
Date: Monday, May 20, 2013 8:20 PM
To: Carson Holt <cars...@gmail.com<mailto:cars...@gmail.com>>
Cc: Carson Holt <Carso...@oicr.on.ca<mailto:Carso...@oicr.on.ca>>, "maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>" <maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>>, Todd Heywood <hey...@cshl.edu<mailto:hey...@cshl.edu>>
Subject: Re: [maker-devel] MPI MAKER hanging NFS

/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory
/opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory


Todd, are these errors from the starter_with_limit.sh wrapper harmless?

Thanks,
Evan


On Mon, May 20, 2013 at 7:50 PM, Carson Holt <cars...@gmail.com<mailto:cars...@gmail.com>> wrote:
Could you run the following command for me and share the ouptut with me?

mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"'

Thanks,
Carson



From: Evan Ernst <eer...@cshl.edu<mailto:eer...@cshl.edu><mailto:eer...@cshl.edu<mailto:eer...@cshl.edu>>>
Date: Monday, 20 May, 2013 4:36 PM
To: Carson Holt <carso...@oicr.on.ca<mailto:carso...@oicr.on.ca><mailto:carso...@oicr.on.ca<mailto:carso...@oicr.on.ca>>>
Cc: "maker...@yandell-lab.org<mailto:maker...@yandell-lab.org><mailto:maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>>" <maker...@yandell-lab.org<mailto:maker...@yandell-lab.org><mailto:maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>>>, "Heywood, Todd" <hey...@cshl.edu<mailto:hey...@cshl.edu><mailto:hey...@cshl.edu<mailto:hey...@cshl.edu>>>
Subject: Re: [maker-devel] MPI MAKER hanging NFS

Hi Carson,

The SGE launch script looks like this (sans SGE args):

mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1

Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers.

Despite this, the mpi*** files are still being created in the working directory.

If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case.

Thanks,
Evan




><hey...@cshl.edu<mailto:hey...@cshl.edu><mailto:hey...@cshl.edu<mailto:hey...@cshl.edu>><mailto:hey...@cshl.edu<mailto:hey...@cshl.edu><mailto:hey...@cshl.edu<mailto:hey...@cshl.edu>>>> wrote:
>
>>We have been getting hung NFS mounts on some nodes when running MPI MAKER
>>(version 2.27). Processes go into a "D" state and cannot be killed. We
>>end up having to reboot nodes to recover them. We are running MPICH2
>>version 1.4.1p1
>>with RHEL 6.3. Questions:
>>
>>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung
>>on a sync_page system call under NFS. That *might* imply some locking
>>issues.
>>
>>(2) Has anyone else seen this?
>>
>>(3) The root directory (parent of genome.maker.output directory) has lots
>>of mpi***** files, all of which have the first line
>>"pst0Process::MpiChunk". Is this expected?
>>
>>I'm able to reproducibly hang NFS on some nodes when using at least 4
>>32-core nodes and 128 running MPI tasks.
>>
>>Thanks,
>>
>>Todd Heywood
>>CSHL
>>
>>
>
>
>_______________________________________________
>maker-devel mailing list
>maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com><mailto:maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com>><mailto:maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com><mailto:maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com>>>
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


_______________________________________________ maker-devel mailing list maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com><mailto:maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Heywood, Todd

unread,
May 20, 2013, 8:34:48 PM5/20/13
to Ernst, Evan, Carson Holt, Carson Holt, maker...@yandell-lab.org
Actually, line 4 is the exec (one line is commented out):

#!/bin/sh
ulimit -c 0
#ulimit -n 262144
exec $*


From: Evan Ernst <eer...@cshl.edu<mailto:eer...@cshl.edu>>
Date: Monday, May 20, 2013 8:20 PM
To: Carson Holt <cars...@gmail.com<mailto:cars...@gmail.com>>
Cc: Carson Holt <Carso...@oicr.on.ca<mailto:Carso...@oicr.on.ca>>, "maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>" <maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>>, Todd Heywood <hey...@cshl.edu<mailto:hey...@cshl.edu>>
Subject: Re: [maker-devel] MPI MAKER hanging NFS

/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/tmp/uge/1031236.1.primary.q
/opt/uge/default/common/starter_with_limit.sh: line 4: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": No such file or directory
/opt/uge/default/common/starter_with_limit.sh: line 4: exec: /sonas-hs/it/hpc/data/eernst/maker_carson_debug/"/opt/hpc/lib64/mpich2/bin/hydra_pmi_proxy": cannot execute: No such file or directory


Todd, are these errors from the starter_with_limit.sh wrapper harmless?

Thanks,
Evan


On Mon, May 20, 2013 at 7:50 PM, Carson Holt <cars...@gmail.com<mailto:cars...@gmail.com>> wrote:
Could you run the following command for me and share the ouptut with me?

mpiexec -n 8 perl -e 'use File::Spec; print File::Spec->tmpdir()."\n"'

Thanks,
Carson



From: Evan Ernst <eer...@cshl.edu<mailto:eer...@cshl.edu><mailto:eer...@cshl.edu<mailto:eer...@cshl.edu>>>
Date: Monday, 20 May, 2013 4:36 PM
To: Carson Holt <carso...@oicr.on.ca<mailto:carso...@oicr.on.ca><mailto:carso...@oicr.on.ca<mailto:carso...@oicr.on.ca>>>
Cc: "maker...@yandell-lab.org<mailto:maker...@yandell-lab.org><mailto:maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>>" <maker...@yandell-lab.org<mailto:maker...@yandell-lab.org><mailto:maker...@yandell-lab.org<mailto:maker...@yandell-lab.org>>>, "Heywood, Todd" <hey...@cshl.edu<mailto:hey...@cshl.edu><mailto:hey...@cshl.edu<mailto:hey...@cshl.edu>>>
Subject: Re: [maker-devel] MPI MAKER hanging NFS

Hi Carson,

The SGE launch script looks like this (sans SGE args):

mpiexec -n 8 maker -TMP $TMPDIR maker_opts.ctl maker_bopts.ctl maker_exe.ctl >>logs/final.$SGE_TASK_ID.mpi.log 2>&1

Snooping on the running jobs (see attached image), it looks like $TMPDIR is evaluated to a local directory by the shell of the MPI master node as intended, so the evaluated path, not the env var reference, is being passed to the MPI workers.

Despite this, the mpi*** files are still being created in the working directory.

If I understand correctly, these mpi*** files are meant to be written to the directory given by TMP= (maker_opts.ctl) or -TMP (command line arg), which should be equivalent, but this doesn't seem to be the case.

Thanks,
Evan




><hey...@cshl.edu<mailto:hey...@cshl.edu><mailto:hey...@cshl.edu<mailto:hey...@cshl.edu>><mailto:hey...@cshl.edu<mailto:hey...@cshl.edu><mailto:hey...@cshl.edu<mailto:hey...@cshl.edu>>>> wrote:
>
>>We have been getting hung NFS mounts on some nodes when running MPI MAKER
>>(version 2.27). Processes go into a "D" state and cannot be killed. We
>>end up having to reboot nodes to recover them. We are running MPICH2
>>version 1.4.1p1
>>with RHEL 6.3. Questions:
>>
>>(1) Does MPI MAKER use MPI-IO (ROMIO)? The state "D" processes are hung
>>on a sync_page system call under NFS. That *might* imply some locking
>>issues.
>>
>>(2) Has anyone else seen this?
>>
>>(3) The root directory (parent of genome.maker.output directory) has lots
>>of mpi***** files, all of which have the first line
>>"pst0Process::MpiChunk". Is this expected?
>>
>>I'm able to reproducibly hang NFS on some nodes when using at least 4
>>32-core nodes and 128 running MPI tasks.
>>
>>Thanks,
>>
>>Todd Heywood
>>CSHL
>>
>>
>
>
>_______________________________________________
>maker-devel mailing list
>maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com><mailto:maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com>><mailto:maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com><mailto:maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com>>>
>http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org
>


_______________________________________________ maker-devel mailing list maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com><mailto:maker...@box290.bluehost.com<mailto:maker...@box290.bluehost.com>> http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org


_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Sea...@csiro.au

unread,
May 21, 2013, 3:36:37 AM5/21/13
to maker...@yandell-lab.org

Hi Carson,

 

We are currently working on the annotation of Helicoverpa genome project. Maker has been chosen as the preliminary tool for the task. By checking the annotation results by using maker 2.10, we saw some loci have the fusion problem: two separate neighbour genes are likely to be fused together and regarded as a single candidate output by maker. If we go further by looking at the outputs from each individual de novo algorithm, e.g. augustus or snap, the prediction was correct.  We are also using RNA-Seq assembly from cufflinks and some protein evidence data from closely related insects.  

 

We noticed that the parameters “pred_flank” in maker v2.10 and “correct_est_fusion” in maker v2.27 might be useful for maker to decide when to merge models or not.  If possible, can you please explain what these two parameters can do with the predicted genes, RNA-Seq and protein evidence?

 

Also, our current plan is  to install maker 2.27,  train the algorithms to predict UTRs, enlarge the protein evidence datasets and input our previous annotations as model_gff. We are facing with an critical question: in which way we could effectively improve the gene fusing problem? 1) setting the pred_flank lower than 100? 2) turn the correct_est_fusion on? 3) anything else?  

 

Thank you.

 

With best regards,

Xi (Sean) Li, Ph. D.

Bioinformatics Analyst, Bioinformatics Core,
CSIRO Mathematics, Informatics and Statistics
Phone: +61 2 6216 7138
Address: GPO Box 664, Canberra, ACT 2601

 

 

 

 

 

Barry Moore

unread,
May 21, 2013, 7:54:40 PM5/21/13
to Sea...@csiro.au, maker...@yandell-lab.org
Hi Sean,

I think you want to be careful with dropping the pred_flank parameter too low.  This controls how much flanking sequence (for a given cluster of evidence) MAKER will pass to the gene predictor.  Some (maybe all?) of the gene predictors have an initial state in their HMM for intergenic sequence and if you do not have some intergenic sequence for them to consider first they can't transition to their next state.  The correct_est_fusion option can help (at the cost of losing some UTR annotations) - Carson will likely give you a better description of the intricacies of the correct_est_fusion.

Don't know how you are assembling your RNASeq, but there is an option in Trinity - I forget the name - that will instruct Trinity to be more restrictive in merging neighboring clusters of reads into a longer transcript and this can help as well.

B

On May 21, 2013, at 1:36 AM, <Sea...@csiro.au>
 wrote:

_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Barry Moore
Research Scientist
Dept. of Human Genetics
University of Utah
Salt Lake City, UT 84112
--------------------------------------------




Carson Holt

unread,
May 21, 2013, 8:59:09 PM5/21/13
to Barry Moore, Sea...@csiro.au, maker...@yandell-lab.org
Yes.  Barry gave a good overview.  The correct_est_fusion option basically clips UTR when there are two neighboring genes that only overlap in the UTR (so you still get both gene models).  Since the primary effect of falsely merged mRNA-seq is overly long UTR this tends to fix many cases.  Of course avoiding merging the mRNA-seq reads in the first place also works.  So using Trinity's extra options to control that together with the correct_est_option option in MAKER is probably the way to go.

I think you can lower pred_flank to 100, but below that you might start to get weird behavior from the gene predictors (they need some upstream and downstream sequence or the HMMs don't work well).

Thanks,
Carson

Sea...@csiro.au

unread,
May 21, 2013, 9:23:48 PM5/21/13
to cars...@gmail.com, barry...@genetics.utah.edu, maker...@yandell-lab.org

Thanks Barry and Carson for your detailed explanation. Now I have a better understand of “pred_flank”.

 

1.       To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads?  

2.       If my understanding is correct, the “correct_est_fusion” parameter needs to be turned off when we don’t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model?  

 

Regards,

Sean

Carson Holt

unread,
May 21, 2013, 9:39:01 PM5/21/13
to Sea...@csiro.au, barry...@genetics.utah.edu, maker...@yandell-lab.org
One more time, but I fixed a few obvious spelling errors -->

1.       To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads?  

No.  Trinity would probably be a better approach to avoid merging.


2.       If my understanding is correct, the “correct_est_fusion” parameter needs to be turned off when we don’t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model?  

MAKER will always try to add UTR if the EST evidence suggests it.  Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene.  The long UTRs that can result from mRNA-seq are often false.  You are basically expanding the UTR by assembling into exons from the neighboring gene.  This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more gene loci (you lose the true UTR boundaries).

--Carson

Carson Holt

unread,
May 21, 2013, 9:37:02 PM5/21/13
to Sea...@csiro.au, barry...@genetics.utah.edu, maker...@yandell-lab.org
1.       To run maker, we use transcripts produced by tophat+cufflink approach instead of de novo trinity. Will it avoid the possible merging of RNA-Seq reads?  

No.  Trinity would probably be a better approach to avoid merging.


2.       If my understanding is correct, the “correct_est_fusion” parameter needs to be turned off when we don’t ask Maker/prediction algorithms to predict UTRs? Also, it makes me wonder, in such case, when Maker turn off UTRs, but our RNA-Seq data has got long UTRs, will these UTRs been added into the maker gene model?  

MAKER will always try to add UTR if the EST evidence suggests it.  Technically it's a little bit more than that, it can also add missing exons and extend CDS. The correct_est_fusion, just causes it to clip really long UTR if it looks like it was added due to merged evidence, and is probably not really a contiguous part of the gene.  The long UTRs that can result from mRNA-seq are often false.  You are basically expending the UTR by assembling into exons from the neighboring gene.  This is especially common in organisms like fungi where UTR of neighboring genes often overlap, and mRNA-seq assemblies falsely make it look like one transcript encompasses 1, 2 , or more genes loci (you loose the true UTR boundaries).

--Carson





From: <Sea...@csiro.au>
Date: Tuesday, 21 May, 2013 9:23 PM
To: Carson Holt <cars...@gmail.com>, Barry Moore <barry...@genetics.utah.edu>
Cc: <maker...@yandell-lab.org>

Sea...@csiro.au

unread,
May 21, 2013, 10:23:26 PM5/21/13
to cars...@gmail.com, barry...@genetics.utah.edu, maker...@yandell-lab.org

Thank you Carson.  It has been a very helpful conversation with you!  I will pass these information back to our group.

 

Best regards,

Barry Moore

unread,
May 21, 2013, 11:37:30 PM5/21/13
to Sea...@csiro.au, maker-devel@yandell-lab.org List
Sean,

The Trinity option to manage fusion transcripts is --jaccard_clip and is described here:


Trinity has also added functionality to use a hybrid reference-guided/de-novo assembly approach which you might also consider:


B
Reply all
Reply to author
Forward
0 new messages