[maker-devel] errors in final gff

94 views
Skip to first unread message

Anurag Priyam

unread,
Jun 18, 2014, 1:45:20 PM6/18/14
to maker...@yandell-lab.org
Hi,

I compiled all annotations generated by MAKER into a single GFF file
using the gff3_merge script distributed with MAKER. While formatting
this GFF for use with JBrowse, I found a few errors:

1. Three instances where two features were assigned the same id.
2. One instance where a group of three subfeatures refer to a
non-existent parent.

Here is the relevant portion of the GFF file:
https://gist.github.com/yeban/ffaf5cd419639dd073a7

I worked around the issue temporarily for the job at hand, but I am
left wondering why would these errors creep in.

-- Priyam

_______________________________________________
maker-devel mailing list
maker...@box290.bluehost.com
http://box290.bluehost.com/mailman/listinfo/maker-devel_yandell-lab.org

Carson Holt

unread,
Jun 18, 2014, 2:12:07 PM6/18/14
to Anurag Priyam, maker...@yandell-lab.org
What MAKER version are you using?

--Carson

Carson Holt

unread,
Jun 18, 2014, 5:33:32 PM6/18/14
to Anurag Priyam, maker...@yandell-lab.org
Are you passing in old data via GFF3?

--Carson


On 6/18/14, 12:15 PM, "Anurag Priyam" <anurag0...@gmail.com> wrote:

>It's version 2.31.
>
>-- Priyam

Anurag Priyam

unread,
Jun 20, 2014, 5:11:48 PM6/20/14
to Carson Holt, maker...@yandell-lab.org
It's version 2.31.

-- Priyam

On Wed, Jun 18, 2014 at 11:41 PM, Carson Holt <cars...@gmail.com> wrote:

Carson Holt

unread,
Jun 20, 2014, 5:50:57 PM6/20/14
to Anurag Priyam, maker...@yandell-lab.org
did you use est_forward? Also in the example you showed all the IDs are
unique (one says hit and the other hsp in the ID, so they are different)?
Could you find the non-uunique IDs causing the error?

--Carson


On 6/19/14, 2:05 AM, "Anurag Priyam" <anurag0...@gmail.com> wrote:

>I used est_gff= option, which refers to a GFF file generated by
>cufflinks2gff3. The erroneous annotations didn't come from this GFF.
>
>-- Priyam

Carson Holt

unread,
Jun 20, 2014, 5:57:07 PM6/20/14
to Anurag Priyam, maker...@yandell-lab.org
Also note that ID= must be unique. Name= does not have to be, and won't be
if the same protein or repeat element aligns to more than one location for
example.

Thanks,
Carson

Anurag Priyam

unread,
Jun 24, 2014, 2:57:25 PM6/24/14
to Carson Holt, maker...@yandell-lab.org
I am sorry. I have updated the gist -
https://gist.github.com/yeban/ffaf5cd419639dd073a7.
1. The first two chunks contain the annotations with duplicate ids. (4 rows)
2. The last chunk contains the annotations that refer to a
non-existent parent. And what looks like an incomplete line of
annotation (I forgot to state this in my original email).

No, I didn't use est_forward. I am not passing in any old data via GFF3.

-- Priyam

Carson Holt

unread,
Jun 24, 2014, 4:05:21 PM6/24/14
to Anurag Priyam, maker...@yandell-lab.org
Thanks. For the first two --> scaffold00002:hit:1026:1.3.0.12

The value 1026 is held in a global iterator, so it cannot repeat the same
value during the life of the process. And 1.3.0.12 is generated from the
point in the code the ID is being generated. This means that two distinct
processses had to write to the same file at the same point in the code,
which should normally be impossible.

However, there are ways to make this happen. First if you turn file locks
off (-nolock) option and then run MAKER multiple times on the same dataset
you can get process collisions (because you disabled the locks that stop
this). If your NFS file system does not support hard links (FhGFS for
example) then you cannot lock the files (which is the same as setting
-nolock). Or you have other serious IO failures over NFS. Note that NFS
is your Network Mounted Storage.

The last example you give shows the preceding line being truncated. This
suggests that two processes are trying to write to the same file
simultaneously (inserting lines in between other lines), or serious IO
failures are occurring where writes are not completing but true is being
returned for the operations (can happen on unreliable NFS implementations).

So in summary either your NFS storage implementation is giving IO errors,
you have run MAKER with -nolock set and then started MAKER multiple times
in the same directory (process collisions), or your NFS implementation
doesn't support hardlinks and won't allow MAKER to lock files (process
collisions). If it is one of the latter two, you will have to make sure
you never start MAKER more than once simultaneously on the same dataset.
You can still run via MPI fro parallelization, but you won't be able to
start a second MPI process while the first one is still running.

Thanks,
Carson

Anurag Priyam

unread,
Jun 25, 2014, 5:12:05 PM6/25/14
to Carson Holt, maker...@yandell-lab.org
Mmm ... I didn't use -nolock option. But I did launch some 10 MAKER
processes in the same directory.

I feel it's unlikely that my file system doesn't allow hardlinks
because a few processes quit earlier than the others, saying something
to the tune of "Another MAKER process is processing this scaffold
already."

I remember one process in particular had _just_ crashed. I don't
remember how: I might have Ctrl-C'ed by mistake instead of detaching
screen? admin killed it? temporary system glitch? Could this have
caused the same issue?

-- Priyam

Carson Holt

unread,
Jun 25, 2014, 5:27:02 PM6/25/14
to Anurag Priyam, maker...@yandell-lab.org
Maybe if it died in a weird way some of the processes could have continued
briefly without active locks, but I'd more likely attribute this to NFS
weirdness. Because of how network storage works, some implementations
take shortcuts (like returning success on an IO operation even though it
has not completed and may even fail later on). Or an IO operation can be
buffered and completed several seconds later (the process that called the
write operation may not even be active anymore). This is extremely common
on NFS. You should probably just start MAKER fewer times in the same
directory on your system. You may also want to start a single MAKER job
(you should use MPI to parallelize it though), and use the -a flag. This
will cause that job just to just rebuild the current GFF3 and FASTA files.
That way you can clean up your current results without having to rerun
everything. It should run relatively quickly since MAKER will be able to
make use of the existing BLAST reports etc. that are already there
(exonerate will run again though, but it shouldn't take too long).

--Carson

Anurag Priyam

unread,
Jun 25, 2014, 5:38:59 PM6/25/14
to Carson Holt, maker...@yandell-lab.org
-a option looks like just the thing I need.

I will forward concerns about NFS to our IT team. And definitely use
MPI for parallelisation next time.

Thanks a lot :).

-- Priyam
Reply all
Reply to author
Forward
0 new messages