ginfo error

21 views
Skip to first unread message

Wagner Magalhães

unread,
Jan 20, 2012, 1:04:33 PM1/20/12
to glu-users
Dear Glu-users

Does anybody know how to solve this error? I have tried to exclude
this loci, but it didn't work too.

Thanks,

Wagner



[wagner@servidorldgh hapmap]$ /opt/glu-1.0b2-prerelease-Linux_x86-84/
bin/glu -v ginfo -f hapmap genotypes_chr11_CEU_r28_nr.b36_fwd.txt.gz --
outputloci=hapmap_chr11_CEU_snps.lst
GLU (unknown) module: ginfo
Copyright (c) 2007-2009, BioInformed LLC and the U.S. Department of
Health & Human Services. Funded by NCI under Contract N01-CO-12400.

Materializing genotypes.

Well, that could have gone better.

Execution aborted due to a problem with the program input, parameters
supplied, an error in the program. Please examine the following
failure
trace for clues as to what may have gone wrong. When in doubt, please
send
this message and a complete description of the analysis you are
attempting
to perform to the software developers.

Error: Locus record rs2155163 merging incompatible locations
(126029459 != 126029461)

Command line:
glu ginfo -f hapmap genotypes_chr11_CEU_r28_nr.b36_fwd.txt.gz --
excludeloci=chr11.txt --outputloci=hapmap_chr11_CEU_snps.lst

Well, this is embarrassing.

Traceback: Traceback (most recent call last):
File "glu/lib/glu_launcher.py", line 212, in main
File "glu/modules/ginfo.py", line 114, in main
File "glu/modules/ginfo.py", line 50, in ginfo
File "glu/lib/genolib/streams.py", line 1248, in materialize
File "glu/lib/genolib/streams.py", line 3911, in _filter
File "glu/lib/genolib/streams.py", line 1719, in _check_unique
File "glu/lib/genolib/encode.py", line 693, in _encode
File "glu/lib/genolib/formats/hapmap.py", line 118, in
_load_hapmap
File "glu/lib/genolib/locus.py", line 139, in merge_locus
ValueError: Locus record rs2155163 merging incompatible locations
(126029459 != 126029461)


[Fri Jan 20 15:50:25 2012] Execution aborted due to a fatal error

gzip: stdout: Broken pipe

Jacobs, Kevin (NIH/NCI) [C]

unread,
Jan 21, 2012, 11:00:08 AM1/21/12
to glu-...@googlegroups.com
Hi Wagner,

The problem appears to be that there is a duplicated locus with the same name and multiple genome mappings. Unfortunately, GLU doesn't handle this condition well and complains, as you've seen. I have a few experimental fixes, but it is not clear what the correct behavior should be. Obviously, being able to exclude the locus prior to checking for name and position uniqueness are potential workarounds.

Thoughts?

-Kevin

________________________________________
From: Wagner Magalhães [wcsmag...@gmail.com]
Sent: Friday, January 20, 2012 01:04 PM
To: glu-users
Subject: [glu-users] ginfo error

Dear Glu-users

Thanks,

Wagner

Materializing genotypes.

Well, this is embarrassing.

gzip: stdout: Broken pipe

--
You received this message because you are subscribed to the Google Groups "glu-users" group.
To post to this group, send email to glu-...@googlegroups.com.
To unsubscribe from this group, send email to glu-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/glu-users?hl=en.

Nicholas Orr

unread,
Jan 23, 2012, 5:03:19 AM1/23/12
to glu-...@googlegroups.com
Hi Kevin,

What about a module to check the genotype file for these kind of inconsistencies, that produces an out file file of the problematic loci and perhaps a reason for flagging them? That way the user could quickly exclude markers and complete analyses of good loci, but would have a reference list for the duplicates, multiple hits etc that could be investigated.

Nick
________________________________________
From: glu-...@googlegroups.com [glu-...@googlegroups.com] On Behalf Of Jacobs, Kevin (NIH/NCI) [C] [jaco...@mail.nih.gov]
Sent: 21 January 2012 16:00
To: glu-...@googlegroups.com
Subject: RE: [glu-users] ginfo error

Hi Wagner,

Thoughts?

-Kevin

Dear Glu-users

Thanks,

Wagner

Materializing genotypes.

Well, this is embarrassing.

gzip: stdout: Broken pipe


The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer and network.

Jacobs, Kevin (NIH/NCI) [C]

unread,
Jan 23, 2012, 8:11:53 AM1/23/12
to glu-...@googlegroups.com

Hi Nick,

 

The problem with that approach is that GLU has a very specific, perhaps overly specific, and unforgiving concept of a "genotype file".  The file format parsing portion of GLU consists of "plug-ins" for many different formats, but applies essentially the same infrastructure to build data structures for use by any analysis.  The advantage to this approach is that all genotype file formats benefit from equal levels of support and capabilities, but that is also a downside where all file formats also share the same limitations.

 

This means we have to fix this problem globally.

 

What should GLU do when:

 

1.      it finds more than one SNP with the same name?

2.      it finds more than one SNP at a single genomic coordinate?

3.      multiple input files have SNPs with unambiguously reverse-complemented alleles?

4.      multiple input files have SNPs with incompatible alleles (too many, non-reverse complement compatible, ambiguous reverse-complement, etc.)?

 

Some options:

a.      Is it enough to report on one error and then stop?

b.      Should GLU report all errors and then stop?

c.      Should GLU be able to issue warnings, but ignore errors and continue processing the remaining data (for some of the above or all of the above errors)?

 

Option (a) is essentially what GLU does today.  I have preliminary support for options (b)  in the code, but it isn’t yet user visible.  Option (c) can be implemented, but there are some tricky issues.  For one, to most of GLU’s code, loci are processed in a stream and much of the code does not have any idea how long the stream will be or which loci it will see next.  I know this sounds silly, but it is necessary to be able to efficiently use file formats that do not include metadata on the samples or loci to expect without exhaustively reading the file.  Thus, much of GLU cannot exclude the first instance of a problematic locus, since it cannot “look ahead” to see where conflicts may occur.

 

So is option (b) good enough, provided that I also add the ability to exclude loci prior to the error checking?

 

Thanks,

-Kevin

Nicholas Orr

unread,
Jan 24, 2012, 3:51:53 AM1/24/12
to glu-...@googlegroups.com
Hi Kevin,

Clearly 1-4 are all potentially troublesome if ignored. Option A is fine for studies with low numbers of markers, but is a pain for GWAS - especially when pooling data from multiple arrays when multiple conflicts can occur. Option B seems like a good compromise - flag the errors then stop, but then be able to rerun the command with a list of loci to exclude.

Nick

Nick Orr, PhD | Staff Scientist | Breakthrough Breast Cancer Research Centre | Division of Breast Cancer Research | Institute of Cancer Research | 237 Fulham Road | London | SW3 6JB | United Kingdom | nichol...@icr.ac.uk | +44 20 7153 5330 |
Reply all
Reply to author
Forward
0 new messages