glu - trouble with transform

192 views
Skip to first unread message

Dtae

unread,
Oct 13, 2008, 1:28:31 PM10/13/08
to glu-users, gha...@northwestern.edu
Hello,

I am having some trouble using the transform function to convert a
HapMap format file (ldat) into a sdat format.

I enter the command as (where chrY.geno is in ldat):
glu transform /home/dte874/chrY.geno -f ldat, -F sdat -o /home/dte874/
test.txt

and get the following error message:
GLU 1.0a5 module: transform
Copyright (c) 2008, BioInformed LLC and the U.S. Department of Health
& Human Services. Funded by NCI under Contract N01-CO-12400.


Execution aborted due to a problem with the program input, parameters
supplied, an error in the program. Please examine the following
failure
trace for clues as to what may have gone wrong. When in doubt, please
send
this message and a complete description of the analysis you are
attempting
to perform to the software developers.

Traceback:
Traceback (most recent call last):
File "glu/lib/glu_launcher.py", line 203, in main
File "glu/modules/transform.py", line 68, in main
File "glu/lib/genolib/io.py", line 415, in transform_files
File "glu/lib/genolib/io.py", line 235, in load_genostream
NotImplementedError: File format 'ldat,' is not supported


[Mon Oct 13 12:23:25 2008] Execution aborted due to unhandled error


Does anyone know what I am doing wrong?

Thanks,
Dan

Kevin Jacobs <jacobs@bioinformed.com>

unread,
Oct 13, 2008, 3:13:18 PM10/13/08
to glu-...@googlegroups.com, gha...@northwestern.edu
On Mon, Oct 13, 2008 at 1:28 PM, Dtae <dana...@gmail.com> wrote:

Hello,

I am having some trouble using the transform function to convert a
HapMap format file (ldat) into a sdat format.

I enter the command as (where chrY.geno is in ldat):
glu transform /home/dte874/chrY.geno -f ldat,  -F sdat -o /home/dte874/
test.txt

Hi Dan,

Hapmap files are conceptually in locus-major format, what we call 'ldat' in the abstract, but are not in the concrete GLU 'ldat' format.  In order to transform such a file to sdat, the command is:

  glu transform -f hapmap chrY.geno -o chrY.sdat

Please let me know if you are still having problems.  The error message you posted doesn't make sense, so I'll look into it too.

-Kevin


Dtae

unread,
Nov 3, 2008, 10:16:44 PM11/3/08
to glu-users
For some reason the below message didn't get posted on the web and I
am not sure if it ever went out. If anyone has an answer it would be
much appreciated.

Thanks,
Dan

On Mon, Oct 13, 2008 at 6:29 PM, Dan Eisenberg <dt...@dtae.net> wrote:

Thanks for the prompt reply Kevin.
I tried running it and got the below. I think my file format is
not
TRUE HapMap format either. I am attaching it. I don't recall
seeing
HapMap as an option of file formats in "intro.formats". Is there
per-chance an option for this type of tab separated ldat-like
format?

Thanks,
D

[dte874@fisher ~]$ glu transform -f hapmap /home/dte874/chrY.geno -
o
/home/dte874/chrY.sdat
GLU 1.0a5 module: transformsform -f hapmap /home/dte874/chrY.geno -
o
/home/dte874/chrY.s
Copyright (c) 2008, BioInformed LLC and the U.S. Department of
Health
& Human Services. Funded by NCI under Contract N01-CO-12400.


Execution aborted due to a problem with the program input,
parameters
supplied, an error in the program. Please examine the following
failure
trace for clues as to what may have gone wrong. When in doubt,
please send
this message and a complete description of the analysis you are
attempting
to perform to the software developers.

Traceback:
Traceback (most recent call last):
File "glu/lib/glu_launcher.py", line 203, in main
File "glu/modules/transform.py", line 68, in main
File "glu/lib/genolib/io.py", line 415, in transform_files
File "glu/lib/genolib/io.py", line 205, in load_genostream
File "glu/lib/genolib/formats/hapmap.py", line 63, in
load_hapmap
ValueError: Input file '/home/dte874/chrY.geno' does not appear
to
be in HapMap format.


[Mon Oct 13 19:20:40 2008] Execution aborted due to unhandled
error



On Mon, Oct 13, 2008 at 2:13 PM, Kevin Jacobs
<jac...@bioinformed.com>
--
Dan Eisenberg
Department of Anthropology
Northwestern University
www.dtae.net




--
Dan Eisenberg
Department of Anthropology
Northwestern University
www.dtae.net


On Oct 13, 1:13 pm, "Kevin Jacobs <jac...@bioinformed.com>"
<bioinfor...@gmail.com> wrote:

bioinformed

unread,
Nov 3, 2008, 10:28:38 PM11/3/08
to glu-users
On Nov 3, 10:16 pm, Dtae <danaf...@gmail.com> wrote:
> For some reason the below message didn't get posted on the web and I
> am not sure if it ever went out. If anyone has an answer it would be
> much appreciated.


Hi Dan,

Your attachment was also lost. Could you please resend it? I'm
afraid that the HapMap format parser is fairly specific about what it
expects, so we may need to try another approach. What software or
group generated these files and is there documentation anywhere on the
specifics of the format?

~Kevin

Dan Eisenberg

unread,
Nov 3, 2008, 10:50:11 PM11/3/08
to glu-...@googlegroups.com, M. Geoffrey Hayes
Wow, fast reply, thanks Kevin.

I've attached the Y chromosome datafile (chrY.geno). The data is HGDP-CEPH data from http://www.cephb.fr/en/hgdp/ (Supplement 1)
The "file contains in the first row the field names of HGDP-CEPH samples (total 1043).   Fields are separated by tab. The missing genotype characters are "--" ." (from attached README file).

Alternatively supplementary files 2 in STRUCTURE format would probably serve our purpose (READMEsupp2.TXT and chrY-umich.geno) or the tab delimited matrix format version of the HGDP data from Stanford (http://www-shgc.stanford.edu/hgdp/files.html).

Thanks,
Dan
chrY.geno
READMEsupp1.txt
READMEsupp2.TXT
chrY-umich.geno

Kevin Jacobs <jacobs@bioinformed.com>

unread,
Nov 3, 2008, 11:34:53 PM11/3/08
to glu-...@googlegroups.com, M. Geoffrey Hayes
On Mon, Nov 3, 2008 at 11:50 PM, Dan Eisenberg <dana...@gmail.com> wrote:
Wow, fast reply, thanks Kevin.

I've attached the Y chromosome datafile (chrY.geno). The data is HGDP-CEPH data from http://www.cephb.fr/en/hgdp/ (Supplement 1)
The "file contains in the first row the field names of HGDP-CEPH samples (total 1043).   Fields are separated by tab. The missing genotype characters are "--" ." (from attached README file).

Alternatively supplementary files 2 in STRUCTURE format would probably serve our purpose (READMEsupp2.TXT and chrY-umich.geno) or the tab delimited matrix format version of the HGDP data from Stanford (http://www-shgc.stanford.edu/hgdp/files.html).


Hi Dan,

I'll add a parser for these data, since it is likely others will need to read them as well.  It is a very minor variation on what is already possible, so it will be very easy.  Stay tuned!

~Kevin

Kevin Jacobs <jacobs@bioinformed.com>

unread,
Nov 4, 2008, 9:01:01 AM11/4/08
to glu-...@googlegroups.com, M. Geoffrey Hayes
On Tue, Nov 4, 2008 at 12:34 AM, Kevin Jacobs <jac...@bioinformed.com> <bioin...@gmail.com> wrote:
On Mon, Nov 3, 2008 at 11:50 PM, Dan Eisenberg <dana...@gmail.com> wrote:
I'll add a parser for these data, since it is likely others will need to read them as well.  It is a very minor variation on what is already possible, so it will be very easy.  Stay tuned!


I've added an option to use Illumina's missing value code to the next version of GLU.  Once released, you'll be able to run:

glu transform -f imat chrY.geno -o chrY.sdat

In the mean time, until the next release, here is a work-around for Unix/Linux/OS X:

cat chrY.geno | tr -d '-' | glu transform -f ldat - -o chrY.sdat

I'm also going to update the intro.formats text, since it is massively out of date.  GLU can also read/write binary versions of sdat/ldat files by using the extensions sbat/lbat.  Also, GLU can read and write PLINK text and binary formats (normal and transposed) and Merlin/LINKAGE. Read support (aka import) is available for HapMap, PrettyBase, and WTCCC's raw genotype format.  Write support (aka export) is available for STRUCTURE, EIGENSOFT,  and WTCCC's genotype imputation format.

Thanks for helping to improve GLU!

-Kevin

Dan Eisenberg

unread,
Nov 8, 2008, 2:47:15 AM11/8/08
to glu-...@googlegroups.com, M. Geoffrey Hayes
Kevin,

Thanks that works well. Can you give me the syntax for converting to
PLINK and LINKAGE?

I like the (new?) manual you put up on the website. It looks like
there is a minor error on
http://cgf.nci.nih.gov/glu/docs/modules/transform.html#module-transform
At the end in the examples the first section reads:
"Convert an LDAT file to an SDAT file, including only those samples
listed in the "controls" file:
> glu transform samples.ldat --includesamples=controls -o controls.ldat"

If I am not mistaken, you mean the extension on controls to be .sdat

Best,
Dan

On Tue, Nov 4, 2008 at 8:01 AM, Kevin Jacobs <jac...@bioinformed.com>

--

Meredith Yeager (NIH/NCI)

unread,
Nov 9, 2008, 6:34:40 PM11/9/08
to glu-...@googlegroups.com, M. Geoffrey Hayes
Hi,

No, "controls" would be a text file (doesn't need an extension) with the sample IDs for the controls listed (one per line).  What this operation is doing is extracting just the "control" individuals from a larger set of samples (samples.ldat).

Meredith

Kevin Jacobs <jacobs@bioinformed.com>

unread,
Nov 10, 2008, 9:26:13 AM11/10/08
to glu-...@googlegroups.com, M. Geoffrey Hayes
On Sat, Nov 8, 2008 at 2:47 AM, Dan Eisenberg <dana...@gmail.com> wrote:
Thanks that works well. Can you give me the syntax for converting to
PLINK and LINKAGE?

Sure thing.  To convert a GLU format (ldat,sdat,lbat,sbat,ped,bed,etc.) to PLINK binary:

  glu transform myfile.lbat -o myfile.bed

This creates 3 files-- 
  1. myfile.bed, containing your genotypes
  2. myfile.bim with locus data
  3. myfile.fam with pedigree data
Conversion to PLINK text formats are similar, except the extension is either 'ped' or 'tped' and only a single additional file is created (a '.map' for ped format or a '.tfam' for tped format).

  glu transform myfile.lbat -o myfile.ped


LINKAGE files, in this case the dialect used by Merlin and MACH also use the '.ped' file extension, so a bit of magic is needed to load or save that format:

  glu transform myfile.lbat -o myfile.ped:format=merlin

The ':format=merlin' appended to the filename overrides the heuristic that guesses the format you intend and instructs GLU to write Merlin files.  This trick applies to both input and output files.  One can also specify a format using '-F' in this case:

  glu transform myfile.lbat -F merlin -o myfile.ped

This method is slightly less mysterious at first glance, but it doesn't solve the multiple input case:

 glu transform study1.ped:format=plink study2.ped:format=merlin study3.txt:format=eigensoft -o combined.lbat

 
I like the (new?) manual you put up on the website.


Thanks -- documentation is currently the major focus and we've been working hard in our spare moments to get it up to snuff.

 
It looks like
there is a minor error on
http://cgf.nci.nih.gov/glu/docs/modules/transform.html#module-transform
At the end in the examples the first section reads:
"Convert an LDAT file to an SDAT file, including only those samples
listed in the "controls" file:
> glu transform samples.ldat --includesamples=controls -o controls.ldat"

If I am not mistaken, you mean the extension on controls to be .sdat


Good catch.  I've corrected this.  Please let me know if you notice any other glitches.  We are also happy to accept contributions to the documentation, if you'd like to lend a hand.

Best regards,
-Kevin

Dan Eisenberg

unread,
Nov 23, 2008, 2:34:55 PM11/23/08
to glu-...@googlegroups.com
Hello,


> Good catch. I've corrected this. Please let me know if you notice any
> other glitches. We are also happy to accept contributions to the
> documentation, if you'd like to lend a hand.

As is probably apparent from some of my questions, I am a novice to
unix, and computational gentics and probably don't have so much to
contribute. If it would be helpful, I would be glad to write up some
of the advice given to me over this listserv to fit the documentation
format a bit more formally. I also might be able to come up with
something like an "advice for novices" section. Let me know how might
be best for me to go about this.

Thanks again for all the help,
Dan
chr22.map
Hap650Yv3_660918_SNPs-shortened.txt
Reply all
Reply to author
Forward
0 new messages