polymorphism format and filter alignment

205 views
Skip to first unread message

Matt Olson

unread,
Oct 28, 2010, 2:50:29 PM10/28/10
to tas...@googlegroups.com
When I filter a diploid SNP file in Polymorphism format it changes some genotypes (e.g. C:C) to N's. Has anyone else had this problem? I assume there must be a bug in my input file, but it appears to load correctly.

thanks

matt olson
------------------------------------------------------------------
biology dept
texas tech u 

and 

inst. arctic biology
  dept. biology and wildlife
u. alaska fairbanks



Terry Casstevens

unread,
Oct 28, 2010, 3:43:57 PM10/28/10
to tas...@googlegroups.com
Hi Matt,

Can you be more specific about the filtering you are using?
Do you have a sample file that shows the problem?
Are you using Tassel 3.0?

Terry

> --
> You received this message because you are subscribed to the Google Groups
> "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
> To post to this group, send email to tas...@googlegroups.com.
> To unsubscribe from this group, send email to
> tassel+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/tassel?hl=en.
>

Matt Olson

unread,
Oct 28, 2010, 4:15:48 PM10/28/10
to tas...@googlegroups.com, Terry Casstevens, Peter Bradbury
Yes. I'm using Tassel 3.0.46

With the attached file loaded. I choose Sites->Filter alignment.

Settings:
Min count = 355 (of 474),
min freq = 0.01,
select Remove minor SNP states (nothing else).

The filtered data set changes individual COT10 for Site number 1:1 from 
C:C to N. There are probably other places where changes are made, but 
this is the only one I know of.


Note that when I select Extract Indels, no filtered file is created, 
even though there are indels in the file. This also is confusing to me.

thanks

matt olson
-
PopREfSNP.poly

Peter Bradbury

unread,
Oct 29, 2010, 10:44:15 AM10/29/10
to TASSEL - Trait Analysis by Association, Evolution and Linkage
It looks like the data is being handled correctly as far as the
internal rules go that TASSEL uses. The polymorphism format is
basically a text import. It does make any interpretation of the
symbols used to represent alleles. For example, - (dash) is not a
deletion. It is just another allele symbol. For that reason the site
filter does not recognize it as an indel. Also, for polymorphisms the
site filter operates on the locus genotype not on the individual
alleles. So for example, when "C:C" has a lower frequency than "T:T"
and "C:T", it gets set to missing.

Since you actually have nucleotides, you should use one of the
nucleotide specific formats (Hapmap, Plink, Flapjack, of Phylip) so
that your data will be handled as you expect.

Peter
Reply all
Reply to author
Forward
0 new messages