HapSTR Input

Matt

unread,

May 11, 2012, 11:04:11 AM5/11/12

to Isolation with Migration

Hello,

I have two questions regarding formatting the input file for HapSTR
data. Sorry if these were posted previously, I did a search and could
not find anything and the instruction manual doesn't quit detail this
enough (or maybe I am just misreading, I apologize in advance if
that's the case!).

1) From the manual it looks like I cut the STR sequence out of the
full sequence in the input file, but if I have sequence on both sides
of the STR do I just concatenate these two flanking sequences to make
the sequence portion of the HapSTR? For example, if my HapSTR
sequence was AATCCACACACACAGTTC I would code the input file as 5
AATCGTTC?

2) If I have a HapSTR that actually has two linked STR's and 3 sets of
flanking sequence (i.e. upstream of the first STR, between them, and
downstream of the second STR) then I would apply the same format but
concatenating the three sequence sections? For example,
AATCCACACACACAGTTCTTTCTTCTTCTTCTTCTTCCTAG I would code the input file
as 5 5 AATCGTTCCTAG?

While I have your attention, I just thought of another question
regarding input for mtDNA. Since the analysis assumes that all loci
are unlinked, if we have multiple mtDNA loci should they all be
entered as a single locus? The reason I ask is because even if they
are all linked and entering them as seperate loci may violate
something in the model, we don't expect them to have identical
mutation rates and if they have different sample sizes then any
missing individuals in a concatenated dataset would effectively cause
the program to ignore the entire dataset. I think the solution is to
enter them all individually, but just wanted to be sure.

Thanks for your help!

Matt

Miao Liu

unread,

May 11, 2012, 5:35:22 PM5/11/12

to isolation-wi...@googlegroups.com

1) From the manual it looks like I cut the STR sequence out of the
full sequence in the input file, but if I have sequence on both sides
of the STR do I just concatenate these two flanking sequences to make
the sequence portion of the HapSTR? For example, if my HapSTR
sequence was AATCCACACACACAGTTC I would code the input file as 5
AATCGTTC?

It makes sense to me. But I think the key thing is to make sure homologs are listed in the same column when you compile the haplotypes.

While I have your attention, I just thought of another question
regarding input for mtDNA. Since the analysis assumes that all loci
are unlinked, if we have multiple mtDNA loci should they all be
entered as a single locus? The reason I ask is because even if they
are all linked and entering them as seperate loci may violate
something in the model, we don't expect them to have identical
mutation rates and if they have different sample sizes then any
missing individuals in a concatenated dataset would effectively cause
the program to ignore the entire dataset. I think the solution is to
enter them all individually, but just wanted to be sure.

My humble opinioin is to try both, if possible, and see what happen. Because I found sometimes the presumption is very critical, but sometime not.

Not sure if this would help.

Miao

jhey

unread,

Jun 12, 2012, 12:22:30 AM6/12/12

to Isolation with Migration

Yes, just concatenate the coding sequences.

same goes for all your mtDNA loci - treat as a single locus. The
mutation rate would be the total for all the loci.

Can't help with missing data. You will be limited to using just those
individuals for which you have data for all the loci you want to
include

jhey

Reply all

Reply to author

Forward