--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at https://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.
<subset_data.xlsx>
File description: SNP 2 Rows Format: Each allele scored in a binary fashion ("1"=Presence and "0"=Absence). Heterozygotes are therefore scored as 1/1 (presence for both alleles/both rows) |
In 2 rows format: this column is blank in the Reference row, and contains the base position and base variant details in the SNP row. In 1 row format: contains the base position and base variant details | |||||||||||||||||
The position (zero indexed) in
the sequence tag at which the defined SNP variant base occurs Column headings after RepAvg (beginning with Y1_FM_2009) are the codes or individuals (ID_location_sample.year). I believe the trick is to create a data file in which column 1 is the individual coda (a and b for each individual) and the and the sequence of each. However Dart only supplies the reference sequence for each loci and the position of the base variant. We need a script that grabs each sequence if it it is identical to the reference (1) or replace the variant given the location and substitution for each non-reference sequence (0). But I do not know where to begin. Do you know of any other people using DartSeq data that might have addressed this issue? Thanks, Brian |
BrianI guess it would be easy to write a converter (or a pipeline), but we would need more explanation of the format, it seems that one would need an alignment step,I would suggest to use the sequence fragment and not just the snp.Peter
On Aug 24, 2018, at 1:31 AM, Brian Stockwell <bsto...@odu.edu> wrote:
I have recently moved to a new lab and they use DArTSeq. I was wondering if anyone know how to create Migrate-N input files (sequence data) from the two row report files supplied by DArTSeq. They provide a csv/xcel data sheet with the consensus sequence and the SNP position for each loci but only the state (homozygous or het) for each individual. Not sure if there are any scripts to convert these data to migrate format? I have attached a subset of the data provided by DArTSeq.Thanks,Brian--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-support+unsubscribe@googlegroups.com.
To post to this group, send email to migrate-support@googlegroups.com.
Visit this group at https://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.
<subset_data.xlsx>
--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-support+unsubscribe@googlegroups.com.
To post to this group, send email to migrate-support@googlegroups.com.
Hi Peter,I have attached a subset of the 2-row format (diploid data) and an explanation of the format. But in summary:cloneID is the loci identifier.
File description: SNP 2 Rows Format: Each allele scored in a binary fashion ("1"=Presence and "0"=Absence). Heterozygotes are therefore scored as 1/1 (presence for both alleles/both rows)
In 2 rows format: this column is blank in the Reference row, and contains the base position and base variant details in the SNP row. In 1 row format: contains the base position and base variant details The position (zero indexed) in the sequence tag at which the defined SNP variant base occurs
Column headings after RepAvg (beginning with Y1_FM_2009) are the codes or individuals (ID_location_sample.year).
I believe the trick is to create a data file in which column 1 is the individual coda (a and b for each individual) and the and the sequence of each. However Dart only supplies the reference sequence for each loci and the position of the base variant. We need a script that grabs each sequence if it it is identical to the reference (1) or replace the variant given the location and substitution for each non-reference sequence (0).
But I do not know where to begin. Do you know of any other people using DartSeq data that might have addressed this issue?
Thanks,
Brian
On Fri, Aug 24, 2018 at 7:52 PM, Peter Beerli <beerli...@gmail.com> wrote:
BrianI guess it would be easy to write a converter (or a pipeline), but we would need more explanation of the format, it seems that one would need an alignment step,I would suggest to use the sequence fragment and not just the snp.Peter
On Aug 24, 2018, at 1:31 AM, Brian Stockwell <bsto...@odu.edu> wrote:
I have recently moved to a new lab and they use DArTSeq. I was wondering if anyone know how to create Migrate-N input files (sequence data) from the two row report files supplied by DArTSeq. They provide a csv/xcel data sheet with the consensus sequence and the SNP position for each loci but only the state (homozygous or het) for each individual. Not sure if there are any scripts to convert these data to migrate format? I have attached a subset of the data provided by DArTSeq.Thanks,Brian--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at https://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.
<subset_data.xlsx>
--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.