DArtSeq

215 views
Skip to first unread message

Brian Stockwell

unread,
Aug 24, 2018, 3:39:35 AM8/24/18
to migrate-support
I have recently moved to a new lab and they use DArTSeq.  I was wondering if anyone know how to create Migrate-N input files (sequence data) from the two row report files supplied by DArTSeq. They provide a csv/xcel data sheet with the consensus sequence and the SNP position for each loci but only the state (homozygous or het) for each individual. Not sure if there are any scripts to convert these data to migrate format? I have attached a subset of the data provided by DArTSeq.

Thanks,
Brian
subset_data.xlsx

Peter Beerli

unread,
Aug 24, 2018, 3:52:36 AM8/24/18
to migrate...@googlegroups.com
Brian
I guess it would be easy to write a converter (or a pipeline), but we would need more explanation of the format, it seems that one would need an alignment step,
I would suggest to use the sequence fragment and not just the snp.

Peter


--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at https://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.
<subset_data.xlsx>

Brian Stockwell

unread,
Aug 27, 2018, 5:41:28 PM8/27/18
to migrate...@googlegroups.com
Hi Peter,

I have attached a subset of the 2-row format (diploid data) and an explanation of the format. But in summary:

cloneID is the loci identifier.

File description: SNP 2 Rows Format: Each allele scored in a binary fashion ("1"=Presence and "0"=Absence). Heterozygotes are therefore scored as 1/1 (presence for both alleles/both rows)

In 2 rows format: this column is blank in the Reference row, and contains the base position and base variant details in the SNP row. In 1 row format: contains the base position and base variant details
The position (zero indexed) in the sequence tag at which the defined SNP variant base occurs

Column headings after RepAvg (beginning with Y1_FM_2009) are the codes or individuals (ID_location_sample.year). 

I believe the trick is to create a data file in which column 1 is the individual coda (a and b for each individual) and the and the sequence of each. However Dart only supplies the reference sequence for each loci and the position of the base variant. We need a script that grabs each sequence if it it is identical to the reference (1) or replace the variant given the location and substitution for each non-reference sequence (0).

But I do not know where to begin.  Do you know of any other people using DartSeq data that might have addressed this issue?

Thanks,
Brian  




On Fri, Aug 24, 2018 at 7:52 PM, Peter Beerli <beerli...@gmail.com> wrote:
Brian
I guess it would be easy to write a converter (or a pipeline), but we would need more explanation of the format, it seems that one would need an alignment step,
I would suggest to use the sequence fragment and not just the snp.

Peter
On Aug 24, 2018, at 1:31 AM, Brian Stockwell <bsto...@odu.edu> wrote:

I have recently moved to a new lab and they use DArTSeq.  I was wondering if anyone know how to create Migrate-N input files (sequence data) from the two row report files supplied by DArTSeq. They provide a csv/xcel data sheet with the consensus sequence and the SNP position for each loci but only the state (homozygous or het) for each individual. Not sure if there are any scripts to convert these data to migrate format? I have attached a subset of the data provided by DArTSeq.

Thanks,
Brian

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-support+unsubscribe@googlegroups.com.
To post to this group, send email to migrate-support@googlegroups.com.
<subset_data.xlsx>

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-support+unsubscribe@googlegroups.com.
To post to this group, send email to migrate-support@googlegroups.com.
dartseq_summary.docx
dart_data_raw_subset.xlsx

Felipe Martins

unread,
Oct 8, 2018, 12:38:33 PM10/8/18
to migrate-support
Hi Brian

People outside of Australia probably have little to no familiarity with Dart data.
Here's a few resources:



You can extract the data from the excel spreadsheet into other formats and get a migrate input file from that.

Cheers, Felipe


Em segunda-feira, 27 de agosto de 2018 18:41:28 UTC-3, Brian Stockwell escreveu:
Hi Peter,

I have attached a subset of the 2-row format (diploid data) and an explanation of the format. But in summary:

cloneID is the loci identifier.

File description: SNP 2 Rows Format: Each allele scored in a binary fashion ("1"=Presence and "0"=Absence). Heterozygotes are therefore scored as 1/1 (presence for both alleles/both rows)

In 2 rows format: this column is blank in the Reference row, and contains the base position and base variant details in the SNP row. In 1 row format: contains the base position and base variant details
The position (zero indexed) in the sequence tag at which the defined SNP variant base occurs

Column headings after RepAvg (beginning with Y1_FM_2009) are the codes or individuals (ID_location_sample.year). 

I believe the trick is to create a data file in which column 1 is the individual coda (a and b for each individual) and the and the sequence of each. However Dart only supplies the reference sequence for each loci and the position of the base variant. We need a script that grabs each sequence if it it is identical to the reference (1) or replace the variant given the location and substitution for each non-reference sequence (0).

But I do not know where to begin.  Do you know of any other people using DartSeq data that might have addressed this issue?

Thanks,
Brian  



On Fri, Aug 24, 2018 at 7:52 PM, Peter Beerli <beerli...@gmail.com> wrote:
Brian
I guess it would be easy to write a converter (or a pipeline), but we would need more explanation of the format, it seems that one would need an alignment step,
I would suggest to use the sequence fragment and not just the snp.

Peter
On Aug 24, 2018, at 1:31 AM, Brian Stockwell <bsto...@odu.edu> wrote:

I have recently moved to a new lab and they use DArTSeq.  I was wondering if anyone know how to create Migrate-N input files (sequence data) from the two row report files supplied by DArTSeq. They provide a csv/xcel data sheet with the consensus sequence and the SNP position for each loci but only the state (homozygous or het) for each individual. Not sure if there are any scripts to convert these data to migrate format? I have attached a subset of the data provided by DArTSeq.

Thanks,
Brian

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
<subset_data.xlsx>

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages