DArtSeq

222 views

Skip to first unread message

Brian Stockwell

unread,

Aug 24, 2018, 3:39:35 AM8/24/18

to migrate-support

I have recently moved to a new lab and they use DArTSeq. I was wondering if anyone know how to create Migrate-N input files (sequence data) from the two row report files supplied by DArTSeq. They provide a csv/xcel data sheet with the consensus sequence and the SNP position for each loci but only the state (homozygous or het) for each individual. Not sure if there are any scripts to convert these data to migrate format? I have attached a subset of the data provided by DArTSeq.

Thanks,

Brian

subset_data.xlsx

Peter Beerli

unread,

Aug 24, 2018, 3:52:36 AM8/24/18

to migrate...@googlegroups.com

Brian

I guess it would be easy to write a converter (or a pipeline), but we would need more explanation of the format, it seems that one would need an alignment step,

I would suggest to use the sequence fragment and not just the snp.

Peter

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at https://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.
<subset_data.xlsx>

Brian Stockwell

unread,

Aug 27, 2018, 5:41:28 PM8/27/18

to migrate...@googlegroups.com

Hi Peter,

I have attached a subset of the 2-row format (diploid data) and an explanation of the format. But in summary:

cloneID is the loci identifier.

File description: SNP 2 Rows Format: Each allele scored in a binary fashion ("1"=Presence and "0"=Absence). Heterozygotes are therefore scored as 1/1 (presence for both alleles/both rows)

In 2 rows format: this column is blank in the Reference row, and contains the base position and base variant details in the SNP row. In 1 row format: contains the base position and base variant details

The position (zero indexed) in the sequence tag at which the defined SNP variant base occurs

Column headings after RepAvg (beginning with Y1_FM_2009) are the codes or individuals (ID_location_sample.year).

I believe the trick is to create a data file in which column 1 is the individual coda (a and b for each individual) and the and the sequence of each. However Dart only supplies the reference sequence for each loci and the position of the base variant. We need a script that grabs each sequence if it it is identical to the reference (1) or replace the variant given the location and substitution for each non-reference sequence (0).

But I do not know where to begin. Do you know of any other people using DartSeq data that might have addressed this issue?

Thanks,
Brian

On Fri, Aug 24, 2018 at 7:52 PM, Peter Beerli <beerli...@gmail.com> wrote:

Brian
I guess it would be easy to write a converter (or a pipeline), but we would need more explanation of the format, it seems that one would need an alignment step,
I would suggest to use the sequence fragment and not just the snp.

Peter

On Aug 24, 2018, at 1:31 AM, Brian Stockwell <bsto...@odu.edu> wrote:

I have recently moved to a new lab and they use DArTSeq. I was wondering if anyone know how to create Migrate-N input files (sequence data) from the two row report files supplied by DArTSeq. They provide a csv/xcel data sheet with the consensus sequence and the SNP position for each loci but only the state (homozygous or het) for each individual. Not sure if there are any scripts to convert these data to migrate format? I have attached a subset of the data provided by DArTSeq.

Thanks,
Brian

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-support+unsubscribe@googlegroups.com.
To post to this group, send email to migrate-support@googlegroups.com.

Visit this group at https://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.

<subset_data.xlsx>

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-support+unsubscribe@googlegroups.com.
To post to this group, send email to migrate-support@googlegroups.com.

dartseq_summary.docx

dart_data_raw_subset.xlsx

Felipe Martins

unread,

Oct 8, 2018, 12:38:33 PM10/8/18

to migrate-support

Hi Brian

People outside of Australia probably have little to no familiarity with Dart data.

Here's a few resources:

https://github.com/smyers13/DArT_scripts

https://cran.r-project.org/web/packages/dartR/vignettes/IntroTutorial_dartR.pdf

You can extract the data from the excel spreadsheet into other formats and get a migrate input file from that.

Cheers, Felipe

Em segunda-feira, 27 de agosto de 2018 18:41:28 UTC-3, Brian Stockwell escreveu:

Hi Peter,

I have attached a subset of the 2-row format (diploid data) and an explanation of the format. But in summary:

cloneID is the loci identifier.

File description: SNP 2 Rows Format: Each allele scored in a binary fashion ("1"=Presence and "0"=Absence). Heterozygotes are therefore scored as 1/1 (presence for both alleles/both rows)

In 2 rows format: this column is blank in the Reference row, and contains the base position and base variant details in the SNP row. In 1 row format: contains the base position and base variant details

The position (zero indexed) in the sequence tag at which the defined SNP variant base occurs

Column headings after RepAvg (beginning with Y1_FM_2009) are the codes or individuals (ID_location_sample.year).

I believe the trick is to create a data file in which column 1 is the individual coda (a and b for each individual) and the and the sequence of each. However Dart only supplies the reference sequence for each loci and the position of the base variant. We need a script that grabs each sequence if it it is identical to the reference (1) or replace the variant given the location and substitution for each non-reference sequence (0).

But I do not know where to begin. Do you know of any other people using DartSeq data that might have addressed this issue?

Thanks,
Brian

On Fri, Aug 24, 2018 at 7:52 PM, Peter Beerli <beerli...@gmail.com> wrote:

Brian
I guess it would be easy to write a converter (or a pipeline), but we would need more explanation of the format, it seems that one would need an alignment step,
I would suggest to use the sequence fragment and not just the snp.

Peter

On Aug 24, 2018, at 1:31 AM, Brian Stockwell <bsto...@odu.edu> wrote:

I have recently moved to a new lab and they use DArTSeq. I was wondering if anyone know how to create Migrate-N input files (sequence data) from the two row report files supplied by DArTSeq. They provide a csv/xcel data sheet with the consensus sequence and the SNP position for each loci but only the state (homozygous or het) for each individual. Not sure if there are any scripts to convert these data to migrate format? I have attached a subset of the data provided by DArTSeq.

Thanks,
Brian

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.

Visit this group at https://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.

<subset_data.xlsx>

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.

Reply all

Reply to author

Forward

0 new messages