DArT data as CSV with one row and SNPs as AA, AB and BB

142 views
Skip to first unread message

Juliana Souza

unread,
Jul 16, 2023, 6:14:05 AM7/16/23
to dartR

Hello dartR team,

I need some help with my input data. I need to convert it to a genalex file.

I have DArT data in csv. The file contains the columns: CloneID, AlleleSequence, Chrom_species_contigs, ChromPos_species_contigs, SNP, SnpPosition, CallRate, Individuals SNP data, count AA, count AB, count BB, Freq allele A, Freq allele B and MAF. I also created a ind.metafile file with population data.

The data is included in unique rows, so I only have one row of information for each allele. I do not have the data in double rows, as in the example file.

In the example file from dartR:

Imagem1.png

My dataset:

Imagem2.png

Is anyone familiar with this type of data? Is it possible to run dartR using this data as it is? Also, my SNPs are coded as AA, AB, BB. Can I just convert it to 0, 1 and 2?

This is the first time I see a data display for DArT data as it is, and I have not seen it anywhere else online.

I appreciate any assistance you could give.

Best,

Juliana

Peter Unmack

unread,
Jul 16, 2023, 8:01:48 AM7/16/23
to da...@googlegroups.com
dartR automatically detects with the data are in two row or single row
format and then reads it in. No need to change anything, just read it.

Cheers
Peter Unmack

On 16/07/2023 8:14 pm, Juliana Souza wrote:
> Hello dartR team,
>
> I need some help with my input data. I need to convert it to a genalex
> file.
>
> I have DArT data in csv. The file contains the columns: CloneID,
> AlleleSequence, Chrom_species_contigs, ChromPos_species_contigs, SNP,
> SnpPosition, CallRate, Individuals SNP data, count AA, count AB, count
> BB, Freq allele A, Freq allele B and MAF. I also created a ind.metafile
> file with population data.
>
> *The data is included in unique rows, so I only have one row of
> information for each allele*. *I do not have the data in double rows, as
> in the example file. *
>
> In the example file from dartR:
>
> Imagem1.png
>
> My dataset:
>
> Imagem2.png
>
> Is anyone familiar with this type of data? Is it possible to run dartR
> using this data as it is? Also, my SNPs are coded as AA, AB, BB. Can I
> just convert it to 0, 1 and 2?
>
> This is the first time I see a data display for DArT data as it is, and
> I have not seen it anywhere else online.
>
> I appreciate any assistance you could give.
>
> Best,
>
> Juliana
>
> --
> You received this message because you are subscribed to the Google
> Groups "dartR" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dartr+un...@googlegroups.com
> <mailto:dartr+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dartr/e6b70518-0e44-4324-a617-f46a171f2826n%40googlegroups.com <https://groups.google.com/d/msgid/dartr/e6b70518-0e44-4324-a617-f46a171f2826n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Juliana Souza

unread,
Jul 17, 2023, 2:50:45 PM7/17/23
to dartR
Thank you for the quick reply, Dr.  Unmack.

If I may, I would like to ask a few more questions about running dartR with this type of data.
 
I would always get the message that it was not possible to determine the number of columns to skip based on RepAvg (obviously, since I do not have that data on my dataset).
Long story short, I had to make a lot of adjustments just to get the code to read the file properly. I manually added an AlleleID column just for the tag (could not get it to read properly just with CloneID) and created 3 files. One file containing just the AlleleID and SNP data, one for individual metrics, and one for loc metrics containing the Chrom_species_contigs, ChromPos_species_contigs, SNP, SnpPosition, CallRate, count AA, count AB, count BB, Freq allele A, Freq allele B, and MAF columns.
I also changed the SNPs from AA, AB, and BB to 0, 1, and 2 (could not get it to read properly using AA, AB, and BB). Very similar to the example on page 18 of the Workbook.
Once I had a genlight object, I ran a compliance check and everything seemed to be working fine.
I created the GenAlEx file and all the info was correct for the populations (03), number of individuals (509), and number of SNPs (1292). However, the SNPs were coded only as 0, 1, or 2. I am used to seeing GenAlEx files go up to at least 4 (originated from sequencing and stored as base pairs). Is that correct? Am I missing something due to the data limitations on my dataset? Again, this is my first time working with DArT data, and I have not seen examples of converting AA, AB, and BB DArT data to a GenAlEx file.
If anyone could enlighten me on this, I appreciated it.

tl;dr genalex files, converted from DArT datasets (with SNPs coded as AA, AB, and BB), only code SNPs as 0, 1, or 2?

Best,
Juliana

Peter Unmack

unread,
Jul 17, 2023, 8:40:06 PM7/17/23
to da...@googlegroups.com
Just to clarify one point, a dart file is one that comes from the
sequencing company Diversity Arrays Technology (DArT). DartR was
designed initially to primarily work with data files from DArT, but it
can work with any snp data if formatted correctly. I have no experience
in this area though, so hopefully someone else will chime in. Most of
the key folks are away at a meeting this week though.

Cheers
Peter
> https://groups.google.com/d/msgid/dartr/e6b70518-0e44-4324-a617-f46a171f2826n%40googlegroups.com <https://groups.google.com/d/msgid/dartr/e6b70518-0e44-4324-a617-f46a171f2826n%40googlegroups.com> <https://groups.google.com/d/msgid/dartr/e6b70518-0e44-4324-a617-f46a171f2826n%40googlegroups.com?utm_medium=email&utm_source=footer <https://groups.google.com/d/msgid/dartr/e6b70518-0e44-4324-a617-f46a171f2826n%40googlegroups.com?utm_medium=email&utm_source=footer>>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "dartR" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dartr+un...@googlegroups.com
> <mailto:dartr+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dartr/803cf7a4-d7d6-4439-a0d7-0e719fd37fb6n%40googlegroups.com <https://groups.google.com/d/msgid/dartr/803cf7a4-d7d6-4439-a0d7-0e719fd37fb6n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Juliana Souza

unread,
Jul 18, 2023, 9:02:06 AM7/18/23
to dartR
Thank you, Dr.  Unmack.
To supplement my inquiry, as informed by the wet lab, the database was obtained through DArT, stored as a csv, and filtered employing MAF.
If anyone has any info that could further enlighten me on this, I appreciate it.

Best,
Juliana

Juliana Souza

unread,
Jul 19, 2023, 10:37:24 AM7/19/23
to dartR
Dear Dr.  Unmack,
I'm sending this message to update and close this thread. Thank you for your support.
I got the original files. I still had to run a few adjustments to get it to work on dartR (the file was very structurally similar to the example). In the end, I had the same results for the genalex converted file as before, so I'll continue with my analysis.

Thank you,
Juliana
Reply all
Reply to author
Forward
0 new messages