So if I have my VCF file as such:
##fileformat=VCFv4.2
[... removed the extra info fields...]
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 100 18 4 55 63 64 91 L
contig1 53 . A G . PASS NS=6;AF=0.083 GT:DP:AD:GQ:GL 0/0:8:8,0:33:-0.00,-2.70,-26.45 0/0:13:13,0:40:-0.00,-4.20,-42.47 ./. 0/0:12:12,0:40:-0.00,-3.90,-39.27 0/1:8:6,2:40:-3.71,-0.00,-17.35 0/0:14:14,0:40:-0.00,-4.50,-45.67 ./. 0/0:6:6,0:27:-0.00,-2.10,-20.05
contig3 19 . C A . PASS NS=6;AF=0.333 GT:DP:AD:GQ:GL 0/1:10:6,4:40:-9.75,-0.00,-16.50 ./. 0/1:20:12,8:40:-19.19,0.00,-32.17 0/1:9:5,4:40:-10.05,-0.00,-13.69 ./. 0/0:7:7,0:24:-0.01,-1.81,-22.33 0/0:4:4,0:13:-0.05,-0.95,-13.03 0/1:6:3,3:40:-7.84,-0.00,-8.36
contig3 22 . T G . PASS NS=6;AF=0.083 GT:DP:AD:GQ:GL 0/0:10:10,0:39:-0.00,-3.26,-21.44 ./. 0/1:20:11,8:40:-10.56,-0.00,-17.57 0/0:9:9,0:36:-0.00,-2.96,-19.38 ./. 0/0:7:7,0:30:-0.00,-2.37,-15.26 0/0:4:4,0:20:-0.01,-1.49,-9.08 0/0:6:6,0:26:-0.00,-2.07,-13.19
contig7 129 . T C . PASS NS=6;AF=0.417 GT:DP:AD:GQ:GL 0/1:9:4,5:40:-12.94,-0.00,-10.02 0/1:4:1,3:31:-8.42,-0.00,-2.48 ./. 0/1:8:1,7:18:-19.30,-0.02,-1.29 ./. 0/1:9:2,7:40:-18.98,-0.00,-3.99 0/1:15:9,6:40:-14.16,-0.00,-23.30 0/0:7:7,0:21:-0.01,-1.55,-21.22
contig8 101 . T A . PASS NS=6;AF=0.250 GT:DP:AD:GQ:GL 0/1:15:13,2:25:-1.97,-0.00,-37.81 ./. 0/0:18:18,0:40:-0.00,-5.34,-58.23 0/0:8:8,0:29:-0.00,-2.33,-26.25 0/0:7:7,0:26:-0.00,-2.03,-23.05
Are these columns (CHROM, POS, ID, REF, ALT) what you call "map information" ? That would make my first table (=map information).
My genotypes for sample 100 would then be: 0/0, 0/1, etc. right?
For now, I find it easier to use the Variant Annotation package (Bioconductor):
vcf <- readVcf("variants.vcf")
snp.matrix <- genotypeToSnpMatrix(vcf, uncertain = FALSE)
snp.matrix.transposed <- t(as(snp.matrix$genotypes, "character"))
write.table(snp.matrix.transposed, "SNP_matrix_test.csv", sep="\t")
to obtain the following SNP matrix:
| contig |
100 |
18 |
4 |
55 |
63 |
64 |
91 |
L |
| contig1:53_A/G |
A/A |
A/A |
NA |
A/A |
A/B |
A/A |
NA |
A/A |
| contig3:19_C/A |
A/B |
NA |
A/B |
A/B |
NA |
A/A |
A/A |
A/B |
| contig3:22_T/G |
A/A |
NA |
A/B |
A/A |
NA |
A/A |
A/A |
A/A |
| contig7:129_T/C |
A/B |
A/B |
NA |
A/B |
NA |
A/B |
A/B |
A/A |
| contig8:101_T/A |
A/B |
NA |
A/A |
A/A |
A/A |
A/B |
A/B |
NA |
| contig10:52_A/T |
A/A |
NA |
A/A |
A/A |
A/A |
A/A |
A/A |
A/B |