Hello,
I really don't have any ability to create an accurate mapping file for you. The barcodes don't seem to be in the reads, neither do the primers.
Here is what I was asking you to do earlier, take a read, and examine it to see where the primer fits. E.g. (with just the first part of the read from the BongT9.zip file):
>ICYH0ZI01C2CP5 length=436 xy=1139_3419 region=1 run=R_2013_07_18_16_13_37_
TGCATCGAATTAAACCACATGCTCCACCGCTTGTGCGGGCCCCCGTCAATTCCTTTGAGT
For the primers, you have these options
CAGCAGCCGCGGTAA
and
GTAAGGTTCYTCGCGT
Also, just to be sure, the reverse complement of these two primers:
TTACCGCGGCTGCTG
and
ACGCGARGAACCTTAC
These do not match bases in the read. This indicates that either the primers are wrong, or you're dealing with reads that already have the barcodes and primers removed from them. It sure looks like you got a fasta file per sample from your boss, such as BongaT9, BongaT10, and BongaT11. Will your boss not tell you if he/she downloaded them from SRA, got them from an author a paper, or what the source was? I strongly suspect that the barcodes were already removed and the data were already split into a fasta file per sample, because blasting some of the reads hits NCBI results (mostly chloroplasts) and it hits all the way to ends of the reads. If barcodes were still present at the end, they wouldn't match the reference genes on genbank at the beginning and/or ending of the reads.
There is no way that I with certainty say that these reads are already split per sample, but they may be. I would *strongly* suggest that the source of these reads be tracked down, so that the nature of the processing is known. You wouldn't be able to publish on these data without knowing this in any case.
If it turns out to be the case that the data are already split up according to sample, the approach to getting the separate fasta files into a single fasta file that can be used by QIIME for OTU picking is to use the add_qiime_labels.py script ( see
http://qiime.org/scripts/add_qiime_labels.html). You'll have to take your existing mapping file, and put in the file names of the fasta files, e.g.
#SampleID BarcodeSequence LinkerPrimerSequence InputFileName Description
BongaT9 CCCCCCCC CAGCAGCCGCGGTAA BongaT9.fasta BongaT9
BongaT11 AAAAAAAAA CAGCAGCCGCGGTAA BongaT11.fasta BongaT11
and so on for all of the separate fasta files, then call add_qiime_labels.py as shown in the example on the scripts page.
-Tony