Re: [bedtools-discuss] getFasta forced strandedness problem with BEDtools v2.17.0?

420 views
Skip to first unread message

Aaron Quinlan

unread,
May 13, 2013, 8:22:22 PM5/13/13
to bedtools...@googlegroups.com
Hi Mark,

I did a test with a small fasta file and cannot replicate this.  How are you testing whether or not the files are identical?


cat x.fa
>chr14
actttgggg

cat x.bed
chr14 0 2 chr14 1 +
chr14 2 5 chr14 1 -
chr14 5 8 chr14 1 +

bedtools getfasta -fi x.fa -bed x.bed -fo stdout -name -s | awk '!x[$0] {print} {if ($0 ~ /^>/) x[$0] += 1}'
>chr14
ac
aaa
ggg



On May 13, 2013, at 7:52 PM, rareaqua...@gmail.com wrote:

Hi there,

I've been trying to figure out an issue I've been having using bedtools getfasta.  I'm trying to alter a reference genome (to correct a misorientation in the reference build) by feeding in BED coordinates using the -s feature. 

I have a bed file

$ cat bedfile.bed
chr14    1    3000000    chr14    1    +
chr14    3000000    19419705    chr14    1    -
chr14    19419706    124902244    chr14    1    +

That I'm running through bedtools:

$ bedtools getfasta -fi mm10.fasta -bed bedfile.bed -s -name -fo chr14.fasta

And collapsing all the fragments by name (so instead of getting two records for chr14, I combine it to one) to create an edited chromosome:

$ awk '!x[$0] {print} {if ($0 ~ /^>/) x[$0] += 1}' chr14.fasta > new_chr14.fasta

For some reason the -s option is not reverse complementing though; my data is identical irrespective of if I use the "-s" flag.  Have I done something wrong, or does this feature not work on large genomic regions like this?

Any help would be greatly appreciated.

Kind regards,

Mark

--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Aaron Quinlan

unread,
May 14, 2013, 4:38:20 AM5/14/13
to bedtools...@googlegroups.com
Hi Mark,

I successfully tried the exact same commands and BED file on hg19, as I have that file on my laptop.  I will try mm10 later today, but at least for human, the stranded requests are correctly handled.


On May 13, 2013, at 9:02 PM, rareaqua...@gmail.com wrote:

Hi Aaron,

If I use your small test data set it works perfectly for me too.  I seem to be unable to force reverse complementing on larger data sets.  I'm assessing by doing the following;

$ cat bed_file.bed

chr14    3000000    19419705    chr14    1    -

$ bedtools getfasta -fi mm10.fasta -bed bed_file.bed -fo fwd.fasta -name
$ bedtools getfasta -fi mm10.fasta -bed bed_file.bed -fo rev.fasta -s -name

I'm assuming here that fwd.fasta should be in the same orientation as the reference file (because I've not invoked -s) and rev.fasta should be in the opposite orientation (because I have invoked -s).

What I get though is:

$ more fwd.fasta
GAtcaacacccttccctcaacaagtctcctgtatccctcccaacttcatgttgtttaataaaatcacccactaagtccaactggtgctgcctgtattcgt...

$ more rev.fasta
GAtcaacacccttccctcaacaagtctcctgtatccctcccaacttcatgttgtttaataaaatcacccactaagtccaactggtgctgcctgtattcgt...

mm10.fasta is the latest mouse reference genome (mm10/GRCm38) and consists of 2.78 Gbases of DNA.

Do you have any experience using larger data sets? Is there a size limit to the amount of nucleotides you can reverse complement?

Thanks for your quick response!

Mark

Aaron Quinlan

unread,
May 14, 2013, 7:01:24 AM5/14/13
to bedtools...@googlegroups.com
What does diff return?

diff fwd.fasta rev.fasta 

One other idea is that perhaps the strand column is malformed or not being read.  From your example, I can't see how that would be, but it is worth checking.

- Aaron




Aaron Quinlan

unread,
May 14, 2013, 2:56:48 PM5/14/13
to bedtools...@googlegroups.com
Hi Mark,

I am glad the problem is solved.  On to the fun stuff!
On May 14, 2013, at 1:41 PM, rareaqua...@gmail.com wrote:

Hi Aaron,

After troubleshooting multiple things today I think I've figured out it was a stupid error on my part; I'd downloaded and saved the bed file from a windows machine and hadn't realized that while the character encoding was fine, it had a windows line ending rather than a Unix, which seemed to screw up the direction column.  As soon as I copied and pasted the same coordinates into a new bed file and saved it on my Linux station, it worked fine.

Sorry for all the confusion, and thanks for your help!
Reply all
Reply to author
Forward
0 new messages