Re: [bedtools-discuss] fastaFromBed - intervals only partially available

300 views
Skip to first unread message

Aaron Quinlan

unread,
Jul 20, 2012, 9:46:08 PM7/20/12
to bedtools...@googlegroups.com
Hi Ralf,

I guess my initial reaction would be that you should create your intervals such that they are within the boundaries of the chromosome. How did you create them? The "flank" command in bedtools can do this for you while respecting telomere boundaries.

That said, I do see your point that it might be nice to have a "-clip" option that would truncate an interval to the chrom boundary. I'll give that some thought.

Best,
Aaron



On Jul 20, 2012, at 10:17 AM, Ralf wrote:

> Hi Aaron,
>
> I'm using BedTools, especially fastaFromBed. Nice program!!!
>
> I have the following problem, which I hope you are able to solve:
>
> I want to get the sequences (in .fasta format) 100 bases downstream (!) of features, that have their mapping-intervals near the 3' end of transcripts (RefSeq human.rna).
> As you can imagine, these sequences do not necessarily exist, if the feature mapping-interval is too close to the 3' end.
> So fastaFromBed comes with the message:
> Feature (gi|221316752|ref|NM_005567.3|:2256-2357) beyond the length of gi|221316752|ref|NM_005567.3| size (2277 bp). Skipping.
>
> Using a former version of BedTools, we do not get this error. Instead BedTools printed the beginning of the next (!) entry in the .fasta file (>header, sequence... etc.)
> By appropriate parsing of the outcoming .bed file, the task was accomplished perfectly.
>
> Unfortunately I'm now restricted to use the current (!) BedTools version!
> So is there some way to force BedTools to report the existing substretches of the requested 100 bases stretch instead of skipping the case?
>
> The alternative for me would be to test for availability of these downstream stretches and adjust the interval sizes in my query to fastaFromBed. But then, fastaFromBed would get completely useless to me, as I could also fetch the sequence, when checking for interval availability.
>
> Can you help me?
>
> Best regards,
>
> Ralf

Message has been deleted

Aaron Quinlan

unread,
Jul 24, 2012, 3:53:14 PM7/24/12
to bedtools...@googlegroups.com
Hi Ralf,

Here's an example that I hope illustrates the "clipping" functionality in the flank tool.  In this example, we have a toy genome with one 100bp chromosome.  In the BED file I have 3 intervals; the first is snug in the middle of the chrom, whereas the 2nd and 3rd are near the beginning and end, respectively.  I will run flank on each interval, asking for the flanking 10bp. As you can see, flank recognizes that this is not possible for both sides of the second two features in toy.bed and "clips" them accordingly.  I think this will help you with your analysis. I have inserted comments into the results for clarity.


You'll want the latest version in Github for this analysis, as I recently fixed a small corner case: https://github.com/arq5x/bedtools

cat toy.bed 
chr1 40 50
chr1 0 5
chr1 90 95

$ cat toy.genome 
chr1 100


$ bedtools flank -i toy.bed -g toy.genome -b 10
# we can safely grab 10 bp on each side of chr1:40-50
chr1 30 40
chr1 50 60
# we can only safely grab 10 bp on the "right" side of chr1:0-5
chr1 5 15
# we can safely grab 10 bp on the "left" side of chr1:90-95, but only 5bp on the "right"
chr1 80 90
chr1 95 100

Aaron

On Jul 22, 2012, at 2:03 PM, Ralf wrote:

Hi Aaron,

my intervals are created by mapping reads to the genome/transcriptome. Interval = (Start of read, end of read).
As I want to analyze downstream regions of 100 bp, I derive NEW intervals of the type (End of read, End of read + 100).
The clip option would be nice!

Could you write a short example of the "flank" command in the context of my problem, showing usage and outcome?

Best,
Ralf

Aaron Quinlan

unread,
Jul 25, 2012, 6:05:12 AM7/25/12
to bedtools...@googlegroups.com
Hi Ralf,

Have a look at the help by typing "bedtools flank".  The -r command is what you want.  For example:

bedtools flank i ralf.bed -g ralfs.genome -l 0 -r 10

Usage:   bedtools flank [OPTIONS] -i <bed/gff/vcf> -g <genome> [-b <int> or (-l and -r)]

Options: 
-b Create flanking interval(s) using -b base pairs in each direction.
- (Integer) or (Float, e.g. 0.1) if used with -pct.

-l The number of base pairs that a flank should start from
orig. start coordinate.
- (Integer) or (Float, e.g. 0.1) if used with -pct.

-r The number of base pairs that a flank should end from
orig. end coordinate.
- (Integer) or (Float, e.g. 0.1) if used with -pct.


Best,
Aaron


On Jul 25, 2012, at 4:17 AM, Ralf wrote:

Amazing! This is, what I need.

Is it somehow possible to take just the 10 bp DOWNSTREAM interval, instead of both sides? 
 

Best,
Ralf

Reply all
Reply to author
Forward
0 new messages