strand-aware makewindows?

252 views
Skip to first unread message

Jay Hesselberth

unread,
Jul 17, 2014, 11:13:38 AM7/17/14
to bedtools...@googlegroups.com
I have a use case for makewindows that I would really like, and I thought I would see if there is enough interest to incorporate this into bedtools.

I often use the makewindows tool to generate windows around features to make e.g. meta-feature plots. Features include polyA sites, transcription starts, transcription stops, start/stop codons etc. These are usually single base features in a BED file.

The data that I am aggregating is typically stranded, such that I have data for which pos / neg signals are meaningful; these are usually libraries prepared from RNA samples (e.g. RNA-seq, polyA-seq etc).

I typically run a common pipeline for aggregation that is similar to this recipe:


However, the problem is that the window numbers from the current makewindows don't have any sense of strand. For example, for positive strand features, window number "1" has upstream coordinates, and for negative strand features, window number 1 has downstream coordinates. When I ulimately groupby these window numbers to report an aggregated signal, it ends up not making much sense (i.e. the window numbers are the same, but the upstream and downstream signals are mixed).

It would be much nicer if there were a flag in the makewindows tool such that it would pay attention to the strand of features it is making windows around, and report window numbers that are consistent across "upstream" and "downstream" relative to features, i.e. window 1 for a positive strand feature would have coordinates upstream of the feature, and window 1 for a negative strand feature would have coordinates downstream. That way, if you aggregate by a fixed window number, they are all either "upstream" or "downstream".

Jay Hesselberth

unread,
Jul 17, 2014, 11:20:15 AM7/17/14
to bedtools...@googlegroups.com
OK so after thinking about this for 10 seconds, it would be just as useful (and likely easier to implement) to have a flag in the makewindows tool that would make the window numbers print out in reverse order:

$ bedtools makewindows features.bed
chr1 100 200 1
chr1 200 300 2
chr1 300 400 3
chr1 400 500 4

$ bedtools makewindows -reverse features.bed
chr1 100 200 4
chr1 200 300 3
chr1 300 400 2
chr1 400 500 1

You wouldn't necessarily have to pay attention to strand in this case, and leave that decision up to the user.

yupu...@gmail.com

unread,
Nov 3, 2015, 10:55:21 PM11/3/15
to bedtools-discuss
Dear Jay:

Hi, I wonder if you have a patch to do strand specific makewindow that you can share?

I am currently separate the features into two files base on their strand information and then use my own script to fix the window number on the negative strand features. This is not great......

Thanks,
Yupu

florian...@gmail.com

unread,
Feb 21, 2018, 11:21:22 AM2/21/18
to bedtools-discuss
Hello Jay, I am writing my Master's Thesis and encountered the same problem after solving all other issues. I found a solution which works good for me but due to my little experience I am not sure whether it is a good/clean attempt concerning memory efficiency etc. Maybe it helps someone; any feedback welcome and thank you for pushing that issue!  

bedtools intersect -u -a $featureBED -b $signalBED | \

cat <(awk '($6 == "+")' | bedtools makewindows -b - -w $windowBin -i srcwinnum) \
      <(awk '($6 == "-")' | bedtools makewindows -b - -w $windowBin -i srcwinnum -reverse) > out.bed

Florian Deckert

unread,
Feb 22, 2018, 12:42:15 PM2/22/18
to bedtools...@googlegroups.com
Just realizing "tee" and/or <() is a mess. I found another way which works smoothly in pipes without storing files. 

Define two functions and make them accessible to the environment: 

function plusWIN {
    printf "%s\t%s\t%s\t%s\t%s\t%s\n" $1 $2 $3 $4 $5 $6 | bedtools makewindows -b - -w $windowBIN -i srcwinnum
                  }
export -f plusWIN
    
function minusWIN {
    printf "%s\t%s\t%s\t%s\t%s\t%s\n" $1 $2 $3 $4 $5 $6 | bedtools makewindows -b - -w $windowBIN -i srcwinnum -reverse
                   }
export -f minusWIN 

Call the following within your pipe. $* have to be adjusted accordingly if you have more/other stdin formats

... pipe bed in ... | awk 'BEGIN{OFS=FS="\t"} { if ($6 == "+") {system("plusWIN "$var)} else if ($6 == "-") {system("minusWIN "$var)} }' | ... pipe bed out ... 

It is a bit tricky to work with system() inside awk. But I think it might be the best option without storing +/- files on the system platform. 

Best
Florian 



--
You received this message because you are subscribed to a topic in the Google Groups "bedtools-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bedtools-discuss/7KGkI3MYrTk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bedtools-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Reply all
Reply to author
Forward
0 new messages