Re: Sort order of bedGraph file before bedGraphToBigWig

3,480 views
Skip to first unread message

Lim, Hee-Woong

unread,
Jan 27, 2017, 11:16:03 AM1/27/17
to gen...@soe.ucsc.edu
Dear UCSC Genome Informatics Group:


I am writing to ask a question on bedGraphToBigWig command in UCSC utility.

I have been truly enjoying these tools for my research
especially bedGraphToBigWig was extremely useful in combination with other tools 
such as Homer or bedtools when creating bigWig files.

However, recently after I upgraded UCSC utilities into the most recent version,
I experienced an error saying that my bedGraph file is not case-sensitive sorted.
Previously, it was OK to use a bedGraph file that is version-sorted like chr1/chr2.../chr10/ch11..
and it also allowed lexicographically sorted files as well.
but now the new version seem to only allow lexicographically sorted file only such as chr10 follows chr1.

I was just wondering whether this change is permanent? and only lexicographical ordering will be allowed from now on?


Thank you.


- Lim




--
=========================================
Hee-Woong Lim, Ph.D.
Postdoctoral Researcher
Institute for Diabetes, Obesity and Metabolism
Perelman School of Medicine at the University of Pennsylvania
12-111 Smilow Center for Translational Research
3400 Civic Center Boulevard
Philadelphia, PA 19104-5160

Chris Villarreal

unread,
Feb 1, 2017, 11:22:40 AM2/1/17
to Lim, Hee-Woong, UCSC

Dear Lim,

Thank you for your question about the UCSC Genome Browser. The error may be due to changes in your shell environment. The 'sort' command depends upon
the LANG variable in the shell, we require LANG=C for this type of sort. To see all variables related to this situation, use the 'locale' command. Here is an example:

LANG=en_US.UTF-8
LC_CTYPE="C" 
LC_NUMERIC="C" 
LC_TIME="C" 
LC_COLLATE="C" 
LC_MONETARY="C" 
LC_MESSAGES="C" 
LC_PAPER="C" 
LC_NAME="C" 
LC_ADDRESS="C" 
LC_TELEPHONE="C" 
LC_MEASUREMENT="C" 
LC_IDENTIFICATION="C" 
LC_ALL=C

Could you please share your settings with us so that we have a better understanding of your issue? Including the file you're using may help as well although not required. You may send this information directly to me.

I hope this is helpful.

-Chris V
UCSC Genome Browser


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.

Lim, Hee-Woong

unread,
Feb 1, 2017, 12:03:36 PM2/1/17
to Chris Villarreal, UCSC
Dear Chris:


Thanks a lot for the kind reply.
I think it make sense although I don't know much about this type of setting.

For your information, here is the result from "locale" command in my terminal.

LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=

I also attached two example bedGraph files, one that works fine and another that gives an error below:
 
test.bad.bedGraph is not case-sensitive sorted at line 4.  Please use "sort -k1,1 -k2,2n" with LC_COLLATE=C,  or bedSort and try again.

In the previous version (v287) of bedGraphToBigWig, both bedGraph files were OK as an input.


Thank you.
-Lim
test.bad.bedGraph
test.good.bedGraph

Chris Villarreal

unread,
Feb 2, 2017, 4:23:12 PM2/2/17
to Lim, Hee-Woong, UCSC
Dear Lim,

Thank you for providing the files. If you have tried using bedSort and it worked I would suggest that option. The bedSort utility can be found in the same directory where you downloaded bedGraphToBigWig. Your environment variable settings are what's causing the sort errors. You will need to set LANG=C or LC_COLLATE=C. In its current configuration, it will not work. Here is a useful explanation on the sort issue: https://www.madboa.com/geek/utf8/.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

-Chris V
UCSC Genome Browser

Lim, Hee-Woong

unread,
Feb 6, 2017, 12:55:56 PM2/6/17
to Chris Villarreal, UCSC
Dear Chris:


Thank you for checking this out.

But my examples do not seem to be related to the locale issue, because chr10 will come before chr2 both in any locale setting.
Instead, it seems to be about sorting requirement.
In previous bedGraphToBigWig, somehow it was also OK to use a version-sorted bedGraph as an input, (like grouping by chromosome and sorted by position.),
but now the new version now strictly requires a sorted bedGraph by "sort -k1,1 -k2,2n" in LANC/LC_COLLATE=C,

Thanks again for your help.


-Lim



Gert Hulselmans

unread,
Feb 9, 2017, 11:29:26 AM2/9/17
to Lim, Hee-Woong, Chris Villarreal, UCSC
Dear Chris,

It would be nice if bedGraphToBigWig would allow converting BedGraph files
where the chromosomes are in the same order as specified in chrom.sizes file.

BEDtools sort option has a way to sort chromosomes in a BED file based on the order in a text file:

  bedtools sort -faidx chrom.sizes -i test.bed

It would be nice if bedSort would have a similar option.

Thanks

Matthew Speir

unread,
Feb 9, 2017, 1:35:46 PM2/9/17
to Lim, Hee-Woong, Chris Villarreal, UCSC
Hi Lim,

It appears that this sorting check was added in May 2015 (v287 looks to be from Aug 2013) as otherwise, the indexes for bigWig and other "big*" file types didn't work correctly. I would recommend updating your bedGraphToBigWig binaries to the latest version. You can download them from our downloads server here: http://hgdownload.soe.ucsc.edu/admin/exe/. If you can't change your LC_COLLATE settings, then it may be best to use bedSort to sort your bedGraph files before feeding them into bedGraphToBigWig.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
--


Matthew Speir

unread,
Feb 9, 2017, 1:35:48 PM2/9/17
to Gert Hulselmans, Lim, Hee-Woong, Chris Villarreal, UCSC
Hi Gert,

As I just mentioned to Lim in another response, the sorting check was added in May 2015 as otherwise, the indexes for bigWig and other "big*" file types didn't work correctly. Based on that, it's unlikely that we will be changing the allowed sort order for bedGraphToBigWig.

You are welcome to store your bedGrpah files sorted in any manner you like, it's just that you would need to sort them before feeding them into bedGraphToBigWig to be able to create bigWig files.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group


--


Reply all
Reply to author
Forward
0 new messages