ERROR: number of input data lines over limit 50000000

274 views
Skip to first unread message

Franzi G

unread,
Dec 12, 2017, 12:09:52 PM12/12/17
to gen...@soe.ucsc.edu
Dear UCSC team,

since one weak I have trouble uploading .begGraph.gz files generated with HOMER. I get the following error message: "ERROR: number of input data lines over limit 50000000".
I also tried uploading files that are already in the session (successfully uploades 1 month ago). These give the same error message.
I also tried generating a new session and upload the files. Same error message again!

If I understand it correctly, the files seem to be to big. But why did it work before?

Did you change the size limits of bedGraphs uploaded recently?
Do you have any other advice?

I use the mm10 mouse genome assembly.

Thank you very much for your support.

Franziska Greulich

Brian Lee

unread,
Dec 12, 2017, 4:47:17 PM12/12/17
to Franzi G, gen...@soe.ucsc.edu
Dear Franziska,

Thank you for using the UCSC Genome Browser to view your data and your question about the recent limit 50000000 error message.

You are correct that we have recently set a size limit on bedGraphs that can be uploaded. We do not want to deter you from continuing to use the browser to view your files, but needed to create limitations on file size as a small number of users uploading large-sized files were using a majority of the space allocated to serve custom track users and better solutions exist.

There is a solution available that is also outlined on the HOMER documentation page that will provide a win-win solution. For these sizeable files, it is better to use a tool called bedGraphToBigWig to turn them into binary indexed files called bigWigs and then to host them remotely. Then, when you go to visualize the file, only the section that you are planning to view is transferred over the Internet. In this way you retain a copy of your data and the browser space used to serve custom tracks is reduced to only the regions you are viewing (rather than uploading data across an entire genome to only view select regions).

On the HOMER site you can find a section titled "Creating bigWig files with HOMER" with a script they created called makeBigWig.pl and read the "Making bigWigs from scratch" section to learn more about the description of what their script is doing and to see the options they provided.

I will also help show some steps to create a bigWig from bedGraph also outlined in our documentation here:

Let's say I have a bedGraph (http://genome.ucsc.edu/goldenPath/help/bedgraph.html) for mm9 like this, that I could copy and paste to load on the custom track page: http://genome.ucsc.edu/cgi-bin/hgCustom?db=mm10

track type=bedGraph name=bG description="example of bedGraph" 
chr19 49302000 49302300 -1.0
chr19 49302300 49302600 -0.75
chr19 49302600 49302900 -0.50

If this file was over the 50000000 limit, I would need to convert it to a bigWig. To do this I would take only the data lines from the file (removing that first "track type=" line) and run it through the command bedGraphToBigWig:

bedGraphToBigWig dataFile.bedGraph http://hgdownload.cse.ucsc.edu/goldenPath/mm10/bigZips/mm10.chrom.sizes dataFile.bigWig

Note that the bedGraphToBigWig program DOES NOT accept gzipped bedGraph input files, and that the mm10.chrom.sizes file can be locally downloaded or referenced by URL and that bedGraphToBigWig can be found in our utilities directory: http://hgdownload.soe.ucsc.edu/admin/exe/

The resulting dataFile.bigWig would then needs to be placed in a location that is on the internet and accepts byteRange requests (it is likely your institution provides this service, or another option is NSF funded CyVerse), see more here about hosting data: http://genome.ucsc.edu/goldenPath/help/hgTrackHubHelp.html#Hosting

With the file located online, then instead of uploading all the text, the file can be referenced with a "bigDataUrl=" setting and the type=bedGraph can be changed to type=bigWig. Here is a real example produced by the above command and above data:

track type=bigWig name=bGbw description="example of bedGraph now as a bigWig" bigDataUrl=http://hgwdev.cse.ucsc.edu/~brianlee/dataFile.bigWig

Again, it looks like HOMER has existing tools to help you with this conversion process, and I highly recommend hosting data on CyVerse (if your institution doesn't have internet accessible locations for you to place your data), but it is down for scheduled maintenance today.

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UC Santa Cruz Genomics Institute

Training videos & resources: http://genome.ucsc.edu/training/index.html
Want to share the Browser with colleagues?

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAEN8b%3DAe9wR9g9sF1W%2B-%3DMsyebcBnbrQNEUvXaJAYdFUXexSRA%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Reply all
Reply to author
Forward
0 new messages