Hello António,
Thank you for your question about converting wig files to bigWig. It is quite possibly that your problem is due to a memory issue - creating large bigWigs can consume a substantial amount of RAM, particular when you start with a large file like this one. That said, one of our engineers notes that the "kill" might instead happen due to resource limitations set in your environment. You can run "ulimit -a" for the bash shell or "limit" for cshell variants to see what limits, if any, are in place. Some limits are soft limits, and can be increased by the user.
If you cannot get enough memory available (it seems likely - my test run of your command used over 30GB of memory), you might have better luck converting the individual wig files into bigWigs and then using the new "bigWigCat" utility developed by Daniel Zerbino to combine them. Splitting the format conversion into multiple steps should make it easier for your computer to handle. Here is the usage message for bigWigCat:
bigWigCat v 4 - merge non-overlapping bigWig files directly into bigWig format usage: bigWigCat out.bw in1.bw in2.bw ... Where in*.bw is in big wig format and out.bw is the output indexed big wig file. options: -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024
#/bin/tcsh foreach f in (chr*.placental.wigFix.gz) zcat f | grep -v ^track > f:r.temp wigToBigWig -clip -fixedSummaries -keepAllChromosomes f:r.temp f:r.bw rm f:r.temp end # Note the f:r syntax just trims the extension from the filename. #To concatenate: bigWigCat all_chr.placental.bw hg19.genome chr*.placental.wigFix.bw
For the 100-way alignment, we only provide scores computed from the full alignment - not subclasses like primate or placental mammal. If you are interested in computing your own phastCons scores from a subset of the species, you can find the multiple alignments and more information about our methods at http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz100way/.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--
-- António Miguel de Jesus Domingues, PhD Postdoctoral researcher Deep Sequencing Group - SFB655 Biotechnology Center (Biotec) Technische Universität Dresden Fetscherstraße 105 01307 Dresden Phone: +49 (351) 458 82362 Email: antonio.domingues(at)biotec.tu-dresden.de -- The Unbearable Lightness of Molecular Biology
Group: http://groups.google.com/a/soe.ucsc.edu/group/genome/topics
- phastcons placental as bigwig [1 Update]
- Query regarding repeat element display match with sequence [1 Update]
- TNNI3 [1 Update]
- Retroposed Genes V5 Source Gene name! [1 Update]
- what happened to the DHGP website? (linked from the dm3 Browser Gateway) [1 Update]
- Reproducible bug with highlight feature [1 Update]
Jonathan Casper <jca...@soe.ucsc.edu> May 16 04:48PM -0700
For the 100-way alignment, we only provide scores computed from the full
alignment - not subclasses like primate or placental mammal. If you are
interested in computing your own phastCons scores from a subset of the
species, you can find the multiple alignments and more information about
our methods at http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz100way/.
I hope this is helpful. If you have any further questions, please reply to
gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those
addresses will be archived in publicly-accessible forums for the benefit of
other users. If your question contains sensitive data, you may send it
instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
On Fri, May 16, 2014 at 3:53 AM, António Domingues <
Matthew Speir <msp...@soe.ucsc.edu> May 16 04:11PM -0700
Hello Roshan,
Thank you for your question about the RepeatMasker and Conservation
tracks in the UCSC Genome Browser. You may not be seeing the repeat or
the conservation information because the RepeatMasker and Conservation
tracks are not turned on. I highly recommend taking advantage of some
the introductory resources that are available for the browser. I would
start by watching these tutorials: http://www.openhelix.com/ucsc. You
can find information on further training and tutorials here:
http://genome.ucsc.edu/training.html.
Data sets in the Genome Browser are stored in tracks. To view a
particular data set, you will need to turn on the corresponding track.
Information on repeats, such as the AluY element that you mentioned, is
contained in the RepeatMasker track. The Conservation track contains
conservation information based alignments between the human genome and
99 different vertebrate genomes. You can use the following steps to turn
on the RepeatMasker and Conservation tracks:
1. Navigate to the gateway page for the Human/hg19 assembly,
http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg19.
2. In the search term box type chr16:47,662,162-47,662,468 (note
this is the position you gave for the AluY element).
3. Click Submit.
4. Scroll down the page to the section labeled Comparative Genomics.
5. Click the label for the Conservation track.
6. Check the boxes next to the species that you want to be
displayed as part of this conservation track.
7. At the top of the page, set the maximum display mode for this
track to 'Full'.
8. Click Submit.
9. Scroll down the page to the section labeled Repeats.
10. Click the label for the RepeatMasker track.
11. At the top of the page, set the display mode to full.
12. Click submit.
If you have correctly turned on the RepeatMasker and Conservation
tracks, you should see some thing like this:
http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=mspeeeer&hgS_otherUserSessionName=hg19_mlqAluYcons.
In this session, the first track is the UCSC Genes track showing the
PHKB gene, the second track is the RepeatMasker track showing the AluY
element, and the last track is the Conservation track showing only the
primate alignments. If you would like more information on how the data
for a track was generated, you can look at the track description page.
You can get to the description for a track by clicking the track labels,
just as you did to turn on these tracks, and scrolling down past the
track settings.
I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group
On 5/15/14, 5:22 AM, roshan fatima wrote:
Matthew Speir <msp...@soe.ucsc.edu> May 16 12:41PM -0700
Hello Christian,
Thank your for your follow-up questions about the TNNI3 gene. Currently,
the RefSeq track is generated through an automated pipeline. We download
the transcripts from RefSeq, align them to the genome, and then filter
the alignments according to the parameters described in the methods
section of the track description page,
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=refGene. Since this
track is not manually curated, it is unlikely that this issue with TNNI3
or other proteins will be fixed in the RefSeq Genes track. In the
future, we hope to add a track that contains RefSeq's alignments to help
resolve issues where our re-alignments are different from the RefSeq
alignments. In the mean time, you could use the GENCODE genes track,
which does not have this issue with the TNNI3 protein. The GENCODE
track,
http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeGencodeV19,
contains manual annotations merged with evidence-based automated
annotations.
I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group
Roberto Munita <robert...@gmail.com> May 16 03:21PM -0400
Hi,
I was trying to use the data from Retroposed Genes V5 track. But my problem
is that I can´t find a table that have pseudogenes alignments information
with the gene name (I searched in the table browser).
For example NM_004643.3-18 is "retro-PABPN1" and if you search in the
genome browser: "NM_004643.3-18"and you click over the pseudogene the
information say:
Source Gene:
NM_004643.3-18 PABPN1 Homo sapiens poly(A) binding protein, nuclear 1
(PABPN1), mRNA.
Where I can found that information for all the Retroposed Genes V5 track?
I hope you could help me,
Thanks for your help!
Roberto
Enviado con MailTrack<https://mailtrack.io/install?source=signature&lang=es&referral=robert...@gmail.com&idSignature=23>
Matthew Speir <msp...@soe.ucsc.edu> May 16 11:56AM -0700
Hi Hervé,
Thank you for bringing this incorrect link on the dm3 gateway page to
our attention, I will change it soon. We've been informed that the DHGP
website has been retired. As an alternative, you can check out the
websites for the Berkeley Drosophila Genome Project (BDGP),
http://www.fruitfly.org/, and FlyBase, http://flybase.org/. The
information from the DHGP site often mirrored what was on these BDGP and
FlyBase sites. If you would like more information on the dm3 assembly
and the heterochromatin sequences used in this assembly, I recommend
looking at the following resources:
* BDGP Release notes,
http://www.fruitfly.org/data/sequence/README.RELEASE5
* DHGP publication 'Sequence finishing and mapping of Drosophila
melanogaster heterochromatin',
http://www.ncbi.nlm.nih.gov/pubmed/17569867
I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group
On 5/15/14, 3:09 PM, Hervé Pagès wrote:
Jonathan Casper <jca...@soe.ucsc.edu> May 16 10:28AM -0700
Hello Kie,
Thank you for reporting this issue, and I'm sorry for the confusion it is
causing you! We're aware of the problem and are planning to address it, but
I don't have a timetable for when the issue will be fixed. In the meantime,
you can recover the proper highlighting by simply refreshing the browser
page.
I hope this is helpful. If you have any further questions, please reply to
gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those
addresses will be archived in publicly-accessible forums for the benefit of
other users. If your question contains sensitive data, you may send it
instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
--