Liftover Question

Stephen Tsou

unread,

Sep 7, 2022, 9:33:49 PM9/7/22

to UCSC Genome Browser Discussion List

Hello,

I having some small trouble on liftover from HG18 to HG38 on a Mac Laptop. I have downloaded the executable from:

http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/

for some reason when i run:

awk '{print $1"\t"$4-1"\t"$4"\t"$2"\t.\t."}' plink.map > plink_hg18.bed
liftOver plink_hg18.bed hg18ToHg38.over.chain.gz plink_hg38.bed plink_nomap

or ./liftOver plink_hg18.bed hg18ToHg38.over.chain.gz plink_hg38.bed plink_nomap

.... I am getting the following error messages:

-bash: liftOver: command not found

OR

-bash: ./liftOver: Permission denied

Would you know what I may be doing wrong? I ran this successfully previously (though with empty lifted over files previously).

Thank you.

best,

Stephen

Hiram Clawson

unread,

Sep 8, 2022, 12:32:55 AM9/8/22

to Stephen Tsou, UCSC Genome Browser Discussion List

Good Evening Stephen:

Try adding the 'execute' permission to the file:

Before:
$ ls -og liftOver
-rw-rw-r-- 1 8516392 Sep 7 14:21 liftOver
$ ./liftOver
bash: ./liftOver: Permission denied

Add 'execute' permission:

$ chmod +x liftOver

After:
$ ls -og liftOver
-rwxrwxr-x 1 8516392 Sep 7 14:21 liftOver
$ ./liftOver 2>&1 | head
liftOver - Move annotations from one assembly to another
usage:
liftOver oldFile map.chain newFile unMapped
oldFile and newFile are in bed format by default, but can be in GFF and
maybe eventually others with the appropriate flags below.
The map.chain file has the old genome as the target and the new genome
as the query.
... etc ...

If you place your download binaries like this in your home directory:
$HOME/bin/

And add that path to your PATH variable in your .bashrc

PATH=$HOME/bin:$PATH

Then the bash shell will be able to find the executable programs in
this directory no matter where you are working.

--Hiram

Hiram Clawson

unread,

Sep 8, 2022, 11:30:07 AM9/8/22

to Stephen Tsou, UCSC Genome Browser Discussion List

Sorry Stephen, I do not understand what you are referring to.
I do not recognize the transformation you show with the
awk statement. What is the resulting 'map' file ?

I'm guessing you have some bed file in hg18 coordinates and you
want to lift it to hg38 coordinates ?

You would need the liftOver file:

-rw-rw-r-- 1 343575 Feb 19 2014 hg18ToHg38.over.chain.gz

As obtained from:
https://hgdownload.soe.ucsc.edu/goldenPath/hg18/liftOver/hg18ToHg38.over.chain.gz

And if elements from your hg18 coordinates fail the lift, the
'unMapped' file output from liftOver will explain why they
do not lift.

On 9/8/22 7:12 AM, Stephen Tsou wrote:
> Thank you so much Hiram. That worked! An additional problem has occurred
> though. The lifted over bed file plink_hg38.bed is empty.
>
> "head plink_hg18.bed" works fine but "head plink_hg38.bed" is empty. Would
> you know what I may be doing wrong? I am hoping to get a map file via:
>
> awk '{print $1"\t"$4"\t0\t"$3}' plink_hg38.bed > plink_hg38.map
>
>
> ...after a successful plink_hg38.bed file is generated.

Hiram Clawson

unread,

Sep 8, 2022, 12:08:17 PM9/8/22

to Stephen Tsou, UCSC Genome Browser Discussion List

And what is in the plink_nomap file ?

That explains why elements do not lift.

What type of chromosome names do you have in your hg18 file ?

On 9/8/22 8:51 AM, Stephen Tsou wrote:
> Thank you so much Hiram for your courtesy. I believe I do have the chain
> file hg18ToHg38.over.chain.gz . The problem is that once I liftOver
> with the below command, I am getting an empty plink_hg38.bed. This was
> apparent because the command "head plink_hg38.bed" does not show anything.
> Would you by any chance know why?

Stephen Tsou

unread,

Sep 8, 2022, 12:26:54 PM9/8/22

to Hiram Clawson, UCSC Genome Browser Discussion List

Thank you so much Hiram. That worked! An additional problem has occurred though. The lifted over bed file plink_hg38.bed is empty.

"head plink_hg18.bed" works fine but "head plink_hg38.bed" is empty. Would you know what I may be doing wrong? I am hoping to get a map file via:

awk '{print $1"\t"$4"\t0\t"$3}' plink_hg38.bed > plink_hg38.map

...after a successful plink_hg38.bed file is generated.

On Wed, Sep 7, 2022 at 9:32 PM Hiram Clawson <hi...@soe.ucsc.edu> wrote:

Stephen Tsou

unread,

Sep 8, 2022, 12:27:01 PM9/8/22

to Hiram Clawson, UCSC Genome Browser Discussion List

Thank you so much Hiram for your courtesy. I believe I do have the chain file hg18ToHg38.over.chain.gz . The problem is that once I liftOver with the below command, I am getting an empty plink_hg38.bed. This was apparent because the command "head plink_hg38.bed" does not show anything. Would you by any chance know why?

liftOver plink_hg18.bed hg18ToHg38.over.chain.gz plink_hg38.bed plink_nomap

On Thu, Sep 8, 2022 at 8:30 AM Hiram Clawson <hi...@soe.ucsc.edu> wrote:

Stephen Tsou

unread,

Sep 8, 2022, 12:53:55 PM9/8/22

to Hiram Clawson, UCSC Genome Browser Discussion List

Thank you so much Hiram. I'm not sure what plink_nomap is but was told that it was necessary. "head plink_nomap" gives me this:

#Deleted in new

1 696230 696231 rs12029736 . .

#Deleted in new

1 742428 742429 rs3094315 . .

#Deleted in new

1 743267 743268 rs3115860 . .

#Deleted in new

1 744196 744197 rs3131967 . .

#Deleted in new

1 751009 751010 rs3115850

"head plink_hg18.bed" gives me this:

1 696230 696231 rs12029736 . .

1 742428 742429 rs3094315 . .

1 743267 743268 rs3115860 . .

1 744196 744197 rs3131967 . .

1 751009 751010 rs3115850 . .

1 758310 758311 rs12562034 . .

1 766408 766409 rs12124819 . .

1 794402 794403 rs11240778 . .

1 820043 820044 rs28444699 . .

1 836670 836671 rs4475691 .

The endgame after all of this is to replace the rsids from hg18 files with updated rsids in hg38 based on chromosome position.

Hiram Clawson

unread,

Sep 8, 2022, 1:29:22 PM9/8/22

to Stephen Tsou, UCSC Genome Browser Discussion List

Your chromosome names need to have the chr prefix.

UCSC chrom names are: chr1 chr2 ... chrM chrX chrY ... etc

Your file has names: 1 2 ...

On 9/8/22 9:34 AM, Stephen Tsou wrote:
> Thank you so much Hiram. I'm not sure what plink_nomap is but was told

Hiram Clawson

unread,

Sep 8, 2022, 2:23:25 PM9/8/22

to Stephen Tsou, UCSC Genome Browser Discussion List

sed -e 's#^#chr#;' noChr.bed > withChr.bed

Depends upon how many different names you have in there.
If they are all just the primary chroms. that will work.

See full list of names at:
https://hgdownload.soe.ucsc.edu/goldenPath/hg18/bigZips/hg18.chrom.sizes

On 9/8/22 11:11 AM, Stephen Tsou wrote:
> Thank you so much Hiram. So I need to prepend all of the first column with
> "chr" for the liftover to work?

Christopher Lee

unread,

Sep 8, 2022, 2:34:18 PM9/8/22

to Hiram Clawson, Stephen Tsou, UCSC Genome Browser Discussion List

Hi Stephen,

Have you already tried searching for your rsID's in a newer version of
dbsnp? For instance if you download the dbSnp 153 file from here:
http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb

and the bigBedNamedItems utility from here:
http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/bigBedNamedItems

you can get the hg38 position of a single rsid via this command:
bigBedNamedItems dbSnp153.bb rs6657048 stdout

or for a list of rsIDs via this command:
bigBedNamedItems -nameFile dbSnp153.bb myIds.txt dbSnp153.myIds.bed

Ignore this if you've done this already and you are now trying to
liftover dropped rsids.

Thanks,

> --
>
> ---
> You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
> To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/97115a90-2e10-e9be-6a42-2c862f7905d3%40soe.ucsc.edu.

Stephen Tsou

unread,

Sep 8, 2022, 2:36:05 PM9/8/22

to Hiram Clawson, UCSC Genome Browser Discussion List

Thank you so much Hiram. So I need to prepend all of the first column with "chr" for the liftover to work?

Stephen Tsou

unread,

Sep 8, 2022, 6:57:10 PM9/8/22

to Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Thank you both of you. It seemed that the prepended "chr" was the key to why the liftover didn't work previously.

Now that a working plink_hg38.bed and plink_hg38.map has been generated, what is the best method to update the ADNI hg18 .bim, .bed and .fam files? I believe I have a snp151common.txt.gz that I generated from UCSC as well. Before liftover, I used something to the affect of:

system("./plink --bim hg18Old.bim --update-name onlySNPs.uniqLocAndldhg38.txt 2 4 --make-just-bim -out newBim")

...to update based on chromosome end position but obviously it didn't work as the genome references didn't match.

Chris, I don't believe these are dropped ids. After this update, how would I go about updating dropped ids? I suppose this is necessary as well.

best,

Stephen

Gerardo Perez

unread,

Sep 15, 2022, 8:29:25 PM9/15/22

to Stephen Tsou, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Hello, Stephen.

Unfortunately, we do not support the plink software, so we cannot give you advice on how to use the tool. You may want to contact the plink support team for any issues with running the software (https://zzz.bwh.harvard.edu/plink/contact.shtml#probs). You could also post your question to other bioinformatic forums, such as BioStars (https://www.biostars.org/), for advice from other bioinformaticians and scientists in your field.

We do not recommend using LiftOver to convert the SNP, and you can read more about this in our FAQ:
https://genome.ucsc.edu/FAQ/FAQreleases.html#snpConversion

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute

To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAF%3DP_n%2BCdqCGki4-rc%2BCPjO2F0X8Qbm3yX8Jg1t_r%2B5kNtxq5A%40mail.gmail.com.

Stephen Tsou

unread,

Sep 16, 2022, 12:52:22 PM9/16/22

to Gerardo Perez, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Thank you Gerardo. So I have heard this suggestion previously as well but did not understand it. As we just checked out liftover results and the new rsids did not look any different, I am further convinced that the original suggestion of avoiding liftover altogether was probably wise.

How would I take my hg18 .bed or .map file and get a bulk conversion through the website?

Luis Nassar

unread,

Sep 19, 2022, 12:41:24 PM9/19/22

to Stephen Tsou, Gerardo Perez, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Hi, Stephen.

In order to convert from hg18 to hg38 using the identifiers, you will want to create a stripped list of all of the rsIDs. You can use awk for this to extract the single field in the file.

As we describe in the FAQ post (https://genome.ucsc.edu/FAQ/FAQreleases.html#snpConversion) you will then go to the Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables) and make the following selections:

Make sure the region is set to genome, and select paste list on the identifiers option. On this page you can paste all of your rsIDs (or alternatively you can choose to upload a file of just these rsIDs).

From there you can get output or make a few additional modifications such as which specific fields you would like to extract (by default you get all the fields in dbSNP track).

As an example, if I follow these steps with the rsID rs12029736 I see the following result:

#chrom    chromStart    chromEnd    name    ref    altCount    alts    shiftBases    freqSourceCount    minorAlleleFreq    majorAllele    minorAllele    maxFuncImpact    class    ucscNotes    _dataOffset    _dataLen
chr1    770987    770988    rs12029736    A    1    G,    0    31    0.27516,0.415459,0.366421,0.279372,0.362963,0.401869,0.455,-inf,-inf,0.261822,-inf,-inf,0.374767,0.483968,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,-inf,    A,A,A,A,G,A,G,,,A,,,A,G,,,,,,,,,,,,,,,,,,    G,G,G,G,A,G,A,,,G,,,G,A,,,,,,,,,,,,,,,,,,    1627    snv    refIsMinor,diffMajor,commonSome,commonAll,overlapDiffClass,    21209452816    480

If you have some non-matching rsIDs you can also change the table from Common dbSNP155 to All dbSNP155, although we find most variants of interest are present in the common set.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAF%3DP_nKZVQTxH5N1iHonXLc4gaf2YkDXHXyK8PNN8sg%2BUFZ5EQ%40mail.gmail.com.

Stephen Tsou

unread,

Sep 20, 2022, 9:24:54 PM9/20/22

to Luis Nassar, Gerardo Perez, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Thank you so much Luis.

This has been giving me headaches so I appreciate you taking the time.

So the output file from UCSC Table Browser output with your direction does not seem to be the same length as the input hg18 rsid list file. I have tried inputting an hg18 .txt file with names including ones that begin with substrings 'rs' and 'cnvi' (620901 long). I have also tried inputting an hg18 .txt file with names that only begin with only 'rs' (598821 long). The returned .tsv file from UCSC is longer and the same with either input file (626956 long).

Do you know what is going wrong? They shouldn't be different lengths, right? I am also confused how the Table Browser automatically knows that the assembly of the input .txt file is hg18. The settings only asks for the target assembly. Any help would be appreciated.

best,

Stephen

Gerardo Perez

unread,

Sep 21, 2022, 8:40:31 PM9/21/22

to Stephen Tsou, Luis Nassar, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Hello, Stephen.

We will address your questions below:

I have tried inputting an hg18 .txt file with names including ones that begin with substrings 'rs' and 'cnvi' (620901 long).

The rs# means a dbSNP ID, which usually but not always will be found in the dbSNP track. A few rs# IDs have been removed or merged into other rs# IDs. 'cnvi' is not from dbSNP and definitely will not be found in the dbSNP track.

The returned .tsv file from UCSC is longer and the same with either input file (626956 long).

The rs# IDs in the pseudoautosomal regions (PARs, https://en.wikipedia.org/wiki/Pseudoautosomal_region) may be mapped to both chrX and chrY, and some rs# IDs might be mapped to both a main chromosome (e.g. chr10) and an alternate haplotype sequence (e.g. chr10_GL383545v1_alt), which would explain why there would be more lines of output than number of distinct rs# IDs.

I am also confused how the Table Browser automatically knows that the assembly of the input .txt file is hg18. The settings only asks for the target assembly.

If the input .txt file only has IDs (like rs#), then it is not tied to any particular assembly. Only genome coordinates are tied to a particular assembly.

Here is a quick way to find some example rs# IDs that are duplicated in the output (replace the imaginary file names with your real file names):

cut -f 4 TableBrowserOutput.bed | sort | uniq -d | head

That should show the first 10 alphabetically sorted rs# IDs that appear more than once in the name column of BED. Then you can pick one of them (replace "rs???" with an actual ID) to investigate:

grep -w rs??? TableBrowserOutput.bed

You will probably see one line of output for a regular chromosome and one or more lines of output for an alternate assembly sequence. If you want to discard all of the alternate assembly sequences, grep can help:

grep -vE '^chr[0-9XYUn]+_' TableBrowserOutput.bed > TableBrowserOutput.noAlts.bed

Then you can look for duplicates in the new file. There should be fewer duplicates, but there may still be some in chrX and chrY due to PARs.

After this update, how would I go about updating dropped ids?

Here is a way to make a list of the IDs that were not found by the Table Browser query on dbSNP 155 -- those are the ones that will require liftOver. Assuming that hg18.txt contains one ID on each line:

sort -u hg18.txt > hg18.sorted.txt
cut -f 4 TableBrowserOutput.bed | sort -u > TableBrowserOutput.IDs.sorted.txt
comm -23 hg18.sorted.txt TableBrowserOutput.IDs.sorted.txt > IDs.notFound.txt

Then you can use grep with your file that includes hg18 coordinates (chrom, chromStart, chromEnd) to get only the lines that should be fed to liftOver:

grep -Fwf IDs.notFound.txt hg18.bed > hg18.forLiftOver.bed

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute

Stephen Tsou

unread,

Sep 22, 2022, 12:44:16 AM9/22/22

to Gerardo Perez, Luis Nassar, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Thank you so much Gerardo. After two months of interaction with UCSC, various kind souls on Biostars, a professor at Harvard Medical School and now NIH, I am now starting to understand this better.

Could I ask for some clarification of what you nicely explained? Is the tablebrowseroutput.bed and hg18.bed that you mention just in the form of the chr, chromeStart and chromeEnd columns of tableBrowserOutput.tsv? One issue that has come up in the plink files is that there is no chromeStart field in the .bim file (I believe one is to assume a chromeStart at 1 in this instance?). Is this okay or does some change have to be made?

Also for "comm -23 hg18.sorted.txt TableBrowserOutput.IDs.sorted.txt > IDs.notFound.txt" shouldn't rsids in HG18 and rsids in the HG38 TableBrowserOutput.bed be associated with different positions, thus making unique rsids across the two files not necessarily "not found"? Perhaps I am not understanding this correctly.

Also downstream, should I worry about the alternate haplotype rsids? I suppose it is okay to drop them or should I just worry about them later? Thank you so much.

best,

Stephen

Gerardo Perez

unread,

Sep 22, 2022, 8:24:02 PM9/22/22

to Stephen Tsou, Luis Nassar, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Hello, Stephen.

One of our engineers has provided the following information:

Is the tablebrowseroutput.bed and hg18.bed that you mention just in the form of the chr, chromeStart and chromeEnd columns of tableBrowserOutput.tsv?

Yes: in particular the first 4 columns of BED format (https://genome.ucsc.edu/FAQ/FAQformat.html#format1): chrom, chromStart, chromEnd and name. "cut -f 4 TableBrowserOutput.bed" outputs only the 4th column, which is expected to contain rs# IDs found in dbSnp155 in this case.

hg18.bed is assumed to be 4-column BED (chrom, chromStart, chromEnd, name) containing hg18 genomic coordinates and IDs from your original data that you want to convert to hg38.

One issue that has come up in the plink files is that there is no chromeStart field in the .bim file (I believe one is to assume a chromeStart at 1 in this instance?). Is this okay or does some change have to be made?

liftOver cannot read .bim files. In order to run liftOver, the .bim must first be converted to BED. If you can send the first few lines of your .bim file then we can suggest a command pipe to convert it to BED.

Also for "comm -23 hg18.sorted.txt TableBrowserOutput.IDs.sorted.txt > IDs.notFound.txt" shouldn't rsids in HG18 and rsids in the HG38 TableBrowserOutput.bed be associated with different positions, thus making unique rsids across the two files not necessarily "not found"?

Yes, the same rsID will most likely have different genomic positions in hg18 vs. hg38. However, hg18.txt is assumed to contain one ID per line, not genomic coordinates. And due to the "cut -f 4" command, TableBrowserOutput.IDs.sorted.txt should also have one rs# ID per line with no genomic coordinates. Only the rs# IDs are being compared. "comm -23" means "tell me the values that are in the first file but not in the second file". Those are the IDs that were not found by the Table Browser, requiring the fallback approach, i.e. liftOver.

Also downstream, should I worry about the alternate haplotype rsids? I suppose it is okay to drop them or should I just worry about them later? Thank you so much.

That depends on what you will do with the hg38 genomic coordinates. For many purposes, it will be fine to drop the mappings to alternate haplotypes.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute

Stephen Tsou

unread,

Sep 23, 2022, 2:01:31 PM9/23/22

to Gerardo Perez, Luis Nassar, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Thank you so much Gerardo for taking the time and being patient while I ask these questions. Your explanations have been quite clear and EXTREMELY helpful.

This is the head of the .bim file in hg18. Please let me know what your thoughts are on how to obtain a correct chromeStart column in this instance.

> head(bim)
# A tibble: 6 × 6
chr id posg pos alt ref
<chr> <chr> <dbl> <int> <chr> <chr>
1 1 rs12354060 0 10004 0 G
2 1 rs2691310 0 46844 0 0
3 1 rs2531266 0 59415 0 0
4 1 rs4124251 0 97215 0 0
5 1 rs8179466 0 224176 0 0
6 1 rs6603779 0 227744 0 0

Thank you so much again.

best,
Stephen

Luis Nassar

unread,

Oct 3, 2022, 4:32:27 PM10/3/22

to Stephen Tsou, Gerardo Perez, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Hello Stephen,

From that head() command (which we see is from the R session, so it is returning the data as an array) we see that your chromosome numbers do not have 'chr' appended before them, as the Genome Browser requires. It also requires a few column rearrangements to conform to BED.

With that in mind, you should be able to run the following command directly on the command line on your bim file to output the proper format:

awk '{print "chr"$1 "\t" ($4-1) "\t" $4 "\t" $2;}' hg18.bim > hg18.bed

That file can then be used with the previous grep command to pull out only the IDs that were not found by dbSNP:

grep -Fwf IDs.notFound.txt hg18.bed > hg18.forLiftOver.bed

Finally, you can use the resulting hg19.forLiftOver.bed file on the liftOver page or utility. Let us know if this works. The final step will be a command to merge the lifted file with the original rsIDs found in dbSNP.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

Stephen Tsou

unread,

Oct 4, 2022, 1:15:11 PM10/4/22

to Luis Nassar, Gerardo Perez, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Thanks Luis. I think appending the 'chr' was easy enough but wasn't sure what format the .bed file should be in exactly. Thank you.

Stephen Tsou

unread,

Oct 5, 2022, 12:01:48 PM10/5/22

to Luis Nassar, Gerardo Perez, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Hi Luis. So the below command has been running for about 10 hours now so I just wanted to make sure I'm doing the right thing. Is this correct? Thank you.

grep -Fwf IDs.notFound.txt hg18.bed > hg18.forLiftOver.bed

Jairo Navarro Gonzalez

unread,

Oct 6, 2022, 7:46:18 PM10/6/22

to Stephen Tsou, Luis Nassar, Gerardo Perez, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Hello,

Thank you for using the UCSC Genome Browser and sending your follow-up question.

Yes, the process may take a long time, depending on the hardware you use to run the command. You should be able to see some progress if, in another terminal window, you go to the same directory and look at the number of lines in hg18.forLiftOver.bed like this:

wc -l hg18.forLiftOver.bed

If you rerun the command in a few minutes (or hours), you should see the number of lines increase. When done, the number of lines should be similar to the number of lines in IDs.notFound.txt.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro 
UCSC Genome Browser

To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAF%3DP_nJMK0TAMs%3DP4kLa%3Dn8Jugng_sdzbpvXrieUvPttmPCzFw%40mail.gmail.com.

Stephen Tsou

unread,

Oct 7, 2022, 12:14:07 PM10/7/22

to Jairo Navarro Gonzalez, Luis Nassar, Gerardo Perez, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Thank you Jairo. It has been two days and using your command locally, I have a solitary line. Is there a faster way? Do I need GPU for this?

Luis Nassar

unread,

Oct 7, 2022, 1:17:04 PM10/7/22

to Stephen Tsou, Jairo Navarro Gonzalez, Gerardo Perez, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Hi Stephen,

A solitary line is a bit strange. We would expect either 0 (while the computer is reading IDs.notFound.txt into memory) and then a steady increase in output lines after. If the shortcoming is RAM then a GPU would not help.

What are the outputs of the following commands?

wc -l IDs.notFound.txt hg18.bed hg18.forLiftOver.bed

head -2 IDs.notFound.txt hg18.bed hg18.forLiftOver.bed

top -S -l 1 -o cpu -o mem | head -20

Lou Nassar
UCSC Genomics Institute

Stephen Tsou

unread,

Oct 10, 2022, 2:20:22 PM10/10/22

to Luis Nassar, Jairo Navarro Gonzalez, Gerardo Perez, Christopher Lee, Hiram Clawson, UCSC Genome Browser Discussion List

Hi Lou,

Thank you so much.

Interestingly, it is a different line between when I ran:

wc -l hg18.forLiftOver2.bed

two days ago vs. today.

The first reading a couple days ago said:

47414 hg18.forLiftOver2.bed

The second reading today read:

75556 hg18.forLiftOver2.bed

Below are the command lines you suggested. Please let me know what I might be doing wrong or if there are other methods. Thank you again.

For....

wc -l IDs.notFound.txt hg18.bed hg18.forLiftOver.bed

I get:

598821 IDs.notFound.txt

620763 todayhg182.bed

76022 hg18.forLiftOver2.bed

1295606 total

For....

head -2 IDs.notFound.txt hg18.bed hg18.forLiftOver.bed

I get:

==> IDs.notFound.txt <==

rs1000000

rs10000010

==> todayhg182.bed <==

chr1 10003 10004 rs12354060

chr1 46843 46844 rs2691310

==> hg18.forLiftOver2.bed <==

chr1 10003 10004 rs12354060

chr1 97214 97215 rs4124251

For.....

top -S -l 1 -o cpu -o mem | head -20

I get:

Processes: 417 total, 4 running, 413 sleeping, 3177 threads

2022/10/08 15:44:02

Load Avg: 4.50, 4.54, 6.62

CPU usage: 16.13% user, 33.45% sys, 50.41% idle

SharedLibs: 286M resident, 73M data, 25M linkedit.

MemRegions: 1884086 total, 4854M resident, 43M private, 738M shared.

PhysMem: 16G used (3163M wired), 76M unused.

VM: 113T vsize, 2318M framework vsize, 585194956(0) swapins, 588440259(0) swapouts.

Swap: 9203M + 2061M free.

Purgeable: 5500K 82525(0) pages purged.

Networks: packets: 9331650/8671M in, 3377893/1028M out.

Disks: 22680311/2385G read, 23990021/2337G written.

PID COMMAND %CPU TIME #TH #WQ #PORTS MEM PURG CMPRS PGRP PPID STATE BOOSTS %CPU_ME %CPU_OTHRS UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH CSW PAGEINS IDLEW POWER INSTRS CYCLES USER #MREGS RPRVT VPRVT VSIZE KPRVT KSHRD

3080 Google Chrome He 0.0 20:24.02 21 1 8227 1615M 0B 1571M 474 474 sleeping *0[8] 0.00000 0.00000 501 5524905 7714 1808421 1421419 4368288 4418308 3109358 8823 95 0.0 0 0 stephentsou N/A N/A N/A N/A N/A N/A

513 Google Chrome He 0.0 04:32:42 12 1 1465 1443M 76K 381M 474 474 sleeping *23436[5] 0.00000 0.00000 501 33516775 3235 111356385 44348284 48790244 183115788 91915423 35594 550 0.0 0 0 stephentsou N/A N/A N/A N/A N/A N/A

1025 Google Chrome He 0.0 44:46.36 22 2 1002 1189M 0B 1039M 474 474 sleeping *0[9] 0.00000 0.00000 501 15556433 165452 4062463 1888438 8173050 14199963 10116014 3362 101 0.0 0 0 stephentsou N/A N/A N/A N/A N/A N/A

1103 grep 0.0 30:48:26 1/1 0 11 1078M 0B 36M 1103 990 running *0[1] 0.00000 0.00000 501 10692633 91 1057 15 2414 453 119361194 0 1 0.0 0 0 stephentsou N/A N/A N/A N/A N/A N/A

1067 Google Chrome He 0.0 49:50.96 20 2 821 952M 0B 851M 474 474 sleeping *0[9] 0.00000 0.00000 501 19604945 170533 3403874 1727320 7679152 11966849 9487312 5482 85 0.0 0 0 stephentsou N/A N/A N/A N/A N/A N/A

482 AdobeReader 0.0 21:46.75 28 5 542 907M 0B 860M 482 1 sleeping 0[2608] 0.00000 0.00000 501 19969502 94558 3698840 488920 3872863 14728605 5691182 1673354 79 0.0 0 0 stephentsou N/A N/A N/A N/A N/A N/A

Daniel Schmelter

unread,

Oct 11, 2022, 6:19:39 PM10/11/22

to Stephen Tsou, UCSC Genome Browser Discussion List

Hello Stephen,

Thanks for the additional info. We are still investigating this situation. It may be related to certain commands timing-out or running out of memory. This does not normally happen, so I will ask you for some more information that may help us fix it.

Could you please send us the file containing rs# IDs that you uploaded to the Table Browser? Could you also send us the files created by the Table Browser identifier matching query and a short description of each, including any subsequent commands? If you cannot send the files, a line count of any remaining files would be helpful.

All the best,
Daniel Schmelter
UCSC Genome Browser

To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAF%3DP_nJE7-HqDyj6ZDTrd3f6tQOZyJrJ4Xofx5okLmrTMW3exg%40mail.gmail.com.

Stephen Tsou

unread,

Oct 17, 2022, 4:03:54 PM10/17/22

to Daniel Schmelter, UCSC Genome Browser Discussion List

Thanks Daniel,

As I am new to lifting over (and hopefully this is one of the few times I'll have to convert from hg18), it's very very possible I did something wrong.

The hg18 txt file uploaded to the Table Browser is "forUCSCfiltered.txt". I believe the returned hg38 file from Table Browser is attached also "letsSeeFiltered.tsv". It has been attached via google drive. Please let me know if you see anything wrong that I am doing or if there is anything else you need from me. Thank you so much.

letsSeeFiltered.tsv

forUCSCFiltered.txt

Jairo Navarro Gonzalez

unread,

Oct 26, 2022, 7:02:28 PM10/26/22

to Stephen Tsou, Daniel Schmelter, UCSC Genome Browser Discussion List

Hello,

Thank you again for using the UCSC Genome Browser and sending your inquiry.

From looking at some of the rsIDs that you shared, we recommend the "All" subset of dbSNP instead of the default "Common" subset on the Table Browser.

Could you share the name of the file that has both rs# IDs and non-rs# IDs (the unfiltered version of forUCSCFiltered.txt), and the name of the file that has UCSC BED format for all 626,956 SNPs? We don't need the whole files, just the file names, but it would also be nice to see the first few lines of each file just to be sure.

With those filenames, we can suggest a sequence of commands for you to run to generate a 43k-line input file for liftOver.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro 
UCSC Genome Browser

To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAF%3DP_nJxjEtwPPxfzj4MLRHB6hWHhuR8T7HtwXMtmCM3o0hahA%40mail.gmail.com.

Reply all

Reply to author

Forward