UShER: Ultrafast Sample placement on Existing tRee - is there a limit to the number of ID's that can be processed via the web browser version?

15 views
Skip to first unread message

Mike Elmore

unread,
Sep 5, 2022, 2:40:30 PM9/5/22
to gen...@soe.ucsc.edu

Dear UCSC genomics, thank you very much for making available the web version of UShER (https://genome.ucsc.edu/cgi-bin/hgPhyloPlace). I use it to assign lineages of SARS-CoV-2 to those which are “unassigned” by other methods (e.g. Pangolin, Scorpio). I use the sequence ID’s as the input file

I have one question – is there an upper limit to the number of ID’s that can be submitted in one form? At present I see it can take a few hundred, but not thousands. Presumably there is some cut-off as to the number of ID’s/sequences that it can handle.

 

Best wishes

 

Mike Elmore

 

Dr Mike Elmore

Bioinformatician

UK Health Security Agency

Porton Down

Salisbury

Wiltshire

SP4 0JG

mike....@ukhsa.gov.uk

+44 1980 612220

www.gov.uk/UKHSA

Follow us on twitter @UKHSA

 

 

*******************************************************************************************
The information contained in the Email and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the UKHSA, or the intended recipient or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this Email has been swept for computer viruses by Exchange Online Protection, but please re-sweep any attachments before opening or saving. UK Health Security Agency -(https://www.gov.uk/government/organisations/uk-health-security-agency)
*******************************************************************************************

Luis Nassar

unread,
Sep 9, 2022, 8:06:10 PM9/9/22
to Mike Elmore, gen...@soe.ucsc.edu
Hello, Mike.

Thank you for your interest in the Genome Browser and our hgPhyloPlace tool.

Our engineers share that while no actual limit is enforced, as the tree grows larger it takes longer to do the search, so there is a practical limit that keeps shrinking over time.

Regarding unassigned lineages, except for cases of probable recombinants for which lineages have not been designated, pangolin should not produce "Unassigned" if you are using the latest version of pangolin (v4.1.2) with the default UShER analysis mode. If you are using the latest version of pangolin but are using the option --analysis-mode fast or --analysis-mode pangolearn, then you could try running on only the Unassigned sequences while omitting that option (or changing it to --analysis-mode accurate or --analysis-mode usher). Then pangolin will run usher on a minimized version of our tree which is faster than running the web interface on the full tree, especially if you have a machine with multiple cores and add the --threads N option where N is the number of cores you can use for pangolin. For any questions regarding pangolin, https://github.com/cov-lineages/pangolin/issues is a good place to search or file a new issue.

Our engineers also share that they have some ideas about how to speed it up, but no promises about when that work will be complete. In the meantime, if running pangolin in default UShER mode does not resolve all of the Unassigned cases, you can write us back and we can assist you further.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/LO2P123MB51098404DB8C4C59BD42B49EC67F9%40LO2P123MB5109.GBRP123.PROD.OUTLOOK.COM.

Mike Elmore

unread,
Sep 12, 2022, 5:15:24 PM9/12/22
to Luis Nassar, gen...@soe.ucsc.edu

Dear, Luis, thank you for your reply and the explanation. I work at one step removed from the actual program, in that I interrogate COG-UK Covid Metadata files (https://www.cogconsortium.uk/priority-areas/data-linkage-analysis/public-data-analysis/), which have already had their lineages called. The limited lineage information states “PLEARN-v1.14”, which I assume tallies with “Pangolin-data” version on the Pangolin website, but it doesn’t state the Pango program version.

I’ll see if we can get the standalone version up and running at my establishment

Thank you for your help and explanation

 

Best Wishes

 

Mike Elmore

 

From: Luis Nassar <lrna...@ucsc.edu>
Sent: 10 September 2022 01:06
To: Mike Elmore <Mike....@ukhsa.gov.uk>
Cc: gen...@soe.ucsc.edu
Subject: Re: [genome] UShER: Ultrafast Sample placement on Existing tRee - is there a limit to the number of ID's that can be processed via the web browser version?

 

You don't often get email from lrna...@ucsc.edu. Learn why this is important

EXTERNAL: This email originated outside of UKHSA. Do not click links or attachments unless you recognise the sender.

Reply all
Reply to author
Forward
0 new messages