errors in hg38.447way.commonNames.nh.txt and hg38.447way.nh.txt

20 views
Skip to first unread message

Irwin Jungreis

unread,
Jul 14, 2025, 5:27:48 PMJul 14
to UCSC Genome Browser Public Support
There are errors in hg38.447way.commonNames.nh.txt at https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/hg38.447way.commonNames.nh.txt.

  1. On line 147: "PurAlouatta_puruensisuacute;s_red_howler_monkey" should be "Purus_red_howler_monkey"
  2. Lines 399 and 400: include parentheses in unquoted node names, which violates the Newick tree standard (see for example https://www.life.illinois.edu/gary/Newicks_845_Tree_Std.html), so I removed them from domestic_dog_(BS72/Village_Dog) and German_Shepherd_dog_(Mischka).
  3. Line 447: There is an extra unmatched  ");" at the end. I removed it.
  4. Four names appear twice, namely pileated_gibbon, southern_white-cheeked_crested_gibbon, black-shanked_douc, and Central_American_spider_monkey. I renamed them by adding _b and _a which is what was already done to address the duplication in the scientific names tree (Hylobates_pileatus_b/Hylobates_pileatus_a, and the same for Nomascus_siki, Pygathrix_nigripes, and Ateles_geoffroyi)
  5. Note also that there are 59 names in the file that include single quotes, which are disallowed by the specification. I recommend simply removing them.

There is also a problem in hg38.447way.nh.txt. The only differences between hg38.447way.nh.txt and hg38.447way.scientificNames.nh.txt is that the former has hg38 instead of Homo_sapiens, and the former has Hylobates_lar and Hylobates_pileatus instead of Hylobates_pileatus_b and Hylobates_pileatus_a. At https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/ it says hg38.447way.nh.txt is the "phylogenetic tree used to guide the cactus alignment" which implies it should have the same names as the alignment files. In the case of hg38 versus Homo_sapiens that is correct. On the other hand, the MAF files in https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/maf/ (and in https://cgl.gi.ucsc.edu/data/cactus/) have the names Hylobates_pileatus_b and Hylobates_pileatus_a not Hylobates_lar and Hylobates_pileatus, so if hg38.447way.nh.txt is supposed to have the names used in the alignment then they are wrong. (I don't know whether the species sequenced to get the first assembly is actually Hylobates lar, as reported in hg38.447way.nh.txt, or Hylobates pileatus, as reported in  hg38.447way.scientificNames.nh.txt, so I don't know which pair of names is actually correct -- I'm just trying to get the .nh and .maf files into agreement.)

I've attached two corrected versions of hg38.447way.commonNames.nh.txt and a corrected version of hg38.447way.nh.txt:
  • hg38.447way.commonNames.corrected.nh.txt fixes problems 1-4.
  • hg38.447way.commonNames.corrected.noQuotes.nh.txt also removes all single quotes from names.
  • hg38.447way.corrected.nh.txt is the same as hg38.447way.nh.txt except it uses Hylobates_pileatus_b and Hylobates_pileatus_a (as in hg38.447way.scientificNames.nh.txt) rather than Hylobates_lar and Hylobates_pileatus (as in the original hg38.447way.nh.txt) so it matches the names in the MAF files.


hg38.447way.commonNames.corrected.nh.txt
hg38.447way.commonNames.corrected.noQuotes.nh.txt
hg38.447way.corrected.nh.txt

Luis Nassar

unread,
Aug 8, 2025, 6:30:33 PMAug 8
to Irwin Jungreis, UCSC Genome Browser Public Support
Hi, Irwin.

Thank you for taking the time to inform us of these issues.

We've updated the files on our server based on your recommendations: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/

Let us know if you have any additional feedback.

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/D03F2EBB-33A1-4484-8894-23C8ABF99671%40csail.mit.edu.

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/D03F2EBB-33A1-4484-8894-23C8ABF99671%40csail.mit.edu.

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/D03F2EBB-33A1-4484-8894-23C8ABF99671%40csail.mit.edu.

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/D03F2EBB-33A1-4484-8894-23C8ABF99671%40csail.mit.edu.


--
I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute
Reply all
Reply to author
Forward
0 new messages