errors in hg38.447way.commonNames.nh.txt and hg38.447way.nh.txt

46 views
Skip to first unread message

Irwin Jungreis

unread,
Jul 14, 2025, 5:27:48 PM7/14/25
to UCSC Genome Browser Public Support
There are errors in hg38.447way.commonNames.nh.txt at https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/hg38.447way.commonNames.nh.txt.

  1. On line 147: "PurAlouatta_puruensisuacute;s_red_howler_monkey" should be "Purus_red_howler_monkey"
  2. Lines 399 and 400: include parentheses in unquoted node names, which violates the Newick tree standard (see for example https://www.life.illinois.edu/gary/Newicks_845_Tree_Std.html), so I removed them from domestic_dog_(BS72/Village_Dog) and German_Shepherd_dog_(Mischka).
  3. Line 447: There is an extra unmatched  ");" at the end. I removed it.
  4. Four names appear twice, namely pileated_gibbon, southern_white-cheeked_crested_gibbon, black-shanked_douc, and Central_American_spider_monkey. I renamed them by adding _b and _a which is what was already done to address the duplication in the scientific names tree (Hylobates_pileatus_b/Hylobates_pileatus_a, and the same for Nomascus_siki, Pygathrix_nigripes, and Ateles_geoffroyi)
  5. Note also that there are 59 names in the file that include single quotes, which are disallowed by the specification. I recommend simply removing them.

There is also a problem in hg38.447way.nh.txt. The only differences between hg38.447way.nh.txt and hg38.447way.scientificNames.nh.txt is that the former has hg38 instead of Homo_sapiens, and the former has Hylobates_lar and Hylobates_pileatus instead of Hylobates_pileatus_b and Hylobates_pileatus_a. At https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/ it says hg38.447way.nh.txt is the "phylogenetic tree used to guide the cactus alignment" which implies it should have the same names as the alignment files. In the case of hg38 versus Homo_sapiens that is correct. On the other hand, the MAF files in https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/maf/ (and in https://cgl.gi.ucsc.edu/data/cactus/) have the names Hylobates_pileatus_b and Hylobates_pileatus_a not Hylobates_lar and Hylobates_pileatus, so if hg38.447way.nh.txt is supposed to have the names used in the alignment then they are wrong. (I don't know whether the species sequenced to get the first assembly is actually Hylobates lar, as reported in hg38.447way.nh.txt, or Hylobates pileatus, as reported in  hg38.447way.scientificNames.nh.txt, so I don't know which pair of names is actually correct -- I'm just trying to get the .nh and .maf files into agreement.)

I've attached two corrected versions of hg38.447way.commonNames.nh.txt and a corrected version of hg38.447way.nh.txt:
  • hg38.447way.commonNames.corrected.nh.txt fixes problems 1-4.
  • hg38.447way.commonNames.corrected.noQuotes.nh.txt also removes all single quotes from names.
  • hg38.447way.corrected.nh.txt is the same as hg38.447way.nh.txt except it uses Hylobates_pileatus_b and Hylobates_pileatus_a (as in hg38.447way.scientificNames.nh.txt) rather than Hylobates_lar and Hylobates_pileatus (as in the original hg38.447way.nh.txt) so it matches the names in the MAF files.


hg38.447way.commonNames.corrected.nh.txt
hg38.447way.commonNames.corrected.noQuotes.nh.txt
hg38.447way.corrected.nh.txt

Luis Nassar

unread,
Aug 8, 2025, 6:30:33 PM8/8/25
to Irwin Jungreis, UCSC Genome Browser Public Support
Hi, Irwin.

Thank you for taking the time to inform us of these issues.

We've updated the files on our server based on your recommendations: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/

Let us know if you have any additional feedback.

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/D03F2EBB-33A1-4484-8894-23C8ABF99671%40csail.mit.edu.

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/D03F2EBB-33A1-4484-8894-23C8ABF99671%40csail.mit.edu.

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/D03F2EBB-33A1-4484-8894-23C8ABF99671%40csail.mit.edu.

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/D03F2EBB-33A1-4484-8894-23C8ABF99671%40csail.mit.edu.


--
I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

Irwin Jungreis

unread,
Dec 29, 2025, 7:17:37 PM (3 days ago) 12/29/25
to Luis Nassar, UCSC Genome Browser Public Support
Hello,

I found an additional problem in https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/hg38.447way.commonNames.nh.txt. The common name between "nine_banded_armadillo" and "screaming_hairy_armadillo" is "placentals". (This is the node that corresponds to "Tolypeutes_matacus" in the scientific names  tree.) The name should be "Southern_three_banded_armadillo" not "placentals".

PastedGraphic-1.png

- Irwin

Irwin Jungreis

unread,
Dec 29, 2025, 7:18:15 PM (3 days ago) 12/29/25
to Luis Nassar, UCSC Genome Browser Public Support
Also:

  • In the scientific names files (hg38.447way.scientificNames.nh.txt and hg38.447way.nh.txt), should Propithecus_coquerelli be Propithecus_coquereli (single "l" in coquereli rather than double)? It was single in the 241 mammal tree and in the first few references I found online.

  • Comparing the 447 mammal scientific and common names trees, the name corresponding to Ceratotherium_simum_cottoni is Southern_white_rhinoceros_cottoni. However, in the 241 mammals tree Ceratotherium_simum_cottoni corresponded to Northern_white_rhino, and the latter agrees with what it says in wikipedia and other references I found online. Are you sure this one should be southern rather than northern? (Alternatively, should the scientific name be Ceratotherium_simum_simum rather than Ceratotherium_simum_cottoni?)

- Irwin

On Dec 29, 2025, at 4:59 PM, Irwin Jungreis <ILJ...@csail.mit.edu> wrote:

Hello,

I found an additional problem in https://hgdownload.soe.ucsc.edu/goldenPath/hg38/cactus447way/hg38.447way.commonNames.nh.txt. The common name between "nine_banded_armadillo" and "screaming_hairy_armadillo" is "placentals". (This is the node that corresponds to "Tolypeutes_matacus" in the scientific names  tree.) The name should be "Southern_three_banded_armadillo" not "placentals".

Irwin Jungreis

unread,
Dec 29, 2025, 7:18:40 PM (3 days ago) 12/29/25
to Luis Nassar, UCSC Genome Browser Public Support
Regarding Tolypeutes_matacus, in my earlier message I suggested the common name should be "Southern_three_banded_armadillo" rather than "placentals".  "Southern_three_banded_armadillo" is the name that appeared in the common name tree for 241 mammals, but I don't know if that's the exact name you want to use for 447 mammals since I see that many of the common names changed between the 241 and 447 mammal trees. That's up to you, but whatever the right name is, it certainly isn't "placentals".

- Irwin

Maximilian Haeussler

unread,
Dec 30, 2025, 12:53:02 PM (3 days ago) 12/30/25
to Irwin Jungreis, Luis Nassar, UCSC Genome Browser Public Support
Hi Irwin,
thanks for these corrections, we can try to integrate them into the files. 

We don't build the names, we just import the species names from what I know, or translate the names using NCBI Taxonomy. The problem with the three banded armadillo is probably that it has "placentals" as the common name in NCBI: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?searchTerm=Tolypeutes+matacus&searchMode=complete+name&lock=1&unlock=1&command=search
We cannot manually fix hundreds of species names, so we usually use NCBI Taxonomy.

This was probably imported from the Zoonomia folks.

We'll get back to you with more details in the New Year. 

best
Max

Reply all
Reply to author
Forward
0 new messages