How to download OncoTree data (tumor types)

180 views
Skip to first unread message

Joseph Gerrein

unread,
Jul 20, 2023, 9:02:41 PM7/20/23
to OncoTree Users
Hi OncoTree group,

I hope you are doing well. I would like to download the OncoTree data (OncotreeCode, OncotreeSubtype, OncotreePrimaryDisease, OncotreeLineage) in order to facilitate using the data more easily.  Could you please instruct me how to do this?  

My first choice would be the latest release in a text file with a clear OncoTree version.  Could you please provide this?

Here is what I have found out so far:

I saw an old conversation in this group about downloading the tumor types text file but the website address has changed and so the link does not work.  

I saw on the API tab that downloading of tumor types is deprecated, so I do not know if the API would give me the correct data.

I read the README for the OncoTree R package and the R package facilitates downloading the tumor types, but since the API tab of the website says that downloading of tumor types is deprecated, I do not know if the R package would give me the correct data.

I hope this information is helpful.

Thank you for your time!

Joe

Robert Sheridan

unread,
Aug 9, 2023, 11:46:19 AM8/9/23
to OncoTree Users
Hello Joe. Sorry for the long delay responding to your message.

I think I can say a couple of useful things. First, when the oncotree paper was published a couple of years back (https://pubmed.ncbi.nlm.nih.gov/33625877/) we standardized onto a new domain name for oncotree : https://oncotree.info --- and for a while we kept the older domain name  (which was oncotree.mskcc.org) active and running on the original server. There was a brief period recently when the old server (and domain name) stopped working, so your attempts to follow the old links from discussions here probably were attempted during that period. We have now remapped web requests to oncotree.mskcc.org so that they instead go to the web service running as oncotree.info --- but as a general rule we recommend that all users standardize to using oncotree.info and to stop using oncotree.mskcc.org. You can substitute oncotree.info wherever you might have used the other domain name.

---

Next, I can tell you that the deprecated API which produces the table-formatted text file should function correctly and provide accurate information about the oncotree version which is requested. So a request to this web address:


will produce an accurate table formatted list of all oncotree nodes for the version "oncotree_latest_stable" .. or any version which might be requested. The reason why this API is deprecated is that we ran into trouble previously when the oncotree itself grew and some nodes were nested at a depth greater than had been previously encountered. In the output of the link above you can see that there are seven "level" columns because the maximum depth of the oncotree is 7 currently. Originally when we developed the API the depth was 5. So in the future if the oncotree grows to be even deeper, additional columns may be needed in the table  - and because of this unknown column content in the output of the API it become programmatically difficult for scripts or other automations to work with the output of this (deprecated) API. Many external programs would likely break as this table grows (unless the programmers anticipated the possibility of additional columns appearing and proactively programmed for this occurrence). So to avoid this future problems we deprecated this API and intended to remove it when a good opportunity arose.

All of the information needed to construct this table should be obtainable from the API at this address:


What would be helpful would be a script which takes the output of the tumorTypes API and performs this logic:

- define a number of associative maps from oncotree_code to attributes needed in the table {name, mainType, color, nci, umls, history}
- define an associative map from oncotree_code of the current node to the oncotree_code of the parent node
- define an integer holding the greatest depth (level) of any node seen so far, initialized to zero
- loop through each oncotree node in the list. For each:
    - update the greatest depth seen integer if a greater depth is present in the node
    - store needed values in the various attribute maps for this node
    - store the oncotree code for the parent in the parentage map
- define an oncotree-code-to-parentage-list map. This is done by iteratively following links in the parentage map.
    an example would be this entry for GBM -- { "GBM" : ["BRAIN", "DIFG", "GB"] }
- construct a parentage list for each oncotree code, using the parentage map
- construct a sorted ordering of the oncotree codes, based on a comparison function which prioritizes:
    - alphabetical ordering of the name of any parent, from "highest" (level 1) to lowest
    - alphabetical ordering of the name of the node itself
- output the table header
- for each oncotree code in the sorted ordering:
    - look up the parentage-list for the oncotree code
    - for each code in the parentage list output a field for the parent in format "<parent-name> (<parent-code>)"
    - output a number of empty fields equal to greatest-seen-depth - size(parentage list)
    - output the other stored attributes for this oncotree code {name, mainType, color, nci, umls, history}

This is not too difficult of a script to write. If we put together such a script, we will distribute it on our tools page


and maybe then we could actually remove the API which generates this table format of oncotree.

I hope this helps you. Some of the terminology in your message is a little different that the terminology we use, but I think they map to our concepts ... such as OncotreeLineage referring to either our History field or to the parental relationship between oncotree nodes, or OncotreePrimaryDisease referring to what we call "mainType". But if I am mistaken,  feel free to clarify these terms. I think you also suggest adding the oncotree version into the text file as content. I think this would require defining a way to include meta-information into file content while still allowing the file to be machine-readable. Perhaps the easiest solution would be to store the oncotree version in the filename rather than in the file contents.

- Rob

On Thursday, July 20, 2023 at 9:02:41 PM UTC-4:

Joseph Gerrein

unread,
Aug 14, 2023, 11:58:41 PM8/14/23
to Robert Sheridan, OncoTree Users
Hi Rob,

Thank you very much for your generous help and the time you spent on your reply!

I did click the link you gave to the API and that downloaded the oncotree.info_api_tumor_types.txt (which I am attaching here). This file is just-in-time for a current need.

OncoTreeLineage, OncoTreePrimaryDisease, terminology comes from the Cancer dependency map (DepMap https://depmap.org/), which uses OncoTree in its Model.csv file in the 2022 fourth-quarter (22Q4) release.


Best wishes,

Joe



--
You received this message because you are subscribed to a topic in the Google Groups "OncoTree Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/oncotree-users/r9Rf_LzHe_0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to oncotree-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/oncotree-users/f73b70de-a668-4e84-b352-7a4e6eb05a82n%40googlegroups.com.
oncotree.info_api_tumor_types.txt_version=oncotree_latest_stable.txt
Reply all
Reply to author
Forward
0 new messages