|

The Cancer Imaging Archive Updates : December 2025
|
|
Annotated CPTAC Datasets Available on TCIA |
|
|
Clinical Proteomic Tumor Analysis Consortium (CPTAC)
The National Cancer Institute’s Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genome analysis, or proteogenomics. Data (genomics, proteomics, imaging), assays, and reagents are made available to the public as a Community Resource to accelerate cancer research and advance patient care. TCIA has partnered with CPTAC to host the radiology and pathology imaging data generated by the project.
The National Cancer Institute has continued to enhance these datasets by generating annotations (3d segmentation labels and seed points). Available on TCIA, the annotations can help jumpstart research on tumor detection, auto-segmentation methods, and generate radiomics imaging features which can be compared with the proteomic, genomic and clinical data.
Currently, four CPTAC datasets have been annotated:
Each annotation set has improved consistency of annotation labels and new fields have been added to the metadata reports for the images that were annotated for all subjects. One additional segmentation and seed point file had been added, and all issues with select segmentations and seed points that were not rendering properly in Slicer have been resolved.
A jupyter notebook for analyzing the radiology (DICOM) images and annotations from the CPTAC datasets can be found here. The notebook aims to demonstrate how to access the radiology images and tumor annotations, generate radiomic features from the 3D tumor segmentations, and extract the corresponding clinical data that is correlated with the image features.
More information about research and CPTAC publications, as well as a list of all available collections can be found here.
TCIA Seeking 'Normals'
In an effort to support machine learning, TCIA is seeking “normal” imaging datasets. These scans play a critical role in AI development. AI models learn patterns by contrasting healthy anatomy with disease. Large, diverse collections of normal images help researchers reduce bias, improve accuracy, and strengthen generalizability across patient populations. Normal imaging also supports the creation of baseline reference models. This enables algorithms to better detect subtle abnormalities and reduce false positives in clinical applications.
Click here to learn about proposal deadlines and TCIA submission guidelines.
|
|
|
New Collections Available on TCIA
This dataset is a large, heterogeneous, annotated, open-access brain metastasis dataset with matched radiologic and histopathologic imaging.
Prospectively collected data from the Yale Central Nervous System (CNS) Metastasis Biorepository was queried for patients who had a diagnosis of brain metastasis consistent with primary lung cancer. 103 cases with matched histologic-radiologic imaging are provided in this dataset. Tumor boundaries were manually annotated and verified by a board-certified neuroradiologist. Core tumor, indicated by contrast enhancement, was segmented on T1CE sequences, and whole tumor with peritumoral edema, indicated by hyperintensity, was segmented on FLAIR sequences. Pathological confirmation of brain metastasis and the primary tumor of origin was obtained for all patients using data from pathology reports.
|
This imaging dataset is derived from a prospective observational study of the Natural History of Familial Carcinoid Tumor (NCT00646022) which included 174 asymptomatic subjects. Patients underwent screening including history, physical examination; laboratory evaluation; upper and lower endoscopy; Small Bowel Capsule Endoscopy (SB-CE); and radiologic imaging.
Archived SB-CE full length videos from 20 patients were retrieved and reviewed. Abnormalities consistent with Small Intestine Neuroendocrine Tumors (SI-NET) tumors were identified by expert readers. For each abnormality a .jpg and .mpg image was de-identified and stored. The dataset is made available as a training or test set of curated images to improve automated detection of SB-NET tumors.
|
This study introduces the first publicly accessible multimodal dataset designed to advance distant recurrence prediction in breast cancer (BC). The dataset comprises 47 histopathological whole-slide images (WSIs), 677 hyperspectral (HS) images, and demographic and clinical data from 47 BC patients, of whom 22 (47%) experienced distant recurrence over a 12-year follow-up. Histopathological slides were digitized using a WSI scanner and annotated by expert pathologists, while HS images were acquired with a bright-field microscope and a HS camera. This dataset provides a promising resource for BC recurrence prediction and personalized treatment strategies by integrating histopathological WSIs, HS images, and demographic and clinical data.
|
This dataset can be used to help understand why breast cancers spread and to improve treatment options for metastatic breast cancer.
The AURORA US Metastatic Breast Cancer project’s multi-center effort was started to understand the metastatic process through the study of both the primary and metastatic tissue. 55 patients with 31 primary tissues and 102 metastases were profiled using whole genome DNA sequencing, whole exome DNA sequencing, DNA methylation arrays, and RNA sequencing. The related molecular data are hosted in dbGaP and GEO.
H&E slides are available for 184 specimens (17 samples have 2 images, 12 have 3 images). H&E were performed on 53 primary breast cancer tissues, 99 metastatic samples, and 32 adjacent normal tissues. HLA-A immunofluorescence was performed on 37 samples. Clinical data including patient demographics (age, gender, race, ethnicity, family history of breast or ovarian cancer, known BRCA1/2 mutations), primary diagnosis and clinical staging information, surgery and pathologic staging, metastatic diagnosis and pathology, and treatment information are also provided.
|
|
|
NBIA v4 API Published for TCIA DICOM Data
A new v4 API has been published that consolidates all of our older endpoints, improves functionality, and fixes inconsistencies in both parameter names and field names in the returned data. This update also reflects the fact that we no longer directly host controlled-access datasets, and have removed all login and token-related functionality. Full documentation for the latest version of our API is found here. The older versions will be deprecated eventually, so we encourage developers to update their apps. Guidance on migrating can be found in this guide. We will provide at least 6 months' notice before disabling the older APIs.
|
|
|
|
|
|