CuMiDa: An Extensively Curated Microarray Database
http://sbcb.inf.ufrgs.br/cumida
Here we present the Curated Microarray Database (CuMiDa), a repository
containing 78 handpicked cancer microarray datasets, extensively
curated from 30.000 studies from the Gene Expression Omnibus (GEO),
solely for machine learning. The aim of CuMiDa is to offer homogeneous
and state-of-the-art biological preprocessing of these datasets,
together with numerous 3-fold cross validation benchmark results to
propel machine learning studies focused on cancer research. The
database make available various download options to be employed by
other programs, as well for PCA and t-SNE results. CuMiDa stands
different from existing databases for offering newer datasets,
manually and carefully curated, from samples quality, unwanted probes,
background correction and normalization, to create a more reliable
source of data for computational research.
FELTES, B. C. ; CHANDELIER, E. B. ; GRISCI, B. I. ; DORN, M . CuMiDa:
An Extensively Curated Microarray Database for Benchmarking and
Testing of Machine Learning Approaches in Cancer Research. Journal of
Computational Biology, v. 26, p. 1, 2019. DOI:
http://dx.doi.org/10.1089/cmb.2018.0238
--
Prof. Dr. Márcio Dorn
Federal University of Rio Grande do Sul, Institute of Informatics
Structural Bioinformatics and Computational Biology Lab - SBCB
Av. Bento Gonçalves 9500,
91501-970 - Porto Alegre, RS - Brasil
Prédio 72 Sala 217
Tel:
+55 51 3308-6824
Lattes CV:
http://lattes.cnpq.br/6355224981962273