CuMiDa: An Extensively Curated Microarray Database for Machine Learning

50 views
Skip to first unread message

Marcio Dorn

unread,
Jun 27, 2020, 6:26:07 PM6/27/20
to ml-...@googlegroups.com, Marcio Dorn
CuMiDa: An Extensively Curated Microarray Database

http://sbcb.inf.ufrgs.br/cumida

Here we present the Curated Microarray Database (CuMiDa), a repository
containing 78 handpicked cancer microarray datasets, extensively
curated from 30.000 studies from the Gene Expression Omnibus (GEO),
solely for machine learning. The aim of CuMiDa is to offer homogeneous
and state-of-the-art biological preprocessing of these datasets,
together with numerous 3-fold cross validation benchmark results to
propel machine learning studies focused on cancer research. The
database make available various download options to be employed by
other programs, as well for PCA and t-SNE results. CuMiDa stands
different from existing databases for offering newer datasets,
manually and carefully curated, from samples quality, unwanted probes,
background correction and normalization, to create a more reliable
source of data for computational research.

FELTES, B. C. ; CHANDELIER, E. B. ; GRISCI, B. I. ; DORN, M . CuMiDa:
An Extensively Curated Microarray Database for Benchmarking and
Testing of Machine Learning Approaches in Cancer Research. Journal of
Computational Biology, v. 26, p. 1, 2019. DOI:
http://dx.doi.org/10.1089/cmb.2018.0238


--
Prof. Dr. Márcio Dorn
Federal University of Rio Grande do Sul, Institute of Informatics
Structural Bioinformatics and Computational Biology Lab - SBCB
Av. Bento Gonçalves 9500,
91501-970 - Porto Alegre, RS - Brasil
Prédio 72 Sala 217
Tel: +55 51 3308-6824
Lattes CV: http://lattes.cnpq.br/6355224981962273
Reply all
Reply to author
Forward
0 new messages