Data Management Text Book

0 views
Skip to first unread message

Latanya Hariri

unread,
Aug 5, 2024, 4:48:10 AM8/5/24
to waffcortiomint
Thelibrary helps campus research labs and centers manage and publish their research data. We've also worked with Information Management Systems & Services (IMSS) to put together a centralized Research Data FAQ.

The CaltechDATA repository offers standard data preservation and DOI (permanent identifier) services. We also offer services (at an additional cost) for preserving large volumes of data (> 500 GB). Contact us at da...@caltech.edu to discuss the options or read the CaltechDATA FAQ.


Funding agencies, such as the National Science Foundation or the National Institutes of Health have specific requirements and templates for Data Management Plans (DMP). We have many resources to help with the new NIH Data Management and Sharing Policy.


Caltech IMSS provides Box.com cloud storage, with 200 GB of storage for each community member and 1 TB per group. Additional storage is available for an extra cost. Box manages the storage, but you manage access to files. However, Box may not be fast or efficient enough for large amounts of data, and it has a max single file size of 50GB. If your data management needs are too large for Box, you may want to purchase local storage hardware such as a Network Attached Storage device or storage array. IMSS and the library can help you decide on what option is best for your needs (help.caltech.edu select IMSS/Data Storage & Backup or email da...@caltech.edu).


The Resnick High Performance Computing Center (HPC) cluster is an excellent option. Your calculations will run on a state-of-the-art resource at Caltech with local support. Your research group leader has to set up an account (hpc.caltech.edu/documentation/getting-started), and there is a charge depending on how much computing time you use. Groups get up to 30 TB of free data storage, although this storage is not backed up, so groups must store primary data elsewhere. National (off-campus) computing resources like ACCESS ( -ci.org/) are also available by application and can provide additional computing resources at no charge.


The CaltechDATA repository (data.caltech.edu) can accept software and even has an integration with GitHub to automatically preserve software releases. You can set up the integration following the instructions at or email your GitHub repository to da...@caltech.edu for a guided setup.


To share research data files you can use the file sharing options in Box.com, which also allows you to set a custom password for the files. Box.com is a complete cloud file service, so you can add collaborators that can access files with Box.com credentials. Unlike services like Dropbox, collaborators can store files in a shared folder using your institutional Box storage allocation.


Talk to the Institutional Review Board (IRB) about all data collection and storage plans for your project (irb.caltech.edu/). Box.com, SharePoint, and OneDrive are certified by IMSS for personal data covered by HIPPA or FERPA regulations.


The Earth System Data Management and Operations Group supports researchers and scientists around the world with advanced earth, ecological and environmental sciences data management and processing capabilities. The data management capabilities include state-of-the-art metadata workflows, processing, quality controls, archival and data discovery. Processing capabilities include world class high performance computational resources (i.e., Summit, CADES and CUMULUS), provided by the Oak Ridge Leadership Computing Facility, supporting large scale data analysis and visualization.


Currently, the group supports the Atmospheric Radiation Measurement (ARM) user facility and the U.S. Geological Survey (USGS) program. The measurements collected for the ARM Data Center (ADC) track various atmospheric phenomena, including radiation, cloud characteristics, and the concentration and movement of airborne particles. This center is a primary contributor to the ARM user facility, a DOE Office of Science user facility managed and operated through a collaborative effort led by nine DOE national laboratories.


This diverse group of software architects, engineers and developers strives to advance the user facility and data management capabilities through cutting edge research and engineering to enable advanced environmental research and science. These advancements support routine data products, value-added products, field campaign data, complementary external data products from collaborating programs and projects such as ARM and USGS. To assist our user communities with quality and visualization further capabilities include data quality reports, graphical displays of data availability/quality as well as plots. The group also translates this into citable data products and makes them available through the ARM data discovery tool.


Sharing scientific data accelerates biomedical research discovery, enhances research rigor and reproducibility, provides accessibility to high-value datasets, and promotes data reuse for future research studies. Ultimately, the sharing of scientific data expedites the translation of research results into knowledge, products, and procedures to improve human health.


NIH expects that data be made as widely and freely available as possible while safeguarding the privacy of participants and protecting confidential and proprietary data. Sharing is particularly important for unique data that cannot be readily replicated.


In their data sharing plans, applicants should propose the most appropriate means for sharing data according to the specifications of their research project and area of science, in compliance with policies and regulations governing research awards. Learn about methods for managing data and sharing data.


The 2003 Data Sharing Policy applies to final research data generated from grants, cooperative agreements, intramural research, contracts, or other funding agreements of $500,000 or more per year. See Research Covered Under the Data Management & Sharing Policies for more details.


When a Principal Investigator and their authorized institutional official sign the face page of an NIH application, they are assuring compliance with policies and regulations governing research awards. NIH expects grantees to follow these rules and to conduct the work described in the application. Thus, if an application describes a data sharing plan, NIH expects that plan to be enacted.


If progress has been made with the data sharing plan, then the grantee should note this in the progress report. In the final progress report, if not sooner, the grantee should note what steps have been taken with respect to the data sharing plan.


NIH has issued the Data Management and Sharing (DMS) policy (effective January 25, 2023) to promote the sharing of scientific data. Sharing scientific data accelerates biomedical research discovery, in part, by enabling validation of research results, providing accessibility tohigh-value datasets, and promoting data reuse for future research studies. Access the full text of the 2023 Final NIH Policy for Data Management & Sharing.


Prospectively planning for how scientific data will be managed and ultimately shared is a crucial first step in optimizing the reach of data generated from NIH-funded research. Investigators and institutions are encouraged to consider these crucial elements early in research planning.


Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.


These high-level FAIR Guiding Principles precede implementation choices, and do not suggest any specific technology, standard, or implementation-solution; moreover, the Principles are not, themselves, a standard or a specification. They act as a guide to data publishers and stewards to assist them in evaluating whether their particular implementation choices are rendering their digital research artefacts Findable, Accessible, Interoperable, and Reusable. We anticipate that these high level principles will enable a broad range of integrative and exploratory behaviours, based on a wide range of technology choices and implementations. Indeed, many repositories are already implementing various aspects of FAIR using a variety of technology choices and several examples are detailed in the next section; examples include Scientific Data itself and how narrative data articles are anchored to a progressively FAIR structured metadata.


The end result, when implemented, will be more rigorous management and stewardship of these valuable digital resources, to the benefit of the entire academic community. As stated at the outset, good data management and stewardship is not a goal in itself, but rather a pre-condition supporting knowledge discovery and innovation. Contemporary e-Science requires data to be Findable, Accessible, Interoperable, and Reusable in the long-term, and these objectives are rapidly becoming expectations of agencies and publishers. We demonstrate, therefore, that the FAIR Data Principles provide a set of mileposts for data producers and publishers. They guide the implementation of the most basic levels of good Data Management and Stewardship practice, thus helping researchers adhere to the expectations and requirements of their funding agencies. We call on all data producers and publishers to examine and implement these principles, and actively participate with the FAIR initiative by joining the Force11 working group. By working together towards shared, common goals, the valuable data produced by our community will gradually achieve the critical goals of FAIRness.


M.W. was the primary author of the manuscript, and participated extensively in the drafting and editing of the FAIR Principles. M.D. was significantly involved in the drafting of the FAIR Principles. B.M. conceived of the FAIR Data Initiative, contributed extensively to the drafting of the principles, and to this manuscript text. All other authors are listed alphabetically, and contributed to the manuscript either by their participation in the initial workshop and/or by editing or commenting on the manuscript text.

3a8082e126
Reply all
Reply to author
Forward
0 new messages