Clearinghouse for Training Modules to Enhance Data Reproducibility

In January 2014, NIH launched a series of initiatives to enhance rigor and reproducibility in research. As a part of this initiative, NIGMS, along with nine other NIH institutes and centers, issued a funding opportunity announcement (FOA) RFA-GM-15-006 to develop, pilot and disseminate training modules to enhance data reproducibility. This FOA was reissued in 2018 (RFA-GM-18-002).

For the benefit of the scientific community, we will be posting the products of grants funded by these FOAs on this website as they become available. In addition, we are sharing here other relevant training modules developed, including courses developed from administrative supplements to NIGMS predoctoral T32 grants.

NIH Rigor and Reproducibility Training Modules

These modules, developed by NIH, focus on integral aspects of rigor and reproducibility in the research endeavor, such as bias, blinding and exclusion criteria. The modules are not meant to be comprehensive, but rather are intended as a foundation to build on and a way to stimulate conversations, which may be facilitated by the accompanying discussion materials. Currently, the modules are being integrated into NIH training activities.

Let's Experiment: A Guide for Scientists Working at the Bench

iBiology, R25 GM116704
screenshot of slide with course title. 

"Let's Experiment" is a FREE 6-week online course designed for students and practitioners of experimental biology. Scientists from a variety of backgrounds give concrete steps and advice to help you build a framework for how to design experiments. We use case studies to make the abstract more tangible. In science, there is often no simple right answer. However, you can develop a general approach to experimental design and understand what you are getting into before you begin.

By the end of this course, you will have:

  • A detailed plan for your experiment(s) that you can discuss with a mentor.
  • A flowchart for how to prioritize experiments.
  • A lab notebook template that is so impressively organized, it will make your colleagues envious.
  • A framework for doing rigorous research.

Access the course Link to external web site

NIH Office of Disease Prevention (ODP) Course on Pragmatic and Group-Randomized Trials in Public Health and Medicine


This 7-part online course aims to help researchers design and analyze group-randomized trials (GRTs). It includes video presentations, slide sets, suggested reading materials, and guided activities. The course is presented by ODP's Director, Dr. David M. Murray.

Access the course and related materials

Controls in Animal Studies for Rigor and Reproducibility

Christina N. Bennett and Marsha Lakes Matyas, American Physiological Society, R25 GM116166-02

This teaching module was designed to help biomedical researchers understand the changing standards of practice for studies using animals as research models. This module is comprised of three sections that focus on developing strong skills in designing animal studies, analyzing results from those studies, and reporting findings that are reproducible. Modules are designed to be used by higher education institutions, laboratory groups, individuals, and professional societies.

Access the course and related materials Link to external web site


Lesson 1: Experimental Design
The Experimental Design lesson focuses on the experimental details needed to support the design and implementation of a well-controlled animal study. Topics of discussion include: 1) Considerations for Designing an Animal Study, 2) Calculating the Number of Animals to Use in a Study, and 3) Benefits of Consulting with a Statistician before Starting the Study. Each lesson includes activities to develop skills on the topics presented and resources to use when planning future studies.

Lesson 2: Analyzing Results
The Analyzing Results lesson focuses on how to assess datasets on which article figures are based and consider whether the information in the dataset is sufficient for creating an interpretable figure. Topics of discussion include: 1) Four Steps to Analyzing Results, 2) Dealing with Outliers, and 3) Data Sharing and Storage. Each lesson includes activities to develop skills on the topics presented and resources to use when planning future studies.

Lesson 3: Reporting Results
​ The Reporting Results lesson focuses on the details of an animal experiment to be reported in a manuscript. Topics of discussion include: 1) Utilizing the Methods Section, 2) Reporting Results Using Text, Figures, Tables, and Legends, and 3) Addressing Challenges to Reporting Results. Each lesson includes activities to develop skills on the topics presented and resources to use when planning future studies.

Statistical Topics for Reproducible Animal Research

Andrew W Brown and David B Allison, Indiana University School of Public Health-Bloomington; Tapan Mehta and Stephen Watts, University of Alabama at Birmingham - R25 GM116167
Logo of Stats in the Lab. 

Preclinical research involving animal models can be improved when appropriate experimental, analytical, and reporting practices are used. We produced a series of animated vignettes with quantitative experts and laboratory scientists discussing aspects of study design, interpretation, and reporting. Each vignette introduces viewers to key concepts that can stimulate the important conversations needed between quantitative experts and laboratory scientists to enhance rigor, reproducibility, and transparency in pre-clinical research.

Access the vignettes Link to external web site

The BD2K Guide to the Fundamentals of Data Science Series

Arthrobacter arilaitensis Re117 genome atlas. Credit: Wikimedia Commons. 
Arthrobacter arilaitensis Re117 genome atlas. Credit: Wikimedia Commons.

The Big Data to Knowledge (BD2K) Initiative presents this virtual lecture series on the data science underlying modern biomedical research. Since its beginning in September 2016, the webinar series consists of presentations from experts across the country covering the basics of data management, representation, computation, statistical inference, data modeling, and other topics relevant to "big data" in biomedicine. The webinar series provides essential training suitable for individuals at an introductory overview level. All video presentations from the seminar series are streamed for live viewing, recorded, and posted online for future viewing and reference. These videos are also indexed as part of TCC's Educational Resource Discovery Index (ERuDIte). This webinar series is a collaboration between the TCC (BD2K Training Coordinating Center), the NIH Office of the Associate Director for Data Science, and BD2K Centers Coordination Center (BD2KCCC).​

View archived seminars Link to external web site

Cell Line Authentication Training

Leonard Freedman, Global Biological Standards Institute® (GBSI), R25 GM116155
Multiphoton fluorescence image of cultured HeLa cells with a fluorescent protein targeted to the Golgi apparatus (orange), microtubules (green) and counterstained for DNA (cyan).  Credit: National Center for Microscopy and Imaging Research  
Multiphoton fluorescence image of cultured HeLa cells with a fluorescent protein targeted to the Golgi apparatus (orange), microtubules (green) and counterstained for DNA (cyan). Credit: National Center for Microscopy and Imaging Research.

GBSI and its partners have developed an exportable "active learning" training module to reduce cell line misidentification, mislabeling, and contamination. This module contains highly interactive training units, including "back to the lab" exercises that will turn learning into practice by sending the trainees back into the laboratory to practice their skills. The importance of cell line authentication when using cultured lines will improve the credibility, reproducibility, and translation of preclinical research.

Access the course and related materials Link to external web site

Improving reproducibility of computational microbiome analyses

Patrick Schloss, University of Michigan School of Medicine, R25 GM116149

A series of 14 tutorials on improving the reproducibility of data analysis for those doing microbial ecology research. Although the materials focus on issues in microbiome research, the principles are broadly applicable to other areas of microbiology and science. This series of lessons will focus on the importance of command line practices (e.g. bash), scripting languages (e.g. mothur, R), version control (e.g. git), automation (e.g. make), and literate programming (e.g. ​markdown). These are the tools that are used by a growing number of microbiome researchers to help improve the reproducibility of their research. By completing the activities in the tutorials, you will be listed on the Reproducible Research Tutorial Honor Roll, which provides a certification of your training.

Access the Tutorials Link to external web site

Improving Reproducibility in Research

Aaron Carroll, Indiana University School of Medicine, R25 GM116146

In order to promote better training and ensure the reliability and reproducibility of research, we developed a series of webisodes (thematically related online videos) targeted at graduate students, postdoctoral fellows, and beginning investigators that will address critical features of experimental design and analysis/reporting.


Module 1: Experimental Design Learning Module Link to External website
The Experimental Design Learning Module focuses on the intricacies of designing research that is robust, with an eye towards making it reproducible. It is comprised of four distinct learning units: 1) Replication, 2) Randomization, 3) Pitfalls with Experimental Design, and 4) Measurement. Each of these learning units has one or more sub-topics that is the subject of an individual webisode.

Module 2: Analysis/Reporting Learning Module Link to External website
The Analysis/Reporting Learning Module covers the various factors that are critical to writing about research with enough clarity to ensure its reproducibility. Very few researchers are given formal education on how to properly report findings to support reproducibility. It is comprised of three distinct learning units: 1) Power and P-values, 2) Scientific Writing, 3) The Review Process. Each of these units has one or more sub-topics that are the subject of an individual webisode.

Society for Neuroscience Rigor and Reproducibility Training Webinars

Manny DiCiccio-Bloom, Rutgers University; Cheryl Sisk, Michigan State University, R25 DA041326
Webinar 1: Improving Experimental Rigor and Enhancing Data Reproducibility in Neuroscience

The topics of scientific rigor and data reproducibility have been increasingly covered in the scientific and mainstream media, and they are being addressed by publishers, professional organizations and funding agencies. This webinar addresses topics of scientific rigor as they pertain to preclinical neuroscience research.

Webinar 2: Minimizing Bias in Experimental Design and Execution

Investigations into the lack of reproducibility in preclinical research often identify unintended biases in experimental planning and execution. This webinar covers random sampling, blinding and balancing experiments to avoid sources of bias.

Webinar 3: Best Practices in Post-Experimental Data Analysis

Proper data handling standards, including appropriate use of statistical tests, are integral to rigorous and reproducible neuroscience research. Training in quantitative neuroscience is a specific area of emphasis for the BRAIN Initiative, and rigorous statistical analysis methods are included in the recent Proposed Principals and Guidelines for Reporting Preclinical Research [PDF, 69KB]. This webinar covers best practices in post-experimental data analysis.

Webinar 4: Best Practices in Data Management and Reporting

Efforts to enhance scientific rigor, reproducibility and robustness critically depend on archiving and retrieving experimental records, protocols, primary data and subsequent analyses. In this webinar, presenters discuss best practices and challenges for data management and reporting, particularly when dealing with information security and sensitive material; archiving and disclosure of pre- and post-hoc data analytics; and data management on multidisciplinary teams that include collaborators around the globe.

Webinar 5: Statistical Applications in Neuroscience 

How can neuroscientists improve their "statistical thinking" and make full and effective use of their data This webinar covers common applications of statistics in neuroscience, including the types of research questions statistics are best positioned to address, modeling paradigms and exploratory data analysis. The presenters also share examples and case studies from their research.

Webinar 6: Experimental Design to Minimize Systemic Biases: Lessons from Rodent Behavioral Assays and Electrophysiology Studies

Common sources of bias in animal behavior and electrophysiology experiments can be minimized or avoided by following best practices of unbiased experimental design and data analysis and interpretation. In this webinar, presenters discuss experimental design and hypothesis testing for mouse behavioral assays, as well as sampling, interpretational bias and referencing in in vitro and in vivo electrophysiology recording studies.

Workshop 7: Tackling Challenges in Scientific Rigor: The (Sometimes) Messy Reality of Scie​nce

This webinar explores practical examples of the challenges and solutions in conducting rigorous science from neuroscientists at various career stages. It focuses on development of the interpersonal, scientific and technical skills needed to address various issues in scientific rigor, such as what to do when you can't replicate a published result, how to get support from a mentor and how to cope with various career pressures that might affect the quality of your science.

edX Course: Principles, Statistical and Computational Tools for Reproducible Science

Xihong Lin, Harvard School of Public Health, T32GM074897-12S1

Learn skills and tools that support data science and reproducible research to ensure you can trust your research results, reproduce them yourself, and communicate them to others.

This free course covers fundamentals of reproducible science, case studies, data provenance, statistical methods for reproducible science, computational tools for reproducible science, and reproducible reporting science. These concepts are intended to translate to fields throughout the data sciences: physical and life sciences, applied mathematics and statistics, and computing.

Consider this course a survey of best practices that will help you create an environment in which you can easily carry out reproducible research and integrate with similar situations for your collaborators and colleagues.

Access the course Link to External website