Report of the NIGMS Ad Hoc Bioinformatics Training Committee

March 22, 1999


Introduction

A November 1998 meeting to discuss the future of research training at the National Institute of General Medical Sciences (NIGMS) identified bioinformatics and computational biology as key areas requiring the Institute's attention. In response to this suggestion, a committee was formed with the objective of providing advice to NIGMS regarding the formulation of a program announcement for training in the area of bioinformatics and computational biology.

With the explosion of biological data from experimental sources and with the maturation of computational capabilities for large-scale analysis, it is necessary to consider training a cadre of scientists whose primary professional identification and disciplinary affiliation is in bioinformatics or computational biology. A precise definition of bioinformatics and computational biology is the subject of debate, but the first recommendation of the committee was to define these terms broadly to include the use of theory, computer implementation, and application to the full spectrum of basic research in the biomedical sciences. The terms thus include analysis of molecular sequence, structure, molecular function, cellular function, physiology, genomics, and genetics (extending to computational modeling of complex phenomena such as neural circuits and equilibrium phenomena, population genetics, theoretical and mathematical biology, and analysis of complex biological systems).

The goal of NIGMS should be to help train scientists in the background theory, computational implementation, and biological application of the information sciences (including computer science, statistics, mathematics, and others) to problems of relevance to the mission of NIGMS and the National Institutes of Health (NIH). In particular, the committee identified multi-scale (different levels of abstraction) and large-scale (data-intensive) problems in biology to be of particular concern in light of the emerging sources of biological data. Since this mission is not covered explicitly by any of the existing NIGMS-supported training programs, the committee recommends the establishment of a program aimed at training a new class of scientist who has a primary identity as a computational biologist/bioinformatician and whose disciplinary core draws from an emerging set of principles of how to compute with biological data. A critical corollary to this goal is that NIGMS should seek to bring new students into biological research with its efforts, not simply provide additional funding mechanisms for students who are already drawn to training in biomedical sciences.


Mechanism of Support

The committee felt that a new program should provide primarily for institutional training grants targeting predoctoral students. The task of training these young scientists will typically involve a concerted effort to create new programs for didactic training, mentorship, and institutional support. Thus, a model of supplementing existing programs (while certainly feasible for institutions that have reached the appropriate level of maturity in their biocomputing efforts) would not, in general, be able to effectively create an environment to support the idiosyncratic needs of students entering this new interdiscipline.

The committee recognized that an effective predoctoral training program would also provide a critical infrastructure for the training of postdoctoral students making a transition into bioinformatics from either the biological side or the computer science side. Such students will require structured training in order to effect the most successful transition. This category of postdoctoral students was distinguished by the committee from students with sound and fundamental training in biocomputing who could compete successfully for individual postdoctoral fellowships.


What Departments Should Typically Participate

The committee recognized that successful training programs would involve faculty members from a wide array of departments, and that it would be crucial for new programs to mix faculty from the traditional NIGMS-supported investigators in departments of biology, biochemistry, developmental biology, genetics, pharmacology, etc., with faculty from the contributing computational disciplines of computer science, statistics, biostatistics, mathematics, engineering, and others. The key common feature of faculty members should be an interest in developing theory, creating implementations, and/or applying new methods for storing, retrieving, and analyzing biological information.


Particular Challenges

The committee recognized that there are particular challenges in combining disciplines with very different academic cultures. NIGMS should support programs that show evidence of some infrastructure for the interaction of biologists and computational scientists. Evidence for this infrastructure can include co-authored publications, collaborative research projects, service together on dissertation committees, regular interactions in journal clubs or seminar series, and other modalities. The committee recognized that these efforts may have different levels of maturity at participating institutions and urged NIGMS to consider all levels of maturity, as well as ways to create measured responses to assist and catalyze the further maturation of such connections.

A particular challenge to these training programs would be the heterogeneous nature of incoming students. The committee felt it would be important for candidate programs to generate at least two scenarios for student success: one in which a well-trained biology undergraduate enters bioinformatics and computational biology, and one in which a well-trained mathematics or computer science undergraduate enters the bioinformatics field. The scenarios should show how these different students would move through a training program and acquire the skills necessary (in a timely manner) to be independent bioinformatics professionals or computational biologists. It may be that other scenarios would be useful to generate in terms of creating a training environment that can embrace a range of student skills upon entry.

A successful training program should also have a plan for addressing the problem of multiple additive requirements in bioinformatics training. It is not acceptable for a program to simply prescribe a full set of requirements in biological science as well as in computer science. Instead, it is crucial to identify the key contributing ideas and skills from these two areas and remove some of the less relevant requirements. This is a difficult task, and the committee urged NIGMS not to expect these problems to be solved initially in most programs. Instead, it urged NIGMS to make sure that there is a mechanism to address and refine the training requirements. The committee recognized that the mean length of time to degree might initially be somewhat longer for these students, although successful programs should be able to reduce this over time, as the key requirements are identified.

The committee felt that successful bioinformatics and computational biology training programs would be part of a larger institutional commitment to training and research in these areas. Thus, successful programs should have statements from the appropriate deans' offices outlining how the bioinformatics and computational biology training programs fit within the broader mission of the institution with respect to faculty and course development in this area; creating centers of excellence; and integrating undergraduate, graduate, professional, and postdoctoral training in the institution. Although undergraduate training should not be a primary mission of NIGMS training programs, the committee felt that the establishment of a bioinformatics and computational biology training program at NIH would have the salutary indirect effect of encouraging undergraduates to prepare themselves for entry into this field and would create an infrastructure of courses, seminars, and faculty research programs that would benefit undergraduate education in the institutions positioned to do this.


Necessary and Important Programmatic Features

The committee recognized that individual institutions would be positioned to respond in different ways to the opportunities presented by an NIGMS training program in bioinformatics and computational biology. However, in addition to the concerns presented above, the committee felt that it would be particularly important for successful training programs to address the following features in order to bridge these disciplines:

1. Research rotations in both computational and biological domains. A major goal of this new training effort is to have scientists who are conversant in experimental biology as well as in theoretical biology, computational implementation, and application of new methodologies. This may lead some institutions to have training programs in which students with a predominant biology background rotate in quantitative/computer science laboratories, and in which students with predominant computational training experience rotate in biology laboratories. The goal of these experiences is to expose students to the reality of daily life in these very different research environments, to ensure that students understand the special features of biological data that must be considered when applying informatics techniques to them, and to offer students the broadest choice of thesis laboratory options.

2. Emphasis on problem solving. It is clear that independent scientists in the area of bioinformatics and computational biology need to be able to bring together knowledge from disparate domains in order to solve important problems. The committee felt that problem-based learning may represent an important component of training.

3. Joint mentorship. Many institutions do not yet have faculty members who identify themselves as being a member of the field of bioinformatics or computational biology. Such institutions may have to make arrangements for joint mentorship of students in these fields. The committee felt that a variety of creative solutions could be proposed by institutions, taking advantage of the specific administrative and academic options available locally. One opportunity in this regard is to encourage bioinformatics and computational biology trainees to form thesis advisory committees much earlier in their graduate careers than is sometimes done in more traditional disciplines. This could be through formal dissertation committees or through separate processes established particularly to address this concern for bioinformatics and computational biology training grant administration.

4. Seminar series. It is clearly critical to have forums in which predoctoral and postdoctoral trainees in bioinformatics and computational biology can interact, be exposed to visiting scholars, and develop an internal sense of the identity of their field--including its primary open challenges and problems for which satisfactory solutions exist.

5. Courses. It seems desirable to create courses that expose student trainees to the basic concepts in computational biology and bioinformatics. The list of such concepts is subject to some debate, but proposals exist (at least de facto in the contents of current textbooks on bioinformatics and computational biology), and it is incumbent upon successful training programs to define a core set of concepts that graduating students will master, even if their research projects are highly specialized. Such courses might include basic concepts in molecular biology, genetics, and computer algorithms and databases, especially with respect to algorithms developed and widely used in computational biology and bioinformatics (e.g., dynamic programming for sequence alignment, methods for comparing 3-D structures, simulation of complex systems, etc.).

6. Internships. The committee recognized that one mechanism for exposing students to new fields is to allow "super rotations" or internships in which the students are immersed for a semester or summer in a new research environment. Various institutions may have different opportunities for such internships within an academic or industrial setting, and these should be considered.

7. Industrial involvement. The role of industry in creating a market for bioinformatics and computational biology professionals cannot be denied. Industrial affiliate programs offer an opportunity to further enhance the training of students through exposure to industrial R&D efforts and internships.

8. Academic and career advising for trainees. Advising will be a critical factor in the success of training programs, since these programs will often be creating the first generation of students formally trained in computational biology and bioinformatics. The special considerations involved in deciding between careers in industry and academics should be explicitly discussed. Skills for obtaining research funding should be imparted, as well as a sense of the institutions that affect research policy and funding for bioinformatics and computational biology research.

9. Training in methods of teaching and pedagogy. The committee felt that it was critical to specifically encourage the training of bioinformatics and computational biology students in methods of teaching and pedagogy. The current shortage of a well-trained work force in this area means that trainees are likely to have significant teaching responsibilities in both industry and academia as they get jobs after graduation.

The general sense of the committee was that successful programs for training in bioinformatics and computational biology would create an environment in which the key element is balance. The balance must exist at many levels, including the balance of interests within the program and the balance in the background of the trainees. The committee recognized that the most important balance to be achieved is in the training of individuals to have a balanced understanding of biological and computational disciplines and the relationship between them. The challenge to a successful training program, therefore, will be to take students with an imbalance in their prior training and create a more balanced scientist.


Postdoctoral Training

The committee recognized that there are two types of postdoctoral fellows in the context of biocomputation. The first are those with training in bioinformatics and computational biology who have the skills and knowledge necessary to be productive research colleagues. The second are those with solid training in one contributing discipline (biology or computer science) but with significant needs in the other for further training in order to become independent investigators with a primary professional affiliation in bioinformatics and computational biology.

The committee urged NIGMS to consider ways to use mechanisms of predoctoral training to also allow postdoctoral students with a need for further training to gain this training. The provision of additional training grant slots from NIGMS to allow these students to spend time in courses and other training venues would be an appropriate leverage of a high-quality predoctoral training infrastructure. The committee urged NIGMS to create incentives to allow programs to identify and recruit postdoctoral fellows in need of further cross-training, but to allow these slots to be used for predoctoral students if such fellows were not available during any particular training year.

NIGMS already has individual postdoctoral programs that would be suitable for the class of postdoctoral fellows with adequate training in bioinformatics and computational biology. The committee encouraged NIGMS to use creative methods (perhaps borrowed from the Sloan Foundation) to publicize and encourage high-quality applications.


Professional Master's Programs

The committee considered the issue of professional master's programs and their possible relationship to an NIGMS training initiative. It is clear that a market exists for terminal master's degrees in bioinformatics and computational biology, in that there are students who are willing to pay for such well-defined training (typically 2 years) and companies anxious to hire them. There are numerous examples of professional master's degrees that represent a valued "license to practice" at high levels of productivity, including traditional engineering, computer science, and business master's degrees.

As such, the committee felt the role of NIGMS was not to become directly involved in the creation or support of such efforts. The committee did recognize, however, that some of the infrastructure that would be created with a predoctoral training program initiative might also facilitate the creation of professional master's programs. Indeed, the dynamics of the market for students with skills in bioinformatics and computational biology need to be carefully considered by institutions proposing training programs. It may not be appropriate for a training program to be set up in a manner that allows students to easily transition from NIGMS-funded training slots to master's degrees and industry. It may be that different requirements and curricula need to be defined so that talented master's students could transition into NIGMS-funded spots and NIGMS-funded students could not easily transition to terminal master's degrees.


Roster

Russ B. Altman, M.D., Ph.D. (chair)
Stanford Informatics
251 Campus Drive, MSOB X-215
Stanford University
Stanford, CA 94305-5479
Tel: (650) 725-3394
Fax: (650) 725-7944
altman@camis.stanford.edu

Dan Gusfield, Ph.D.
Department of Computer Science
University of California, Davis
Davis, CA 95616
Tel: (530) 752-7131
Fax: (530) 752-4767
gusfield@cs.ucdavis.edu

Susan A. Henry, Ph.D.
Department of Biological Sciences
Carnegie Mellon University
4400 5th Avenue
Pittsburgh, PA 15213
Tel: (412) 268-5124
Fax: (412) 268-3268
sh4b@andrew.cmu.edu

Leslie M. Loew, Ph.D.
Department of Physiology
University of Connecticut Health Center
263 Farmington Avenue
Farmington, CT 06030
Tel: (860) 679-3568
Fax: (860) 679-1269
les@volt.uchc.edu

Lawrence Schramm, Ph.D.
Biomedical Engineering
Johns Hopkins University
School of Medicine
Room 606, Traylor Building
Baltimore, MD 21205
Tel: (410) 955-3026
Fax: (410) 955-9826
lschramm@bme.jhu.edu

Gary D. Stormo, Ph.D.
Molecular, Cellular, and Developmental Biology
University of Colorado
Campus Box 347
Boulder, CO 80309-0347
Tel: (303) 492-1476
Fax: (303) 492-7744
Gary.stormo@colorado.edu