A November 1998 meeting to discuss the future of research training at the National Institute of General Medical Sciences (NIGMS) identified bioinformatics and computational biology as key areas requiring the Institute's attention. In response to this suggestion, a committee was formed with the objective of providing advice to NIGMS regarding the formulation of a program announcement for training in the area of bioinformatics and computational biology.
With the explosion of biological data from experimental sources and with the maturation of computational capabilities for large-scale analysis, it is necessary to consider training a cadre of scientists whose primary professional identification and disciplinary affiliation is in bioinformatics or computational biology. A precise definition of bioinformatics and computational biology is the subject of debate, but the first recommendation of the committee was to define these terms broadly to include the use of theory, computer implementation, and application to the full spectrum of basic research in the biomedical sciences. The terms thus include analysis of molecular sequence, structure, molecular function, cellular function, physiology, genomics, and genetics (extending to computational modeling of complex phenomena such as neural circuits and equilibrium phenomena, population genetics, theoretical and mathematical biology, and analysis of complex biological systems).
The goal of NIGMS should be to help train scientists in the background theory, computational implementation, and biological application of the information sciences (including computer science, statistics, mathematics, and others) to problems of relevance to the mission of NIGMS and the National Institutes of Health (NIH). In particular, the committee identified multi-scale (different levels of abstraction) and large-scale (data-intensive) problems in biology to be of particular concern in light of the emerging sources of biological data. Since this mission is not covered explicitly by any of the existing NIGMS-supported training programs, the committee recommends the establishment of a program aimed at training a new class of scientist who has a primary identity as a computational biologist/bioinformatician and whose disciplinary core draws from an emerging set of principles of how to compute with biological data. A critical corollary to this goal is that NIGMS should seek to bring new students into biological research with its efforts, not simply provide additional funding mechanisms for students who are already drawn to training in biomedical sciences.
The committee felt that a new program should provide primarily for institutional training grants targeting predoctoral students. The task of training these young scientists will typically involve a concerted effort to create new programs for didactic training, mentorship, and institutional support. Thus, a model of supplementing existing programs (while certainly feasible for institutions that have reached the appropriate level of maturity in their biocomputing efforts) would not, in general, be able to effectively create an environment to support the idiosyncratic needs of students entering this new interdiscipline.
The committee recognized that an effective predoctoral training program would also provide a critical infrastructure for the training of postdoctoral students making a transition into bioinformatics from either the biological side or the computer science side. Such students will require structured training in order to effect the most successful transition. This category of postdoctoral students was distinguished by the committee from students with sound and fundamental training in biocomputing who could compete successfully for individual postdoctoral fellowships.
The committee recognized that successful training programs would involve faculty members from a wide array of departments, and that it would be crucial for new programs to mix faculty from the traditional NIGMS-supported investigators in departments of biology, biochemistry, developmental biology, genetics, pharmacology, etc., with faculty from the contributing computational disciplines of computer science, statistics, biostatistics, mathematics, engineering, and others. The key common feature of faculty members should be an interest in developing theory, creating implementations, and/or applying new methods for storing, retrieving, and analyzing biological information.
The committee recognized that there are particular challenges in combining disciplines with very different academic cultures. NIGMS should support programs that show evidence of some infrastructure for the interaction of biologists and computational scientists. Evidence for this infrastructure can include co-authored publications, collaborative research projects, service together on dissertation committees, regular interactions in journal clubs or seminar series, and other modalities. The committee recognized that these efforts may have different levels of maturity at participating institutions and urged NIGMS to consider all levels of maturity, as well as ways to create measured responses to assist and catalyze the further maturation of such connections.
A particular challenge to these training programs would be the heterogeneous nature of incoming students. The committee felt it would be important for candidate programs to generate at least two scenarios for student success: one in which a well-trained biology undergraduate enters bioinformatics and computational biology, and one in which a well-trained mathematics or computer science undergraduate enters the bioinformatics field. The scenarios should show how these different students would move through a training program and acquire the skills necessary (in a timely manner) to be independent bioinformatics professionals or computational biologists. It may be that other scenarios would be useful to generate in terms of creating a training environment that can embrace a range of student skills upon entry.
A successful training program should also have a plan for addressing the problem of multiple additive requirements in bioinformatics training. It is not acceptable for a program to simply prescribe a full set of requirements in biological science as well as in computer science. Instead, it is crucial to identify the key contributing ideas and skills from these two areas and remove some of the less relevant requirements. This is a difficult task, and the committee urged NIGMS not to expect these problems to be solved initially in most programs. Instead, it urged NIGMS to make sure that there is a mechanism to address and refine the training requirements. The committee recognized that the mean length of time to degree might initially be somewhat longer for these students, although successful programs should be able to reduce this over time, as the key requirements are identified.
The committee felt that successful bioinformatics and computational biology training programs would be part of a larger institutional commitment to training and research in these areas. Thus, successful programs should have statements from the appropriate deans' offices outlining how the bioinformatics and computational biology training programs fit within the broader mission of the institution with respect to faculty and course development in this area; creating centers of excellence; and integrating undergraduate, graduate, professional, and postdoctoral training in the institution. Although undergraduate training should not be a primary mission of NIGMS training programs, the committee felt that the establishment of a bioinformatics and computational biology training program at NIH would have the salutary indirect effect of encouraging undergraduates to prepare themselves for entry into this field and would create an infrastructure of courses, seminars, and faculty research programs that would benefit undergraduate education in the institutions positioned to do this.
The committee recognized that individual institutions would be positioned to respond in different ways to the opportunities presented by an NIGMS training program in bioinformatics and computational biology. However, in addition to the concerns presented above, the committee felt that it would be particularly important for successful training programs to address the following features in order to bridge these disciplines:
The general sense of the committee was that successful programs for training in bioinformatics and computational biology would create an environment in which the key element is balance. The balance must exist at many levels, including the balance of interests within the program and the balance in the background of the trainees. The committee recognized that the most important balance to be achieved is in the training of individuals to have a balanced understanding of biological and computational disciplines and the relationship between them. The challenge to a successful training program, therefore, will be to take students with an imbalance in their prior training and create a more balanced scientist.
The committee recognized that there are two types of postdoctoral fellows in the context of biocomputation. The first are those with training in bioinformatics and computational biology who have the skills and knowledge necessary to be productive research colleagues. The second are those with solid training in one contributing discipline (biology or computer science) but with significant needs in the other for further training in order to become independent investigators with a primary professional affiliation in bioinformatics and computational biology.
The committee urged NIGMS to consider ways to use mechanisms of predoctoral training to also allow postdoctoral students with a need for further training to gain this training. The provision of additional training grant slots from NIGMS to allow these students to spend time in courses and other training venues would be an appropriate leverage of a high-quality predoctoral training infrastructure. The committee urged NIGMS to create incentives to allow programs to identify and recruit postdoctoral fellows in need of further cross-training, but to allow these slots to be used for predoctoral students if such fellows were not available during any particular training year.
NIGMS already has individual postdoctoral programs that would be suitable for the class of postdoctoral fellows with adequate training in bioinformatics and computational biology. The committee encouraged NIGMS to use creative methods (perhaps borrowed from the Sloan Foundation) to publicize and encourage high-quality applications.
The committee considered the issue of professional master's programs and their possible relationship to an NIGMS training initiative. It is clear that a market exists for terminal master's degrees in bioinformatics and computational biology, in that there are students who are willing to pay for such well-defined training (typically 2 years) and companies anxious to hire them. There are numerous examples of professional master's degrees that represent a valued "license to practice" at high levels of productivity, including traditional engineering, computer science, and business master's degrees.
As such, the committee felt the role of NIGMS was not to become directly involved in the creation or support of such efforts. The committee did recognize, however, that some of the infrastructure that would be created with a predoctoral training program initiative might also facilitate the creation of professional master's programs. Indeed, the dynamics of the market for students with skills in bioinformatics and computational biology need to be carefully considered by institutions proposing training programs. It may not be appropriate for a training program to be set up in a manner that allows students to easily transition from NIGMS-funded training slots to master's degrees and industry. It may be that different requirements and curricula need to be defined so that talented master's students could transition into NIGMS-funded spots and NIGMS-funded students could not easily transition to terminal master's degrees.
Russ B. Altman, M.D., Ph.D. (chair)Stanford Informatics251 Campus Drive, MSOB X-215Stanford UniversityStanford, CA 94305-5479Tel: (650) 725-3394Fax: (650) 725-7944altman@camis.stanford.edu
Dan Gusfield, Ph.D.Department of Computer ScienceUniversity of California, DavisDavis, CA 95616Tel: (530) 752-7131Fax: (530) 752-4767gusfield@cs.ucdavis.edu
Susan A. Henry, Ph.D.Department of Biological SciencesCarnegie Mellon University4400 5th AvenuePittsburgh, PA 15213Tel: (412) 268-5124Fax: (412) 268-3268sh4b@andrew.cmu.edu
Leslie M. Loew, Ph.D.Department of PhysiologyUniversity of Connecticut Health Center263 Farmington AvenueFarmington, CT 06030Tel: (860) 679-3568Fax: (860) 679-1269les@volt.uchc.edu
Lawrence Schramm, Ph.D.Biomedical EngineeringJohns Hopkins UniversitySchool of MedicineRoom 606, Traylor BuildingBaltimore, MD 21205Tel: (410) 955-3026Fax: (410) 955-9826lschramm@bme.jhu.edu
Gary D. Stormo, Ph.D.Molecular, Cellular, and Developmental BiologyUniversity of ColoradoCampus Box 347Boulder, CO 80309-0347Tel: (303) 492-1476Fax: (303) 492-7744Gary.stormo@colorado.edu