Workshop on Systems Biology and Complex Phenotypes

September 7-8, 2006
Bethesda, Maryland

This workshop on systems biology and complex phenotypes was convened by NIGMS with the goal of providing a context for NIGMS to use in evaluating the current state of systems biology as an approach to enhance our understanding of how genetic mechanisms function to determine complex phenotypes. This information will help NIGMS decide how to help move this critical new field forward.

The twelve people who came together for the workshop (roster below) had expertise in molecular, human, and statistical genetics; computational biology; engineering; and clinical research. Although their professional roles and initial views on the definition of systems biology reflected different orientations, a consensus was quickly reached that systems biology is both a concept and an approach with potentially great utility for the understanding of biology at all levels, but that it is still not sharply defined. It was accepted that the utility of systems biology depends on the production of high quality quantitative data, the establishment of testable, mechanistic models, and the ability to construct global theories that predict observable, real world phenomena.

During the course of the discussion three levels of scientific inquiry were used in a general schema for defining applications of systems biology in the effort to understand how complex phenotypes are produced:

  • WHAT is the system composed of? This is the level of the ultimate parts list, including the chromosomal DNA sequence with allelic variants that specifies the control signals, RNAs, and proteins that function in the system.
  • HOW do the components interact? The collection of structured experimental data related to the mechanism of gene-gene and gene-environment interactions ("networks") should lead to the formulation of a mechanistic model specifying how the parts interact globally at the molecule, cell, and organism levels at different developmental times and in response to various stimuli. The prime criterion for a model is its ability to promote testable predictions.
  • WHY do the components interact in the particular manner that we observe? This will involve producing a theory to encompass complexity and empower general predictions on how the organism has evolved to be able to develop, reproduce, and cope with its environment. This also encompasses an understanding of the design principles.

The systems biology enterprise at this time takes place at the HOW level with modeling as its definitive activity. The ultimate goal of systems biology is to arrive at a theory that explains WHY mechanisms and processes take the form that they do.

It was made clear early on, and was strongly reiterated by all participants, that no phenotype at the organism level, even for conditions associated with what were once referred to as simple Mendelian disorders, is completely determined by the action of a particular allele at a single gene locus. Phenotypes are produced by the interaction of constellations of genes, perhaps at discrete chromosomal regions in some cases, by the coordinated function of multiple genes throughout a genome. These genetic modules, not individual genes, are the determinants of phenotype.

At this time, however, no robust theories explain how genes interact. For the most part, there are only simple observations that the inactivation of one gene results in the compromised function of other genes in a network. Discussion of this problem brought out an intellectual tension between those in the room who depend on genetic approaches and those who manipulate experimental systems with engineering approaches, and between reductionists and those who take a more holistic view.

For the human geneticist, the approach to understanding this complexity involves elucidating the "genetic architecture" of a complex phenotypic trait. This involves identifying all the genes and their alleles with pleiotropic manifestations that can interact in an information-driven network affected by environmental and stochastic factors. With this approach it has become clear that the genetic background can be more important than the allele in the determination of the complex phenotype.

From an engineering viewpoint, a network is a physical object with structural architecture. Instead of deleting and modifying parts as practiced by geneticists, an engineering approach to understanding a complex system would involve building a new network from the identified set of parts and adding only those that are necessary to replicate the phenotype of the observed system.

It was evident that jargon and the use of specialized terminologies has interfered with communications among scientists about systems biology approaches. The discussion among the participants became easier when the language was explicitly shifted to "plain English" and away from loaded words such as "architecture," "network," and "hierarchy" that have varied meanings even within biology. Proper usage of language enabled geneticists to converse productively with the systems and engineering people and vice versa.

Model organisms and model systems became the key topics for much of the rest of the discussion. Model organisms can be genetically altered and their environments can be manipulated to enable the testing necessary to develop models and to validate theories. Already the limitations of single and double knockout approaches to systems interrogation are becoming obvious. Thus, as one example, expression profiling and phenotyping of model organisms has emerged as a promising approach for the analysis of genetic crosses and responses to environmental perturbations. The power of such experiments derives from the potential to test multiple perturbations and obtaining global readouts from a large sample set. The realization of the full potential of this approach for systems biology and genetics will require extensive phenotyping with cooperation from the community to produce high quality data, validation by replication, and continued development of more powerful statistical analyses. The desirability, if not the necessity, of creating an available, standardized model organism population, on the order of the CEPH cell line panel used for decades in human population genetic studies, was raised.

The human geneticists acknowledged how insights and concepts arising from systems biology approaches, developed using model organisms at the basic science level, could facilitate human studies, clinical diagnosis, and prognosis. Nevertheless, it will eventually be necessary to conduct research on humans if we are to understand how human phenotypes arise. Despite the inherent difficulties, there has already been some progress. In particular, expression profiling to assign phenotypes is now a widely accepted approach for clinical investigations; the results may be useful for classification and prediction of disease, identification of therapeutic targets, and understanding basic pathogenesis.

Demonstration of a digitalized molecular network reconstruction approach suggested how computationally-based modeling might be applied to human genetics at the organ, if not organism, level in the relatively near term. The results that were shown employed a two-dimensional array listing all components encoded on a bacterial genome along with all of their interactions to produce an in-silico model addressing all of the metabolic processes in the cell. It was reported that this type of analysis has been started with the goal of relating all the components and functions that have been described in a human red blood cell. There is also a nascent effort to model the functioning of human mitochondria. Both of these efforts have obvious implications for studies of human genetic diseases at the WHAT and the WHY levels.

A presentation of an engineering approach, compared with a geneticist's approach, revealed not only technical differences, but more importantly, a distinction in the level of knowledge obtained at the end of the process. An engineer would note the specification and functions of interest for the functioning object (cell or process) and then put together a selection of parts from the list until he got the entity to produce the desired measurable output. For example, an engineer would encode a minimal functioning array of components in a synthetically manufactured megabase length of DNA. He would then insert this DNA sequence into a supportive environment and express the genes. The engineer would thus have determined one answer to the HOW part of the paradigm. This approach may be more efficient than the geneticist's approach of removing (mutating) one piece of the pathway at a time and then analyzing what no longer works. The geneticist would get to a HOW answer eventually but in the process might also obtain insight into WHY issues, such as the role of evolutionary mechanisms in the production of robust and adaptable systems.

In summary, this workshop educated NIGMS staff on the status of systems biology approaches for understanding the mechanisms by which linear information from DNA molecules, expressed in the context of ongoing biological and environmental perturbations, can create fully functional organisms with discrete, complex phenotypes. Workshop participants offered glimpses of how the knowledge and technologies that are being developed at the very basic science and engineering levels can eventually be adapted for use in studies of the human condition. This will make it possible for scientists to identify and understand disease gene alleles in a much faster, more efficient, and more organized manner.

Several general needs were identified:

  • Many of the immediate requirements for specialized infrastructure development and for the creation of new education paradigms are currently being addressed through the seven NIGMS Centers for Systems Biology. However, most of these centers do not focus specifically on issues of genetic variation and the effect variation has on producing complex phenotypes. Although these centers, and other large institutional programs, are training an outstanding cadre of systems biologists, it will be necessary to attract the interest of these individuals to apply systems biology to address genetic variation. Most useful to this end would be offering R01-type grants that take advantage of the existing infrastructure that is provided at a growing number of institutions.
  • Workshop participants also suggested that NIH must continue to foster the use of model organisms for systems biology experiments. The intense need for well-defined, highly detailed, reproducible data that can be replicated and retested and, most importantly, easily and seamlessly exchanged, raised the idea of creating a shared model organism resource to provide uniformly characterized animals to laboratories committed to the systems biology effort. Collection and reporting of data from these organisms under a tightly defined protocol and sharing of data would, of course, be essential.
  • We need improved and minimally invasive technologies to generate, collect, and standardize well-defined data from experiments with human subjects, and we need ways to store the data that assure the privacy of the subjects.


M. Anne Spence, Ph.D. (Chair)University of California, Irvine
David Botstein, Ph.D.Princeton University
Aravinda Chakravarti, Ph.D.Johns Hopkins University
Gary Churchill, Ph.D.Jackson Laboratory
J. Perrin Cobb, M.D.Washington University in St. Louis
Drew Endy, Ph.D.Massachusetts Institute of Technology
Stuart K. Kim, Ph.D.Stanford University
Trudy MacKay, Ph.D.North Carolina State University
Bernard Palsson, Ph.D.University of California, San Diego
Aviv Regev, Ph.D.Broad Institute, Massachusetts Institute of Technology
Veronica Vieland, Ph.D.Ohio State University
Daniel Weeks, Ph.D.University of Pittsburgh