NOTE: Hyperlinks within the text may have been deactivated because they no longer link to active sites and/or e-mail addresses.
NIGMS Statement on Coordinate Deposition for Structural Genomics
The international agreement discussed at the Airlie meeting called for releasing structure information on most proteins soon after completion but setting aside some structures for a limited period of time (less than 6 months) to allow for application for patents. However, the NIGMS has a more stringent policy. As stated in the Request for Applications , NIGMS Protein Structure Initiative (PSI) grantees are required to make timely deposition of structural coordinates and related data into a public database upon completion of the atomic structure.
The NIH research centers are just getting under way, and it is unclear how much time is needed to ensure that the results are accurate and to prepare the results for publication and deposition in the Protein Data Bank. The PSI program's current policy is to limit this time to 4 to 6 weeks. This should also be adequate time for the investigators to file patent applications for protein structures of commercial interest.
John C. Norvell
Director, Protein Structure Initiative
National Institute of General Medical Sciences
45 Center Drive, Room 2AS.13B
Bethesda, MD 20892-6200
As the human genome sequence is being completed, scientists have begun a large-scale project to determine the three-dimensional shapes of all proteins and other important biomolecules encoded by the genomes of key organisms. The project, called structural genomics, will provide a basis for modeling how life works in molecular detail, with important applications in medicine and biotechnology. The structural genomics community, with participants from four continents, passed an international collaborative agreement at the Second International Structural Genomics meeting held April 4-6, 2001, at Airlie Conference Center near Washington, D.C.
The 'Airlie Agreement' provides for open sharing of scientific data and technological expertise. The agreed conditions for the sharing of data reflect the balance between two different goals -- timely release of all structural genomics data to the public and consideration for intellectual property regulations that vary significantly in different countries. For projects with public funding, all data on biomolecular shapes are to be made available to the public in all countries soon after their determination. In addition, the agreement recognized the potential for collaboration between structural genomics researchers in academia and in industry. The Airlie Agreement extends and refines an earlier agreement reached in April 2000 at the First International Structural Genomics meeting on the Wellcome Trust Genome Campus near Cambridge, UK.
Specifically, the Airlie meeting reached general agreement on international collaboration in a number of areas, including standards for early data release, criteria for assessing the quality of structures, the sharing of targeted proteins lists, and the archiving and curation of all data:
*General agreement on rapid release of data and wide availability to the international public
*For projects with public funding: --General agreement to deposit atomic coordinates of three-dimensional structures of biological macromolecules and associated experimental data into the Protein Data Bank (PDB) immediately after their determination, and in most cases to release these to the public soon thereafter.
--In some cases, to allow a holding period for release of data of up to 6 months to facilitate patent filing where deemed appropriate.
--In no case, to permit withholding of data from the public for more than 6 months.
* Obtaining high quality structures is of primary importance. Projects must not compromise quality for high throughput operation.
* It is the responsibility of the investigator to make certain that sufficient quality is reached.
* Structural genomics laboratories with public funding have agreed to adopt a policy of open exchange of target information.
* Because patent laws vary between different countries and are unclear regarding the products of structural genomics, the meeting participants encourage patent offices and courts to harmonize the laws. They also welcome strengthened utility conditions for inventions based on these structures.
* The group elected an executive committee to establish an international organization for structural genomics and to plan the next international meeting. This committee consists of Tom Terwilliger (U.S.), Udo Heinemann (Europe), and Shigeyuki Yokoyama (Japan).
The meeting participants also shared their experiences and problems setting up large-scale, high throughput structural genomics operations. Many of the groups gave brief presentations on the highlights and status of their projects.
What is structural genomics?
The structural genomics project is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. The effort builds on the information of the genome sequencing projects. While gene sequencing determines the information content of entire genomes, structural genomics seeks to determine the molecular shapes of the complete set of cellular components that make life work. Knowledge of these three-dimensional shapes teaches us how cells and organisms function. Structural genomics has applications in the life sciences, biotechnology, and medicine, where it can serve as a basis for the development of medications, vaccines, and diagnostics.
The field of structural genomics has developed over the last 3 years and now is reaching significant dimensions. It builds on a long tradition of structural biology, in particular X-ray crystallography, nuclear magnetic resonance (NMR) and computational model building. It is now aiming at high throughput operations and complete coverage of the universe of protein structures.
Currently, there is significant funding for structural genomics projects in the U.S., Canada, the European Union, Israel, China, and Japan. The Airlie meeting was funded by the National Institute of General Medical Sciences , a component of the U.S. National Institutes of Health; the Wellcome Trust , a UK-based medical research charity; and RIKEN (a public corporation of the Japanese government) /MEXT (Ministry of Education, Culture, Sports, Science and Technology).
Agreed Principles and Procedures
Coordination of International Programs in Structural Genomics
This document reports the principles agreed at the April 4-6, 2001 meeting of representatives of the structural genomics community. Its purpose is to generate further co-operation in the structural biology and general scientific communities.
This Airlie Agreement builds on the agreement produced following the first international meeting in Hinxton, UK, in April 2000. The broad overall goals and principles are unaltered. Policy extensions and more detailed definitions are based on the initial reports of the five task forces that were established following the first meeting, and discussions at the second meeting. The amended reports of the task forces are included at the end of this document.
The field of structural genomics continues to evolve very rapidly, and it is expected that further policy revisions in many areas will be made at subsequent meetings of the community.
Success of the genome sequencing projects and major advances in methods of protein structure determination have led the structural biology community to propose the large scale mapping of protein structure space. This structural genomics initiative aims at the discovery, analysis and dissemination of three-dimensional structures of protein, RNA and other biological macromolecules representing the entire range of structural diversity found in nature. Such a complete knowledge will facilitate fundamental understanding and applications in biology, agriculture and medicine. The three-dimensional structures will be crucial for rational drug design, for advancing catalysis in chemistry and biotechnology, and for diagnosis and treatment of disease, as well as for advancing basic principles of biology. A broad collection of structures will provide valuable biological information beyond that which can be obtained from individual structures.
This opportunity is made possible by rapid progress in several related key technologies. These include the construction of synchrotrons and high-field NMR instruments, the MAD method of phase determination, high throughput cloning and recombinant expression, a flood of information from genome sequencing projects, and bioinformatic methods for fold assignment, model building, and prediction of function.
The following document outlines issues related to achieving this expansion of knowledge. The goal is to encourage harmonious cooperation among a broad range of public and private sector institutions in the international effort to characterize macromolecular structures in living organisms on a pan-genomic scale.
- Specific goals
- Large scale determination and analysis of three-dimensional structures.
- To determine by experimental methods a representative set of macromolecular structures, including medically important human proteins and proteins from important pathogens and model organisms.
- To provide models based on sequence similarity to significantly extend the coverage of structure space.
- To derive functional information from these structures by experimental and computational methods.
- Development of methods for Structural Genomics.
- Methods of selecting representatives of protein families based on enhancement of structure space coverage, or functional significance.
- High-throughput methods for production of target proteins suitable for structure determination.
- Methods for high throughput data collection.
- Methods for automated determination, validation, and analysis of 3D structures.
- Methods for homology-based modeling, related methods and validation of modeled structures.
- Informatics systems to optimize and support the process of structure determination.
- Bioinformatics methods for assessing biological function based on structure and other linked biological information sources.
- Methods for more challenging problems of production and structure determination such as those involving membrane proteins and multimolecular complexes.
- Programs needed.
- Financial and organizational support for structural genomics projects.
- International network to co-ordinate and promote efficient application of resources and rapid dissemination of methods and results; to coordinate policies, standards, and formats; and to promote access to unique resources such as synchrotron and high field NMR facilities. To this end, an international organization shall be formed to advance the interests of the structural genomics community. The best long term form for this is not yet clear. As the first step in this direction, one representative from each of the three principal constituencies has been selected by the community to form an executive committee. The committee is charged with organizing international affairs until the next meeting, including further work by the Task Forces . The committee may co-opt others, as it sees fit. Further evolution of the organization is expected to follow. The following have been elected to serve on the Executive Committee: Tom Terwilliger (USA), Shigeyuki Yokoyama (Japan) and Udo Heinemann (Europe).
- Support for the collection, archiving and dissemination of detailed structural information, including atomic co-ordinates, as well as experimental data, protocols, and materials.
- Public funding agencies can cooperate:
- By implementing the agreed policies for deposition, release, quality standards, and formats.
- By providing sustainable support for public programs in structural genomics.
- By encouraging and supporting appropriate international collaborative programs.
- Information and Material Release in the National Structural Genomics Programs
- The primary impetus for structural genomics is to obtain a base of freely available structural information and tools that will support advances in wide areas of biology and medicine. Free exchange of data and materials is essential to the success of this effort, including the timely deposition of coordinates, data, and protocols.
- The community agrees to work to maximize the pool of structures available to the public in all countries, as a basis for both academic research and commercial use.
- For the structural genomics programs with public funding, the following guidelines for release of structural data should be supported:
- The community agrees to work toward the timely release to the public of all basic structural data. The promptness of data release is expected to improve over time.
- Structural genomics laboratories with public funding are expected to deposit their structure co-ordinates and other agreed mandatory data in the PDB immediately on completion of structure determination. In most cases, data release to the public will follow in a short time. It is recognized that in some cases release can be delayed by up to six months after deposition. This should be sufficient for investigators, for example, to assess intellectual property prospects and to file a patent application if desired.
- All structural genomics laboratories with public funding will fully adopt these deposition and release policies no later than April 2002.
- Public information on progress of projects. A primary mechanism for encouraging compliance with the guideline of timely release will be openness of progress tracking for projects.
- Structural genomics laboratories with public funding shall adopt a policy of open exchange of target information, in order to facilitate target selection and to avoid unnecessary duplication of effort. It is recognized that publication of these data may have some disadvantages, but on balance these are out-weighed by the advantages.
- As recommended by the task force on data tracking, each laboratory will maintain a public site, listing target sequences and the status of the work, using a simple, standardized format. The current standards are listed in the appendix. An ongoing working group is responsible for the implementation and operation. This system shall become operational by June 1, 2001.
- The future need for a central registry should be considered further. In particular, international laboratories should evaluate the registry being developed by the NIH.
- Short scientific papers.
- Ensuring high quality of released structures is a priority. In order to help achieve this, structures released by members of the public programs may be accompanied by a short, peer-reviewed paper. These papers could be similar in format and content to the publications of small molecule crystal structures in Acta Cryst. C. Detailed procedures are outlined in the Task Force report . Electronic publication is encouraged. Papers should maximize the inclusion of relevant structural and functional information. Criteria for acceptance of publications will include those recommended by the Task Force on Numerical Criteria .
- The key requirement is that the whole process of publication be completed rapidly.
- Full-length publication is of course also possible. Any publication prior to the end of the maximum six month delay period will trigger data release, following accepted practice in structural biology.
- Technology exchange between structural genomics laboratories. The community recognizes that there is much to be gained by an open exchange of the new technologies being developed in each structural genomics laboratory. Therefore it adopts a policy of open exchange of information on emerging technologies. In particular, we encourage exchange of protocols and software. This policy will greatly reduce duplication of effort, and greatly speed progress in the field. To these ends, a central clearing house for this information shall be established by the international organization.
- Assurance of Data Quality.
- The community recognizes that the production of high quality structures is an integral part of any high throughput structural genomics operation. That is, quality is not to be sacrificed in the interests of quantity.
- The community accepts the report of the task force on numerical criteria for assessing structure quality. For the time being, the numerical criteria recommended by the task force are adopted , as a minimum set of measures to be associated with all released structures. Experience using these criteria, together with new developments in the field, will make periodic reassessment necessary.
- Structure depositions will be accompanied by experimental data in a defined format. For crystallographic studies, these will include structure factor amplitudes, and un-merged, un-scaled integrated intensities for all data sets used in the structure determination. For NMR studies, these will include time domain data sets. It is desirable to move as far as possible towards archiving the raw data, as the data management technologies permit.
- Curation and Data Archiving.
- The community endorses the key recommendation of the Task Force on curation and deposition , namely, that overall objective is to capture the level of detail presented in the material and methods section in a good journal paper. Among other benefits, these data will provide the basis for rapid publication. To this end, an appropriate comprehensive set of data items will be collected in a consistent format. Progress in establishing this set is described in the report of the Task Force .
- The Task Force is asked to continue its work, as outlined in the report, with the goal of completing all data item definitions and establishing recommended procedures by April, 2002. It is anticipated that small workshops will be held to speed progress towards this goal. It is desirable that a template be provided for structural genomics laboratories to use in preparing their depositions.
- In the long run, it is also desirable to collect information on abandoned target studies, and methods for accomplishing this should be developed.
- Organized access to material such as clones, cell lines, and protein samples is also encouraged, provided that satisfactory procedures can be put in place for archiving, storage, and dissemination.
- Relationship to industrial activities.
- The public structural genomics community should explore productive relationships with industrial partners to further the goals of structural genomics.
- International efforts should be made to facilitate the eventual deposition of structures determined in the private sector, and to promote harmonious cooperation and exchange between the public and private sectors.
IV. Intellectual Property Rights
Raw fundamental data on the shape of natural protein molecules, including 3D positional coordinates, should be made freely available to researchers everywhere. However, intellectual property protection for inventions based on these can play an important role in stimulating the development of important new health care projects.
Public funding for structural genomics has varying degrees of support for fundamental science and for potential commercial exploitation. The data release policy described earlier has been designed to accommodate these differences, at the same time optimizing the speed of release of data as much as possible under all circumstances.
Fundamental research underpins all practical uses and applications. Policy makers are urged to preserve and promote the free access and exchange of scientific information among scientists engaged in basic research. This community welcomes efforts around the world to harmonize patent law.
We also encourage efforts to strengthen the utility requirement for patentability. This community is concerned about the implications of the granting of patents based solely on the submission of three-dimensional structural co-ordinates, without any identified non-trivial utility.
V. Future Meetings
Further meetings of representatives of the structural genomics community are anticipated for the continued reexamination of these issues and to further develop these principles and guidelines as the field expands and evolves. The next meeting will occur in Berlin, Germany in October 2002.
These principles were supported by the participants in the Second International Structural Genomics Meeting in the Airlie Center, Virginia, USA, April 4-6, 2001.
Day 1 - Wednesday, April 4:
2:00 pm Registration
3:00 pm Welcome John Norvell
3:05 pm Agency Perspectives NIH Marvin Cassman Wellcome Trust Barbara Skene MEXT Toichi Sakata
3:20 pm Introduction John Moult & Chris Sander
3:30 pm International Organization Tom Terwilliger
3:45 - 5:15 pm Report of the Task forces (1) Chair: Joel Janin Data Capture Helen Berman Target Tracking Steve Bryant Data Quality Assurance Randy Read
5:15 - 5:45 pm Tea/coffee
5:45 - 6:45 pm Project Bottlenecks (1) Chairs: Wayne Hendrickson Michal Linial William Studier Report from active groups: what are the biggest technical challenges?
6:45 - 8:45 pm Drinks and dinner
8:45 - 10:15 pm Relations between Industry and Academia Chair: Dino Moras The Structural Genomics Consortium Barbara Skene Views of a CEO of a structural genomics company Tim Harris
Day 2 - Thursday, April 5:
7:00 - 8:30 am Breakfast 8:30 - 9:15 am Intellectual Property Rights Chair: Joel Sussman Intellectual property issues for structural genomics Joseph Straus
9:15 - 10:00 am Report of the Task forces (2) (continued) Chair: Stephen Burley Intellectual Property Rights Marvin Cassman/John Norvell Publication Guy Dodson
10:00 - 10:30 am Coffee/tea 10:30 - 11:30 am Project Bottlenecks (2)
(continued) Chairs: Wayne Hendrickson, Michal Linial,
11:30 - 12:30 pm Review draft agreement/policy document
12:30 - 2:00 pm Lunch
2:00 - 3:30 pm Bottlenecks (1) Chair: Shigeyuki Yokoyama cDNA repository Josh Labaer Automation of NMR structure determination
and refinement Michael Nilges
3:30 - 4:30 pm Policy Discussion Discuss revisions to policy document
4:30 - 5:00 pm Tea/coffee
5:00 - 7:00 pm Parallel sessions (1) Highlights from structural genomics groups Chair: Ivano Bertini (2) Work groups to revise draft agreement / policy document
7:00 - 8:30 pm Drinks and dinner
8:30 - 10:00 pm Pass draft agreement / policy document
Day 3 - Friday, April 6:
7:00 - 8:30 am Breakfast
8:30 - 10:30 am Bottlenecks (2) Chair: Udo Heinemann Miniaturization of protein production and crystallization Ian Wilson Automation of crystal structure solution and refinement Tom Terwilliger Comprehensive functional characterization of gene products Lee Makowski
10:30 - 11:00 am Coffee/tea
11:00 - 12:30 pm Pass final consensus agreement and dissemination plan
12:30 pm Lunch
Formal close of meeting
1:30 - 4:00 pm Working groups as needed to polish final policy document & press release
Task Force Reports from the Second International Structural Genomics Meeting Sponsored by NIGMS, the Wellcome Trust, and RIKEN/MEXT