Participants at the Challenges in Docking and Virtual Screening meeting at NIH in August 2005 concluded that progress toward computational tools for molecular docking and in silico screening would be significantly faster if research groups had access to common, high-quality data sets that could be used for development and benchmarking. The ultimate goal was defined as the quantitative prediction of binding affinity, given the independent structures of target protein and ligand. It was proposed that industrial partners might contribute valuable data sets for ligand-target interactions, provided the data would be curated, completed, extended as needed, and made available so that it provided substantial value to the docking and scoring research community.
As proposed in August, a smaller group of pharmaceutical industry, academic, and government scientists met on February 24, 2006 to develop a working plan for a public-private partnership that could accomplish this goal.
The February meeting was held by bicoastal videoconference from sites at GlaxoSmithKline and the University of California, San Francisco. Its goals were:
The industrial groups identified about 25 ligand-target sets potentially available for release. Each set contains tens to hundreds of ligands and their affinities for a given target, and typically tens of X-ray structures of the ligand-protein complexes. Perhaps half of the datasets were for kinases, but the balance covered a wide range of target types. In addition, Mike Gilson mentioned the 13,000 Kd measurements in the BindingDB, some of which might complement the new data sets. Expression conditions, assays and crystallization conditions probably would be released. Materials (plasmids, compounds, crystals) probably would not.
The consensus was that to make a meaningful improvement in prediction methods, data sets will need both close-analog series within chemotype and more than one chemotype per target. This appears to already be the case in a fair number of the data sets under consideration. Data sets on identical targets from different companies might be combined, provided assay data can be rationalized (or re-measured). Some (few) other academic and extant PDB datasets might be suitable. Other organizations and companies may have data sets to contribute. An active recruitment effort should be mounted once a plan for the effort is in place.
Database/user interface. To make the new information readily available for use by academic and private developers, it will be essential to develop and maintain a multidimensional database and user interface capable of integrating target protein crystallographic data sets (PDB format); small molecule ligand crystallographic data; binding affinity data (IC50s, assay and SPR Kds, ITC); expression, crystallization, and ligand synthesis, and binding assay protocols. The Structural Genomics Knowledge Base of the NIGMS Protein Structure Initiative has many of the same requirements and might serve as a prototype, or even a service center. Discussions with interested individuals are recommended. Ideally the user interface would need to supporting download of the data to docking/scoring developers, as well as upload and testing of docking/scoring programs by resource staff in benchmarking exercises.
New experimental data. To enhance and complete datasets submitted by industry participants, several types of additional data will be required. Some binding data will need to be reassessed under consistent conditions, so structural data from different sources can be combined. Affinity constants will be needed for datasets that have only IC50's and for additional compounds as needed to test specific hypotheses about improving prediction programs. Ideally, isothermal titration calorimetry using micro-scale high throughput methods would be performed. Possible sites discussed for such assays were the NIH Molecular Libraries Screening Center and the Protein Structure Initiative centers.
Provision of needed crystallographic data will require two types of effort: refinement of submitted datasets and curation to PDB level and format (for use as input for current prediction programs); determination of a limited number of new structures of target proteins with additional ligands, as needed to test hypotheses and for benchmarking exercises. Possible loci discussed included the PSI centers and the national synchrotron centers.
Determination of properties of unbound ligand molecules, such as solvation and solubility, may be needed to analyze energetics of bound vs. free ligands. National Institute of Standards and Technology could play a very helpful role here. Expression of new proteins and synthesis of additional ligands will be necessary on a limited scale. Contracting out this work may be the most efficient way to proceed.
Christopher P. Austin, M.D.Senior Advisor to the Director for Translational Research Director, NIH Chemical Genomics CenterNational Human Genome Research InstituteNational Institutes of HealthBuilding 31, Room 4B0931 Center DriveBethesda, MD 20892Phone: 301-594-6238Fax: 301-402-0837Email: austinc@mail.nih.gov
Jeff Blaney, Ph.D.Vice President, Lead Discovery Structural Genomix10505 Roselle StreetSan Diego, CA 92121Phone: 858-228-1495Fax: 858-558-0642Email: jeff_blaney@stromix.com
Anne M. Chaka, Ph.D.Computational ChemistPhysical and Chemical Properties DivisionNational Institute of Standards and Technology 100 Bureau DriveGaithersburg MD 20899 Phone: 301-975-4525Fax: 301-869-4020 Email: anne.chaka@nist.gov
Wendy Cornell, Ph.D.DirectorMolecular SystemsBasic ChemistryMerck Research Laboratories126 East Lincoln AvenueRahway, NJ 07065Phone: 732-594-4954Email: wendy_cornell@merck.com
Michael K. Gilson, M.D., Ph.D.Professor and CARB FellowCenter for Advanced Research in BiotechnologyUniversity of Maryland Biotechnology Institute9600 Gudelsky DriveGaithersburg, MDPhone: 240-314-6217Fax: 240-314-6255Email: Gilson@umbi.umd.edu
Jayne Kapur, Ph.D.Computational ChemistPhysical and Chemical Properties DivisionNational Institute of Standards and Technology 100 Bureau DriveGaithersburg MD 20899Phone: 301-975-2460Fax: 301-975-3675Email: jayne.kapur@nist.gov
Deborah A. LoughneyDirector, Computer-Assisted Drug DesignBristol-Myers Squibb CompanyP.O. Box 4000Princeton, NJ 08543-4000Phone: 609-252-6054Fax: 609-252-6012Email: deborah.loughney@bms.com
Arthur J. Olson, Ph.D.ProfessorDepartment of Molecular BiologyThe Scripps Research InstituteLa Jolla, CA 92037Phone: 858-784-9702Fax: 858-784-2860Email: olson@scripps.edu
Catherine E. Peishoff, Ph.D.Site Director, Computational Analytical & Structural SciencesGlaxoSmithKline1250 S. Collegeville RoadUP12-210, PO Box 5089Collegeville, PA 19426Phone: 610-917-6585Fax: 610-917-7393Email: Catherine.e.peishoff@gsk.com
Emanuele Perola, Ph.D.Applications ModelingVertex Pharmaceuticals130 Waverley StreetCambridge MA 02139Email: emaneule_perola@vrtx.com
Brian Shoichet, Ph.D.ProfessorDepartment of Pharmaceutical ChemistryUniversity of California San Francisco1700 4th Street, QB3 BuildingRoom 508DSan Francisco, CA 94143-2550Phone: 415-514-4126Fax: 415-502-1411Email: shoichet@cgl.ucsf.edu
Janna P. Wehrle, Ph.D.Program DirectorDivision of Cell Biology and Biophysics National Institute of General Medical SciencesNational Institutes of Health45 Center Drive, Room 2AS.19KBethesda, MD 20892-6200Phone: 301-594-0828Fax: 301-480-2004Email: wehrlej@mail.nih.gov