2005 Protein Structure Initiative Protein Production and Crystallization Workshop

Natcher Conference Center
NIH Campus, Bethesda, MD

February 2-3, 2005

A major objective of Protein Structure Initiative (PSI) is to develop, apply and disseminate methods for faster, cheaper and more reliable determination of protein structure. During the initial pilot phase of the PSI research centers have made a significant progress in establishing high throughput protein structure determination pipelines based on x-ray crystallography and NMR. As progress is made in a particular in a particular portion of these processes typically technical challenges emerge elsewhere. To facilitate an exchange of developments and advancements among participants in the PSI, the National Institute of General Medical Sciences (NIGMS) has organized annual workshops highlighting the particular constituent tasks of structural genomics which represent the most significant and persistent challenges. The 2005 edition of this PSI fixture was held in the Natcher Conference Center on the NIH campus in Bethesda, Maryland, on February 2-3, 2005. Participating scientists gathered for one and one half days to focus on methods and techniques for the high throughput production, purification and crystallization of target proteins. This gathering included representatives from the nine pilot Centers and also investigators supported by investigator initiated R01, R21. SBIR and STTR research projects grants. As in past successful workshops the emphasis was the sharing of both failures and successes. The main goal of workshop is to provide an effective platform for scientists to exchange ideas and data, to discuss progress and problems, and to encourage contacts and collaborations among participants.

Dr. Jeremy Berg, the director of NIGMS, introduced the workshop and welcomed the participants. The program on the first day began with an invited presentation from Dr. Bill Studier, Brookhaven National Laboratory. The remainder of the workshop consisted in the presentations by the nine PSI Centers providing an overview of methods and techniques most successfully employed during the pilot phase of the initiative. Interleaved with these talks were a total of fifteen presentation from investigators within the Centers or allied research projects which were promoted to the platform from submitted poster abstracts by organizing committee vote. A total of seventy five abstracts were submitted and presented as posters by meeting attendees during a poster session ending the first day of the workshop.

The notes which follow provide an account of the oral program for the 2005 Protein Production and Crystallization Workshop as prepared by Stephen Anderson, Rutgers University, and Rosalind Kim, Lawrence Berkeley National Laboratory, with contributions from Andrzej Joachimiak, Argonne National Laboratory, Scott Lesley, Genomics Institute of Novartis Research Foundation, Jeff Bonanno, New York Structural Biology Center, and Michael W. Adams, University of Georgia. Together these individuals constituted the organizing committee for the meeting.

Session I

Keynote Speaker:
Dr. F. William Studier, Brookhaven National Laboratory
Protein production by auto-induction in inducible T7 expression systems

A historical perspective on the development of T7 promoter-based systems for heterologous gene expression in E. coli was presented. Investigation of factors that affect stability, growth, and induction of T7 expression strains in shaking vessels led to the recognition that sporadic, unintended induction of expression in complex media, previously reported by others, is almost certainly caused by small amounts of contaminating lactose. Glucose prevents induction by lactose by well-studied mechanisms. Reliable non-inducing and auto-inducing media that include metabolic balancing of pH, in which batch cultures grow to high densities, were presented. Auto-inducing media have also been developed for labeling proteins with selenomethionine, for enriching proteins with 15N and 13C, and for production of target proteins by arabinose induction of T7 RNA polymerase from the pBAD promoter in BL21-AI.

Center Overview 1:
Dr. Jeff Bonanno (NYSGXRC)
The NYSGXRC: An Industrial/Academic Center for the NIH Protein Structure Initiative

Together, members of the NYSGXRC have established and operate a fully-integrated, high-throughput center for the following activities: (i) protein family classification and target selection, (ii) protein expression and solubility testing, (iii) E. coli fermentation, (iv) purification and biophysical characterization, (v) crystallization, (vi) synchrotron X-ray diffraction data collection, (vii) X-ray phase determination via molecular replacement, multi- and single-wavelength anomalous dispersion measurements, and isomorphous replacement, (viii) model building and refinement, (ix) PDB deposition of atomic coordinates and experimental structure factors, (x) comparative protein structure modeling, and (xi) functional annotation and dissemination of results.

Targets for this effort are derived from bioinformatic analyses of public genome databases and selected based on protein family sequence identity to known structures with the intention of coverage of sequence/structure space. A program of technology development has also been undertaken to both increase efficiency/throughput and accelerate structure determination for challenging, multi-domain eukaryotic proteins.

At peak capacity, the NYSGXRC will be able determine 200 structures annually at an average cost approaching $50,000-60,000/structrure for proteins that can be expressed readily in E. coli. The long-term goal of the NYSGXRC is to contribute substantially to the 5,000 - 10,000 structure objective of the Large Scale Protein Structure Initiative (PSI-2).

Center Overview 2:
Dr. Andrzej Joachimiak (MCSG)
The Midwest Center for Structural Genomics protein structure determination pipeline using x-ray crystallography and synchrotron radiation

The Midwest Center for Structural Genomics (MCSG) has established a protein structure determination pipeline using x-ray crystallography. The process of structure determination was analyzed for critical steps and major bottlenecks. A number of these steps required development and optimization of new methods and application of parallel approaches as well as automation and robotics. The current MCSG pipeline integrated all essential experimental and computational processes. Public databases of genomic sequences are being analyzed and targets are selected for structural studies. The MCSG pipeline generates well-characterized protein target expression strains, produces milligram quantities of proteins and heavy-atom labeled crystals. Crystals are being tested for diffraction using synchrotron beamlines. The cryoprotected crystals of x-ray quality are used for data collection and structure determination using semi-automated SAD or MAD approach. Structural models are auto-build and structures are refined, verified and analyzed using semi-automated computational tools. Functional analysis is being performed using a newly developed ProFunc server. 3D models of relevant members of sequence family are generated and their quality is assessed. A number of target optimization approaches have also been tailored to high throughput to increase the efficiency of the pipeline. These include the improvement of protein construct, the use of orthologues and application of alternative expression systems, protein refolding and chemical modification, optimization of crystallization, cryoprotection and data collection. The majority of the steps in the MCSG pipeline are tracked in near real time by the database, including potential overlaps and progress. All the structures and their analysis are made available to the public using the MCSG database and web interface. The MCSG structure determination pipeline when combined with data collection facilities at third generation synchrotrons, advanced software and computing resources resulted in significant acceleration of protein structure determination and overall reduction of cost.

Session I – promoted abstracts

Dr. Masayori Inouye, Dept. of Biochemistry, UMDNJ-Robert Wood Johnson Medical School
Single protein production in living cells using an mRNA interferase

Dr. Inouye described a cold shock vector-based expression system in which induction of an ACA-specific endoribonuclease, MazF, could be used to clear a living E. coli cell of virtually all endogenous mRNAs. A recombinant protein gene of interest, previously engineered to remove all ACA triplets (without affecting the encoded amino acid sequence) and thus create a MazF-resistant mRNA, could then be expressed in the cell in the absence of all other protein synthesis. Using this system, very high levels of expression with very low backgrounds were demonstrated for several proteins, including human eotaxin. Even days after MazF induction, protein synthesis remains robust: the rate of eotaxin synthesis was unchanged after 96 hrs. at 15 °C. This single protein production (SPP) system enables specific and exclusive labeling of a protein in a living cell, which may facilitate direct structural characterization of proteins in the intracellular environment by techniques such as NMR.

Dr. James Swartz, Departments of Chemical Engineering and Bioengineering, Stanford Univ. New cell-free technologies provide a platform for synthesis of protein libraries

Dr. Swartz discussed several improvements to E. coli extracts for cell-free protein expression. By altering the batch reaction composition to provide ATP synthesis from activation of oxidative phosphorylation, energy levels can be maintained and NMPs used instead of triphosphates, thereby greatly reducing reaction costs. Host genetic mutations have been established that stabilize in the cell extracts the levels of arginine, tryptophan, serine and cysteine, which are the four amino acids typically lost during the in vitro reaction. Additionally, stabilizing the reaction redox potential and addition of a disulfide isomerase has enabled the folding of a number of secreted mammalian proteins. Replacing the recBCD operon with the lambda bacteriophage red genes yields good synthesis from PCR-produced linear DNA templates. These innovations are available for scaled-up expression as well as for efficient screening of expression libraries.

Dr. Christopher Mehlin, SGPP and Department of Biochemistry, University of Washington
Small changes make an enormous difference: lessons from altering the protein sequence

The SGPP is focusing on protozoan proteins from the causative pathogens of human diseases such as malaria, African sleeping sickness, and leishmaniasis. To overcome expression problems with any particular polypeptide from these pathogens, expression of homologous variants from related species is tried, and mutation and domain parsing are extensively employed to improve the expression, solubility, and/or crystallizability of particular proteins. GINSU, the domain parsing program available on the web from Dr. David Baker's lab, has proved particularly useful. The authors have observed that even relatively small changes in the proteins amino acid sequence can have profound effects on the protein's expression and solubility (in E. coli) as well as its crystallizability.

Dr. Mark Knuth, JCSG and Protein Sciences Dept., Genomics Inst. of the Novartis Foundation
A high-throughput baculovirus-mediated protein expression pipeline

For structural genomics applications, a relatively high-throughput insect cell (baculovirus)-based system for difficult-to-express mammalian proteins has been developed. This was achieved by marrying two technologies: adapting commercially available baculovirus expression vectors and hosts to off-the-shelf robotics and other automation systems. Especially notable were the development of techniques for rapid titer determination without the need for plaque assays and automated methods for bacmid production. Expression yields of up to 25 – 100 mg/L have been achieved. Also, in the course of this work, the need for checking for soluble aggregates was flagged, since proteins using this system are sometimes expressed in soluble form (i. e., non-sedimenting) but still subject to micro-aggregation that can inhibit crystallizability.

Dr. Ming Luo, SECSG and University of Alabama at Birmingham
96-well expression of higher eukaryotic proteins in insect cells

An insect cell system for structural genomics studies of proteins from higher eukaryotes such as C. elegans and human, based on the Invitrogen Drosophila Expression System (DES), was described. It was found that this could be adapted to a 96-well format in the SECSG's standard robotic expression set-up with only minor modifications. More than 70% of the clones tested (many of which had no expression or low solubility when expressed in E. coli) yielded soluble material in Drosophila cells. Scale-up for production of proteins for structure determination is underway.

Session II

Center Overview 3:
Dr. Christopher Mehlin (SGPP)
From genome to protein to structure to ligands

The goal of SGPP is to express, purify, crystallize and solve three- dimensional structures of key proteins from several Leishmania species, Trypanosoma brucei, T. cruzi, Plasmodium falciparum and other Plasmodium species. Target selection is characterized by application of a combination of criteria that optimize, on the one hand, the probability of success in protein expression and crystallization (such as size, PFAM family size, protein flexibility, pI) and, on the other hand, the likelihood that the protein is functionally relevant and a possible target of new therapeutics. So far, a total of ~ 7500 soluble target proteins from these eukaryotic protozoa have been cloned and checked for soluble expression, ~ 600 purified at the 10 mg pure protein level, and ~30 structures solved. A statistical analysis of factors affecting soluble expression and successful crystal structure determination was presented. As with many groups, the SGPP is observing problematic expression of proteins with a predicted high pI.

Salvage pathways such as yeast two-hybrid screens of Plasmodium falciparum to discover protein pairs, computational domain prediction to determine compact domains for optimization of soluble expression and crystallization, and the use of antibodies and small molecule cocrystallants as aids in crystallization and structure determination are also being explored.

The presence of bound small molecules can improve crystal quality significantly. In several cases, E. coli appears to volunteer by contributing endogenous ligands that cocrystallize and provide insight into the function of several SGPP proteins. In other cases, ligands, cofactors, or inhibitors have been added to crystallization cocktails, frequently enabling or enhancing structure determination. In a number of cases, the active sites of structurally-characterized proteins revealed unexpected similarity to those of proteins with known function, increasing functional understanding of the SGPP target protein. Data was presented on a library of guanidinium-linked compounds for use as ligands for co-crystallization.

Center Overview 4:
Dr. Thomas Acton (NESG)
Structural proteomics of eukaryotic protein domain families

NESG focuses on proteins targeted from the proteomes of eukaryotic model organisms and human. The targets are selected as representatives from protein sequence families in order to provide broad "coverage" of fold space, as well as proteins that are particularly interesting from a functional genomics perspective. A primary goal of the project is to develop efficient and integrated technologies for high-throughput (htp) protein production and 3D structure determination. NESG combines parallel efforts in both X-ray crystallography and solution-state NMR spectroscopy. The primary "target genomes" are eukaryotic model organisms with target selection focused on clusters of protein domain families (NESG Clusters). Multiple homologues from these "NESG Clusters" are selected for study both from these eukaryotic "Target Proteomes" and from the proteomes of bacterial and archeal "Reagent Proteomes".

As with other centers, the NESG protein structures exhibit quality scores similar to structures generated in traditional structural biology projects. The project has developed an integrated and standardized "high-throughput" process for structure and function analysis of novel gene products on a genomic scale, correlating the structural and biochemical function analysis of these targeted proteins with the extensive biological data emerging from large-scale functional genomics efforts. As part of this project, the NESG has also developed technologies for protein sample production, crystallization, and rapid protein structure determination by both NMR and X-ray crystallography. To date, the NESG project generated some 160 3D protein structures, which provide the basis for homology modeling over 25,000 proteins, including more than 4,000 proteins that could not have been modeled from structures in the PDB at the time the NESG structures were deposited. An additional observation is that 1/3 of the human proteins observed were intrinsically unfolded.

Center Overview 5:
Dr. Sung-Hou Kim (BSGC)
Overview of Berkeley Structural Genomics Center: Mission, progress, metrics and lessons learned during the pilot phase of PSI

Two general approaches have been taken by the PSI centers during the pilot phase: (1) dense coverage of all the protein structures in a minimal organism; and (2) sparse coverage of selected proteins in one or more medium-to-large organisms. Berkeley Structural Genomics Center (BSGC) took the first approach for Mycoplasmas containing only 500 – 700 genes. The objective for the pilot phase can be summarized as (1) to develop high throughput methods and protocols to proceed from cloning to structure determination, (2) to determine the structures of all soluble proteins in Mycoplasma with no sequence homologies to the proteins of known structure, and (3) to obtain the metrics for assessing the magnitude and scale required to achieve the overall PSI objective of a comprehensive coverage of the protein structure space. The rationale for the minimal organism approach, technologies and protocols developed, progress made, and the metrics and lessons learned at BSGC were presented. Particularly notable achievements reported were an overall success rate of 25% in going from gene to structure due to multiple approaches, and the finding that structure gave clues to function in at least 96% of the 80 structurally-characterized BSGC targets.

Center Overview 6:
Dr. Li-Wei Hung (TBSGC)
TBSGC overview

The TB Structural Genomics Consortium was formed to foster an international effort to determine structures of proteins from M. tuberculosis. The members of the TBSGC have determined structures of more than 90 M. tuberculosis proteins. The TBSGC has facilities for high-throughput cloning, expression testing, protein production, crystallization, and X-ray data collection for the benefit of the entire Consortium. Additionally, structure determinations are carried out in member laboratories around the world. Members of the TBSGC have developed technologies for structural genomics including: an interactive proteomics database, the engineering of proteins for optimal solubility, crystallization platforms, and automated structure solution methods for X-ray crystallography. These technologies will be applicable to structure determination in both large-scale and laboratory scale efforts. Software developments for automated fitting of flexible ligands was also discussed.

Session II – promoted abstracts

Dr. Natalia Oganesyan, BSGC
Osmotic stress and heat shock as a way to increase solubility of recombinant proteins expressed in Escherichia coli for structural studies

The most important characteristics of a protein as a candidate for structural studies are its solubility and homogeneity. A disadvantage of E. coli as an expression system is the formation of insoluble inclusion bodies due to protein misfolding during over-expression. Several approaches are utilized to deal with expression problems, including: fusion of the target protein to a solubilization tag, protein refolding from inclusion bodies, optimization of expression conditions, and co-expression of heat-shock proteins with target proteins. Heat shock proteins, many of which are molecular chaperones, are a bacterial defense mechanism against heat stress. They prevent protein aggregation and assist protein refolding. An increase of recombinant protein solubility by induction of heat shock proteins has been reported. Adaptation of E. coli cells to salt leads to the accumulation of small organic compounds known as osmolytes. Osmolytes can act as "chemical chaperones" to help stabilize proper folding and decrease the tendency of native soluble proteins to form insoluble aggregates during heat shock. A method was described for enhancing recombinant protein solubility in E. coli based on the presence of glycine betaine, one of the osmolytes, in the growth media and induction of a heat shock response using 3% ethanol.

Dr. Chang Yub Kim, TBSGC and Bioscience Division, Los Alamos National Laboratory
Technique development for quality control and high throughput ligand analysis of purified proteins

A quality control technique was described that employs gel band analyses of target proteins (as well as possible contaminant proteins) using mass spectrometry and identification of peptides through a database. This technique was also applied to monitor the efficiency of seleno-methionine incorporation.

Another technique, to screen ligands of each purified protein in a high throughput manner, is being developed. In this method, a mixture of proteins is bound randomly to a dye-affinity column and groups of ligand-interacting proteins are eluted with series of nucleotide ligands such as ATP, ADP, cAMP, SAM, GTP, NAD, and NADH. Using Pyrobaculum aerophilum and Mycobacterium tuberculosis cell extracts, the identification of proteins interacting with many nucleotide ligands (and potential TB target proteins for new drug discovery) was demonstrated. This method was also applied to the identification of ligands of the "stalled" proteins in the TBSGC crystallization pipeline with the potential to improve crystal quality.

Dr. Irina Kataeva, SECSG and Dept. of Biochemistry & Molecular Biology, Univ. of Georgia
Improving protein solubility: Our experience of using MBP fusion and expression at different temperatures

A common problem of HTP cloning and expression of recombinant proteins into Escherichia coli is their low solubility. In this work, cleavable fusions of "helper" proteins along with varying temperatures during expression were used to increase protein solubility. 106 targets were salvaged from the mesophile Shewanella oneidensis and 192 targets from an anaerobic thermophilic bacterium, Clostridium thermocellum JW20. (Previous results with more conventional methods had demonstrated that expression of the majority of these targets resulted in insoluble recombinant protein.) In the present study, the expressed proteins comprised an N-terminus of maltose-binding protein, a 6His tag, an attB1 recombination site, a TEV protease recognition site and a target protein. The proteins were screened for total expression at 37 ºC and for solubility at different temperatures. The results showed that of 86 S. oneidensis genes cloned, 80 of them expressed; however, only 5 of them were soluble at 37 ºC. The solubility was dramatically improved at 28 ºC (65 proteins) and 18 ºC (75 proteins). In the case of C. thermocellum, 182 genes were cloned and 149 of them expressed well. The solubility of the expressed C. thermocellum proteins at 37 ºC was much higher than that of the Shewanella proteins (100 examples) and increased by approximately 40% at both 28 and 18 ºC.

Dr. Y. Kim, MCSG and Structural Biology Center, Argonne National Laboratory
Automated purification using ÄKTAexplorer and ÄKTAxpress in the Midwest Center for Structural Genomics

One of the critical components of the structural genomics pipeline is to produce "structural biology grade" proteins in a quantity, concentration and quality suitable for structure determination experiments. A high-throughput parallel protein purification system was described in this talk. In brief, proteins are expressed as fusions with cleavable affinity tags and purified in a few chromatographic steps: (i) immobilized metal affinity chromatography (IMAC) coupled with buffer-exchange step and, after tag-cleavage, (ii) IMAC and buffer-exchange. These protocols have been implemented on the multi-dimensional chromatography workstations, ÄKTAexplorer and ÄKTAxpress. The automated protocols have been successfully applied to more than 1000 soluble proteins, native and seleno-methionine labeled, of several microbial origins. The current facility contains two ÄKTAexplorers and two ÄKTAxpresses producing 48 – 60 His6-tagged proteins per week. For well-expressed, soluble targets, the typical yield is 20 – 200 mg at 90 – 95 % purity. One issue that was noted was the tendency of some proteins to fall out of solution when the fusion tag was cleaved off.

Dr. Heath E. Klock, JCSG
Expanding the collection of analytical data for soluble proteins

Highly parallel approaches to protein expression and purification developed at the Joint Center for Structural Genomics (JCSG) allow for large numbers of unique proteins to be processed quickly. A tiered-strategy for target processing requires an ability to analyze protein behavior in the pipeline. Understanding protein behavior requires characterization of the biophysical properties of gene products in the pipeline. Obtaining such information is often a time-consuming and labor-intensive process. Parallel and high-throughput methods to collect such data were described. Analytical testing within the JCSG pipeline provides information such as protein toxicity, the multimeric state of the protein, evidence of bound cofactors, thermostability via differential scanning calorimetry, isoelectric points, and protein purity. Analytical data resulting from the expression of approximately 600 soluble T. maritima proteins that have entered crystallization trials was shown. Trends in success rates can now be correlated with biophysical properties. As an example, thermostability measured by DSC analysis did not show a correlation to the ultimate success rate in obtaining structures.

Session III

Center Overview 7:
Dr. Scott A. Lesley (JCSG)
Protein production and crystallization at the Joint Center for Structural Genomics

A major theme of the JCSG since its inception has been technology development. In this light, a historical summary was presented regarding how the original concept of the structural genomics pipeline has evolved, and emphasizing how the final result derives primarily from the quality of the proteins produced rather than the sheer numbers attempted. The JCSG has settled on a three-tiered approach to production as the optimal arrangement at present. Tier 1 is a highly automated, high throughput (HTP) protein production process, which assesses the general behavior of the target gene products. Tier 2 is structure determination of these products, primarily by X-ray crystallography. And Tier 3 involves a number of salvage pathways for the attempted rescue of targets that fail to yield acceptable samples when expressed via the standard Escherichia coli-based pipeline. Such approaches include screens for different buffers and additives for maintaining solubility, use of insect cell and other expression systems, and identification and co-expression of protein complex partners (note: this is a difficult issue – various partner identification approaches, including two-hybrid analyses, protein arrays, and bioinformatic tools show very little overlap in their predictions). Other routes that the JCSG has explored to increase crystallization and structure determination success are reductive methylation of proteins, partial proteolysis, targeting orthologs, and surface mutagenesis based on predicted regions of disorder. The JCSG is also utilizing a number of techniques to analyze protein products for aggregation and for the presence of bound ligands.

Center Overview 8:
Dr. George Phillips (CESG)
History and accomplishments of the Center for Eukaryotic Structural Genomics

In addition to the construction of a productive pipeline from target selection to structure deposition, the CESG has also developed and/or evaluated new technologies in a high throughput environment: these have included a comprehensive LIMS, novel vectors, optimized E. coli production methods, robotic cell-free protein production, mass spectrometry for quality assurance, an integrated crystal development environment, and procedures for semi-automated X-ray and NMR structure determinations. The CESG is now using their integrated SESAME database to mine the existing data to increase protein purification efficiency, and even to manage costs. For example, a lower predicted pI was found to correlate strongly with expression success, and is now utilized as a screening factor; other attempts are currently underway to correlate success at each step in the process with gene and/or gene product characteristics. In addition to targeting proteins representing novel folds (20% of the CESG produced structures represent protein families found only in eukaryotes), they are targeting proteins involved in human diseases and are accepting target recommendations from the scientific community. The CESG has implementing a unified cloning and small scale screening procedure using both E. coli cells and wheat germ cell-free expression systems. They are also testing integrated microfluidic chip technology for screening of scaled down crystallization conditions using 100x less protein than standard screens. A number of the CESG protein structures have generated hypotheses re biological function, which are being tested. The center has been able to reduce the turnover time for both X-ray and NMR structures to as low as two months in some cases, and the most recent estimate of the cost per structure, including overhead, was $89K.

Center Overview 9:
Dr. B.-C. Wang (SECSG)
From adolescence to maturity at SECSG: What, why, and where we stand in technology development

The SECSG began with three groups targeting different genomes -- Pyrococcus furiosus, Caenorhabditis elegans, and human -- with different techniques. The project has evolved through its adolescent stage (infrastructure and HTP protocol development, analysis and quality control of protein products) in the first several years to its current level of maturity. In the past year the crystallomics group (a test-bed for a PSII-type production center) has produced 23 structures at an average estimated cost of $83K per structure. Technology development has included robotic small-scale expression screens, which are currently being expanded to include 96-well insect cell (Drosophila) expression screens. Further, optimized automated protein purification techniques using the ÄKTA 3D purifier, and novel expression vectors for simple, rapid co-expression of protein complex partners, are being incorporated. A protein structure determination pipeline is in place, which recently allowed on-site determination of five structures in 23 hours. Premounting and screening of crystals on the home source results in considerable savings on the beamline, and naturally present metals or sulfurs can be used for phasing. Moreover, some structures are easily solved using the home source.

Session III – promoted abstracts

Dr. Margaret Johnson, JCSG and The Scripps Institute
Structures and screening by NMR spectroscopy in the Joint Center for Structural Genomics

For 79 proteins, a comparison was made between solution foldedness as measured by NMR and crystallization success. 1D NMR spectra could be "graded" into four categories: well folded; well folded but with some line broadening; oligomeric or aggregated; or unfolded. The data indicated no correlation with crystallization "hits"; however, a good correlation between foldedness and crystal (diffraction) quality was observed. Introduction of a microcoil probe has allowed screening of protein foldedness with 1% (as little as 100 g) of the protein previously required, thereby reducing costs by allowing folding prescreens to be performed on small scale protein preps.

Dr. John Hunt, NESG and Department of Biological Sciences, Columbia University
Informatics and experimental approaches to address the failure-to-crystallize conundrum

In contrast to the above presentation, this talk reported that the NESG had found that there is little correlation between foldedness and crystallization success. However, strong correlations were found between crystallization success and (i) protease resistance (especially for trypsin) and (ii) lower oligomeric state of the protein. The talk offered a good example of how the next generation of Centers will need to be able to normalize their data for better comparisons. Dr. Hunt also speculated we may be (at least temporarily) bumping up against the limits of current engineering and technology, in that only 11% of human proteins go on to crystallize and diffract X-rays.

Dr. Mark Pusey, Marshall Space Flight Center, NASA
Fluorescent approaches to high throughput crystallography

Work on doping protein crystals with fluorescent dyes was reported. Screening for crystals can be improved by addition of trace levels of fluorescently labeled protein to the bulk unlabeled protein so that the only protein crystals, not those due to impurities, will fluoresce. Even with as little as 0.5% labeling it is possible to pick out crystals from amorphous material and to visualize mounted crystals in cryoloops. Crystal quality was not adversely impacted by this deliberate impurity: in preliminary trials, addition of the tracer molecules seemed to affect nucleation rates but not resolution. This technique also allowed visualization of small crystals, even when they were buried in precipitates.

Dr. Craig Bingman, CESG and Biochemistry Department, University of Wisconsin-Madison
Comparative crystallomics at the Center for Eukaryotic Structural Genomics

This was an overview of the integrated robotic platform for crystallization and imaging, as well as the SESAME LIMS database, at the CESG. By incorporating salvage pathways CESG has been able to increase success rates from 5% in June of 2004 to 12% in January 2005. For example, reductive methylation of proteins can increase crystallization success by 10%. The time- and cost-saving potential of microfluidics chips was also emphasized: in principle, only 100 g samples of protein might be sufficient for screening of crystallization conditions. But this application of microfluidics technology is still in its early days.

Dr. Pearl Quartey, MCSG and Biosciences Division, Argonne National Laboratory
Use of reductive methylation of proteins to increase crystallization efficiency at the Midwest Center for Structural Genomics

This talk provided a detailed analysis of the effect of reductive methylation on the crystallizability of 90 target proteins chosen at random. While the technique is fast, quantitative, and relatively free of side reactions, it requires significant amounts of protein (~20 mg). The overall success rate with this technique was 6.7% and, where crystals were obtained from both methylated and non-methylated targets, the crystals from methylated proteins yielded better resolution than the crystals from the corresponding non-methylated forms. There is no apparent correlation between success and the number of lysines, and they are currently investigating other modification techniques, such as phosphorylation and carboxylation. Overall, the statistics for MCSG regarding crystallization success are: 31.4% of "purified" proteins go on to yield "crystals"; 38% of "crystals" go on to "structures".