March 29-31, 2004
One of the major goals of Protein Structure Initiative (PSI) is to determine structures of proteins from an assemblage of thousands of protein families found in living organisms for which there is no structural data. In order to accomplish this, the PSI pilot centers in the last four years have made a significant effort to establish high throughput protein structure determination pipelines using x-ray crystallography and NMR. The analysis of these early pipelines identified a number of important bottlenecks and technical challenges. Preparation of protein samples and crystals that are suitable for structural studies represents one of the most important stumbling blocks. In an effort to facilitate an effective exchange of developments and advancements between pilot centers, the National Institute of General Medical Sciences (NIGMS) organized several workshops on gene cloning, protein expression, and purification. The first workshop was held in March 2002. The scope of a second workshop in April 2003 was expanded to include crystallization. The 2004 workshop focused on several key topics that still remain major challenges in structural genomics and structural biology. The important aspect of these workshops was to share both failures and successes. The 2004 Protein Purification and Crystallization Workshop (PPCW) topics included:
- Modification of proteins to optimize expression/solubility
- Alternative and novel prokaryotic expression systems
- Eukaryotic expression systems
- Methods to minimize sample heterogeneity and improve crystal diffraction
- Membrane-Associated Proteins
- Robotic platforms for all of the above
- Failures, problems, and bottlenecks
PSI is now in the fourth year and the nine PSI pilot research centers, as well as R21 and RO1 recipients, have already contributed significant technology and new methods in high throughput structural biology. Protein and crystal production for structural genomics comprises aspects of target selection, cloning, and expression of recombinant proteins; cell-culture fermentation, isotopic enrichment with selenium for x-ray crystallography, and/or stable isotopes required for NMR studies, purification, analytical characterization, and growth of x-ray quality crystals.
The 2004 NIGMS (PPCW) was organized with the main goal of providing an effective platform for scientists to share and exchange ideas and data, to discuss progress and problems, and to address the current challenging bottlenecks. The purpose of PPCW was also to encourage the contacts and collaborations between the participating groups. The PPCW brought together representatives from all nine P50 Center Grants, two R21 Program Project Grants, and recipients of NIH R01 grants in structural genomics. Federal staff from NIGMS and other NIH institutes and federal agencies also attended. There were several changes introduced for this workshop based on the suggestions of participants from last year. Five major areas of research were proposed for the workshop this year with a broader scientific program that required expansion of the workshop from 2 to 2.5 days. The PSI pilot centers were invited to present their current progress in a form of posters. A total of 125 participants met at the NIH campus on March 29-31, 2004. Twelve invited speakers presented their research results in areas relevant to the meeting topic. In an expanded forum, speakers representing the research centers and research projects described the bottlenecks and results. In addition, experts from the international community were invited to provide initial discussion points in all critical areas pertinent to the main topics of the workshop. Discussions focused on progress in high-throughput methods for cloning, production, purification, and crystallization of proteins for X-ray crystallography and NMR studies.
Last year’s successful poster session was expanded and this year included 37 posters from pilot centers, RO1 grant recipients and R21 program projects. Several posters were selected for short oral communications and an “open microphone” session was significantly expanded this year giving opportunity to present and discuss new results. Abundant time for discussion was planned and the workshop was completed with a summary discussion session. Similar to last year’s workshop, the proceeding of the PPCW will be published in an upcoming issue of the Journal of Functional and Structural Genomics.
The workshop provided the opportunity to review the current state of the art in several important areas of the structural genomics pilot projects and to discuss areas that still remain major challenges. It was very clear that the PSI pilot centers have put together impressive and successful pipelines for protein structure determination. The workshop showed that significant progress has been made since last year in the production of more difficult proteins (membrane proteins and protein complexes). These pipelines are now being applied to membrane proteins with encouraging results, including application of NMR for structure determination of small membrane proteins. Several salvage pathways have been proposed and tested to improve success rates for protein expression, crystallization, and for improving crystal quality. The intriguing protein surface mutagenesis approach may provide a viable alternative to currently adopted methods. Moreover, numerous processes have been partly automated and made more efficient and cost effective. These results show that the PSI pilot centers are already making important technical advances in protein production and crystallization for structural biology. The resulting technologies will have broad value to the entire biological science community.
Dr. John Norvell, research director of the PSI opened the workshop and discussed the current progress and the future of the PSI program. He outlined the time frame and basic requirements for establishing the next phase of the PSI network including call for proposals to create large scale structural genomics centers as well as smaller methodology oriented specialized centers. Dr. Jeremy Berg, the director of NIGMS, has emphasized the role of the PSI program in the NIH long-term planning and its potential contribution to studies of disease relevant proteins as a part of an integrated NIH road map.
Session I Modification of Proteins to Optimize Expression/Solubility
Dr. Francois Baneyx discussed strategies and challenges of heterologous protein production in E. coli. He emphasized the role of molecular chaperones, particularly those expressed as part of the sigma factor 32 (σ32) regulations, in the in vivo folding of recombinant proteins in E. coli. He demonstrated that DnaK/DnaJ or GroEL/GroES co-expression, particularly when combined with lower growth temperatures, could improve the yield of soluble recombinant polypeptides (although this did not work in all cases). He also discussed the 3D structure and mechanism of action of “holding chaperones” like Hsp31 and Hsp33 and their role in keeping intracellular proteins in a soluble state under conditions of heat stress. Dr. Baneyx also presented results on the E. coli cold shock response and described a set of novel vectors based on the promoter and 5’-UTR of the cspA gene. He showed that such vectors, particularly when combined with a rbfA- host mutation to block post-induction repression of the cspA promoter, allowed for improved expression of some toxic, proteolytically sensitive, or poorly translated proteins in E. coli. Finally, he discussed the use of active site mutants of thioredoxin 1 (TrxA) expressed in trans in a trxB mutant host strain to promote stable folding and oxidation of some disulfide-containing proteins (e. g., PhoA and IL21) in the E. coli cytoplasm.
Dr. Barbara Morris discussed how various N- and C-terminal fusion tags—especially hexahistidine, NusA, glutathione-S-transferase, thioredoxin, and the “S-tag” (a fusion to the RNase S peptide – see below)—could be employed in the pET vector systems to increase expression, solubility, and/or purifiability of recombinant proteins in E. coli. Although she said that there is “no magic tag” that works for all proteins, she mentioned that Roger Harrison (Univ. of Oklahoma) had software available that had some value in predicting a priori which tags might work best with a particular protein. Thioredoxin reductase (trxB) and glutathione reductase (gor) mutant hosts have also been used to create a more oxidizing environment for the folding of disulfide-containing proteins in the E. coli cytoplasm. Regarding rare codons, Dr. Morris showed how in some cases these could cause premature translation termination, resulting in truncated gene products and mimicking protein degradation problems; this could be addressed by co-expression of cognate tRNAs encoded by a suitable vector such as pRARE. The S-tag system (see above) provides a rapid and sensitive reporter for expression, since S-protein can be added to reconstitute active RNase, and this can be assayed using a fluorescent substrate. Dr. Morris also discussed highly parallel robotic approaches for small scale purifications and the use of “pDUET” vectors for co-expression studies. Finally, she reported on recent results with high-throughput expression screening and scale-up using baculovirus vectors and insect cells.
Dr. Stephen Chambers discussed protein production for structure-based drug design. He emphasized that he and his colleagues are pursuing a family-based strategy of protein expression and structure determination for certain therapeutic target classes (e. g., protein kinases); he pointed out that this endeavor shares some similarities with structural genomics approaches. There are differences, too, however: Vertex scientists are almost exclusively focused on human proteins, which can be quite complex and difficult to express in active form, and a high premium is placed on timely production of structures of high-value targets rather than just plucking the “low hanging fruit.” To increase the chances of success, they use a parallel approach of expressing targets in both E. coli and insect cells. If a given protein does not immediately exhibit robust expression in E. coli, they do all further work in insect cells. Remarkably, he reported that they can extrapolate conditions directly from expression results in deep well plates to fermentors. Insect cell culture reagents were reported to be about five times more expensive than E. coli media, etc., but the pay-off in terms of structures was about five times higher (86% of proteins attempted!) in insect cells. They have explored limited proteolysis and site-directed mutagenesis to increase solubility and crystallizability of target proteins. With a philosophy of “no protein left behind,” they also try refolding. in an attempt to resurrect expressed but insoluble proteins.
Several short talks were presented that complemented invited speakers in this session.
Dr. Christopher Mehlin discussed computational approaches to domain prediction to aid the expression of difficult proteins from pathogenic protozoa. The SGPP is focusing on the structural genomics of pathogenic protozoa, for example, Plasmodium sp., Leishmania sp., and Trypanosoma sp. These are very challenging targets and so far only ca. 5% of the full-length gene products have been expressed in soluble form using E. coli. To overcome this problem, Dr. Mehlin reported that they have employed the strategies of attempting to express orthologues from multiple homologous genomes and also domain parsing using an approach that they have termed “chunking” (see below). A panel of forty high value targets (representing classes of enzymes with known inhibitors) has been chosen, since success with one or more of these might feed directly into an anti-infective drug discovery program. The chunking strategy involves domain prediction and creation of constructs containing all possible combinations of contiguous domains (e. g., a three domain protein would yield N(N+1)/2 = 6 basic constructs). Prediction of exact domain boundaries can be tricky, though, so another combinatorial level of complexity involves using pairs of PCR primers to vary the exact start and stop points. By a combination of these two approaches, at least some fragments of previously intractable targets can be expressed in soluble form and crystallized.
This discussion was extended by Dr. Ming Luo who presented an approach to systematic identification of protein domains for structure determination. In their studies of ORFs from C. elegans, the SECSG has cloned more than 7000 genes and expressed more than 3000, yielding more than 500 soluble proteins of which ca. 200 were purified and 65 crystallized. Dr. Luo reported on the use of computational (analyses of homologies to Pfam domains) and experimental (limited proteolysis w/ mass spectrometry) to parse approximately 50 of these ORFs into individual domains. Upon expressing these identified domains they were able ultimately to produce X-ray structures for 10% of the target proteins.
Dr. Stephanie Cabantous discussed directed protein evolution approaches and new protein solubility reporters. She reported on further refinements of the TBSGC GFP reporter fusion method of Waldo and coworkers. One innovation discussed was the use of circularly permuted GFP fusions whereby an N-terminal fragment of GFP is fused upstream of the gene of interest and a C-terminal fragment is fused downstream. Reconstitution of an intact and active GFP thus requires a full-length fusion product, and this prevents truncations of the gene product of interest due to proteolysis, internal translation initiation, etc., which with earlier generation GFP fusions could artificially give rise to fluorescent colonies. She also reported on an improved directed evolution strategy. The bulk of the talk, however, described a new “microdomain” GFP reporter whereby a small peptide fragment (“S11”) of GFP is fused to the gene of interest. The resultant fusion polypeptide is only fluorescent when a large GFP fragment (“S1-10”) is added in trans, giving rise to a signal by intragenic complementation. S11 has been engineered to minimize its influence on the folding and solubility of the gene of interest. By co-expressing the S1-10 fragment one can assess from the colony phenotypes the expression levels and (if the S1-10 fragment is induced sequentially) the solubility of the gene products of interest, enabling both of these critical properties to be assayed in vivo in a facile manner without resorting to biochemical analyses.
Dr. Andrei Alexandrov and his colleagues at the SGPP have undertaken to create a plasmid that can be heterodimerized in a tandem orientation with another plasmid of the same basic form but with a different gene from the first, thus enabling facile co-expression of both gene products. The key to this technology is the incorporation of a special 61bp sequence (termed “FLIP”) in each plasmid which, when digested with appropriate restriction enzymes and the T4 DNA polymerase 3’à5’ exonuclease, allows tandem head-to-tail heterodimeric plasmids to be assembled without interfering side reactions such as self-ligation or head-to-head/tail-to-tail dimer formation. Using an expression clone library prepared in this vector, co-expression of all pairwise combinations of the library’s gene products can be investigated. This was tested on P. falciparum protein pairs that had previously exhibited positive signals in a yeast two-hybrid screen. In one such case (a pair consisting of an ubiquitin-conjugating enzyme E2 and an ubiquitin-protein ligase E3) it not only facilitated expression of the two polypeptides but also allowed them to be purified as a complex.
Session II Alternative and Novel Prokaryotic Expression Systems
Session II focused on alternative and novel expression systems using prokaryotic hosts. In the first talk, Dr. Mark Fisher from the University of Kansas Medical Center addressed the issue of refolding insoluble or misfolded recombinant proteins in vitro using a combination of bacterial GroEL chaperonins and osmolytes. Recombinant proteins are captured by the chaperonin in a stable, partially-folded intermediate state that can be washed, concentrated, and also immobilized. Many proteins can be released from the soluble or immobilized chaperonin with ATP or ADP alone. Alternatively, various osmolytes, which may also contain potential ligands, can cause release of the folded protein. Arginine, sucrose, glycerol, proline, betaine, TMAO, and sarcosine have all been used successfully to produce folded proteins using the chaperone system. An array system has also been produced to screen small molecules to identify those that can prevent protein aggregation. Such a strategy is being utilized to test ligands with potential therapeutic value as they prevent the misfolding of mutant proteins that cause disease, such as cystic fibrosis and certain types of diabetes. The release of folded proteins in high concentrations from immobilized chaperonins in the presence of potential ligands using various osmolytes clearly has a great deal of potential utility in structural genomics research.
In the second invited talk, Dr. Jay Keasling, from the University of California at Berkeley, discussed the genetic tools that are available for manipulating recombinant protein production in Escherichia coli. The model system concerns enzymes involved in the complex metabolic pathway that is used by certain plants for the synthesis of artemisinin, the starting material for terpenoid-type drugs that are used on a world-wide basis. It was shown how gene expression levels can be altered by the copy number and also the stability of the recombinant plasmids, while bacterial artificial chromosomes (BACs) are stable indefinitely in the absence of selection pressure. Depending on the promoter that is chosen, gene expression levels can also be controlled by varying the number of induced cells as well as by varying induction in individual cells. In the case of a recombinant metabolic pathway, alternative strategies include all-or-none pathway control or regulated control with individual control elements. The issue of mRNA stability was also addressed where a cassette system and synthetic hairpins can be utilized with synthetic operons to control overall protein production. Clearly, the novel techniques that are being utilized in pathway design and optimization of gene expression in reconstituting metabolic pathways also have enormous utility in the high throughput production of recombinant proteins in E. coli and other recombinant hosts.
In the first of a series of short talks, Dr. Rosie Kim from the BSCG discussed on-column chemical refolding of protein inclusion bodies (IBs). The general strategy was to bind urea-solubilized IBs on a nickel-NTA column and elute with buffers containing detergent, cyclodextrin and imidazole. The most efficient conditions were selected by a purification screen that can be operated in a high-throughput fashion. Seven out of ten proteins tested were successfully folded by this method in amounts sufficient for crystallization trials.
Dr. Philip Bryan of the University of Maryland Biotechnology Institute described a one-step affinity purification scheme for recombinant proteins that have been engineered to contain the C-terminal pro-region of subtilisin (proR58). The fusion protein is bound to a SBT-S column (Kd < 1 nM) and the target is cleaved in a slow, single turnover reaction to release the target protein while the processed proR58 is retained (Kd < 0.1 nM). Eight different recombinant proteins from both prokaryotic and eukaryotic sources have been purified so far using this method.
The expression in E. coli of genes encoding periplasmic proteins and soluble domains of membrane proteins was the topic of the short talk by Dr. Frank Collart of the MCSG. The signalP algorithm was used to identify 200 targets from the Bacillus subtilis genome and these were cloned and expressed using standard high-throughput protocols. Approximately 40% were expressed in the soluble fraction using tag-detection methods. From these data it is apparent that secretory and periplasmic proteins, together with low complexity helical membrane proteins, are clearly targets for SG pipelines although improved methods are needed to predict signal sequences and the boundaries of the soluble domains of helical membrane proteins.
Dr. Steve Anderson of the NESG discussed the production of recombinant proteins in E. coli using target genes from E. coli, C. elegans, Drosophila and human and a variety of different expression systems. These included the cold-induced (pColdI) promoter, maltose binding protein fusions and ubiquitin-like protein fusions, both of which were coupled with protease cleavage, and expressed in the cell-free wheat germ system. The expression of fifty different eukaryotic target genes in eight different systems showed that each offered some advantages for certain genes over the standard pET system.
The final talk of the session was given by Dr. Mark Sullivan of the SGPP. This presentation focused on the production of single chain antibodies and their use in stabilizing recombinant proteins to facilitate crystallization. Fab fragments of monoclonal antibodies are typically used for such studies but their isolation is time-consuming and expensive. The SGPP is using phage display and a large, human single-chain Fv (scFv) library to select for antibodies that bind to target proteins from Leishmania major. A total of 33 distinct His-tagged scFvs were identified for eight target proteins. These were evaluated for complex formation using immunoaffinity chromatography and gel filtration. This approach clearly has a great deal of potential utility, especially for less stable proteins.
Session III Eukaryotic Expression Systems
Session III focused on the development of eukaryotic expression systems.
Dr. Don Jarvis – Insect cell expression utilizing baculovirus vectors are in common use and have demonstrated great success in expressing mammalian proteins. The system has the advantage of high-level expression using the polyhedrin late promoter as well as eukaryotic processing for glycosylation. The glycosylation can influence protein function. Insect cell glycosylation is somewhat different than mammalian glycosylation producing an abbreviated paucimannose versus a sialylated complex sidechain. Mammalian glycosylation can be achieved by providing the appropriate human glycosylation pathway in recombinant insect cells. Engineered insect cell lines have been developed and are commercially available for producing proteins with humanized glycosylation.
Dr. Michael Meagher – Large-scale protein expression for biopharmaceuticals requires system engineering and quality controls development. Quality control requires significant protein characterization including: N-terminal sequencing, tryptic digest and peptide map (LC-MS/MS), mass spectrometry, post translational modifications, amino acid analysis, isoelectric focusing, and bioassays. Pichia pastoris is a preferred expression host. High density expression levels for cost-effective production can be obtained. High-level expression requires strain evaluation and recombinant gene copy number evaluation. Fermentation parameters should be optimized. The most critical “raw material” is the cell line.
Dr. Sabine Geisse – Milligram quantities of recombinant proteins can be expressed using a transient transfection system. This approach provides a parallel option for protein expression utilizing the same materials as for cell-based expression. HEK.EBNA cells are the preferred cell line for expression because they can grow in serum free media, can be made stable expressors, express secreted, membrane, and intracellular proteins, and are commercially available. Commercial vectors can also be used. Protocols were presented showing >10mg/liter expression levels in 96-well format sufficient for screening protein properties. Calcium phosphate or PEI can be used as transfection reagents at large scale. Large-scale plasmid production is a bottleneck. Multiple success stories were presented. An80% overall success rate in large-scale transient transfection was achieved.
Dr. John Markley – Cell-free expression was presented as a parallel pathway for protein production with particular emphasis on labeling methods for NMR. It is also used for screening of potential targets from eukaryotic genomes for suitability for structural studies. Production of labeled proteins has been demonstrated on the scale of several milligrams. Assessment is ongoing of this approach for high-throughput structure determination and improvement of the technology through its use in a production environment. Evaluation of GST and His tag expression shows about 50% success rate in soluble expression with 80-90% overlap in soluble targets. Cell-free methods are now being supported by a robotic systems.
Short-talks focused on automation systems. Heath Klock presented a semi-automated approach to cloning for expression. The approach utilizes a conventional restriction enzyme method for cloning but employs commercial robotics to process clones in a high-throughput manner. Fluorescent dye-based screening of diagnostic PCR is combined with automated gel loading for initial clone analysis. Sequence analysis is semi-automated with a Perl script. Current throughput is 384 clones per week with one operator. The JCSG has developed semi-automated cloning robotics that can routinely attempt 384 clones a week. 89% of the PCR amplifications generated have been successfully cloned as verified by sequencing (76% of total attempted targets). This throughput can adequately supply the expression and purification pipeline with expression vectors of novel targets and truncations. He described methods used to semi-automate the entire cloning process. Liquid and plate handling along with motorized-lid thermocyclers from MWG Biotech have been integrated to set up and run PCRs, restriction digests, ligations, and DNA cleanups, as well as load agarose gels and assay reaction wells for successful PCR amplifications. He summarized results of JCSG projects that included cloning and characterization of several hundreds of gene targets from a variety of bacterial genomes and mouse. These results demonstrate the capabilities of the robotic platform to provide an avenue to high-throughput cloning that utilizes little manpower and is rapid and cost-effective.
Dr. Thomas Acton – The protein production pipeline was presented with an emphasis on parallel evaluation of varying expression constructs. These constructs are evaluated for expression at small scale utilizing commercial robotics systems. Conventional FPLC purification is followed by HSQC screening of proteins. Buffer optimization is performed utilizing light scattering. Overall 60% soluble expression was achieved. Full-length proteins and protein fragments from fully sequenced eukaryotic organisms have been clustered into domain families, and those families that do not yet have a representative three-dimensional structure are targeted for investigation. Multiple homologues from each domain family are selected for study both from the eukaryotic "Target Proteomes" and from bacterial and archeal "Reagent Proteomes." High-throughput cloning has been implemented in 96-well format using a Qiagen BioRobot 8000 together with Primer Prim’er, a web-based automated primer design program, and a pET-based “Multiplex Vector” expression system. The robotic protocols developed for this classical restriction endonuclease/ligase dependent cloning system have enabled the team to clone and express more than 1800 of the NESG targets. The platform includes robotic 96-well protein expression screening technology together with 96-well affinity purification methods. Parallel large-scale fermentation and purification of tens-of-milligram samples of proteins suitable for crystallization, X-ray crystallography, and NMR analysis have been developed. Protein samples are screened by analytical gel filtration in combination with static light scattering to identify conditions favoring monodispersity, providing samples more suitable for crystallization and 15N-1H HSQC NMR screening. The scalable platform currently has a capacity for cloning and expression-screening of ~ 5000 proteins per year, and for the production in the tens-of-milligram scale for some 400 – 600 proteins per year.
Dr. Jeff Bonanno presented a high-throughput approach to defining protein domains using partial proteolysis and mass spectrometry was. The NYSGXRC has built a cost effective high throughput platform for x-ray crystallography. Platform involves technology development and integration in molecular biology applications as well as in protein purification and structure determination. The next step of upgrading of this platform is to develop approaches that deal with difficult protein systems and poorly diffracting protein crystals. A biochemical method that combines limited proteolysis and matrix-assisted laser adsorption ionization mass spectrometry has been shown to efficiently determine protein domains that can have improved crystallization properties. The method is based on inferring structural information from determination of protection against enzymatic proteolysis, as governed by solvent accessibility and protein flexibility. This approach can be supported by bioinformatics analysis. A mixture of trypsin, lysC, gluC, chymotrypsin, and detergents are evaluated with individual proteins. Digests are mixed with matrix and evaluated by MS. Data analysis utilized PAWS to identify stable domains for crystallization. Analysis performs peptide identification and sequence alignments for optimal domain definition. The method has been applied to 270 targets with a minimum of personnel and 6 months effort.
Session IV Methods to Minimize Sample Heterogeneity and Improve Crystal Diffraction
Session IV focused on new methods to minimize sample heterogeneity and improve crystal quality. Dr. Rusten Ismagilov from the University of Chicago presented a microfluidic system for protein crystallization using nanoliter volumes. A nanodroplets microfluidic system that relies on chaotic advection to rapidly mix multiple reagents isolated in droplets (plugs) was described for protein crystallization. Due to rapid mixing, low sample consumption, and transport of reagents with no dispersion, the system is particularly appropriate for chemical kinetics and biochemical assays as well as for screening a wide range of crystallization conditions. The mixing occurs by chaotic advection and is rapid (sub-millisecond). Screening protein crystallization using microfluidics showed that the rate of crystal nucleation is low in nanoliter volumes. It appears that the nucleation rate is dependent on the surface area not to the volume. The system can be used to probe a wide range of crystallization conditions.
Dr. Thomas Record from the University of Wisconsin, Madison, discussed ASA-based thermodynamic analysis of solute effects on protein processes and its importance to protein stability, interaction and crystallization. Solute effects arise from preferential interactions as solute and water compete for the biopolymer surface. The vapor pressure osmometry (VPO) is an efficient and accurate method for characterizing preferential interactions. Biopolymer surfaces range from nonpolar and uncharged to highly charged, for example, native protein surface is 20-30% charged. Studies of preferential interaction with native BSA showed that urea (denaturant) is weakly accumulated near native BSA surface and betaine (osmolyte) is strongly excluded (from anionic carboxylate oxygens). However betaine has qualitatively different interactions with different surfaces. For the homologous series of surfaces exposed in unfolding globular proteins (with similar surface compositions and a wide range of ASA, values of preferential interaction coefficient G for preferential interactions of urea and GuHCl are proportional to m3 and to ASA, and Kp is the same for all proteins in the series. Analysis of the exclusion of glycine betaine (GB) from different biopolymer surfaces indicates that GB is completely excluded (Kp = 0) from anionic (carboxylate, phosphate) oxygen surface and that hydration of this anionic surface is 2 layers of water (0.23 H20/A2). GB therefore drives biopolymer processes in which anionic surface is dehydrated. Urea accumulates at polar amide surface of proteins and nucleic acid bases (Kp = 1.8 if hydration is a monolayer); urea appears to be weakly excluded from anionic oxygen surface. The analysis shows that it is possible to quantitatively predict effects of urea, GB on biopolymer processes from structure (∆ASA; composition). In absence of structure, one can interpret effects of urea, GB in terms of ∆ASA if one assumes a particular surface composition.
Short presentations focused on several issues associated with crystallization and crystal optimization. Dr. Jarmila Jancarik, from the University of California, Berkeley, presented an optimum solubility (OS) screening as an efficient way to optimize buffer conditions for homogeneity and crystallization of proteins. Aggregation and polymerization are a major concern in obtaining good quality proteins that can lead to successful crystallization results. This problem is becoming more prominent with attempts at crystallizing a variety of different proteins and protein complexes that suffer from aggregation and misfolding problems. In the process of developing techniques to automate protein purification, one tries as much as possible to use a set of generic buffers. Very little is known about the properties of the proteins they are working with a priori, except for theoretical pI, molecular weight and amino acid composition. General practice is to use one or two favorite buffers where pH and salt concentration are the only variables. However, a protein has complex properties and its condition and behavior depend very much on its environment. A screen was developed in which a panel of buffers and additives were tested in order to obtain the most suitable condition for proteins that usually aggregate and cannot be concentrated for crystallization experiments. A panel of buffers was tested using a hanging-drop method and vapor diffusion equilibrium. After monitoring precipitation, the conditions leading to clear drops were selected for dynamic light scattering (DLS) characterization. For this part of the screen only 24 µl of protein are required. If the DLS results are not optimal, a series of additives are tested in the presence of the best buffer, and again DLS is used to determine the best condition. They have tested 12 poorly behaving proteins so far: after screening with the buffer panel, 10 of them had highly improved DLS results and were able to concentrate well after exchanging the initial buffer, and out of the ten, 4 have crystallized.
Jakub Bielnicki from the University of Virginia presented results of enhancement of protein crystallization and crystal quality by rational surface mutagenesis on selected SG targets. Protein crystallization constitutes a limiting step in structure determination by X-ray diffraction. Flexible surface amino acid side chains may be a possible cause of difficulties in the crystallization. Recent studies show that targeted mutagenesis of surface patches containing residues with large flexible side chains and their replacement with smaller amino acids lead to effective preparation of X-ray quality crystals of proteins otherwise recalcitrant to crystallization. Furthermore, this technique can also be used to obtain crystals of superior quality as compared to those grown for the wild-type protein, sometimes increasing the effective resolution by as much as 1 Å or more. Secondary structure and fold predictions and non conserved residues are used as a guide to design mutations of patches of Lys/Glu to Ala. Using this approach four MCSG targets that failed initial crystallization screening have been resurrected. An example of YdeN protein from B. subtilis was provided. Of three prepared double mutants, i.e. E124A/K127A, E167A/E169A and K88A/Q89A, the latter gave high-quality crystals and the crystal structure was solved by SAD at 1.8 angstroms resolution. Several recent examples of this new methodology suggest that the method has the potential to become a routine tool in protein crystallography.
Dr. Rebecca Page presented results of shotgun crystallization of the T. maritima proteome: protein properties and crystallization conditions which correlate with crystallization success. As new technologies are developed in the field of structural proteomics, novel strategies are needed to efficiently crystallize large numbers of protein targets. One strategy, developed for the high-throughput structure determination of the T. maritima proteome, is to determine which proteins have a propensity for crystal formation followed by focused crystallization attempts. This experimental effort has resulted in over 325,000 individual crystallization experiments, yielding 456 crystallized proteins from 1376 attempted clones. As such, it has provided one of the most extensive, systematic datasets of commonly used crystallization conditions against a wide range of proteins to date. Analysis of this data has enabled the JCSG to successfully streamline its structure determination pipeline for targets with a natural propensity to crystallize. Specifically, they have reduced the number of initial screening trials from 480 conditions at two temperatures to either 288 or 96 at a single temperature, depending on the amount of protein available, since all of the T. maritima proteins that readily crystallized could be identified using just 23% of the original conditions. In addition, they screen tier 1 and tier 2 samples separately, as proteins processed in tier 2 (expressed in the presence of selenomethionine with extensive purification) crystallize in conditions distinct from their tier 1 counterparts (native/minimal purification). Finally, they find that the two-tiered strategy employed here is extremely successful for predicting which proteins will readily crystallize, as greater than 99% of the proteins identified as having a propensity to crystallize under non-optimal conditions, did so again as selenomethionine derivatives during the focused crystallization trials.
Dr. Deirdre Meldrum, from University of Washington discussed a high-throughput, capillary-based system for protein crystallography. Crystal growth is an essential step in obtaining three-dimensional structural information about biomacromolecules such as proteins by X-ray diffraction. The long-term goal is to develop a fully automated Protein Crystallography Sample Processing System that will provide high-throughput, fully automated sample processing in a single unbroken pipeline that extends from incoming purified proteins to outgoing crystal samples ready for crystallographic analysis. Ultimately, this capillary-based system will include hardware and software for automated preparation of picoliter to nanoliter reaction volumes, efficient growth of samples in capillaries in a temperature and humidity controlled environment to traverse the phase space as desired, image processing and associated optics to detect and classify crystals, and retrieval of desired samples for X-ray diffraction analysis. This project enables them to work on the proof-of-principle steps toward the long-term goal. They reported progress in piezoelectric reagent dispensing for various proteins and precipitants, imaging approaches for identifying and classifying crystals, and protein crystallography experiments on their automated system, ACAPELLA, in the SGPP. Protein crystals have been successfully grown in capillaries and subsequently cryo-cooled in situ. High resolution data has been collected at the ALS synchrotron (Berkeley, CA).
Dr. Robert Thorne from Cornell University discussed how to grow large protein crystals for diffraction experiments. Fundamental studies by several groups over the last decade have established how macromolecular crystals grow and why they stop growing. Impurities play a key role in causing cracks and other macroscopic disorder, and in limiting crystal size. . A growing crystal can relieve strain by cracking, by forming twin boundaries, and by forming small angle grain boundaries. Strain relief by these mechanisms is more probable in larger crystals. Polycrystalline balls, excessively twinned crystals, or cracked crystals usually arise from inadequate growth solution and purity combined with excessive initial growth rates. Microseeding into reduced supersaturation solutions for initial growth can yield reduced stress cores for subsequent growth. He also discussed the role of impurities in preventing growth below a minimum supersaturation. If some of the impurities bind irreversibly and the growth rate remains zero for an extended time, then eventually the crystal surface will be covered by irreversibly bound impurities, and growth will cease. Raising the supersaturation by adding more protein will not cause growth to resume. Protein degradation and impurity segregation cause the effective impurity concentration within a crystallization drop to increase with time, increasing the supersaturation required to sustain growth. Therefore, supersaturation should always be maintained above the threshold for surface impurity coverage. Use macroseeding, frequent changes or additions of growth solution, and freshly purified protein at each step.
Dr. Irina Dementieva from Argonne National Laboratory presented parts of the high-throughput structure determination pipeline developed at the Midwest Center for Structural Genomics (MCSG). She placed special emphasis on the crystallization bottleneck since considerable attrition occurs between producing purified proteins and obtaining diffraction quality crystals. This process includes: protein concentration and storage, initial crystallization screening, diffraction testing, crystal optimization, obtaining heavy atom derivative, and finding cryo-conditions. An attempt was made to optimize these steps using the large amount of data obtained during implementation of MCSG structure determination pipeline. In the MCSG crystallization pipeline, proteins are purified in a few chromatographic steps using semi-automated AKTA-3D system and concentrated and stored in liquid nitrogen. Initial crystallization screens are set up in 96-well format with a nanoliter crystallization Honeybee robot using commercial crystallization screens, and droplets are incubated at two different temperatures. The screen set up is completed in less than two hours. Crystals are cryo-protected using a set of standard solutions and tested for diffraction properties. The rapid feedback from diffraction experiments is a key element of crystal optimization. Crystals are optimized with homemade screens produced with Crystal Monitor software on a Matrix Maker. A basic set of 60 solutions were chosen to cover a majority of hits with polyethylene glycols, salts, additives and a wide range of pH, yielding flexible sparse- or grid-type approach for screening and optimization. For a majority of proteins a seleno-methionine derivative can be prepared; other derivatives are tried only if Se-Met labeled protein cannot be obtained. In the past several months, this approach has been tested on nearly 600 soluble proteins purified from B. subtilis, B. stearothermophilus and Aquifex aeolicus. Macroscopic crystals were obtained for 35% of the protein samples, more than half of which were of diffraction quality and did not require significant optimization. The MCSG database is used to further refine and optimize crystallization and crystal handling procedures.
Dr. Zhi-Jie (James) Liu from Southeast Collaboratory for Structural Genomics (SECSG) discussed crystal salvaging efforts. The crystal salvaging effort is defined beyond additional crystallization screening to produce high quality diffraction crystals. If routine screening either fails to crystallize the protein target or the crystal quality prevents X-ray analysis, these proteins are diverted to the crystallomics group of SECSG. A variety of salvaging pathways are employed including reductive methylation of surface lysine residues and alternative expression/purification approaches. The crystallomics group is a newly established unit within SECSG. A procedure for effectively combining crystal screening and optimization was introduced.
Joseph Luft from SGPP discussed the use of high throughput screening to determine lead crystallization conditions. An HTS facility located at HWI in Buffalo screened 121 SGPP macromolecular samples for crystallization leads from July to December 2003. The 121 samples were used to set up 185,856 unique microbatch-under–oil crystallization experiments. Images of these experiments underwent manual inspection. For these 121 samples: 38 produced ‘definite leads’; 36 ‘questionable leads’ and 47 ‘no leads’. Samples with ‘definite leads’ had an 80% reproducibility rate in Seattle during follow-up crystallization trials. New Greiner plate with a round well has been successfully tested. The HTS facility was also used for challenging projects. A series of co-crystallants, freezers and glues, designed to promote crystallization through intra- and inter- molecular interactions between macromolecules is under development in Seattle. Nine of these co-crystallants were investigated at the Buffalo HTS laboratory in a study that included 72,192 experiments (9 co-crystallants, 7 macromolecules and 1536 crystallization cocktails). Outcomes from 43,008 of these experiments were manually scored and the results were tabulated. In the majority of cases, fewer crystal hits were observed in the presence of the co-crystallants. However, one macromolecule produced significantly more crystals in the presence of a co-crystallant. Growth of co-crystallant/macromolecule complex crystals suitable for X-ray diffraction analysis is underway. Structures determined from these crystals will be analyzed in Seattle to locate any specific molecular interactions that may be taking place.
Lori Anderson from SGPP described protein crystal growth for eukaryotic pathogenic protozoa. The Protein Crystal Growth Unit of SGPP receives flash frozen protein samples in 96 well trays which are checked for quality by: SDS PAGE, native PAGE, dynamic light scattering and limited proteolysis. Top priority is given to “definite hits” discovered in the High Throughput Screening Center in Buffalo which have in the order of a 50% success hit rate for well-purified SGPP protein samples. First generation follow-up experiments employ different strategies, including the Oryx Robot for vaporbatch under oil and vaporbatch sitting drop procedures with incomplete factorial matrices, as well as the Hydra-plus-One robot in sitting drop mode and systematic follow-up matrices. Crystal Imaging is performed by the Robodesign Microscope II robot and images are stored in the RoboDesign CrystalMation Database. Next generations of optimization involve a variety of methods, including additive screens and micro-seeding. The success rate of optimizing “Definite Buffalo Hits” is on the order of 80%. So far 23 proteins with diffraction size crystals have been obtained. Correlations between SDS gels, DLS, limited proteolysis and crystallization success rate were presented. All diffracting crystals were stable in proteolysis tests and most have good DLS. The Acapella 5K crystallization robot is being tested for optimization of crystal growth in capillaries. Data files derived from live images taken by the Acapella cameras are being used to optimize the high-resolution imaging and image storage in the RoboDesign system.
Dr. Igor Jurisica from Ontario Cancer Institute discussed classification and data mining of high-throughput protein crystallization screens. One of the important issues is automated detection of crystals. Different approaches and algorithms have been tried. Data include false positive as well as false negative images. Data mining of results allows for comparisons and extracting trends that can be used in planning and evaluation of new experiments.
Session V Membrane-Associated Proteins
In session V presentations focused on production and characterization of membrane proteins for structural biology applications. Dr. Reinhard Grisshammer from National Institute of Diabetes and Digestive and Kidney Diseases/NIH discussed large-scale expression and automated purification of G-protein-coupled receptors for structure determination. The G-protein-coupled receptors (GPCRs) are integral membrane proteins involved in many important physiological processes, including cell-to-cell communication, mediation of hormonal activity and sensory transduction. This presentation addressed expression and purification of functional GPCRs. It covered (1) comparison of methods for over-expression of GPCRs in various host systems, (2) usage of a maltose binding protein fusion approach in E. coli, (3) analyzed factors influencing the expression levels and stability of GPCRs, and (4) large scale automated purification. One of the successful constructs consisted of a signal peptide-MBP-neurotensin receptor (NTR) gene-trxA-His10. This fusion protein was successfully purified using a Ni-NTA step followed by a neurotensin column. Usage of 30% glycerol was necessary to stabilize the purified protein. Functional expression of NTR fusion protein in E. coli was achieved. Presently crystallization trials of NTR are in progress.
Dr. William Cramer from Purdue University discussed purification and crystallization approaches for integral membrane proteins involved in transport across membranes. The structure of the energy-transducing cytochrome b6f complex of oxygenic photosynthesis from Cyanobacterium (8 subunits, molecular weight of 110 kDa) was presented. The talk emphasized that in this case, if the complex was purified to a high level, it would be delipidated and thus unstable. These initial preparations degraded and lost activity during the course of a week. The understanding that delipidation had occurred and that addition of a pure synthetic lipid, DOPC, at a ratio of 10 molecules per monomer immediately after the last purification step, allowed for the appearance of good quality crystals overnight. These crystals resulted in a 3 A structure of the dimeric complex. It was emphasized that membrane proteins should not be stored frozen.
The first short talk in this session was given by Dr. Robert Nakamoto from the University of Virginia. He discussed a genomics approach to expression of membrane proteins from Mycobacterium tuberculosis. Predicted membrane protein genes from Mycobacterium tuberculosis (228 ORFs) were cloned and 150 expressed. Approximately 50 expressed at greater than 5 mg/L. Good expression success was observed when C or N terminal 6 or 10 His tag on the targets were expressed the E. coli C43 strain . He emphasized that it is important to test tags at either the N- or C- terminus and it is also important to screen detergents. The formation of intracellular lipid bilayers (membrane tubes) in E. coli was observed by electron microscopy. Promising 15N labeled proteins that had been refolded were obtained.
Dr. Linda Columbus from JCSG discussed expression and purification of α-
helical membrane proteins from Thermotoga maritima
for NMR structural studies. Fifty predicted α-helical transmembrane proteins from T. maritima
were cloned using an N-terminal His tag fusion and expressed in E. coli
. The membrane fractions were solubilized with n-dodecyl-b-D-maltoside and purified by Ni2+
-affinity chromatography. A screen of twelve different detergents was tested to optimize solubility and protein yields. It was shown that 1
H-1DNMR spectroscopy is suitable for characterization and evaluating the overall fold of the proteins.
Dr. Mark Dumont from SGPP discussed expression and purification of membrane proteins from Leishmania major for structural genomics. Targets with two or more transmembrane regions from L. major were cloned and expressed in E. coli and Pichia pastoris. Using ligation-independent cloning, Calmodulin binding peptide and His6 tag fusion proteins were expressed and later cleaved with rhinovirus 3C protease. He has had better experiences with E. coli strain BL21 than with C43. For solubilization of the membrane proteins, 0.5 % fos-choline-16 was used and later the detergent was exchanged for a more suitable one. Out of 50 expressed proteins in E. coli, 20 of them could be expressed at levels sufficient for detection in lysates by Coomassie Blue staining.
Dr. Han-Seung Lee from SECSG presented the development of methods for heterologous membrane protein production and purification. Fifty ORFs containing one or more transmembrane domains were selected from Pyrococcus furiosus for expression. The study showed that 25 ORFs out of 50 that did not express in BL21(DE3) gave positive signals in C43(DE3). Those ORFs with fewer predicted transmembrane regions were more likely to express. The proteins were purified using octyl-b-D-glucopyranoside. Purification was accomplished by metal affinity chromatography and size exclusion chromatography.
The meeting summary was presented by Dr. Andrzej Joachimiak from Argonne National Laboratory. He emphasized that the workshop demonstrated that major progress in the last year has been made in the technology development, quality, and high-throughput of protein expression, purification, and crystallization. Presentations also demonstrated that PSI pilot centers developed and implemented protein structure determination pipelines that are capable of producing a significant number of novel protein structures. These pipelines are supported by semi-automated robotic platforms for gene cloning, protein purification and crystallization. More significantly, these high-throughput approaches are being applied also to “high hanging fruit” and “high-value” targets.
At present, a majority of protein production is done in E. coli expression systems. Efficient protein expression in E. coli still encounters some bottlenecks that are being addressed:
- Inefficient transcription – optimizing mRNA (synthetic genes)
- Inefficient translation – optimizing codon usage, alternative expression of rare tRNAs (pRARE, Magic etc, use of synthetic genes)
- Inefficient folding – approaches to protein refolding (use of chaperones, cofactors in vivo and in vitro, designing chaperonin/osmolytes folding array system)
- Screening for “good” protein expression systems and scale up problems
- Screening for different vector/host combinations
Meeting participants expressed the need to develop a more flexible and broader cloning strategy that can include, in addition to E. coli, also yeast (Pichia pastoris), insect cells/baculovirus, eukaryotic cells, and cell free expression systems. For example, the baculovirus-insect cell system can be used effectively for production of recombinant glycoproteins with correct post-translational modifications. Eukaryotic expression technologies such as transient transfection into eukaryotic cells can potentially provide an alternative to bacterial and insect cell systems. Application of wheat germ-based cell-free expression system for NMR and X-ray crystallography has been discussed as an alternative to E. coli and a new robotic system was presented. The challenge is how to make these alternative approaches HTP and cost effective.
Several speakers discussed the use of affinity tags to aid protein purification. The Workshop showed quite clearly that there is no ”magic” tag and alternative tags need to be evaluated and tested for a specific protein target (N- vs C-terminal tag, His-tag, Trx-tag, GST-tag, Nus-tag, S-tag). Expression of fusion proteins often improves expression levels and solubility, but the issues of cost need to be considered when a large scale protein production is planned.
Protein solubility remains a major issue in protein expression and purification. Thus far it is very difficult to predict solubility of proteins expressed in E. coli and computational predictions are not very reliable. A number of potential solutions have been proposed and discussed. For example, co-expression of protein pairs may improve expression and solubility. Similarly, co-expression with the target protein of high-affinity single-chain antibodies may improve protein solubility, stability and aid crystallization. Protein super-chunking and better domain identification may help to express more stable domains. Proposed experimental approaches includes protein evolution using split GFP assay, limited proteolysis combined with mass spectroscopy to identify stable domains, and the use of orthologues to identify proteins with better solubility properties. For proteins that expressed in a form of inclusion bodies, on-column chemical refolding of proteins or use of chaperones combined with osmolytes may provide alternative solutions.
A strong progress has been observed in protein production. New approaches and hardware allow for rapid parallel purification of hundreds of proteins. Good promise showed one-step purification and processing of fusion proteins. Examples included the use of engineered subtilisin and TEV protease for cleavage on column.
Impressive progress also has been observed in membrane protein expression and purification. Successful expression of functional membrane proteins in E. coli and other systems has been presented. High-throughput approaches to expression of membrane proteins is being applied to a relatively large number of targets from both bacterial and eukaryotic sources. Quite successful expression of periplasmic domains and soluble domains of membrane proteins in E. coli has been shown. Automation of large-scale purification of membrane proteins is being developed in several centers and the use of NMR for characterization of small membrane proteins has been presented.
Crystallization shows the highest attrition rate in the structural genomics pipelines. High-throughput crystallization has been implemented in several pilot centers. The approach is based on crystallization screening aided by robotic systems. Crystal optimization protocols are being implemented. High-hanging fruit targets are being addressed using several different approaches. One of the most promising involves rational surface mutagenesis to lower protein surface entropy and enhance crystallization. Several crystals have been produced using this approach for proteins that failed in the traditional pipeline. Role of osmolytes on protein stability and crystallization also offers an interesting alternative.
In several presentations the issue of databases have been touched on and the integration of pipelines with LIMs and databases. Questions were asked on how to capture all relevant data, and how to mine databases to improve the process.
At present, a significant fraction of proteins targeted by the pilot centers are “left behind.” New approaches and strategies need to be developed for higher output. The workshop showed that there is still room for improvement. A number of questions remain unanswered and need further research. However, examples of advanced metabolic engineering showed clearly that we can make protein expression better. Similar advances in protein purification and crystallization, some presented by colleagues from the biotech industry, gave participants of the workshop a good feeling of work well done.