Protein Structure Initiative "Bottlenecks" Workshop

April 14-16, 2008

Session 1: Homology Modeling in Structural Genomics
Session 2: Data Databases and Knowledge
Session 3: An International Perspective on Structural Genomics
Session 4: Mini-Session - Targeting for Success
Session 5: Mini-Session - High-Throughput NMR Methods
Session 6: Traditional Bottlenecks
Session 7: The Really Hard Stuff

Improvements in methodology and technology to permit the more rapid, less expensive and more predictable determination of the structure of protein targets remains a major goal of the NIGMS Protein Structure Initiative (PSI). In the present production phase of the PSI, while substantial progress has been made, the challenges of this objective remain high. The 2008 PSI "Bottlenecks" meeting was held at the Natcher Conference Center on the NIH campus in Bethesda, Maryland, on April 14-16 in order to stimulate communication and dissemination of the recent progress in the constituent tasks of protein structure determination in a high-throughput setting. The nearly one-hundred and fifty registered meeting participants included scientists from the four Large-Scale Production Centers, from the six Specialized Centers and also investigators supported by R01, R21, SBIR and STTR research project grants. As in the past, topics of gene cloning, gene expression, protein purification and crystallization and rapid NMR structure determination were highlighted in the meeting program. In addition to this year's meeting, the meeting organizing committee adopted the field of homology modeling as a topical highlight. This inclusion brought investigators allied to the PSI through two P50 center awards and several new R01 investigators into the community of "Bottleneckers". Following the model of previous successful workshops, the program of oral and poster presentations emphasized the sharing of both failures and successes. The main goal of meeting was, as always, to provide an effective platform for scientists to exchange ideas and data, to discuss progress and problems, and to encourage contacts and collaborations among participating investigators.

Dr. Jeremy Berg, the director of NIGMS, opened the workshop and welcomed the participants. Dr. John Norvell, director of the PSI, introduced the workshop and discussed the current progress and the future of the PSI program particularly in the context of the recent evaluation of the Protein Structure Initiative by representatives of the National Advisory General Medical Sciences Council. The program on the first day began with an invited keynote presentation from Dr. Michael Levitt of Stanford University in which he discussed the emerging pictures of protein structure and sequence spaces.

The notes which follow provide an account of the oral program for the 2008 PSI "Bottlenecks" as prepared by members of the organizing committee; Brian G. Fox, University of Wisconsin; Celia Goulding, University of California Irvine; Michael G. Malkowski, Hauptman-Woodward Medical Research Institute; Lance Stewart, deCODE biostructures; Ashley Deacon, Stanford Synchrotron Radiation Laboratory; and Ward Smith of the Cell Biology and Biophysics division of NIGMS.

Session 1: Homology Modeling in Structural Genomics

Growth of Novel Structure and Sequence and Their Relationship (Keynote Presentation)
Michael Levitt, Stanford University

One criticism from the PSI assessment panel was the perception that sequence space is unbounded and therefore it is an unattainable goal to determine the folds that map sequence space. Although sequence space appears to be growing without bound, as represented for example by the Global Ocean Survey, determining the growth of novel folds is difficult to assess. One example of this is the phenomenon of classifier fatigue for projects like SCOP. Another difficulty is determining how to cluster sequences for analysis. The definition of structural novelty itself is problematic. One useful approach to clustering is to examine clusters of non-redundant domains (from NCBI CDART). Clustered this way, it appears that the number of domain patterns is saturating. Novel sequence growth is growing, but much more slowly compared to total sequence growth. Furthermore, Structural Genomics efforts have contributed 40% of known novel domains. Combinations of existing domains are growing much more quickly. This suggests that new structural information will come from understanding the structures of sequences from combined domains.

Modeling Protein Sequence Space with All PDB Structures as Templates
Diana Murray, Columbia University

Two examples of the use of modeling in combination with experimental studies are presented. One example is Phosphoinositide (PI) signaling. Modeling tools have been used to characterize domain families that bind PI. PI lipids are distributed differently within various organelles. Proteins that bind PI typically have two motifs (or multiple motifs) which allow for temporal and spatial separation of binding events. Modeling the electrostatic surface potential and hydrophobic effects on membrane binding proteins provides useful insights. Electrostatic interactions drive binding initially, then desolvation together with hydrophobic interactions drive membrane binding. Another example is Phospholipase c-delta C2 domain, in which Ca++ binding causes basic charge that drives to membrane. But for 5-LO C2 domain, Ca++ ends up making the protein more negative and therefore dissociate from the membrane. The goal is to characterize as many lipid binding domains as possible. Together with modeling, this will help guide and interpret membrane binding measurements. Protein function may then be annotated based on this data. A tool developed for this is SkyLine, a high throughput comparative modeling pipeline. The results are stored in Skybase and can be searched. The system allows outside users to submit sequences for modeling. Now C2 domains of all isoforms of Phospholipase c-delta have been modeled. Predictions based on electrostatics and Ca++ binding to these domains generally correlate well with experimental data from vesicles. Furthermore, mutagenesis studies can be planned at the residue basis from the models to test functional effects. The combination of modeling with experimentation can help prioritize proteins for structure determination so that the resulting structure can have the best possible impact on understanding.

Development of SCWRL4 for Improved Prediction of Protein Side-Chain Conformations
Georgy Krivov, Fox Chase Cancer Center

The goal of this effort is to optimize side-chain conformations. One assumption is that side chains can be located in finite conformations called rotamers. The approach is to build a database of rotamers and neighboring residues and save the information in a database Tree decomposition provides a sparser tree to search more rapidly for rotamers and allows SCWRL4 to handle larger proteins with a global optimum approach, making it possible for the method to perform well as an energy function plus rotamer search. This combinatorial algorithm accuracy depends on rotamer library and energy potentials quality. H-bond potential is important for understanding side chain interaction and anisotropic bond potential. The suggestion is to use 5 parameters to predict H-bonds. This has been shown to be capable of predicting side chains of proteins in crystal contacts. Knowledge of crystal symmetry and surface accessibility can be used to better predict residues in crystal contacts. By tuning the parameters of the Flexible Rotamer Model, an iterative approach to modeling side chain conformations based on neighbor relations can be carried out. Side chains with better electron density are easier to predict with this model (Phe, Tyr, Leu for example). The model predicts stable interactions well. For other residues the accuracy may not be as high (Gln). The new SCWRL4 program offers a Microsoft-Windows compatible DLL to allow users to have an improved user interface for the program.

Distance Matrix-Based Approach to Protein Structure Prediction
Andrzej Kloczkowski, Iowa State University

A distance matrix contains information about distances between residues. A contact matrix represents the interaction of 2 residues. Mathematical operations on these matrices can be used to understand protein structure, including a method for refine NMR structures using derived distance and mean-force potentials. The development of elastic network models of proteins can predict motion of protein regions similar to a model of atoms as nodes connected by springs Combinations of B-factor, R-square distance, and fluctuations, are all correlated according to the model. Using principal component analysis, based on 680 non-redundant structures from ASTRAL database, the prediction of structures is within 7.3 Angstroms (DRMS). An analysis was undertaken of the principal component of motion of HIV-1 protease relying on 164 X-ray and 28 NMR PDB structures, and 10,000 molecular dynamics simulations. The models suggest there are 3 different ways for the HIV Protease flaps to open. The principal component analyses do not correlate well with motions deduced from B factors from X-ray structures but there is a much better correlation with the NMR structures.

Multiple Mapping Method with Multiple Templates for Homology Modeling
Andras Fiser, Albert Einstein College of Medicine

Sequence alignments are important drivers for modeling structure. But popular methods for alignment are often giving the same data with a sequence alignment with ~20% sequence identity. There is no "right" alignment. There are different approaches and different scoring functions. So the goal is to combine sequence alignments in the context of structure. Combinations of alignments should give a better alignment that is optimally combined. The approach is to determine optimal splicing of alternative alignment inputs, and evaluate how the different combinations fit into the template environment. This uses the known structural environment for a homologue to drive the ranking of the sliced alignments. There is a need to grasp what information from the structure is useful in this approach. One approach is to use an independent H3P22 scoring function. The results of tests with multiple mapping methods, e.g. EasyPred3D, consensus, MMM, suggest that sometimes an underperforming alignment technique can contribute strongly to particular problems. There is still a long way to go in improving alignments. But multiple templates for the same part of the model can improve quality. Using multiple templates for multiple domains can improve accuracy even further, especially for difficult alignments and in homology modeling.

Comparative Protein Structure Modeling for Molecular Replacement in X-ray Crystallography
Ursula Pieper, UCSF

Molecular replacement (MR) is growing in its role for phasing in X-ray crystal structure determination, and could reduce the costs of novel structure determination if new models were improved to provide conformationally sampled models for these MR studies. The sampling of alignment space is carried out with a genetic algorithm to generate a large number of ???? alignments that are evaluated with a fitness algorithm. The best fit alignments then are fed into MODELER, which carries out the molecular replacement using conjugate gradients minimization, Molecular Dynamics minimization and simulated annealing to enhance the radius of convergence. In addition, a variety of other modeling methods are being implemented as well, such as an atomic distance-dependent statistical potential scoring method. This is all implemented in a MOULDER pipeline for MR. The first test case was with PDB code 106m. Several PDB structures were chosen as search models based on sequence identity of 20-30%. The initial search models did not yield a structure solution. Next, search ensembles of MOULDER refinement models using DOPE score as the selection criteria were generated from PDB structures and this approach resulted in a successful MR solution. For the second test, a solution for NYSGXRC 10069e1was sought with three PDB models with between 23-33% sequence identities. This approach did not result in a MR solution. However, use of search coordinates from Moulder models gave better MR solutions but suffered from overlap of molecules, that still was a good starting model from examination of the solution in electron density maps. Results to date include 7 structures with very low sequence homology to structures in the PDB solved by Moulder after 10,000 CPU hours. Currently we are building a web server that interacts with various other modeling software packages to generate models for input to MOULDER from models generated by MODPIPE or from other input models.

Beyond RMSD: Homology Model Accuracy from an Applications Perspective
Roberto Sanchez, Mount Sinai School of Medicine

Using the results from large scale modeling efforts is a challenge. One can use structures to calculate many types of information for things ranging from drug design to sequence alignments. Comparative modeling is needed to fill in gaps across sequence space based on experimentally determined structures. For example, for predicting ligand binding to proteins in a family may require that all structures be determined experimentally which will be hard. Need to be able to carry out modeling to dissect ligand binding specificity. One needs to know how good the models are. Will the models be different enough to determine differences for binding of different ligands? A study of Brome-domains was undertaken. Clustering based on electrostatic potential of surface would not be predictive. Comparative modeling can be measured according to how close it comes to predicting the structure of the target. Model accuracy is how close model is to target. The difference between template and target subtracted from model accuracy provides an assessment of how useful the models are. There are over 24,000 models in the data set for Modeler. Are the models informative based on measures of accuracy (pocket composition, electrostatics, etc.)? For overall structure there is some difference between template-target vs. model-target, but the difference is not large and therefore the models are not very informative for certain properties. However, if looking at pocket composition, or electrostatics, then the model can be quite important in obtaining accuracy versus the template. Thus, the added value of the model vs. template for predicting target structure is larger for specific detailed questions. With respect to solvent accessible surface area, the surface ruggedness of the models matches the actual structure reasonably well. Thirty two known protein ligand complexes were modeled. Approximately 128 models were built. There was poor correlation between the models and experimental ligand binding data.

Geometric Potential for Protein Structure and Binding Prediction
Jie Liang, University of Illinois at Chicago

Protein structure prediction is often based on empirical statistical potentials. With a geometric model it is possible to calculate the extent of neighbor proximity. The key advantage allows the modeling of contact interactions more efficiently. With this approach, a very simple binding energy potential may be derived. This provides a model for exploring protein folding and protein-protein interaction energies. The geometric parameterization is useful over all resolutions and provides a method to model all kinds of interactions, disulphide bonds, salt bridge, water contact, etc... There are various types of interactions, for example hydrophobic interactions are cooperative on the surface of proteins. Tests of the approach have uncovered 30-34 possible protein-protein interactions, 40 of 45 CASP4 structures. The procedure is useful for both docking prediction and folding predictions. For folding prediction of folding rate from sequences with geometric model there is no need to do protein folding prediction. Instead, two types of residues are defined to distinguish folded from unfolded in geometric model. There is a nice correlation when you have a training set of known structures, and there is a decent correlation with actual measurements.

Session 2: Data Databases and Knowledge

The PSI Structural Genomics Knowledgebase
Helen Berman, Rutgers University

Helen Berman described the current status of the PSI Knoweldgebase (KB). The KB is designed to allow the broader scientific community to explore the structures and technology developed by the PSI. The KB has been rapidly assembled into a production website, consisting of a set of modules, which cover target selection, experimental tracking, materials, models, annotation, metrics and technology. A Structural Sleuth allows users to explore PSI structures that lack a complete annotation and allows them to provide their input. Target selection module describes the PSI target selection strategies developed by the BIG4. Interfaces are provided to TargetDB and PepcDB. The MR is now linked directly to the KB and thus it will be easy for the user to find and order clones. The modeling portal allows the user to enter a sequence and obtain various models from the supported modeling servers. The Metrics Module is centered on a set of summary statistics which reflect the PSI-2 Goals and Milestones policy document. The Technology Module provides searchable, indexed (by center and by research topic). Methodology publications have also been indexed and are used for keyword searching to find methods from PSI. An Outreach module includes media reports, publications and latest news. Plans are underway for a PSI Nature Gateway (release September 2008), which will provide a research library of articles, news reports, along with Nature alerts and RSS feeds. Finally, an Annotation Module is under development. A workshop was held in March 2008 soliciting input on what features were required to facilitate the annotation of PSI structures, protein sequence features, structure features, ligands, functional classification, Mapping to biological systems, literature.

The PSI Knowledgebase Protein Model Portal
Torsten Schwede, Swiss Institute of Bioinformatics

Torsten Schwede described the development of the PSI KB Modeling Portal. The goal is to provide the best structural information available for a given sequence, by providing a single portal to various modeling sites. The KB Modeling portal does not try and calculate models, nor does it store them rather it provides an interface to precomputed models at all the supported sites. Thus it makes it much easier for users to access the models. In addition, the KB does not attempt to evaluate the quality of the models. One problem encountered was to identify the sequence itself that was used based on changing and variable identifiers. Thus the sequence itself is used as the identifier, calculated as MD5 hash. In the future model quality assessment tools will be integrated. A workshop for Applications of Protein Models in Biomedical Research will be held in July 2008.

TOPSAN: A Community-Driven Resource for Enhanced Impact of Structural Genomics Data
Sri Krishna, University of California, San Diego

Sri Krishna described the development of a TOPSAN, a community-driven wiki-based website for protein function annotation and dissemination. The output from the PSI SG initiative is rapidly outpacing our ability to publish structures. Currently only 20% of structures are being published and annotated. Many of the structures are novel and are therefore not associated with extensive literature and even for well-studied targets the centers that solve the structure to not have specific expertise in that system. Over recent years, Wikipedia, Proteotopia, Topsan, Scholarpedia, wikiproteins, citizendium, wikiomics, pdbwiki, openwetware etc. have been developed as interactive knowledge resources allowing user and community input. The Open Protein Structure Annotation Network (www.topsan.org Link to external Web site) allows external users to annotate the structures being output by PSI efforts. Over 300 structures have now been annotated in TOPSAN for 640 structures from JCSG and currently all newly solved structures have an annotation page completed prior to deposition. All registered users are able to edit each page. Targets have also been added for MCSG and NYSGXRC targets. Several pages are provided for each target including Protein Summary, Classification, Molecule, and Experimental Details. For each page, history and authorship are tracked.

PSI Data Management and Reporting: Expectations, Standards, and Utility
Michael Sauder, SGX Pharmaceuticals

Michael Sauder described a standardized PepcDB reporting scheme, as developed at NYSGXRC. NIGMS funded PSI centers are required to report all target progress and tracking information and their experimental protocols in TargetDB and PepCDB. The experimental information deposited in PepcDDB is essential and highly complementary to the structures deposited in the PDB and will enhance the value to the broader scientific community. The data is also valuable for data mining. Centers are not fully capturing and / or reporting data efficiently into PepCDB. It is important for example to get the DNA sequence of the expression clone into the DB, because many of the genes used are synthetic and have codon optimized sequences that could not be reproduced unless this information is provided. A minimum set of data was proposed to be maintained in PepCDB, covering Molecular Biology, Protein Purification, Crystallization, NMR and Structure. A dictionary of mmCIF tags is available to fully describe all steps from gene to structure.

Integration of Mass Spectrometry with High-Throughput Protein Crystallography
Tarun Gheyi, New York SGX Research Center for Structural Genomics

Tarun Gheyi described the routine use of MALDI-MS and ESI-MS at SGX. MALDI-MS with an accuracy of 2000ppm, coupled with SDS-PAGE was used in a first pass to evaluate samples prior to crystallization and obtain an initial estimate of molecular weight and sample purity. If the protein appears to have a contaminant by this method then this is flagged. If crystals are obtained, then the crystals are washed and the protein is subjected to MS to confirm its identity before structure solution. Electrospray ionization MS permits finer analysis down to 200ppm accuracy. A mass accuracy threshold of +/- 260 Da. (~2 mutations) is applied and any protein with a mass discrepancy greater than 260Da is then characterized by tandem MS and DNA sequencing to confirm its identity and pinpoint any mutations. From 1451 samples examined in 2007, ~120 samples had questionable identity. 56 of these proteins had a clone mix-up, of which 8 eventually led to structures, 25 were truncated by proteolysis and 2 resulted in structures and 10 others proved to be E. coli contaminants, which were abandoned.

Crystallization Image Analysis on the World Community Grid
Christian A. Cumbaa, Ontario Cancer Institute

Christian Cumbaa presented details of their efforts to automate the analysis and classification of crystallization images, with a goal of not only improving throughput for this manual step, but also of generating consistent and objective evaluations. Phase 1 of the project was to extract features, by transforming the pixel image into a few numeric values. During this phase 12,375 features were calculated on a high-priority set of images, which included 165,441 hand scored images. The World Community Grid was used to provide compute resources for this intensive task. The analysis took 9,000 years of CPU time. Phase I results have almost been completed. For each feature, its effectiveness at distinguishing a set of crystallization outcomes was evaluated. The chosen outcomes were phase separation, precipitate, skin, crystals and clear drop. The goal is to reduce the number of features to the most useful ones. Next, in Phase II, is to take the most predictive features and build a model with appropriate weights on each feature for each image category. This reduced feature set will be more computationally tractable and will allow the complete backlog of crystallization images to be tackled.

Quality of Protein Crystal Structures in the PDB
S. Ramaswamy, University of Iowa

Subramanian Ramaswamy described an approach to analyze the quality of structures deposited in the PDB. It was noted that despite improved structure solution, refinement and validation methods the average R-factor of structures deposited in the PDB has not improved with time. In addition it is often difficult to convey to the user of PDB structures the limitations of the experimental techniques that were used to generate them and the subjective intepretations that are made by the depositor. By performing a principal component analysis of 9 important quality metrics defined for X-ray structures, they have developed a Q-value quality score, which captures both the global and local structural features that contribute to the R-factor. This analysis shows that structural genomics efforts are producing structures of better quality than the average in PDB and that they are improving over time. Using the same Q-value analysis they can also examine the output of different beam lines, best practices, protocols, country and scientific journals etc. In the future it should be possible to develop a similar analysis for NMR structures, based on a different set of metrics.

Session 3: An International Perspective on Structural Genomics

Structural Proteomics Projects in Japan
Shigeyuki Yokoyama, University of Tokyo

Shigeyuki Yokoyama gave an overview of the structural genomics efforts in Japan under the Protein 3000 project. He described the NMR facilities and the NMR structure analysis pipeline, which uniquely exploits the Escherichia Coli cell-free system for sample preparation in a largely automated fashion. He described the successes of the project and then described how the entire pipeline was opened up to general users (both academic and industrial). Finally the latest project of the Ministry of Culture, Sports, Science and Technology was described. Professor Yokoyama also gave insights into the latest efforts to develop a broad industrial and academic scientific user community of their protein production and NMR structure determination facilities.

Session 4: Mini-Session - Targeting for Success

Addressing Protein Crystallization Bottlenecks by Screening Multiple Homologues
Lukasz Jaroszewski, University of California San Diego

PSI efforts have the benefit of being able to choose targets from multiple organisms so long as they meet the family definition criteria. Hence one way to get crystallization improvement is by selecting homologous targets from a variety of different organisms (targets are called orthologous). The databases TargetDB and PepCDB contain information about what crystallizations worked for what targets and also for those that failed. This enables one to carry out a comparative sequence analysis to look for features that are positive or negative for crystallization. One can look at calculated information about the protein from sequence information and formulate learning sets according to different parameters (sequence length, gravy hydrophobicity index, isoelectric point, longest disordered region, etc. and combinations thereof). These calculated parameters can be used to calculate probability distribution for crystallization success according to the parameters. In turn this information is used to guide prioritization of selection of orthologs. It is possible to combine individual probabilities into one estimate of crystallization probability, which can be used to calculate a single value for any given sequence. Jack-Knife tests using information outside of the learning set suggests that the TargetDB data did provide information that was predictive for ortholog selection. This allows now the examination of sequenced genomes to see which reagent genomes may be the best sources for selection of targets. This can also be done for various pfams to prioritize genomes for pfams. This allows promising crystallization targets for various pFams which are currently considered "difficult." Some very hydrophilic proteins from certain pfams are very difficult to solve by crystallography suggesting that maybe there is no hydrophobic core in such proteins. Generally, pathogens are more difficult than bacteria, and thermophiles may be a simpler compared to bacteria. Scoring confirms complimentarity of NMR and X-ray crystallography methods. XtalPred server can calculate information based on this analysis and also can suggest homologues as alternatives. Optimal targets = 50% success, very difficult = 13% success. This is a range.

Data Mining Structure Solution Determinants in the NESG Pipeline
Nicholson Price, Northeast Structural Biology Consortium

As a means of understanding the effects of biophysical conditions in the success of structure determination, we examined the NESG pipeline, 679 proteins (all expressed, soluble, mono-disperse) and compared which proteins gave good enough crystals to get structures, compared to those that don't. The method employed was logistic regressions. The results are binned along a continuous variable. The advantage is that this approach gives a sense of the size and direction of the distribution (a disadvantage is that it does not deal with bimodal distributions). Sequence characteristics were examined: Favorable factors were found to be Ala, Gly, Phe, hydrophobicity, unfavorable factors included. Glu, Lys, entropy, disorder hydrophobicity, exposed, disordered residues. Side chain entropy and hydrophobicity are strongly negatively correlated from Lys to Gly. These factors can be resolved by multiple logistic regressions to examine redundancy; side chain entropy is the dominant factor. Glycine turned out to be an important factor in predicting crystallization. An examination of glycine residues in various secondary structures predicts that Glycine in loops is a positive factor. This suggests crystallization packing might be mediated by loops and glycines can help. The combination of all factor probabilities showed decent correlations. Prediction was tested with the entire proteome as a negative set and all crystal structures in PDB as positive set. This turns out to be statistically significant. Furthermore, this approach was a good predictor of success in into the PDB. Side chain entropy is still a good predictor and is in line with Derewenda's entropy thoughts. Human proteins are outliers in that side chain entropy turned out to be not such a big factor. Humans don't have very much side chain entropy. There is a lot of disorder in human protein sequences, but the disordered regions have low side chain entropy. This observation was used to set up a factor analysis for humans. Human proteins are not as good overall for crystallization. Other factors examined for correlation with ability to crystallize included stability (thermofluor, and deltaG folding) and it did not seem to be a predictor for crystallization and mono-dispersity; proteins that are mono-disperse are more readily crystallized.

Thermophilic Eukaryotes as a Source for Structural Genomic Targets
Craig A. Bingman, University of Wisconsin

With inspiration from the Thermotoga results at JCSG and other successful Structural Genomics efforts efforts, we decided to look at eukaryotic organisms that are thermophilic. There are no hyperthermophilic eukaryotic organisms. The estimate of the upper limit is about 60 C. This is based on temperature probes in thermal vents in the deep ocean. One concern was - what thermophophilic eukaryotic genomes are available? Candidates include Cyanidioschizon merolae which lives at 50 C, is found in Japan and the genome sequence is completed, Tetrahymena thermophila which grows at 43 C, and whose sequence completed. We decide to work on Galderia Rotaphyta, a unicellular red algae. This organism possesses splicing, core conserved metabolism, etc. and is a good overlap with humans. The Galderia genomic sequence is nearing completion. Cloning outcomes for two independent cDNA cloning events of the Galderia genome was very difficult, but ultimately successful. Success rates of expression were comparable to other entries in the TargetDB. Galderia, success rate in structure determination appears to be outperforming success rates for other organisms in the PSI by statistically significant margin. The future plan is to work in parallel on human protein and Galderia homologues.

High-Throughput Structural Proteomics
Nathaniel Echols, University of California Berkeley

We are interested in purifying proteins from natural sources, with the goal of getting complexes purified. We prefer to produce Kg of material and purify small amounts that can be put into nanovolume crystallization using microfluidics. The goal is to produce 100 ug of proteins; these are screened in a Fluidigm microfluidics crystallization chip. The proteins are examined with mass spectrometry to ascertain the identity of the molecules. We are investigating the use of diffraction-capable microfluidic chips, or mosquito robot for crystallization optimization. The fact that several structures have been solved already will help to solve the structures. The purification procedure is; clarify E coli lysate. Run through gel filtration and combine orthogonal purification methods because the proteins contain no affinity tags, including ion exchange, hydrophobic and affinity columns, Superdex, etc. Interestingly, crystals seem to grow even from very impure samples. To date, promising crystals have been obtained and 25 unique data sets measured resulting in the determination of with 15 already-published structures, and 3 novel E. coli structures. The successful structures are all symmetric homo-oligomers and therefore are easier to crystallize Phasing of the structures has been a challenge. Successful structure determinations have relied on molecular replacement. Novel structures will require another approach, perhaps metal and halide soaks. So far that has not worked.

The Medical Structural Genomics of Pathogenic Protozoa (MSGPP) Protein Production Pipeline
Alberto Napuli, University of Washington

The challenge within the Pathogenic Protozoa is that may of the proteins are insoluble. So far there has been only a 93% hit rate with solubility. As a result, 15 reagent genomes are used to produce proteins that can be solved. Although natural gene sequences are used, many constructs are generated including terminal truncations; only 50% of targets to structure are solved with full length proteins, 50% are terminal deletions. Current progress indicated a success rate within MSGPP of 50 new structures from parasitic protozoa. Purification protocol uses two different His tag vectors one with 3C protease site and the other not. Both vectors are continued to be used even only slight difference +/- 3C cleavage site, and find that many times the protein is either soluble in one or the other vector and hence are complementary. PCR is used to pull clones out, gel purification of fragments, LIC cloning is used, and the resulting clones are analyzed by PCR to ensure successful insertion. Clones are grown overnight and then inoculated into autoinduction media overnight. Then a quick lysis is performed, freeze thaw, and 0.5% chaps to lyse cells with centrifugation to recover supernatant. Purification on metal chelate follows. Current procedure is to use 1 liter cultures from flasks, and now shifting to air bubble systems for 2 liter cultures. The immediate goal is to ship proteins to collaborators so there is a stringent selection on high productivity to ensure delivery of 4 samples per week. A regimented approach is needed to get pure proteins.

Session 5: Mini-Session - High-Throughput NMR Methods

Integration of Fast Data Collection and Automated Probabilistic Assignments for Protein NMR Spectroscopy
Abrash Bahrami, University of Wisconsin

Automated software for NMR structure determination approaches something that can be used by non-experts to solve proteins by NMR based on a Probabilistic Inference Network of Evidence (PINE). The PINE-HIFI aims at total integration of fast data collection by automated tilted plane reduced dimensionality (High-Resolution Iterative Frequency Identification for NMR or HIFI) with automated data analysis to yield peak assignments and secondary structure determination (PINE). PINE-HIFI represents a significant advance toward the goal of building a fully automated framework for NMR structure determination. Dr. Bahrami and co-workers reported on the status of this new system and its ability to produce more reliable assignments in less time than the individual packages (HIFI and PINE) used separately and sequentially.

Impact of Protein-Detergent Interactions on NMR Structure Determination
Linda Columbus, University of Virginia

The project is looking at various detergents for possible use in conjunction with NMR for membrane protein structure determination in solution. Beta barrel structures of membrane proteins have been demonstrated able to be solved by NMR. However alpha helical integral membrane proteins have resisted NMR structure determination. There is a big difference in detergent-protein interactions leading to difference in NMR spectra. DDM and FC-10 doedecyl choline contribute to line broadening, hindering structure determination. Site directed spin labeling has been used to probe dynamics of structure at site of nitroxide introduction, within a helical turn. Helix-helix interaction can be lost with the detergents that cause line broadening. It is possible that mixed micelle could help balance solubilization and detergent disruption of alpha helix. Data indicates that a single protein is present in each micelle. With the use of DDM/FC-10 in molar ratio 4:7, the spectra are much better, and reversible. We are now going for smaller and larger mixed micelle (non-ionic and zwitterionic mixed). We predict that DHPC/LPPG mixture will prove useful based on detergent properties from mixed micelles that gave nice NMR spectra.
The energy between helices may be weak enough to allow the helix to solvate into the detergent. Hence there might be a requirement for a balance between the two to get good spectra from NMR. Mixed micelle may be enabling in that the size of the lipid can accommodate binding to the helix and help solublize.

Characterization of Protein Detergent Complexes
L. Maslennikov, The Salk Institute for Biological Studies

We are investigating detergent analysis by NMR including determination of concentration of extraction detergent, detergent exchange efficiency and detergent amount during concentration by filters. We use 1D NMR to determine optimal concentration for extraction. Different detergents and membrane proteins can be specifically extracted. This is done for each detergent. For example with QseC, 90% of the detergent is bound to the protein when 100% of the protein is extracted. A question is can more hydrophobic detergent be exchanged with less hydrophobic detergent? Exchange occurs on Metal chelate chromatography. FC14 exchange with 0.5mM DDM does not work. However FC14 if hit with 5mM DDM first then two wash with 0.5 mM DDM results in good exchange. A full examination of the determine amount of detergent binding during concentration by filters. Vivaspin50Kda worked best for PDC < 100Kd. Check homogeneity, oligomeric state, detergent protein ratio, etc. Analytical centrifugation can be used to analyze buoyant density, and sedimentation coefficient.

Linking Machine Vision with Crystal Harvesting Robotics
Robert Viola, Square One Systems Design, Inc

The goal is automating the last manual operation of harvest of crystals from crystallization plates. We have demonstrated the feasibility of using robotics to harvest and mount crystals. We use a Struble robot, a 6-axis robot modified for precision. We added a camera imager. The system chooses the correct end effector (i.e. harvesting loop) and the harvester takes over manual control to get the crystal out with automation help from the robot. Feasibility of the system was established with user manual operation. The system could capture crystals from a variety of set ups and also using Mitegen and nylon loops. Tape cutting and crystal dissection are enabled by the resolution of the robot. The system uses the tip of a needle to break a crystal and position the fragment within the drop. The operation of cutting the opening was automated, the crystal dissection is run manually by the user. We estimate that time between open chamber and capture is <1 min. all steps included. Presently there is no good way to get 3rd dimension, depth of field. Cryocooling after crystal harvesting is handled by the robot using liquid nitrogen. The next steps are to incorporate machine vision to pick a crystal and track its location in real time to execute the capture sequence for crystal harvest. The system could eventually be used to mount crystals <10 um.

Session 6: Traditional Bottlenecks

GFP-Based Membrane Protein Over expression and Purification in E. coli and S. cerevisiae.
Hyun Kim, Stockholm University

Methodology for GFP-based pipeline for the rapid identification of well-expressing membrane proteins in E. coli. C-terminal membrane protein – GFP fusions are found not to fluoresce when in inclusion bodies, and fluorescence is only seen when GFP is cytoplasmic. In-Gel fluorescence shows GFP fluoresce when membrane protein is folded properly but not when it is aggregated. Western blotting shows a lot more protein in aggregated state but it does not fluoresce. One of the many problems with Eukaryotic membrane protein expression in E. coli is differences in lipids and cell trafficking, which makes it difficult to express Euk membrane proteins. Hence expression was carried out in a yeast strain yeast strains which has the Pep4 gene for protease deleted to help in protein stability. One problem with this technique is that free GFP shows fluorescence, and the presence of this is confusing with respect to total fluorescence due to intact fusion protein hence in-gel fluorescence is good measure of total cell membrane fluorescence. Also S. cerevisiae has a tightly regulated ER check for proper folding and only properly folded proteins make it through ER to membrane. This avoids the free GFP contamination, which can over estimate the total amount of fusion protein. Free GFP is likely due to proteolysis during cell breakage, also perhaps protein turnover degrades protein from N-term leaving GFP only, that may be more stable to proteosome breakdown membrane protein fusion from yeast in membranes are checked by in gel fluorescence and those that are optimally expressed are scaled up and used in detergent extractions in combination with SEC to allow an understanding of how good the detergent works for extraction. GFP can be removed after purification by proteolysis. Both GFP-on and GFP-removed (by cleavage) protein is put into crystallization.

Heterologous Expression of L. Major Proteins in S. cerevisiae.
Elizabeth J. Grayhack., University of Rochester Medical School

Yeast has well characterized genetics and homologous expression, as well as regulated control of expression from vectors etc. Using the gateway system, C-terminal tags for purification His6 HA, 3C ZZ domain that can be cleaved with 3C and also utilized the PGal promoter. Up to 0.5 mg/liter membrane protein expressed, as measured by total protein pull down after lysis using IgG columns that bind ZZ domain. Vectors were designed for co-expression studies with URA3 and LEU2 and N term His6 or His10, ORF, 3C HA His6 ZZ sites. They observed co-expression of up to four proteins. Co-purification of proteins where one protein is tagged and the other not allows confirmation of complex formation. Solved problem of SeMet incorporation by block conversion of SeMet to S-adenosylmethionine by deleted SAM1 and SAM2 genes. Grow yeast on exogenous S-adenomethionine. Deletions can grow nicely on SeMet reducing the toxicity. Tested L. major pathogen proteins in yeast strains where poor expression from found in E. coli, out of 64 proteins that were insoluble in E coli, 50% were found to be soluble in yeast.

Salvage Methods Applied to Failed Pfam Families
Anna Grzechnik, The Scripps Research Institute

Large-scale center target selection was carried as drafts that come in all groups and sizes. All the different families have individuals and they can be selected and make the best choices. XtalPred is a server for crystallization prediction based on calculable factors from primary sequence. Pfams that have lots of targets that look good by XtalPred are often the best producing Pfams for gene-to-structure work. Selected 102 families from initial 400 Pfam draft representing targets that failed for multiple orthologues. Then get the best orthologue from any genome and buy synthetic gene codon engineered. Now processing these targets, starting with microscreen. Does it express. Is it soluble, will it crystallize, point mutations, ligand screens etc. that are planned to be plugged in. First major drop out point is to get soluble expression, if drop expression temperature to 25oC, 124 out of 188 targets are expressed. One of the most important predictor for crystallization is oligomeric state. Choose only those that have good SEC profile. In many cases improved soluble protein expression leads to poor aggregation state. PIPE cloning method applied to truncations. In addition, partial proteolysis is applied to the soluble protein that does come through low temp expression and then use construct engineering. H/D exchange mass spec study to ID domains followed by construct engineering. 5 targets per week can be processed. Salvage rate is ~20% if you focus on those targets with disorder on one or the other end and are amenable to terminal deletion construct eng. Surface mutagenesis, reductive methylation and use of ligands methodologies were also used in salvage pathways. 200 compounds KEG DB, molecules of life and these are used in thermofluor screen to see which ones have an effect on stabilization of the protein. First draft gave 30% structure solution with the first target. For the remaining 70% then you can get to a solution by looking into orthologues, which recovers half of the targets that fail in first pass. Then the salvage pathways combine to recover another large proportion of failed targets.

Microfluidic Platforms for Membrane Protein Crystallization
Paul J. A. Kenis, University of Illinois at Urbana-Champaign

Identification of crystallization conditions involves screening pH, temperature and path to supersaturation. Second step is optimization. How to minimize sample consumption and ID solubility profile and lead conditions. How to maintain membrane proteins in membrane environment include microbatch, free interface diffusion and vapor diffusion. There is a time dependent effect of each of these methods across supersaturation range. Drop exposed to air, dependent on length of channel to open air and as the dry-down goes forward, you screen a lot of conditions along the way. Sometimes diffusion is too fast you get skin, and too slow you get nothing and just right you get crystals. Can go back and forth growing and dissolving crystals as vapor is added or removed from chamber. Another challenge is that shower of crystals, and nucleation is a challenge. And again you can adjust vapor up and down to reduce nuclei before crystal growth. Screen 144 crystallization conditions vacuum opens valves and otherwise closed valve, 3xratio of protein crystallant against 48 conditions. Serpentine channel up to valve position, and crystallant. Have done this with photosystem 2 to improve crystals. Rate of evaporation can determine crystal habit and polymorphs. For Membrane proteins, working with Lipidic Cubic Phase, monoolein water mixture mixed will self assemble to LCP. Salt added to LCP collapses with phase transition. To mix LCP on the fly in microfluidics, miniaturize distance the distance that LCP has to be pushed through microfluidic channel. Use of peristaltic pushing back and forth to get mixing through microfluidic channels. End up with 3 compartments with mixed LCP. Have done BR protein crystallization using salt to drive LCP mediated crystallization. Getting crystal out of PDMS has been a challenge due to size of crystals. Would like to do in situ data collection. Also would like to see phase by X-ray analysis. Looking now at LCP phases and phase diagrams with the chip. Examples of polyamide chips that have valves.

A Semi-Automated System for Nanovolume Plug-Based Crystallization
Cory Gerdts, Emerald BioSystems

Plug based crystallization is based on fluorocarbon oil carrier with aqueous channels merging at a mixer and nanovolume aqueous pugs being carried along into a micro channel. The first working version of the microfluidic protein crystallization chip, is a 3+1 mixer system that allows on-chip formulation of gradients of crystallization solutions and proteins. Combining a sparse matrix screen with a gradient screen, is enabled by the preparation of a cartridge that contains nanovolumes of separate crystallization cocktails separated by an air gap between each of the cocktails that are fed into one channel of the 3+1 mixer and the pump system varies the concentration of each component. This results in the formation of a series of plugs with different crystallization conditions for each primary cocktail. Properties of labcard required, thin, flat, plastic injection molded parts, clear, transparent, low birefringence, low surface energy, chemical stability, low vapor loss, bondability, X-ray transmissive. Investigated a number of possible labcard production methods. Currently two labcard architectures, with different macro-micro interfaces. Typical channel is 200um and crystals are usable for diffraction ready use. It was important to be able to get the crystals out. Can peel the labcard apart, and the plastic half part that contains microfluidic circuitry retains plugs, from which crystals can be harvested by cryo-loop extraction. Crystals can be extracted from CrystalCards and/or X-ray diffraction collected in situ. If a beamline has in-line optics then it's easy to clip mount a CrystalCard and orient it for data collection. Beamlines without in-line optics are harder to orient cards. The gradient-based crystallization allows great exploration of phase space. Example, crystallization conditions are directly optimized by using mother liquor from a standard vapor diffraction screen and remaining protein to run massive gradient across phase space with nanovolume drops that can produce diffraction quality crystals. The system is a no-dead volume system. All the protein is used in the experiment.

Integrating HT Empirical Data with HT Crystallization.
Joseph Luft, 2The Hauptman-Woodward Medical Research Institute

Joseph Luft of the The Hauptman Woodward Medical Research Institute and the Center for High Throughput Structural Biology (CHTSB) presented At HWI they are running crystallization screens against many different salts. Looking at salt concentrations and types on crystallization. This has led to thinking about the use of salts and silver bullet compounds in the thermoflour screen. Using Thermofluor screening to look at salt effects and silver bullets. Thermoflour temperature shift of 17oC resulted in the identification of Zn++ as an ion needed to stabilize the protein. This led to the identification of crystallization conditions that worked to produce crystals with the protein that had Zn++ added. Thermofluor is now used as a common method for protein formulation to go back into crystallization screens wherein the protein is more stabilized. AutoSherlock software for looking at results from incomplete factorial. Incomplete factorial screening with annotation is analyzed by AutoSherlock to identify the optimal crystallization condition. Also carrying out definition of crystallization phase diagram data using a combination of varied crystallization Drop protein/crystallant Volume Ratio and incubation Temperature (DVR/T). The DVR/T method has been effectively applied to optimize crystallization conditions for a number of samples and its power is fully realized when used in conjunction with automated liquid handling systems. HWI offers a crystallization service that you can send samples to for screening and optimization. Software packages images into an Excel spreadsheet that organizes the DVRT information into a user format that is nice input. High pressure helium flash freeze in microcaps crystallization tips for in situ X-ray data collection.

Fixing Misfit Side Chains: Molprobity & Real Space Refinement
Jeffrey Headd, Duke University

Jeff spoke about the program Molprobity, which addresses identifying and addressing misfit sidechains in crystallographic refinement. As electron density becomes less defined at lower resolution, some incorrect or impossible sidechain conformers score favorably as their correct counterparts in traditional refinement methods, allowing these sidechains to be trapped in a local minima. Molprobity allows the identification and rigorous movement of the entire sidechain and refinement which most refinement programs are not capable of. Firstly, MolProbity identify candidate misfits as those residues that are rotamer outliers and/or have serious clashes with neighboring residues. Each candidate is then run through real-space refinement in Coot, which uses favorable rotamers as starting positions. The best scoring rotameric fit is then run through a second round of quality statistics, which is compared to the original score. Changes are accepted if rotamericity goes up and clash score goes down, coupled with a flip in chi angle of 180o +/- 30o, and is rejected otherwise. Real-space refinement is shown to be accurate as low as 2.5A resolution, and will correct these problems with a high degree of accuracy. It appears that Leu is predominately misfit in this manner, along with Thr, Val and Arg residues. They also demonstrate that the refit residues remain in their corrected conformations after multiple rounds of refinement using PHENIX.

Exploring QPCR-Derived Thermal Melts for Ligand Screening
Amanda Meyer, Albert Einstein College of Medicine

Amanda Meyer spoke about exploring QPRC-derived thermal melts for ligand screening. It has been shown that the addition of stabilizing ligands and/or buffer conditions would not only assist in identification of possible ligands to the protein but also aid in structure determination. Thermofluor-based assays have been used to measure the melting temperature against a library of small molecules. This method employs the use of a fluorescent dye (Sypro Orange) to obtain melting point data in numerous conditions simultaneously. If a temperature shift of greater than 2 degrees Celsius is observed the condition is considered to have a significant stabilizing effect. Since various compounds can be screened against specific protein families, they suggest that it could be possible to infer specific compound motifs that interact with target proteins, and possibly providing functional information. They also suggest that increasing the thermal melting temperatures of a protein can improve its purification yield, stability and crystallization conditions. Additionally this method can test for proper folding and assist in narrowing down an appropriate range of protein concentrations needed for kinetic assays.

Extracting and Expanding Information Derived from ~1/2 Million Classified Crystallization Images
Edward H. Snell, The Hauptman-Woodward Medical Research Institute

HWI have manually classified ~150,000 images representing crystallization experiments conducted for NESGC and SGPP groups. The combination of typical outcomes, enriched with representative crystal images, provides a training set that is in use for the development and testing of automated classification systems. In addition, this information is useful for chemical space mapping, to identify specific regions of chemical space that globally promote crystallization for a diverse set of proteins.

Session 7: The Really Hard Stuff

Discovery-Oriented Eukaryotic Integral Membrane Protein Production
Franklin A. Hays, UCSF

Franklin Hays discussed efforts within the Center for Structures of Membrane Proteins (CSMP) to develop generally applicable protocols for detergent extraction, stabilization and handling integral membrane proteins. Detergent solubilization tests have been run on 272 integral membrane protein targets from yeast which were expressed at reasonable levels (starting from an original set of 354 candidate targets). Given the large number of membrane protein targets that required processing, it was necessary to establish standardized protocols that could generally be used as a first pass attempt to obtain stable and pure membrane protein. These studies revealed that n-dodecyl-β-D-maltopyranoside (DDM) detergent extraction was successful 25% of the time for extraction of integral membrane proteins, while β-D-octylglucoside (OG) was also highly effective. A single buffer system was used together with these two detergents so as to limit the number of possible variables required to screen for extraction. Small HPLC scale size exclusion chromatography on protein-detergent complexes was used to quickly determine which of the detergent extractions were producing stabile uniform (not aggregated) membrane-protein detergent complexes that could be entered into downstream stabilization, purification and crystallization studies. With this general approach to membrane protein extraction-testing allows proteins establishes a narrow exit funnel for membrane proteins to enter downstream studies and by definition establishes the prioritization of membrane proteins for efficient target processing. These methods are now being applied to several human integral membrane proteins at the CSMP. A list of proteins purified was reported.

Expression, Purification, and Assembly of Functional Human Membrane-Stearoyl-CoA Desaturase Complex Using Cell-Based and Cell-Free Technologies
Brian Fox, University of Wisconsin

Brian Fox of the Center for Eukaryotic Structural Genomics (CESG) discussed the use of wheat germ cell free translation technology in combination with tissue extract liposomal formulations to produce active membrane protein complexes, including the membrane bound Stearoyl-CoA desaturase complex. The inclusion of liposomes in the wheat germ extract for cell free protein production led to a 30% improvement of protein production levels for both monotopic and polytopic membrane proteins. Moreover, both the monotopic and polytopic membrane protein components were found to incorporate into the liposomes and the resulting proteo-liposomal complexes could be recovered with high purity by sucrose gradient density flotation. Co-translation of cytochromeb5 and human stearoyl-CoA desaturase isoform 1 in the wheat germ cell free translation system supplemented with liposomes led to the production of highly active liposomal-bound reconstituted enzyme complex which could be purified by sucrose gradient flotation in quantities sufficient for structural studies. This work demonstrates the great potential of cell free protein translation to allow the production of functional multi-component membrane protein assemblies for enzymatic and structural studies.

Detergent Screening via Immobilized-Protein Stability Assay
James M. Vergis, University of Virginia Health Sciences Center

James Vergis at the University of Virginia Health Sciences Center has investigated the use of immobilized metal chelate affinity chromatography (IMAC) to rapidly explore the stability of His6-tagged membrane proteins after on-column detergent exchange. In this protocol, called Affinity-Immobilized Protein Stability Assay (AIPSA), detergent solubilized His6-tagged membrane proteins are loaded in small volume into individual wells of 96 well IMAC plates. Several different detergent/buffer combinations are then used to wash the membrane proteins bound to IMAC beads, in an attempt to successfully carry out detergent exchange. Membrane proteins that are successfully exchanged with the new detergent buffer mix will typically be efficiently eluted with the application of the same buffer containing imidazole at ~500 mM. Membrane proteins that fail to elute with this protocol are likely to have aggregated on column due to incompatible detergent / buffer exchange. Standard SDS-PAGE analysis of the material before and after elution provides a simple easy visualization score for elution. This method provides a rapid and simple way to assess the utility of detergent / buffer systems for membrane protein detergent exchange that is compatible with the same chromatography systems used to purify His6 tagged membrane proteins, and represents an important time saving tool for membrane protein research.

Automated High-Throughput Platform for Soluble Domain Screening
Pawel Listwan, Los Alamos National Laboratory

Pawel Listwan at the Los Alamos National Laboratory and the Integrated Center for Structure and Function Innovation (ISFI) presented the most recent ISFI implementation of their robotic high-throughput split-green fluorescent protein (split-GFP) assay to define stable folding domains within complex proteins. In essence, genes for proteins of study are fragmented and then cloned in frame with a short 15 amino acid GFP fragment (amino acids 216-228) which can be detected using a GFP "detector" protein (amino acids 1-215) which will bind to the 15 residue GFP fragment and the inter-molecular complementation of the two allows the assembled GFP to have fluorescence. Individual clones of His6 tag-"gene fragment"-15 residue GFP fragment fusion proteins are screened in high throughput manner wherein the E. coli cultures are grown in 96-well plates with autoinduction media, followed by lysis and small scale 96-well plate immobilized metal chelate affinity chromatography (IMAC) capture and elution of any soluble His6 tag-"gene fragment"-15 residue GFP fragment fusion protein. The eluates are then assayed with the addition of recombinant GFP-detector protein to the wells. Fluorescence is measured as the method to detect soluble folded domain constructs since any insoluble-15 residue GFP fragment proteins will not have been successfully captured by IMAC and/or would not efficiently complement the GFP detector protein. With this automation, ISFI has moved the split GFP domain hunting technology from bench to high-throughput processing which is now being applied to several difficult PSI proteins.

A Cholesterol Biosynthetic Yeast for Mammalian Membrane Protein Production
Niall J. Fraser, University of Glasgow

Niall Fraser from the University of Glasgow presented a progress report on his efforts to develop strains of yeast that make mammalian sterol and cholesterol, rather than the fungal ergosterol. Such yeast strains will be attractive for the expression of mammalian integral membrane proteins many of which require cholesterol for stability and or function, including for example the human beta-2 adrenergic receptor (a G-protein coupled receptor). The engineering of yeast to produce cholesterol and not to produce ergosterol requires a step-wise genetic manipulation in auxotrophic strains of yeast, wherein key ergosterol biosynthetic genes are knocked out, and mammalian cholesterol biosynthetic genes are stably introduced into the yeast genome. After each genetic manipulation to reconstitute the cholesterol biosynthetic pathway, the sterol contents of the yeast strains are characterized by mass spectrometry to confirm that the appropriate sterols are produced or not. Presently, the Glasgow group has successfully achieved all genetic manipulation steps except the introduction of the last mammalian gene needed to make cholesterol. So far all the yeast strains have been viable which bodes well for the eventual successful production of cholesterol producing yeast. Soon, such yeast strains will be compared to native yeast strains for overall production of red fluorescent protein (RFP)-membrane protein fusions. The successful engineering of cholesterol producing yeast is expected to be an important tool for recombinant expression of mammalian membrane proteins. This approach is also being applied to other yeast species including Picha.

The PSI Materials Repository
Joshua LaBaer, Harvard University

Joshua LaBaer, PI for the PSI Materials Repository (PSI-MR) at the Harvard Institute of Proteomics presented the mission and mode of operation of the PSI-MR which serves as a centralized storage, maintenance & distribution center for all plasmid clones produced by PSI researchers. The PSI-MR may eventually store more than 100,000 PSI clones, all managed by the Plasmid Information Database (PlasmID; http://plasmid.hms.harvard.edu Link to external Web site) developed at Harvard, which facilitates the on-line search and request of plasmid clones within the PSI-MR. As of April 2008 the MR has received >2,000 PSI clones. Importantly, before clones are sent to the PSI-MR, the PSI Centers need to send information about the clones they intend to submit to the MR. The clone information is reviewed carefully following Standard Operating Procedures (SOPs) for accuracy, and then the clones can be sent by the PSI Center to the PSI-MR for deposition. Clone samples are processed in a highly automated, standardized way and stored in a state-of-the-art automated freezer storage system with 2D barcode tracking of all samples. The PSI-MR sequence validates every clone that it receives from PSI researchers and has established mechanisms for handling any discrepancies of information before any given clone is allowed into the PSI-MR. The plasmids are prepared as frozen DNA samples and also transformed into bacteriophage resistant E. coli hosts for long term frozen glycerol stock storage. Sequence validation of clones is managed by ACE Software for Automated Clone Evaluation which can design primers for sequencing and can identify any sequence differences observed between expected vs. determined sequences. The PSI-MR is still working to establish blanket Depositor Agreements that serve as the uniform legal documents allowing transfer of materials from PSI related institutes (both academic and commercial) to the Harvard Proteomics Institute. The Material Transfer Agreements are also designed to cover the legal aspects to allowing transfer of materials out of PSI-MR to other third party institutions that request the samples via the on-line request system. Significant progress has been made in establishing the MR Depositor Agreements but still more work is needed to complete these for all PSI related institutions. The PSI-MR allows researchers to request plasmid orders through an on-line store interface that handles all clone requests. New PSI specific data on clones can be searched on-line via the web portal to the PSI-MR or the newly formed PSI Knowledgebase which connects clone information to TargetDB and PepcDB. Clones can be purchased using credit cards, on-line for $45 per clone (PSI members get a discount and can buy individual clones for $30), or $12 for 96 clones.

Logistical Proteomics for Biomarker and Target Discovery
John G. Primm, University of Wisconsin

John Primm at the University of Wisconsin and the Center for Eukaryotic Structural Genomics (CESG) discussed the challenging logistical considerations for dissemination of PSI materials to the broader community. When materials are requested from the CESG, a long chain of 100s of communications, e-mails, phone calls etc. are initiated. Initially this communication is between researchers to establish the exact nature and need for materials. This is followed by several communications between technology transfer offices of the respective institutional organizations involved, ultimately resulting in the establishment of a Material Transfer Agreement which may often carry language related to use, transfer, intellectual property etc. Numerous parties at both organizations must be notified and signature documents tracked as well as documents relating to the request and eventual delivery of materials. interacting with MR. Stakeholders in the process are CESG management, Wisconsin Alumni Research Foundation (WARF), the requesting institution, and several sub-offices within each. This process is very inefficient and represents an unrecognized bottleneck in the dissemination of PSI materials and technologies. For this reason, CESG has taken a lead in working with the PSI Materials Repository (MR) to get its clones into the MR which will allow external researchers to avoid all the paperwork and simply buy at a nominal price the clones of interest. In order to fully document the CESG clones, several 1,000s of lines of python code have been written by CESG to gather information from their Sesame database to deliver to the MR along with the physical clones, including information from TargetDB, and PepcDB, on the clones. Importantly, the costs involved in the deposition of a clone into the MR are no more than that of a single request for material. However once the clones are available through the MR, numerous requests can then be made at a nominal cost, thereby amortizing the single deposition cost over many individual requests. This represents a huge cost savings in the long run for the dissemination of PSI materials.

Getting Down and Dirty with Detergents: Quantitation, Synthesis, and Screening
Philip D. Laible, Argonne National Laboratory

Philip Laible from the Argonne National Laboratory discussed the Rhodobacter expression system for production of recombinant integral membrane proteins. To date, over 400 membrane protein constructs have been expressed in Rhodobacter with a success rate of ~60%. As such there is a backlog in understanding how to purify, crystallize, Rhodobacter expressed integral membrane proteins. One bottleneck in detergent extraction and purification of integral membrane proteins is the routine assessment of detergent / lipid qualification and quantification in protein-detergent complexes. To facilitate analysis of detergents and lipids in protein samples, a simple thin layer (TLC) chromatography method has been established. The TLC method requires relatively little sample (microgram quantities) and can be performed relatively rapidly (less than an hour). The use of detergent standards and a simple visible light scanner can be used to efficiently quantify lipids and detergents by the TLC method, with linear reads from below the critical micelle concentration (cmc) to 10x above the cmc. The TLC method was used to monitor the rate of detergent exchange by dialysis which led to the important finding that detergent exchange can sometimes take many days to reach completion. In contrast, on-column detergent exchange was often complete with 3-5 column volumes. However, typically 20 column volume washes are needed to fully exchange detergents. With small volume chromatography, the on-column detergent exchange is relatively inexpensive in comparison to the large volume and time involved with dialysis. In order to evaluate the utility of over 128 different detergents (Anatrace, Cognis, Sigma, Avanti) for extraction of membrane proteins from Rhodobacter, the Labile lab has studied extensively the photoreaction center (RC) stability after extraction with the 128 different detergents. This work has led to the definition of seven 7 detergent "quality" tiers that the Laible lab is now using to rank the utility of detergents for stabilization of individual membrane proteins. The Laible lab has also collaborated with Sam Gellman at the University of Wisconsin to develop Tripod amphiphiles as novel detergents with OG, DDM, LDAO head groups and various sugar (Glucose, diglucose, triglucose, maltose, di maltose, etc.) side chains. Several of these novel Tripod amphiphiles are similar or superior to DDM for stabilizing photoreaction center complex. These new methods and detergents are now being applied to the preparation of stabilized membrane protein-detergent samples.

Using Detergent Phase Boundaries to Crystallize Membrane Proteins
Michael G. Malkowski, Hauptman-Woodward Medical Research Institute

Mikael Malkowski of the Hauptman-Woodward Medical Research Institute (HWI) and the Center for High-Throughput Structural Biology (CHTSB) discussed the development of detergent specific protein crystallization screens which are tuned to drive the crystallization of protein-detergent complexes (PDCs) by maintaining chemical conditions at the detergent phase-separation boundary (between soluble and insoluble detergent micelle formation). The CHTSB has conducted large scale chemical cocktail screening with a variety of commonly used detergents to identify chemical formulation conditions that span a broad range of conditions straddling the aqueous phase separation boundary for each detergent. The screening involves mixing defined detergent concentrations with a small amount of lipophilic dye tracer (e.g. porpyrin red), and quantifying the visible aggregation state of the detergent under various chemical conditions. This information has been used to design specific crystallization screens for each of the commonly used detergents in membrane protein crystallization. The utility of these new screens has been demonstrated for two integral membrane protein targets, the Bor1 transporter from yeast and another unnamed community provided protein X. The screens that are chemically tuned to the phase boundary of the detergents used to solubilize these proteins have yielded novel crystal hits which are currently being optimized with the drop-volume ratio / temperature (DVRT) screening approach also developed at the HWI. will soon be examined for X-ray diffraction quality.