From Genes to Proteins: NIGMS Catalogs the Shapes of Life

Release Date:
Alisa Zapp Machalek

On the Front Page...

Now that we have published versions of the human genome sequence, what's next? For many scientists, the answer is proteins.


NIGMS is leading an initiative that focuses on one important aspect of proteins — their three-dimensional structures. While gene sequencing projects identify and arrange all the "letters" in an organism's genetic material, the NIGMS Protein Structure Initiative will harness this genetic information to help identify and group into "families" all the natural shapes that proteins can form.

"The initiative will develop a catalog of all the protein structures that exist in nature," said Dr. Marvin Cassman, NIGMS director. "We expect that it will yield major biological findings that will improve our understanding of health and disease."

Why Proteins?

If genes are the recipes for life, then proteins are the culinary result — the very stuff of life. Proteins form our bodies and direct its systems. They digest our food, help us fight infections, control our body chemistry, and in general keep us — and every other living organism — running smoothly.

But proteins that twist into the wrong shape, have missing parts, or don't make it to their job site can cause diseases that range from cystic fibrosis to cancer and Alzheimer's.

To examine a protein's role in health and disease, and to explore ways to control its action, researchers often seek to decipher the protein's shape, or structure. This structure reveals the physical, chemical and electrical properties of the protein and provides clues about its role in the body.

For the past 40 years, such structural biology studies have shed light on specific proteins. NIGMS invests heavily in the field, supporting about half of all NIH research grants in structural biology.

But now, through its Protein Structure Initiative, NIGMS has launched an additional, more organized effort in a related field called "structural genomics." As its name implies, structural genomics hinges on the relationship between protein structures and gene sequences.

The new project is designed to group proteins into structural families based on their gene sequences, then solve the structures of representative proteins from each family. While structural biology continues to illuminate details of individual proteins, the Protein Structure Initiative will cast broad light over all the protein shapes that exist in nature.

In September, NIGMS announced its first large-scale awards in structural genomics to seven groups of scientists, including one group cofunded by NIAID. These groups include hundreds of researchers in several countries. Over the next 5 years, they will determine thousands of protein structures; study the relationship between genes, protein structure, and protein function; and develop new techniques to do all of this more efficiently.

Ten Thousand Structures in 10 Years

Scientists believe that the millions of proteins in nature fall into a relatively small number of structural families — perhaps a few thousand. Researchers participating in the project strive to peg one or two members of each protein family by solving about 10,000 unique and carefully selected protein structures. And they aim to accomplish this in 10 years — the current 5-year scale-up phase and then 5 more years at full speed.

Currently, of the 14,000 or so protein structures in the Protein Data Bank (a central repository for such data), only an estimated 3,000 to 4,000 are unique structures. All the others are minor, but often important, variations of these. By determining 10,000 protein structures, the Protein Structure Initiative would at least triple the number of unique structures available.

From these 10,000 new structures, the scientists will develop a library of nature's protein structure families. This library, which will be freely available to the scientific community, will integrate structural and genetic information and any available biochemical information for each protein entry.

Why Structures?

A solved protein structure, usually displayed as a computer-generated image, shows the relative locations of all of the protein's thousands of atoms. It reveals how these atoms are arranged to form grooves, ridges and pockets on the protein's surface and spirals and pleats in its inner architecture. These features indicate how a protein functions normally and how tiny changes in its shape or composition — its amino acid sequence — can cause disease.

Unfortunately, determining protein structures is often difficult and time-consuming. It's done using either of two techniques — X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. X-ray crystallography, which has yielded the vast majority of structures, allows scientists to study atomic details of protein structures. But it requires crystallization of the proteins — a task that is often difficult and, in some cases, borders on impossible. NMR, which relies on proteins in solution (a more physiological condition than crystallization, proponents point out), is usually slower than crystallography and is limited to solving the structures of small and medium-sized molecules. Recent advances in both techniques enable scientists to solve protein structures faster than ever before.

Moving a Boutique onto the Assembly Line

The seven centers are seeking to speed up not only these two structure determination methods, but also every other task in structural genomics. This includes choosing which protein structures to solve, cloning and isolating the proteins, determining the structures and depositing the data into a new central online resource that is being constructed for this purpose.

Currently, it takes weeks to years and an average of $100,000 to solve a single protein structure. NIGMS expects each center to ramp up its operations to solve 100 to 200 structures a year and significantly reduce the cost per structure.

Although it is clearly too early to predict the eventual impact of the Protein Structure Initiative, it promises to open a whole new chapter in biomedical research.