The proteome is the collection of all the proteins inside a cell, organism, or biological sample. The proteome is to proteins what the genome is to genes. Because proteins drive the functions underlying most biological processes, studying the proteome means studying the inner workings of life itself.
Scientists typically study the proteome using techniques like mass spectrometry or traditional protein sequencing, but much of the proteome remains unexplored. Next-generation proteomics technologies will uncover far more of the proteome, vastly expanding what’s possible. We call this coming wave of proteome exploration and application the proteomics revolution.
For a quick video introduction to the proteome and proteomics, check out our “Intro to proteomics” episode of Translating Proteomics.

What is the proteome?
A proteome is all the proteins in a biological sample. A researcher might want to understand the proteome of a single cell, a tissue sample, an individual organism, or an entire species. So, how is the proteome defined? The exact definition of the proteome – and the size of it – depends on the scale of your investigation.
Nonetheless, proteomes are generally huge and complex, making them a challenge to study. A single cell can contain billions of proteins, some of which are encoded in entirely different genes, while others are variations on the same gene product (called proteoforms). But understanding the proteome is worth it: By studying the proteome, you can determine what proteins are found in a cell, how much of each is present, and how they might relate to one another. In other words, you can dig into exactly how cells work.
History of proteomics and the proteome
It first became possible to study the proteome in 1975, when a technique known as two dimensional gel electrophoresis was invented. This proteomics technique involves separating molecules in two dimensions, such as vertically and horizontally, with each dimension corresponding to a different property. This technique outputs so-called “2D gels” which are basically sheets of material with dark spots containing proteins spread across them.
Early proteome studies included analyses of the proteomes of mice, guinea pigs and Escherischia coli bacteria. These first studies were groundbreaking in that they allowed researchers to begin to understand what proteins existed inside these organisms, but they were fairly limited in capability because the spots found on 2D gels often contain multiple protein species. In many ways, the evolution of proteomic technologies has been through advances in our ability to discern distinct protein species – 2D gels can segregate many groups of proteins, but their spots consist of mixtures of poorly separated proteins.
Around the turn of the century, mass spectrometry began to be used more heavily for proteomics research. In this technique, researchers break proteins into fragments called peptides and analyze the masses of the peptides to determine their compositions. They might also fragment the peptides to determine the masses of their constituent parts and get an even better understanding of their make-up. By identifying peptides and comparing them to those expected to be contained in their parent proteins, researchers can infer what proteins are present in a sample. This technique made it possible to increase the resolution of protein measurements and explore more the proteome. Yet, there are many proteins this technique cannot distinguish between, and it often cannot identify proteoforms because that requires the analysis of intact proteins.
In the present day, next-generation proteomics technologies like the Nautilus Proteome Analysis Platform are designed to analyze intact protein molecules with single-molecule resolution. Developers of these technologies aim to enable researchers to identify and quantify every protein in the proteome.
The term “proteomics” itself wasn’t coined until 1995, in a paper studying the proteome of a species of a bacteria. Similarly, the word “proteoform” was first proposed in a commentary in Nature in 2013 as a way to encompass all of the variations to proteins with a single term.
Tools for studying the proteome
There are many different approaches to studying the proteome, including broadscale proteomics techniques (often called discovery proteomics) that explore much or all of the proteome of a sample, as well as targeted proteomics approaches that selectively study just a subset of the proteome of a cell, sample, or organism. Soon, new next-generation proteomics tools may enable far more in-depth and high-resolution studies of the proteome, fully unlocking all of the proteins in a sample.
To this day, mass spectrometry is one of the most common ways to study the proteome. This proteomics technique involves breaking proteins apart into fragments called peptides that are then run through a mass spectrometer. The peptides are separated by mass, which lets scientists identify them and compare them to peptides predicted to be generated from known proteins. In this way, they can infer what proteins are present in a sample.
Another common way to explore the proteome is with antibodies and other affinity reagents. These molecules bind to specific proteins, and, when combined with fluorescent labeling or other techniques, scientists can see which proteins are present by seeing which have been bound. Affinity reagents can also show how much of a specific protein is in a sample, and where that protein is.
Researchers also use a technique called Edman degradation to study the proteome. It’s a form of protein sequencing that strips off the amino acids at the end of a protein one by one, identifying each in turn. Eventually, this shows scientists what sequence of amino acids makes up a protein, which reveals its identity and can be used to model its structure.
Watch our animations to learn how the Nautilus Platform is designed to enable analysis of the proteome and its proteoforms.
From the genome to the proteome
For years, scientists have been asking, “What makes a living cell tick?” If you peer inside a cell, you’ll find lipids, carbohydrates, nucleic acids, and proteins – the four macromolecules that form the building blocks of life. All four combine to form the various structures in cells, from their outermost membranes to their innermost machinery. But when it comes to the mechanisms that make cells function, it’s proteins that get work done.
The proteins a cell can produce are determined by its genome. This DNA contains the blueprints for life in the form of the genetic code. Inside a cell, DNA is transcribed into RNA, which is in turn translated into proteins. This process of moving from DNA to RNA to protein is so foundational to all life that it’s referred to as the central dogma of biology. Yet, the central dogma is not the full story and is often misinterpreted to mean one can predict RNA levels from DNA, and predict protein levels from RNA. In reality, DNA, RNA, and proteins all interact with one another as well as their environments such that you cannot predict everything happening at one level of the central dogma by measuring a different level. For example, RNA levels rarely correlate with protein levels – there are too many ways proteins can be processed outside of transcription for this to be possible. Learn more about the central dogma and misconceptions surrounding it on the Translating Proteomics Podcast.
Ever since the Human Genome Project completed its first sequence 20 years ago, advances from genetic and genomic research have made it easier to learn about every component of the central dogma, including proteins. But truly effective tools for studying all the proteins in the last component of that dogma, the proteome, are still lacking. While we can comprehensively sequence DNA and RNA, scientists are still unable to see the full scope of the proteome. The inherent inability to predict protein composition or activity from the genome or transcriptome means this inability to fully measure the proteome leaves us with a huge gap in our understanding of biology.
The Human Proteome Project
With increasing interest in proteomics, we may soon see the completion of a Human Proteome Project to complement the Human Genome Project. This extraordinary undertaking is organized by the Human Proteome Organization (HUPO). The goal of this international effort is to reveal the function of every protein. As of March 2022, researchers working on the project have reported finding 18,397 proteins, and estimate they have 7% of the human proteome to go.
Characterizing each human protein in this way will supply an important framework for proteome research but is just the first step to understanding the human proteome — like having an encyclopedia entry about an animal in-hand before setting out to study its ecology. To build upon this baseline knowledge of the proteome, researchers will need new technologies that allow them to understand how the proteome varies between cells and organisms, health and disease/stress, in response to drug treatments, and much more.
Size of the human proteome
The size of the proteomes of different organisms can differ widely. Much work to date has gone into studying the human proteome to discover proteins involved in various biological pathways, diseases, and conditions. Despite all that research, there is still some debate over the actual size of the human proteome.
The question isn’t very straightforward. There are around 20,000 protein-coding genes in the human genome. Yet, there are more than 20,000 protein variants in the human body. That’s because mRNA transcripts can be modified before they are transcribed into proteins, and because proteins themselves can be altered after they are translated thus creating many proteoforms. Altogether, there are likely millions of human proteoforms.
Scientists are beginning to try and study all of those proteins, but it’s a difficult task. For example, the Human Proteome Project in 2020 released a blueprint of the human proteome covering a little more than 90% of the proteins that genes encode. As of April 2023, they’ve slightly increased that number to 93%, or 18,397 of the proteins predicted by genomic studies of protein-producing genes.
Even with those efforts, there are still millions of proteoforms to account for, meaning that studies of the human proteome are far from finished. Importantly, many studies get only crude measurements of protein abundance and more quantitative proteomics techniques must be used in efforts to measure protein levels in different kinds of human tissue. Quantifying protein levels more accurately and precisely is particularly important for understanding human biology and combating disease. That kind of work is only getting started.
Next-generation proteomics tools could significantly enhance this type of research. These technologies aim to provide researchers with accessible means of quantifying every protein in a sample quickly. They have the incredible potential to improve our mechanistic understanding of all facets of biology. The Nautilus Proteome Analysis Platform is designed to be this kind of complete proteomics solution achieving both breadth – covering the whole proteome – and depth – accurate, sensitive, and reproducible quantification of proteins and proteoforms.
Applications of proteomics
Unlocking the proteome is as fundamental to biology as understanding DNA and RNA. This means proteomics applications span all sectors that deal with cells and life in general. Furthermore, by integrating data from genomics (DNA), transcriptomics (RNA), and proteomics (proteins), the burgeoning field of multiomics is bringing with it an unparalleled depth of biological understanding.
One of the most valuable things studying the proteome adds to the –omics space is mechanism of action. For instance, the chain of events from a mutation in a gene to a disease phenotype are complex, relying on interactions between the genome, other parts of the cell, and the environment. Proteins directly carry out the vast majority of these interactions and perform most cellular functions. Thus studying proteins can show us the mechanisms through which mutations, drugs, toxins, and more cause changes in biological function. Genomics shows us what proteins a cell could possibly make, but proteomics shows us what proteins are active now, how many of them there are, where they are, and, with specialized techniques, what proteins and other molecules interact with one another. Adding insights from proteomics connects the dots between environment, DNA, RNA, proteins, and biological function.
Being able to identify the proteoforms present in a given sample will be particularly important to studies of mechanism of action. The combinations of isoform-specific sequences, post-translational modifications, and other alterations present on proteoforms help define their molecular functions, and by associating specific sets of proteoforms with specific biological outcomes, we’ll gain a molecular view of function. In our preprint, we show how the Nautilus Platform is already beginning to help researchers understand the roles of specific tau proteoforms in Alzheimer’s. We hope researchers can use our platform for similar studies across biological functions and diseases.
Even if DNA encodes the architecture for these interactions, the proteins are the last step in this chain and the most closely linked to how cells, tissues, and organisms behave. This means studying proteins gives us the clearest views into real-time biology, and manipulating proteins can have direct impacts on processes across all levels of biology.
Check out this the Translating Proteomics podcast to learn how scientists are “Putting the Proteomics to Work“
Lessons from the proteome
Some of the most immediate applications of proteomics are in health. Measuring the proteins in both healthy and diseased cells increases our understanding of disease mechanisms and provides potential protein biomarkers for diagnosis as well as targets for treatment.
Importantly, proteins are generally easier to target with drugs than DNA or RNA, so knowing what proteins are directly involved in a disease provides researchers with clear paths to therapeutic development.
For example, an understanding of the proteomes of cancer cells can enable the development of precision medicines that target the mechanisms controlling tumor growth. In addition, many think diseases like Alzheimer’s are caused by a buildup of proteins. A better understanding of the proteome could help researchers identify biomarker proteins that are indicative of the early stages of this build up and could be targeted to either prevent or reverse this process.
Some research is already using discovery proteomics to find protein networks involved in Alzheimer’s, and linking genes and proteins in different kinds of neurological disease. This proteomics research and more may lead to novel treatments for Alzheimer’s and other neurological conditions.
Find additional applications of proteomics here and read our eBooks to learn more about applications of proteomics in neuroscience and cancer.
Unleash the proteome
We are excited to be at the forefront of efforts to unleash the proteome and bring proteomics applications like those described above to the researchers, doctors, and patients who need them. To learn how Nautilus is driving a revolution in proteomic analysis, subscribe to the Nautilus blog, YouTube channel, and Translating Proteomic podcast.
MORE ARTICLES