
Proteomics has the incredible potential to reveal the mechanisms underlying biology. In doing so, it may provide the raw materials needed to develop the next generation of more effective biomarkers, diagnostics, and drug targets. In short, proteomics can revolutionize biomedicine.
However, without the help of AI, achieving these aims will be slow-going. In this blog series, we discuss:
- Why proteomics needs AI, and why we need better proteomic data to train models of biology
- How Nautilus data is designed to be well-suited for AI integration
- How AI-ready data may improve multiomic analyses
Today’s post covers how Nautilus data is designed to be well-suited for AI integration
Learn more about AI and biotech on the Translating Proteomics podcast.
High quality, single-molecule protein counts
At Nautilus, we are developing a next-generation proteomics platform, and our core Iterative Mapping method provides direct measurements of single protein molecules. In every cycle of Iterative Mapping, we reveal more information about every molecule in the proteome. We do this by recording whether probes designed to target protein features bind to each of billions of protein molecules immobilized on our massive protein arrays. While these binding interactions are themselves stochastic, repeated cycles build up lists of likely features contained within each protein molecule. Iterative Mapping thereby provides a uniquely information-rich means to define every protein in the proteome at the single-molecule level. Single molecules defined as the same entity by Iterative Mapping can then be summed, and a simple readout of single-molecule counts provided for each distinct entity. In this way, Iterative Mapping provides direct measurements of the protein molecules that control biology. This is exactly what we believe is needed to train accurate models of biology – high quality, detailed, direct, and simple measurements of single protein molecules.
Importantly, the quality of our data isn’t just theoretical. We have empirical evidence demonstrating the Nautilus Platform can produce highly nuanced data that captures the diversity of proteome. For example, we’ve shown that we can detect molecular entities present at as low as 1,000 parts per million in our proteoform studies, and our quantifications of these molecules are linear over greater than 3 orders of magnitude. Furthermore, experiments quantifying proteins in simple mixtures show we can measure proteins present in yoctomole quantities.
We also know that our data is precise, accurate, and reproducible. Even across users and instruments, our coefficients of variation fall below 6%, and our measurements have less than 9.97% error. As a result, our data should capture deep proteomic heterogeneity without being drown out by noise.
Single-molecule proteomics at scale
Our platform is also designed to produce the rich and high-quality measurements discussed above at scale. Our massive single-molecule arrays accommodate billions of proteins meaning every experiment should generate large amounts of single-molecule proteomic data. In addition, we’ve designed our workflows to be as easy-to-use as possible. Thus, we hope that any lab can implement them and produce even more data. Indeed, we aim to enable researchers to produce not just the high-quality data needed, but the volume of data needed for AI and machine learning tools to find patterns and associate them with meaningful biological functions.
We further predict that such patterns will only be visible to AI tools that analyze single-molecule data. Indeed, our data is designed to be suited for AI and machine learning workflows through its richness – it captures nuances that may underlie important biology that cannot be seen on other platforms. Ultimately, it is our data’s quality, simple and direct relationship to biology, and scale that make it ideal for AI integration.
In the next post in this series, we’ll discuss how AI-ready, single-molecule proteomics may improve multiomic analyses.
MORE ARTICLES