Biocomputers

A mostly speculative post on the far-ish future of biology.

This essay’s a spiritual success to my previous post on the subject. If you’re an investor feel free to invest with that essay’s thesis in mind :) . I’d like to take a few steps forward into the future and try to reason backwards to where we are now. I began the other essay with a comparison to the mainframe era, and I’d still like to draw on the computing metaphor.

Most people identify Intel + the microprocessor as a key innovation in the whole computing revolution. The same could be said about the Apple II, which finally incorporated the microprocessor into a consumer-ready, integrated product. I won’t argue for against either for marking a new age. Either way, those technologies were unequivocally tied together, they bookended the period where the microprocessor led the way to general purpose computing for everyone.

The integrated circuit was the culmination of billions of dollars in R&D, and today the heir to that technology is the iPhone 8, which holds some $150 trillion dollars just in transistors from 1957. These devices let you do essentially anything and were the corner stones of global communications and global money. A person could live their life with just a phone.

I wonder what set of innovations might allow for the equivalent exponential jump in biology, the microprocessor for biology. What’s the equivalent of a general purpose computing device in biology, and why would we even want one? 

First, let’s look at the definition of the microprocessor according to Wikipedia.

"The microprocessor is a multipurpose, clock driven, register based, digital-integrated circuit which accepts binary data as input, processes it according to instructions stored in its memory, and provides results as output.”

If we swap out binary data for DNA, that sounds a lot like what a nucleus does. The speed and accuracy with which we can create new strands of DNA is limited right now. Biology is, of course, general purpose. The same DNA that codes humans can be used to code algae. However, most DNA is assembled for a specific purpose. The software, the ACTGs of DNA are still way too expensive to sequence. Additionally, de novo gene synthesis and assembly, or making long DNA strands from scratch, is doubly plus expensive. While we herald a $1000 human gene sequence, and soon a $100 human gene sequence, it really needs to be close to zero. While a single base pair costs $0.02 to synthesize, this also should be close to zero. 

Why do I think $0.02 is way too high? Well, think about it this way. If every line of code cost $0.02, we would not have operating systems or any of the wonderful things we depend on today. To get to truly ubiquitous DNA manipulation the cost has to be ~$0.00000, like manipulating electrons in a personal computer.

In short, a biological microprocessor, a bioprocessor for short would be able to manipulate DNA and spit out the results, or biological and chemical components of whatever we wanted at near zero costs. An integrated biocomputer would take the inputs, the single cells, the small molecules, blood drawn from individuals, other enzymes, and return new cell the right genes inserted. Attached to main bioprocessor would be other chips such as microscopes, perturbation devices, electroporation devices, incubators, bioprinters, fluid + solid handling devices (think needles and other things), as well as being connected to traditional chips.

Fundamentally, having a digital bioprocessor or some personal computer equivalent could lower the cost of creation by several orders of magnitude. The table top sets we have today for home biology are the equivalent of the HAM radio sets. So it will be some time before we have anything really cool. However, biology holds this same property of being an information science. However, like the pre-personal computer + internet era, we had to go to separate sources to gather all of our biological material. We travel to the grocery store, we go to the mall to buy creams that are synthesized by snails, we go get surgery + pay $ to look different, we go to the pet store to get pets, even the clothes on our back are made from organic materials. If we can download creams, seeds for foods to be grown, and drug treatments, we could enable biological creativity like we have in bits.

One use case that bioprocessors could dramatically influence is human drug/medical treatment. Martin Shrekli and the latest EpiPen snafus could be avoided by at home production of molecules and treatments. If the cost of treatments are zero, then how are drugs/treatment costs to be amortized? To create a blockbuster drug today costs billions, so what happens when individuals are able to “download” medicine for free? Of course, this is a moral dilemma. Orphan drug disease gene therapy treatments cost consumers $500,000 for one treatment. That seems a bit outrageous. 

Business Models for Biology

Bioprocessors should hopefully have two first-order effects on biology--decreasing the cost of production and distribution. We just have to look to software, as we’ve seen with the internet, a radical shift in the costs of distribution has, and will continue to reshape industries. 10-1000x cost reductions leads to startups. Disrupting industries. With the internet, everything either became free, had a SaaS/APIs model attached, or birthed a marketplace. Each download or use will cost some amount, like hitting an API endpoint.
  • Music -> piracy (zero cost distribution) + lower production cost = free initially, but now SaaS model, litigious for sure.
  • Movies -> high production costs, lower discovery/distribution cost =  SaaS model (Netflix)
  • Banking -> high production/integration cost = Now have an API for this. We have Stripe.
  • Housing -> high production cost, high discovery cost = Marketplace Model (Airbnb)
The same will happen with biology. The effect on food will be different than that of pharma and that’s related to the market dynamics of production, distribution, and reputation. All these elements add to transaction cost, and as we know, transaction costs govern where fat businesses are made. Sit on top of a fat pipe of transaction costs and win money for a long time. A worry people have is drug piracy. If the cost of downloading a drug effectively drops to zero, then what happens to the dollars that need to go into research.

There are a few effects of a bioprocessor and associated peripheral devices could have on drug development. The cost of research should be way lower, allowing more drugs to come on the market, however determining efficacy will still be hard, so brands or marketplaces should establish themselves.

However, free in biology isn’t necessarily bad. People don’t always need to be motivated by monetary ends (directly) to contribute, the Debian ecosystem has had ~$20 billion of work put into free software. And this isn’t just random stuff. It runs on almost any internet connected server. We depend on it for critical infrastructure. We could potentially have freely designed seeds that are pest resistant that farmers could use instead of ones controlled by the huge pharma companies.

We might have a SaaS like business model for individuals to purchase treatments (Illumina, Gene Therapy Market??? -> have the right idea). However, we’ll have to deal with data security. Medical records are worth 20x your credit card information on black market. There is no way that I would want my health information to be hacked. A more fun SaaS business might be custom designed hair product and colorizer. First, input a strand of your hair, enter the desired hairstyle and texture, and out comes a specially designed set of creams that actually changes biological hair growth from the follicles, If we actually change the follicles, then we can change the color and texture of our hair at will for longer, cheaper, and safer than we do now.

If we go to space, we’ll certainly need and want different biological tools. Space radiation can kill, just as scurvy killed people. Space radiation can also be curtailed by 4 SNPs that potentially could be free. A digital biocomputer would be a necessary tool. We’re not going to have a lot of space on those space ships and we’re going to need to bring a lot of things. The best way of compressing things is through just information.

All these of these are possible arrangements for how the bioprocessor changes production and distribution of organic materials. But we’re sadly still a way’s away.

Today: Complexity

Computer scientists severely underestimate the complexity of even single cells. These things are really, really complex to model and build, especially if you want to get to atomic scale precision. Atomic scale precision is often what you’ll need, after all polymerase is atomically precise. It manipulates several atoms into place, and we can thank evolution for that. We only have several mutations per our few billion base pairs. To do that level of simulation we need to assume Moore’s Law continues 50 years into the future (so we’ll basically need Quantum Computers to continue that trend) to simulate one cell. For a whole brain simulation, we’ll need 100 years for that. Another example of complexity is protein structure.

We’ll either need to reduce the modeling accuracy of our systems (as we’ve done with deep learning) or use biological techniques in addition to computational models. We can use bioprocessors as a model of studying, directing the evolution of cells, of creating anything we want. On our way to a glorious biologically infused future we have many roadblocks to creating components for a bioprocessor and or personal biocomputer.

A future post will speculate in detail on 1) what a bioprocessor actually looks like 2) who’s working on this stuff now and 3) what else is holding us back.

Biology in the Coming Years

If I had to compare the development of the synthetic biology/biotech stack to that of the computer, I would say we’re still pretty early. In biology, we’re in the big mainframe era, before the development of the transistor and integrated circuit.


Here's my thinking:


Biology Today Mainframe Era
Long Dev. Cycle Times/Sharing resources Waiting for western blots and gels to run… Waiting for cultures to grow. Few hours to a few days. Trying to get mainframe time to run programs. Few hours to a few days.
Low Debugging No idea if an organism works until actually produced (no in silico modeling) Punch Cards!!! and No compiler
Low reusability/reliability of parts Genes often don’t work outside of their original organism Vaccuum tubes get moths stuck in them
Fragmented community Limited hackers, mostly stuck within universities limited hackers, mostly stuck within universities
Low Abstraction Individual Gene Sequences Punch Cards/Machine Code
Low Complexity of Programs

Today: Yeast that makes beer and a scent

Future: Designer cows??
Then: Computing missile trajectories

Today: Google

And moreover, right now, Ph.D. students and Undergrads are oftentimes just manual labor.

  • Compare:  to 
These student while credentialed as ever don’t touch the interesting problems like experimental design, have much of a say in what projects they work on. I can personally attest to this. For the few short months that I worked in a cancer lab, I was bored to tears. I spent the first week excited from learning to perform different protocols. The next few months were spent being bored to tears. Day in and day out, all I did was move a small amount of liquid from point A to point B. The automation of labor will bring huge headwinds.

It’s not all bad news. Just as the mainframe era evolved into the computer revolution, the bench-work era in biology will give way to a cloud-based, automated version of biology. This is great news for the general public and a great business opportunity. Here are the startups that are bringing a CS approach to biology.


  • The “App” Layer” -> Machine learning applied to discovery: These companies are using large data sets and deep learning techniques to make biological products to sell.
    • Existing drugs: Mine drug databases to find new combinations that will work for treatment on different diseases. This is a huge growth area and makes a lot of sense for a deep learning company firm to enter the market. Since drugs combinations don't have to go through Stage 3 Clinical Trials again, and only have to prove that the drug combination is safe, this can give a capital efficient method to producing cures.
    • Molecular: Companies that are making small molecules to treat disease. Atomwise is the most successful company in this space. This also seems like a type of data that deep learning techniques are able to represent more easily than the complex biological circuits. http://arxiv.org/pdf/1510.02855.pdf
    • Genomics/Biologics: These companies are using ML/DL techniques to create useful DNA Sequences and Antibodies. 
    • Organisms: These companies create functional microbes that do different things. End users buy products that these microbes produce--fragrances for perfumes, oil, and therapeutics. Although these companies might use machine learning, this process is more about trial and error and iterative design, compared to the more automated process of small-drug discovery.
  • The “Backend” -> The "Biological Data Analysis Software: Companies here either sell analysis software or offer specific recommendations based on their proprietary algorithms to clinicians, end consumers, or researchers. I’m not sure who will win in this space, as I don’t think it’s clear that having large datasets are very defensible. I think this mostly because the cost of data acquisition is decaying exponentially. I think this may be a reverse situation to consumer internet companies. Where data is easy to get, but the algorithms are the important things. See Craig Venter’s attempt at monetizing the very first full human genome sequencing that failed. Is the timing right, now? 
    • -Omics: Besides our genes, there is RNA, small molecules (like lipids), proteins that make up our cells, and their own “-omics” which respectively are transcriptomics, metabolomics, proteomics (and don’t forget the microbiome. HLI and iCarbonX are the two largest companies trying to make sense all this stuff.
    • Genomics: Genetic analysis software that goes to researchers and clinicians that help drive better decisions.
    • Consumer: Recommendations are given to end consumers. It’s interesting to see that a large consumer player is transitioning from making money on selling tests/data to developing drugs. Will other players follow?
    • Imaging and Misc: More biological data such as image data, ultrasound, or public health. There’s a lot of interesting things that can happen here. Using MRI data to help doctors diagnose PTSD and other neurological conditions is one big thing that comes to mind.
  • Protocol Layer -> Distribution of existing datasets: These companies provide what data there is, how to share data, and how to compute on data.
    • -Omics: Public organizations provide data sets. Companies like Google Cloud Platform allow you to store large data sets and analyze them to a certain extent.
    • Genetic Variation: Companies here are able mapping out the variation within genes.
    • Circuits: These companies build off the popular iGEMs competition and the synthetic bio movement to provide a reusable set of genes to build with. These are usually free to the public, however, organism discovery companies usually have proprietary gene and circuits that they use.
  • The Internet -> Collaboration Software for People: These are more traditional software products—content platforms, data sharing, and design tools.
    • Literature and the Research Network: There are many attempts at making journal articles easy to find and researchers more accessible.
    • Protocols: These are attempts to make biology more reproducible through the creation of standardized languages to describe experiments in discrete, repeatable steps.
    • Gene Design Tools: The IDEs for biology. Software here is trying to make genes and organisms easy to build with WYSIWYG and visual interfaces. A lot of these products are put out by DNA synthesis companies that want to make the designs scientists produce… for a profit.
  • Creating a Functioning Lab: Funding and bench work are broken. Moving towards a fully automated lab.
    • Funding/Equity Models: Everyone knows that basic research funding is broken. Both the number and average size of grants is decreasing. There are many crowdfunding competitors here. There’s an interesting attempt at creating “equity” with the blockchain.
    • Machine Automation in the Lab: Companies here are looking at the hardware in the lab. Different approaches include an Uber for Lab Experiments, an AWS for experiments, and creating remote access for your own lab.
    • Automating Assays: Taking care of the mixing and matching of assays/reactions within a lab.
    • Lab Management Software: Traditional software that is trying to get a lab functioning better.


My initial thoughts on investment themes:
  • The AWS for lab automation as well as computation will be huge. Automation frees up more than man hours, the lower cost of science will allow scientists to conduct ever more research. Biology has historically been a pretty good adopter of computer techniques to model/simulate/discover organisms. However, historically all three things necessary for machine learning—data, computational capacity, and the algorithms haven’t been able to handle modeling of biological systems. All three areas are now changing. In the past, 1 petaflop would have cost infinite money, now this only costs $400 dollars on AWS. By 2020, we’ll be producing more genomic data than is uploaded to Youtube. All this data will need to be stored safely and computed on. Deep learning in discovery is only going to become more interesting as those algorithms continue to develop.
  • Continuing machine learning’s march into basic research/medicine. There are lots of attempts at making sure research is read, and that people can collaborate, but is that the right approach? Even now, there's not enough time for a biologist to stay on top of current literature. Although early, there are attempts extracting structured data from literature and pushing them through Watson to synthesize finding. After synthesis, researchers or clinicians can use data to create new experiments/make more informed decisions. This will only quicken as adoption of a high-level language used to describe experiments that are machine readable spreads.
  • How to share data is an open problem: There haven’t been many businesses that are trying to build large scale open sharing of genetic info/data sets. Although both HLI and iCarbonX endeavor to aggregate huge data sets to (in the long term) create medicines that extend human lifespan, their short term plan is to sell sequenced consumer data to drug companies thru B2B licensing agreements. This places the valuable data outside the hands of smaller researchers and gives patient data to large companies. I’d be interested in seeing how bitcoin (and especially 21) play into the development of open sharing in biology. With projects like https://github.com/joepickrell/genome-server-21 and https://github.com/joepickrell/phenopredict21 happening, bitcoin shows it's flexibility. Although this was a proof of concept, I think analysis of data, has the potential to put personal health data sharing in the hands of the people rather than doctors and companies.
  • Developing direct relationships between patients and drug companies. Many companies are taking a very new model for finding patients. These companies are directly developing relationships with patients/users of their drugs. Instead of partnering with hospitals and large health care networks to find study candidates, they can do so with a lower cost of capital with the internet. 23andMe is a shining example.
  • Bio is becoming a lot cheaper.  Look at the Perlstein Lab. They're able to do drug and mouse studies on software startup run rates.

Work being done by these companies to bring biology up to software speed is incredible. But what does it really mean for end consumers? What kind of products will we see? Here are my predictions for what we'll see by the end of 2020: