Hacking the Genetic Code 101: Simplifying genomes to make them virus-resistant
Written by Lorenzo Mahoney '24
Edited by David Han '24
Maybe your high school biology textbook was wrong after all. Any introduction to the science of life covers the basic importance of DNA, the information-storing particles of our cells. DNA’s genetic material is composed of just four nucleic acids: Adenine (A), Guanine (G), Cytosine (C), and Tyrosine (T). Their extensive patterning forms genes that are transcribed and translated into cellular proteins, converting information into products. But beneath this surface of simplicity, modern scientists are attempting to unlock massive potentials by imploding DNA’s principal components. Compression of the genome’s complexity has yielded exciting but distant prospects of new cellular capabilities including viral immunity.
Such revolutionary science emerges from its natural foundations that have lasted billions of years. Just four nucleotides are responsible for the differential outcomes of a skin cell to a blood cell, a jellyfish’s structure compared to an apple tree. The wide amount of genetic information is processed in individual nucleotide triplets, called codons (ex: TCA, GGG, etc.). Nucleic patterns are processed into proteins by converting codons into individual amino acids. Digging deeper into the linearized information flow, its mathematics reveal some billion-year oversights. Read in groups of three out of a group of four possible nucleotides, there exist 64 distinct codons in each genome. With one codon corresponding to one amino acid, the individuality should carry over to the protein domain. However, most organisms rely on 20 canonical amino acids. This numerical imbalance leads to multiple codons encoding for the same amino acid, known as codon degeneracy. For example, the amino acid leucine is represented by six unique codons. The redundancies of the genome can be evolutionary explained as insurance against single nucleotide mutations, as many redundant codons only differ at one base. A mutated codon could still result in the same amino acid, thereby ‘silencing’ the mutation. Still, with over thirty degenerate codons, how many do you actually need?
Researchers have begun investigating this natural redundancy with artificial tinkering. Working in the entire genome of E. coli, a team from Cambridge was able to scan through the four million nucleotides and recode every instance of three separate codons into their equivalents . The first two degenerate codons encode for serine, which still had four other equivalents left untouched, while the final removal candidate is a ‘stop’ codon, responsible for ending protein translation. Stopping translation is a vital function, but there are two other codons with the same exact impact. To achieve the massive task of codon replacement, the bacterial chromosome was sliced up into 37 smaller, more manageable segments before being edited at the nucleotide level using CRISPR technology. CRISPR has incited a modern revolution in the possibilities of genetic editing, using guide nucleotide sequences to direct DNA-cutting machinery to exact locations and enabling direct and precise nucleotide deletions, substitutions, and additions. The technology enabled the recoding of over 18,000 codons, removing all instances of TCG/TCA/TAG triplets without interfering with predicted protein products. Upon reassembly, the updated genome showed cellular viability with minimal difference in gene expression and protein production. This new strain, dubbed Synthetic-61 (Syn61) for its downscaled number of codons, displayed the non-essentiality of specific degenerate codons and opened up a biological can of worms for scientists to further remodel the genetic basis of the cell.
Eliminating codon presence reduced the genome beyond individual nucleotide reassignments. For each codon, specific transfer RNA (tRNA) matches the triplet sequence to the appropriate amino acid. For a stop codon, this takes the form of a release factor protein. Within Syn61, such machinery for the recoded triplets became outdated. In further E. coli variants, their separate removal from the Syn61 genome showed no functional consequence . With a condensed genome, the bacterial cell paradoxically gained expanded capabilities. The lack of standard machinery becomes a benefit when you consider the 64-codon necessities of some of the cells’ biggest enemies: viruses. Reliant on their host to express its viral genome, the infectious agents proliferate thanks to cellular tRNAs. Viral reproduction could be seriously halted in the face of the newfound tRNA dispensability in Syn61. To test the viability of this silver bullet to viral infection, the newly synthesized Syn61Δ3 (now losing all three decoding factors) strain was incubated amongst a cocktail of bacteriophages, viruses targeting bacteria. Previous research showed that the simple exclusion of a stop codon and its release factor only proved mildly efficient at resisting bacteriophage infection in E. coli . Resistance was dependent on whether the virus utilized the specific stop codon in its expression, as the one removed is the rarest. With restrictions now on more commonly used amino-acid encoders, the Syn61Δ3 strain showed a ‘broad resistance to phage’ . The viruses were unable to replicate themselves nor kill the bacteria without its usual arsenal of machinery. Against a variety of different infectious bacteriophages, the simplified genome proved to undermine the basics of the viral life cycle.
So, is getting rid of a couple of codons the end of all viral pandemics? Not quite. Firstly, the 23-chromosome human genome is magnitudes more complex than the circular E. coli equivalent, both in regulation and structure. For example, the TAG stop codon appears around 750 times in the standard E. coli genome. In the human genome, the number skyrockets to over 32,000. Still, the mountain of complexity hasn’t stopped scientists from trying to chip away from the gargantuan task posed by our own genetic code’s scale. Human cells in vitro have undergone codon reassignment, where the introduction of a single CRISPR guide has the capability to alter over 30 genes . At the same time, this base-editing technology also delivered off-target mutations in essential genes. For a perfect target-specific editor, the accuracy of current technology must raise 60-fold. On top of the size and technological constraints, E. coli has separate release factors for each of its three stop codons, while human cells utilize a single factor for all three . Therefore, the elimination of TAG’s release factor (as seen in Syn61Δ3) would impede all genetic expression, making all protein production impossible. To successfully recreate E. coli’s viral resistance in human cells, researchers are stuck between opposite interests: targeting amino-acid codons requires a newfound scale and technology while targeting the less numerous stop codons doesn’t adequately restrict viruses without halting all cellular translation. As the frontier of cutting-edge genetic editing expands, the link between codon degeneracy and viral immunity will be further clarified. The sheer number of codons available for reassignment holds promise, but interventions will have to be host-cell specific and downgraded genomes must be tested for longer-term effects than just immediate viral infection . With continued effort in the area, perhaps biology textbooks of the distant future will remark on the outdated, 64 codon genomes as a thing of history.
1. Fredens J, Wang K, de la Torre D, Funke LFH, Robertson WE, Christova Y, et al. Total synthesis of Escherichia coli with a recoded genome. Nature. 2019 May;569(7757):514–8.
2. Lajoie MJ, Rovner AJ, Goodman DB, Aerni HR, Haimovich AD, Kuznetsov G, et al. Genomically recoded organisms expand biological functions. Science. 2013 Oct 18;342(6156):357–60.
3. Robertson WE, Funke LFH, de la Torre D, Fredens J, Elliott TS, Spinck M, et al. Sense codon reassignment enables viral resistance and encoded polymer synthesis. Science. 2021 Jun 4;372(6546):1057–62.
4. Chen Y, Hysolli E, Chen A, Casper S, Liu S, Yang K, et al. Multiplex base editing to convert TAG into TAA codons in the human genome. Nat Commun. 2022 Aug 2;13(1):4482.
5. Tang H, Zhang P, Luo X. Recent Technologies for Genetic Code Expansion and their Implications on Synthetic Biology Applications. Journal of Molecular Biology. 2022 Apr 30;434(8):167382.
6. Wang F, Zhang W. Synthetic biology: Recent progress, biosafety and biosecurity concerns, and possible solutions. Journal of Biosafety and Biosecurity. 2019 Mar 1;1(1):22–30.
7. Venter JC, Glass JI, Hutchison CA, Vashee S. Synthetic chromosomes, genomes, viruses, and cells. Cell. 2022 Jul 21;185(15):2708–24.
8. DeBenedictis EA, Carver GD, Chung CZ, Söll D, Badran AH. Multiplex suppression of four quadruplet codons via tRNA directed evolution. Nat Commun. 2021 Sep 29;12(1):5706.