I've put some notes together for the next session and thought I'd share them here; it's basically just background of the current reading but should help to orient folks. As I was searching for data and details, I was struck by a few things that really rang a bell: the 3-5 Code Cs talked about long time ago.
The book is
The Origins of Life by John Maynard Smith and Eors Szathmary
Authors say: Evolution by natural selection lacks foresight. A transition may open up new possibilities for future evolution, but that is not why it happened.
Before there were specific enzymes, the maximum size of the genome was about 20 bases. A sort of Catch-22. With a mere 20 bases, one cannot code for an enzyme, let alone the translating machinery needed to convert the base sequence into a specific protein.
But, RNA molecules can themselves be enzymes.
The difference between RNA and DNA is chemically minor: the backbone is slightly different, and one of the bases of DNA, thymidine, is replaced in RNA by uridine. Uridine can replace thymidine as a pair for adenine. Plus, RNA usually exists as a single strand. The error rate in RNA replication is in the range of 1/1000 to 1/10,000. RNA can have a variety of secondary structures. The molecule can be bent back on itself and make loops. The stems of the loops form base pairs. That pairing is always between strands in reverse orientation. DNA and RNA strands have a polarity and pairing can occur only between strands pointing in opposite directions. RNA molecules, therefore, can have a diversity of three-dimensional structures while all DNA have much the same double helix structure. This means that RNA molecules, like proteins, should be able to act as enzymes: ribozymes. RNA can perform both functions: carriers of information and enzymes.
Background on genetic code:
the genetic code is 'written' in a linear sequence in four letters corresponding to two purines, A and G (adenine and guanine), and two pyrimidines, C and T (cytosine and thymine); in mRNA, U (uracil) replaces T. The words of the alphabet comprise 3 letters, thus there are 43=64 permutations or words which are called codons. 61 of the 64 codons code for 1 of 20 amino acids. Since more than one codon codes for a particular amino acid the code is said to be redundant. Four of the 64 codons punctuate the message; one, AUG, is the start signal and three, UAG, UAA and UGA, are stop signals.
Adenine is a nucleobase (a purine derivative). Its derivatives have a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate (ATP) and the cofactors nicotinamide adenine dinucleotide (NAD) and flavin adenine dinucleotide (FAD). It also has functions in protein synthesis and as a chemical component of DNA and RNA. The shape of adenine is complementary to either thymine in DNA or uracil in RNA.
Cytosine is one of the four main bases found in DNA and RNA, along with adenine, guanine, and thymine (uracil in RNA). It is a pyrimidine derivative, with a heterocyclic aromatic ring and two substituents attached (an amine group at position 4 and a keto group at position 2). The nucleoside of cytosine is cytidine. In Watson-Crick base pairing, it forms three(3) hydrogen bonds with guanine.
Thymine is one of the four nucleobases in the nucleic acid of DNA. As its alternate name (5-methyluracil) suggests, thymine may be derived by methylation of uracil at the 5th carbon. In RNA, thymine is replaced with uracil in most cases. In DNA, thymine (T) binds to adenine (A) via two hydrogen bonds, thereby stabilizing the nucleic acid structures. Thymine combined with deoxyribose creates the nucleoside deoxythymidine, which is synonymous with the term thymidine.
In March 2015, NASA scientists reported that, for the first time, complex DNA and RNA organic compounds of life, including uracil, cytosine and thymine, have been formed in the laboratory under outer space conditions, using starting chemicals, such as pyrimidine, found in meteorites. Pyrimidine, like polycyclic aromatic hydrocarbons (PAHs), the most carbon-rich chemical found in the Universe, may have been formed in red giants or in interstellar dust and gas clouds, according to the scientists.[3] Thymine has not been found in meteorites, which suggests the first strands of DNA had to look elsewhere to obtain this building block. Thymine likely formed within some meteorite parent bodies, however may not have persisted within these bodies due to an oxidation reaction with hydrogen peroxide.
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine (uracil in RNA). In DNA, guanine is paired with cytosine. The guanine nucleoside is called guanosine.
Guanosine is a purine nucleoside comprising guanine attached to a ribose (ribofuranose) ring via a β-N9-glycosidic bond. Guanosine can be phosphorylated to become guanosine monophosphate (GMP), cyclic guanosine monophosphate (cGMP), guanosine diphosphate (GDP), and guanosine triphosphate (GTP). These forms play important roles in various biochemical processes such as synthesis of nucleic acids and proteins, photosynthesis, muscle contraction, and intracellular signal transduction (cGMP). When guanine is attached by its N9 nitrogen to the C1 carbon of a deoxyribose ring it is known as deoxyguanosine. The antiviral drug aciclovir, often used in herpes treatment, and the anti-HIV drug abacavir, are structurally similar to guanosine.
Uracil is one of the four nucleobases in the nucleic acid of RNA that are represented by the letters A, G, C and U. The others are adenine (A), cytosine (C), and guanine (G). In RNA, uracil binds to adenine via two hydrogen bonds. In DNA, the uracil nucleobase is replaced by thymine. Uracil is a demethylated form of thymine. … Uracil is a common and naturally occurring pyrimidine derivative. It is a planar, unsaturated compound that has the ability to absorb light. Based on 12C/13C isotopic ratios of organic compounds found in the Murchison meteorite, it is believed that uracil, xanthine and related molecules can also be formed extraterrestrially. In RNA, uracil base-pairs with adenine and replaces thymine during DNA transcription. Methylation of uracil produces thymine.
Uracil also recycles itself to form nucleotides by undergoing a series of phosphoribosyltransferase reactions.[2] Degradation of uracil produces the substrates aspartate, carbon dioxide, and ammonia.
Oxidative degradation of uracil produces urea and maleic acid in the presence of H2O2 and Fe2+ or in the presence of diatomic oxygen and Fe2+.
Uracil is rarely found in DNA, and this may have been an evolutionary change to increase genetic stability. This is because cytosine can deaminate spontaneously to produce uracil through hydrolytic deamination. Therefore, if there were an organism that used uracil in its DNA, the deamination of cytosine (which undergoes base pairing with guanine) would lead to formation of uracil (which would base pair with adenine) during DNA synthesis. Uracil-DNA glycosylase excises uracil bases from double-stranded DNA. This enzyme would therefore recognize and cut out both types of uracil – the one incorporated naturally, and the one formed due to cytosine deamination, which would trigger unnecessary and inappropriate repair processes.
In a scholarly article published in October 2009, NASA scientists reported having reproduced uracil from pyrimidine by exposing it to ultraviolet light under space-like conditions. This suggests that one possible natural original source for uracil in the RNA world could have been panspermia. More recently, in March 2015, NASA scientists reported that, for the first time, additional complex DNA and RNA organic compounds of life, including uracil, cytosine and thymine, have been formed in the laboratory under outer space conditions, using starting chemicals, such as pyrimidine, found in meteorites. Pyrimidine, like polycyclic aromatic hydrocarbons (PAHs), the most carbon-rich chemical found in the Universe, may have been formed in red giants or in interstellar dust and gas clouds, according to the scientists.
The association between amino acids and codon – for example, between UAC and tyrosine – is called the genetic code. Three of the possible 64 codons do not code for an amino acid, but are ‘stop’ codons, terminating translation. When such a stop is reached, no further amino acids are added to the chain which is released from the ribosome as a complete protein. The meaning of the code – that UAC specifies tyrosine and not some other amino acid – depends on the fact that the tRNA with the anti-codon AUG also has attached to it the amino acid tyrosine. … The attachment enzyme assigns a particular codon to a particular amino acid. The code is therefor chemically arbitrary. By altering the sequence of a tRNA, or the specificity of an assignment enzyme, the code would be altered. Mutations, usually lethal, that alter the code in these ways are known. It is still an open question whether the code was always arbitrary or whether there was once a good chemical reason whay UAC specific Tyrosine.
All existing organisms have this system, essentially. There is nothing in between.
The genetic code is highly similar among all organisms and can be expressed in a simple table with 64 entries.
The code defines how sequences of nucleotide triplets, called codons, specify which amino acid will be added next during protein synthesis. With some exceptions, a three-nucleotide codon in a nucleic acid sequence specifies a single amino acid.
codons consist of three DNA bases
the codon UUU specified the amino acid phenylalanine.
the codon AAA specified the amino acid lysine
the codon CCC specified the amino acid proline.
While the "genetic code" determines a protein's amino acid sequence, other genomic regions determine when and where these proteins are produced according to various "gene regulatory codes".
In a broad academic audience, the concept of the evolution of the genetic code from the original and ambiguous genetic code to a well-defined ("frozen") code with the repertoire of 20 (+2) canonical amino acids is widely accepted.
Since 2001, 40 non-natural amino acids have been added into protein by creating a unique codon (recoding) and a corresponding transfer-RNA:aminoacyl – tRNA-synthetase pair to encode it with diverse physicochemical and biological properties in order to be used as a tool to exploring protein structure and function or to create novel or enhanced proteins.
H. Murakami and M. Sisido extended some codons to have four and five bases. Steven A. Benner constructed a functional 65th (in vivo) codon.
In 2015 N. Budisa, D. Söll and co-workers reported the full substitution of all 20,899 tryptophan residues (UGG codons) with unnatural thienopyrrole-alanine in the genetic code of the bacterium Escherichia coli.
In 2016 the first stable semisynthetic organism was created. It was a (single cell) bacterium with two synthetic bases (called X and Y). The bases survived cell division.
In 2017, researchers in South Korea reported that they had engineered a mouse with an extended genetic code that can produce proteins with unnatural amino acids.
A reading frame is defined by the initial triplet of nucleotides from which translation starts. It sets the frame for a run of successive, non-overlapping codons, which is known as an "open reading frame" (ORF). For example, the string 5'-AAATGAACG-3' if read from the first position, contains the codons AAA, TGA, and ACG ; if read from the second position, it contains the codons AAT and GAA ; and if read from the third position, it contains the codons ATG and AAC. Every sequence can, thus, be read in its 5' → 3' direction in three reading frames, each producing a possibly distinct amino acid sequence: in the given example, Lys (K)-Trp (W)-Thr (T), Asn (N)-Glu (E), or Met (M)-Asn (N), respectively (when translating with the vertebrate mitochondrial code). When DNA is double-stranded, six possible reading frames are defined, three in the forward orientation on one strand and three reverse on the opposite strand.
The most common start codon is AUG, which is read as methionine or, in bacteria, as formylmethionine. Alternative start codons depending on the organism include "GUG" or "UUG"; these codons normally represent valine and leucine, respectively, but as start codons they are translated as methionine or formylmethionine.
The three stop codons have names: UAG is amber, UGA is opal (sometimes also called umber), and UAA is ochre. Stop codons are also called "termination" or "nonsense" codons. They signal release of the nascent polypeptide from the ribosome because no cognate tRNA has anticodons complementary to these stop signals, allowing a release factor to bind to the ribosome instead.