It is one of the more striking generalizations of biochemistry - which surprisingly is hardly ever mentioned in the biochemical textbooks - that the twenty amino acids and the four bases, are, with minor reservations, the same throughout Nature.
If, for example, all the codons are triplets, then in addition to the correct reading of the message, there are two incorrect readings which we shall obtain if we do not start the grouping into sets of three at the right place.
How is the base sequence, divided into codons? There is nothing in the backbone of the nucleic acid, which is perfectly regular, to show us how to group the bases into codons.
For simplicity one can think of the + class as having one extra base at some point or other in the genetic message and the - class as having one too few.
Do codons overlap? In other words, as we read along the genetic message do we find a base which is a member of two or more codons? It now seems fairly certain that codons do not overlap.
Attempts have been made from a study of the changes produced by mutation to obtain the relative order of the bases within various triplets, but my own view is that these are premature until there is more extensive and more reliable data on the composition of the triplets.
A comparison between the triplets tentatively deduced by these methods with the changes in amino acid sequence produced by mutation shows a fair measure of agreement.
We are sometimes asked what the result would be if we put four +'s in one gene. To answer this my colleagues have recently put together not merely four but six +'s.
A final proof of our ideas can only be obtained by detailed studies on the alterations produced in the amino acid sequence of a protein by mutations of the type discussed here.
The meaning of this observation is unclear, but it raises the unfortunate possibility of ambiguous triplets; that is, triplets which may code more than one amino acid. However one would certainly expect such triplets to be in a minority.
It now seems certain that the amino acid sequence of any protein is determined by the sequence of bases in some region of a particular nucleic acid molecule.
The balance of evidence both from the cell-free system and from the study of mutation, suggests that this does not occur at random, and that triplets coding the same amino acid may well be rather similar.
It would appear that the number of nonsense triplets is rather low, since we only occasionally come across them. However this conclusion is less secure than our other deductions about the general nature of the genetic code.
It seems likely that most if not all the genetic information in any organism is carried by nucleic acid - usually by DNA, although certain small viruses use RNA as their genetic material.