Model Building Hypothesis Generation Science

The role of data collection will always be one of the foundations of science and science education. In the epistemological foundations of science proposed furthered by John Dewey, science is to be regarded as more of a process than a body of knowledge, and the laboratory is central to not only science but science education. This centrality of data collection proposed by John Dewey is still reflected in the publishing policy of most scientific journals. An example is contained in the directions for authors of the online journal Public Library of Science One ( “We will not consider: Case studies, Hypothesis papers, Reviews, Commentaries or essays, Opinion pieces, Any other type of secondary literature”.

So then, the rest of this paper is dedicated those authors who have made major contributions to science in the “secondary literature” market.

First and foremost, in any discussion of the importance of interpretation of scientific data is Albert Einstein. The major scientific question in the physics of his day was the nature of light. Two previous scientists, Michelson and Morley were looking for the proposed “ether” of the universe that transported rays of light. They reasoned that since the earth moves through space at a significant speed, the speed of light might be variable depending upon the direction in which it is traveling relative to the earth. Their results showed that light travels the same speed in any direction. This seemed to present an intractable based upon many other observations of the properties of light. It was Einstein who proposed the “special theory of relativity”, in which time was variable, not constant. He was subsequently quoted as saying “If Michelson-Morley is correct, relativity is correct”.

Initially of course, Einstein’s concepts were dealt with derision and confusion. Of course, after time passed, and generations passed, Einstein’s concepts became the new foundations of physics.

Reinterpretation of published and public data is not limited to astronomy and physical science. Watson and Crick built their famous double helix model, the modern representation of DNA, from published chemical and X-ray crystallography data. Well, published sort of. Watson and Crick were working from published as well as unpublished data of Rosiland Franklin. What is known is that Watson and Cricks had capable competitors. Among them was Linus Pauling, who had already constructed a double helical model. Watson and Crick believed the Pauling model was in error, but time was critical to complete theirs. Franklin, for whatever reason, was not a fan of the helical model, or at least that was the message she seemed to present to her colleagues. Her Joke Card probably may have been interpreted to mean that “model builders” were not going anywhere without her data. How wrong she was. Her data was published in the same issue of Nature as Watson and Cricks, but even after publication she seemed to remain skeptical of the helical model.

Many times, analysis of pathological and epidemiological data can lead scientists to predict a particular pathogen before it is discovered. That is the case in the work of Alfred Knudson[1]. Knudson studied the incidence of a form of cancer called retinoblastoma in patients admitted to his hospital. In brief, retinoblasts are developmental, or undifferentiated cells of the retina, and retinoblastoma is a form of cancer that can occur if they fail to fully differentiate into retinal neurons, and as such, exit the cell cycle. In 1971, after studying retinoblastoma incidence from 1944 to 1969 in his patients, and using additional published studies, Knudson published a model predicting that retinoblastoma resulted from loss of heterozygosity of a single gene. That means that a patient had to take “two hits” before a tumor would become active. That began the search for the gene, and in 1986 the RB or RB1 gene was isolated. RB remains an oncogene of great interest, and although much has been learned about RB since its isolation over 25 years ago, it still has mysteries to reveal[2].

When Harold White was a post-doctoral fellow working in the laboratory of Nobel Prize winner Konrad Bloch, he developed a passionate, yet speculative hypothesis that many of the nucleotide based co-enzymes, or vitamins, that he was studying were in fact “fossils” that were left over from very primitive forms of life. In fact, he was very tempted to use the word fossils, a very unorthodox word in a biochemistry laboratory. He presented his ideas to his advisor, Dr. Bloch, and was predictably given a dismissive wave. But passion ruled for the young Harold White and he eventually published his hypothesis using exactly the language he felt most comfortable with.[3].

The ground breaking observations in “metabolic fossils”, in conjunction with work done by Tom Cech regarding a sequence of RNA discovered in the pond organism tetrahymena, led Walter Gilbert to pose and describe “RNA world hypothesis”.

A great problem in speculating how life began is which came first, DNA or proteins. In modern life forms, the “central dogma of molecular biology” is that DNA is an information

molecule and proteins do molecular work. The discovery, or recognition that one could make a life form completely of enzymes made from RNA led to a new hypothesis of the beginning of life. Once this new hypothesis was in place, it led many scientists to the race for the first self replicating molecule. This molecule was proposed to be an RNA polymerase that was composed completely of RNA. Current nucleotide polymerases, or chain builders are made of proteins. There is one near exception though. The amino acids that make up proteins are ligated by a molecular machine made of RNA called the ribosome. In an earlier metabolic state, all of the enzymatic work would have been done by ribosome-like RNA machines. An example of a protein/RNA composite molecule that can maintain the ends, or telomeres of chromosomes is telomerase.

Once it was realized that a RNA polymerase could be “designed” by modifying an existing RNA ligase, Tom Cech who had previously played a pivotal role in the elucidation of the function of ribosimes wrote a “hypothesis paper” about what could be done with an engineered RNA.[5]. So, in 1986, this “model” paper set of the race do produce the first self-replicating molecule in the laboratory. It was to be a “not quite living” model of the proposed first biomolecule.

Construction of scientific models is a critical and yet often over looked or even dismissed aspect of inquiry. While data collection is important, there are times when the most important problem will not give up data to the laboratory easily. And of course there is the other end of the data spectrum. In many cases modern automatic equipment can produce data so rapidly that it goes straight to an online data base without much immediate scrutiny. Such is the case with genomics where data relating to DNA sequence is produced by automatic sequencers and published online. The field of bioinformatics is expanding rapidly, and now encompasses not only sequence data, but expression data and epigenteic data. In the future, more discoveries will be made by scientists who work from public data.


[1] Chial, H. (2008) Tumor suppressor (TS) genes and the two-hit hypothesis. Nature Education 1(1) [Full Text]

[2] Takahashi C, Sasaki N, Kitajima S.

Twists in views on RB functions in cellular signaling, metabolism and stem cells.

Cancer Sci. 2012 Mar 26. doi: 10.1111/j.1349-7006.2012.02284.x. [Full Text]

[3] White HB 3rd.

Coenzymes as fossils of an earlier metabolic state.

J Mol Evol. 1976 Mar 29;7(2):101-4. [Abstract]

[4] White HB 3rd.

Konrad Bloch, evolution, and the RNA world

Biochem Biophys Res Commun. 2002 Apr 19;292(5):1267-71.[Abstract]

[5] Cech TR.

A model for the RNA-catalyzed replication of RNA.

Proc Natl Acad Sci U S A. 1986 Jun;83(12):4360-3.[Full Text]