
In light of the current outbreak, I thought that I would share some of what I’ve learned about the molecular biology of the coronarviruses in general and the Covid-19 virus in particular. My goal will be to describe how the virus acts in terms mostly devoid of the technical jargon that makes molecular biology so difficult for most people to understand.
First of all, what’s a virus? Some people have suggested that viruses aren’t alive. Of course, that isn’t the case. Viruses don’t move about like tigers or grow like poison ivy, but they are organisms that can reproduce, undergo evolutionary changes, and are subject to Darwinian selection. These characteristics, to my mind, make them as alive as any other creature on earth. To be sure, they are obligate parasites, meaning that they can’t do much of anything without the help of a host. In the case of the coronavirus, their hosts are animal cells (there are other viruses, called bacteriophage, that use bacteria as hosts.) In order to survive, they must somehow insert their genetic material into living cells and utilize the machinery that they find there for their own reproduction.
Coronaviruses are unlike most other organisms (and some other viruses) in that they don’t have genes made of DNA. Their genome is made of a single strand of RNA. It’s a big piece of RNA, about 30,000 bases long, the largest in any RNA virus. Recall that RNA looks a lot like DNA. It is a polymer of four bases (ribonucleotides), called A, U (instead of T in DNA), G, and C. Like DNA, it can be replicated to form a complementary strand. The RNA strand that is present inside coronaviruses is arbitrarily called by virologists the “plus” strand. In order to reproduce, the virus must somehow get this RNA into a host’s cell where its ultimate goal is to make new viral offspring. I’ll have more to say about this process below.
An electron microscope image of some coronaviruses is shown above. Coronaviruses are quite small, with a diameter of somewhere around 100 – 120 nanometers. By contrast, an average human cell is some 50-100 micrometers in diameter, about 1,000 times bigger than these viruses. Accordingly viruses pass though most filters and can only be imaged using an electron microscope.
How does this tiny organism manage to infect animal cells? How does it know when it’s encountered its target? Coronaviruses are envelope viruses, meaning that in their past infection they’ve stolen some of the cell membranes of their previous host and incorporated it into themselves. They’re surrounded by this membrane and embedded some of their own proteins into it. Most prominent among these embedded viral proteins are the spike (S) protein that forms the crown that give these viruses their name. In addition there’s a so-called membrane (M) protein and an envelope (E) protein. The lipid nature of the virus makes it susceptible to soap and detergents that burst it, kill it, and release its contents.
It is the spike protein embedded in the membrane that is the device that the virus uses to identify and attach to its target. Acting somewhat like the sensors on a naval mine, when these proteins detect a complementary protein on their prey, the virus attaches. In the case of Covid-19, the human protein that the virus attaches to is a membrane bound enzyme called “angiotensin-converting enzyme 2” or ACE2. It is the spike protein’s specificity that determines the host that the virus attacks. Sometime in the past, Covid-19 suffered a change in the spike protein so that it no longer could only bind to some protein on the surface of a bat cell, and changed – via a mutation in the viral genome – so that it recognized the ACE2 human target. Because humans seldom encounter bats, there is speculation that Covid-19 may have undergone an intermediary mutation after infecting another mammal like a civet or a pangolin, but ultimately it must have switched its quarry to humans, all because of a change in sequence in the gene specifying the spike protein. Two other similar viruses, SARS-CoV and MERS-CoV seem also to have originated in bats, with civets and camels as suspected intermediate hosts.
Once the spike protein and its target bind together, the virus envelope fuses with the cell membrane and the virus enters the cell. What does it do in this environment?

These two proteins are called “polyproteins” for good reason. They are enzymatically broken apart into something on the order of 17 smaller so-called nonstructural proteins (nsp’s) that make up the replication complex. Some of these nsp’s are proteolytic enzymes involved in the cleavage of the polyprotein. But the main nsp is the one responsible for duplication of the virus: an RNA dependent RNA polymerase (it uses RNA as a template to make complementary strands of RNA). It is aided by another protein that unwinds the RNA and still others that promote fidelity of replication and fulfill other replicative functions.
As mentioned, the first two thirds of the viral RNA is responsible for the synthesis of this polyprotein. The remainder codes for the aforementioned spike, membrane, and envelope proteins as well as several other minor ones. How are they synthesized?
Following its translation into the replicase polyprotein, the viral RNA – the genome – next acts as a template for the synthesis of a half dozen or so additional RNA’s. These begin at different locations along the viral RNA but terminate at the same place; at its end – they’re said to be nested. It’s important to realize that the copies produced by this process can’t act as mRNA’s – they’re the wrong polarity (the complementary strand of either DNA or RNA runs in the opposite direction of its template). But the virus goes ahead and makes complementary copies of these negative RNA’s. In other words, the virus starts with a positive (the viral genome itself); makes nested negative copies of some sequences; and makes complementary copies of the negatives. The result are RNA’s that are in the correct orientation and can act as messengers.
These nested RNA’s (remember that they all started at different points along the viral RNA) code for the aforementioned spike, membrane, and envelope proteins as well as several smaller ones. In addition, they serve to code for the nucleocapsid protein that binds to the viral genome.
Once the membrane, envelope, and a series of accessory proteins are manufactured, the virus is assembled within the internal membranes of the cell. The viral M and E proteins are implicated in this process along with several host factors only some of which have been characterized. The newly assembled viruses depart by fusion of the membrane bound viruses with the cell’s plasma membrane.
Of course, the molecular biology of coronaviruses is very much more complicated than I have laid out. Many more details are described in the review articles that I list below and from which I have appropriated much of the material in this blog.
Comments and corrections are welcomed.
“The Molecular Biology of Coronaviruses” Paul S. Masters, Advances in Virus Research 66: 193-291 (2006).
“Human Coronavirus: Host-Pathogen Interaction” To Sing Fung and Ding Xiang Liu, Annual Review of Microbiology 73:529–57 (2020).
“Coronaviruses: An Overview of Their Replication and Pathogenesis” Anthony R. Fehr and Stanley Perlman, Methods Mol Biol. 1282: 1–23 (2015).