Anatomy of a Retrovirus

Human immunodeficiency virus (HIV) is a retrovirus (lentivirus family). Its genome is made of RNA, instead of DNA like humans and other animals, or even like most other viruses. Inside each virus molecule are two identical strands of RNA and three enzymes necessary for viral replication. These enzymes were encoded by the DNA in the previous host cell that released that virus molecule. The genome of HIV creates a domino effect of viral production, resulting in its survival and host death.

HIV genes and genome structure

HIV-1, the most common strain of the virus, only has nine genes. The entire genome is approximately 10 kilobases. The ends of the genome are flanked by long terminal repeats, and the sequence contains regulatory elements bound by Tat and Rev proteins. These proteins are encoded by the tat and rev genes. The most well known HIV genes are gag, pol, and env, which encode the elements making up the virion. However, a number of accessory factors that vary among different HIV strains are also encoded by the following genes: vpr, vif, vpu, and nef. HIV-2 has a tenth gene, a homolog of vpr known as vpx. For a visual representation of the relative gene structure see the HIV Sequence Database.

Structural genes

Gag – the structure of HIV is made up of proteins called capsids (denoted as p#, such as p7). They are encoded by the gag gene. This gene is the driving force behind HIV assembly. For more information on viral assembly and the components of the HIV envelope, see a review by Gottlinger.

Pol – the three enzymes carried by the HIV molecules are encoded in one long protein from the pol gene (actually it is a Gag-Pol polyprotein including all of the proteins from both genes). Protease then cleaves the protein into the separate proteins. This is a step targeted by some anti-HIV medications because without these enzymes, the next HIV molecules cannot replicate. When the virus infects a cell, reverse transcriptase allows reverse transcription, a process in which RNA is translated into DNA. Integrase help that DNA infiltrate the DNA of the host cell, resulting in the translation of the HIV genes into proteins during normal cellular replication.

Env – the outer envelope of the virus contains glycoproteins (denoted gp#, such as gp120), which are encoded by the env gene. These are the receptors for the proteins on human cells, like CD4, that allow infection.

Regulatory genes

Tat (transactivator) – occurs in a shorter minor form and longer major form, it binds the TAR element in the HIV genome to bypass the 5’ polyA tail. It is the first eukaryotic transcription factor known to interact with RNA. For more information on the protein’s role in latency, read a review by Jonathan Karn.

Rev – binds the RRE element on pieces of RNA exports them from the nucleus to the cytoplasm. The importance of nuclear transport to the life cycle of the virus is indicated by the conservation of the gene across lentiviruses.

Accessory genes

Vif (viral infectivity factor) – its expression is dependent on Rev and essential for infectivity but not replication.

VPR and VPU (viral protein R and viral protein U) – appear to aid in virion formation and release. HIV-2 virions that carry the homolog VPX also carry VPR. All three accessories interact with gag and env proteins.

Nef – this protein has a number of functions and is necessary for disease progression. It is one of the first genes translated from the HIV genome during replication. For more information on this virulence factor see the review by Piguet and Trono.

Among the increasing number of identified HIV strains, the genomic structure varies. The mutation of the virus is what allows it to survive, with most mutation being seen in the envelope proteins. But these ten basic genes allow that small particle to wreak havoc on the human body in a similar manner, regardless of clade, group, or type.