eZ-101: DNA Transcription to RNA

Before explaining DNA transcription let’s remember that, life has evolved so the pattern of gene expression in a given cell would be the result of the integration of many signals-in to trigger the appropriate cellular response to the changes of its environment. No wonder that DNA transcription regulatory networks ended to be so complex and sophisticated.

Of course, not every life form has evolved similarly, and consequently transcription mechanisms are slightly different according to the taxonomic group. There are several bibliographic resources, most of them available online, that detail the transcription mechanisms and our goal here is not to duplicate such knowledge. Instead, we will summarize below some generic and specific elements to take into consideration during the design of gene expression vectors.

Generally speaking, transcription is a multistep process comprising initiation, elongation and termination steps:

DNA transcription definition:

DNA Transcription initiation consists in recruiting a multi-protein complex known as RNA polymerase (RNAP) to bind the DNA molecule upstream (in a 5’->3’ orientation) of the transcription start site (TSS or +1). RNAPs are recruited by general transcription factors that are DNA binding proteins, that recognize specific DNA sequences that can be considered as the actual transcription promoter. Additional proteins can positively (activators) or negatively (repressors) regulate the RNAP recruitment.

These regulators often are DNA-binding protein themselves that recognize specific sequences nearby the promoter site. These sites are known as transcription factor sites. DNA region containing activators site are described as enhancers. At last, other proteins can also bind to activators or repressors to potentiate their effect and will be described as co-activators and co-repressors. (schema à produire). Co activators or Co-repressors can also be biochemical compounds.

In prokaryotes, general transcription factors are called sigma factors. Several sigma factors co-exist in a given cell and will specifically initiate transcription of various genes according to the environmental signals.

Specific domains of sigma factors are described to bind DNA sequences at specific positions within the promoter, notably the promoter −10 element (called the « Pribnow box ») and the promoter −35 element (-10 and -35 referring to the distance to the +1 start site)1.

When designing an expression vector to be amplified in bacteria, one may consider that -10 and -35 elements could be randomly present in exogenous sequences (e.g. the transgene of interest), thus triggering unexpected transcription in the bacteria cell, with possibly adverse outcome for the vector amplification. For example, unwanted expression of a human protein in bacteria may cause toxicity and loss of the amplification culture.

DNA sequences of any vector can be analyzed with predictive algorithm to detect putative bacterial promoter, such as the BPROM freeware from Softberry (www.softberry.com). For instance, the well known and used viral promoter CMV does contain such predictive promoter sequence, that, together with unwanted RBS downstream of the +1 start site could result in unexpected protein expression. This is what is sometimes described as promoter leaking.

In eukaryotes, there are three disctinct RNAPs, namely RNA Pol I, RNA Pol II and RNA Pol III. Each of them is involved in the transcription of specific pools of genes. RNA Pol I is specific of ribosomal RNAs (rRNAs) transcription. RNA Pol II is specific of messenger RNAs (mRNAs), small nuclear RNAs (snRNAs), small interfering RNAs (siRNAs), micro RNAs (miRNAs) and long non-coding RNAs (lncRNAs). RNA Pol III is specific of short RNAs such as transfer RNAs (tRNAs), the 5S rRNA and the U6 spliceosomal snRNA.

The initiation of transcription is historically best described for the RNA Pol II system, but recent studies nicely described similarities and differences in the initiation of RNA Pol I 2 and III 3 systems.

Broadly, all three systems resemble to what happens in bacteria in the fact that pre-initiation complexes are formed when general transcription factors are bringing RNAPs to the vicinity of TSS by recognizing and binding core promoter elements. The nature of the transcription factors, recognized sequences and modes of interactions between the molecular partners do vary, as a reflection of the evolutive specialization of each complex.

One specific sequence described in core promoters is the so-called TATA-Box (consensus 5′-TATA(A/T)A(A/T)-3′), that serves the same purpose as the pribnow-box. TATA-box binds the first transcription factor, here the TBP protein -a subunit of the general transcription factor TFIID-, bringing it close to the TSS. However the majority of genes in humans and other eukaryotes are TATA-less genes. TATA-less transcription relies on the presence of other consensus sequences, such as the CAAT-box (5’-GGCCAATCT-3’) as an example.
Considering DNA vector construction, the strength and specificity of any chosen promoters will be dependent of its ability to compete with endogenous ones to attract RNAP. Artificial promoters can be developed using minimal promoters (that contains the binding site of general transcription factor) combined to chosen enhancers. Also, promoters can be engineered to be ‘inducible’ by introducing specific binding sites for transcription factors requiring co-activators or co-repressors.
DNA Transcription elongation is a somewhat simpler process. Using the antisense DNA stand as template the RNAP complex is catalyzing the synthesis of a new nucleotide polymer made of ribonucleotides instead of deoxynucleotides. The resulting RNA molecule, complementary to the antisense DNA strand is therefore a copy of the sequence of the DNA sense strand or coding DNA, but for the thymines (T) that are replaced with uracils (U).

Multiple RNAPs can transcribe the same gene simultaneously. The number of simultaneous transcriptions is linked to the speed of recruitment of RNAPs at the initiation complex, which is dependent on the strength of the promoter/enhancer regions.

DNA transcription RNA

DNA Transcription termination occurs when the transcribed RNA is released from the RNA-DNA-RNAP complex. As for initiation step, there are distinct mechanisms described for either prokaryotes or eukaryotes organisms.

In prokaryotes, two mechanisms have been described, known as Rho-dependent and Rho-independent termination. Both mechanisms will terminate transcription by destabilizing the RNAP complex so it would detach from the elongating RNA. They both rely on the presence of specific sequences in the transcribed RNA that will recruit proteins ‘upstream’ of the progressing RNAP. So, in a way, the RNAP complex will be stopped from ‘behind’ due to the presence of transcribed signals.
In Rho-dependent terminators, Rho proteins are recruited on the RNA by recognition of the Rho utilization site (rut). Rho protein actively translocate along the RNA molecule until it reaches the progressing RNAP and inhibits its activity.

In Rho-independent terminators, the formation of a self-annealing hairpin structure on the elongating transcript is responsible for the RNAP destabilization. Additionally, the NusA protein, recognizing the hairpin, can potentiate the termination.

In eukaryotes, the termination process relies on the cleavage of the elongating transcript. RNA cleavage occurs when the poly-A signal is transcribed into the mRNA and is mediated by the proteins cleavage and polyadenylation specificity factor (CPSF) and cleavage stimulation factor (CstF). Contrary to the eukaryote mechanisms, the RNA-DNA-RNAP complex remains active.

As a consequence, the transcriptional complex continues to progress along the DNA molecule and transcribe RNA during several hundreds to some thousands of base pairs. The resulting RNA molecule, lacking the protective 5’cap is degraded by exonucleases. Eventually the RNAP complex will detach from the DNA molecules, using a mechanism still poorly understood.

When designing multicistronic vectors for eukaryotic cells, one can consider to organize the various expression cassettes so residual transcriptional activity of RNAP will not create interferences between the cassettes. Order, spacing or orientation of the different cassettes are the parameters than can be tested and compared to identify the optimal expression conditions for each multicistronic vector.