Mechanisms of small RNA biogenesis & action

To survive and flourish, cells synthesize mRNAs from DNA genomes and organize their translation into proteins. In addition to this protein-coding capacity, genomes also store the sequence information for diverse non-coding RNAs (ncRNAs). Our group studies the biochemical machineries and regulatory mechanisms that govern ncRNA production. In particular, we use molecular genetics to analyze RNA polymerase enzymes required for genome surveillance in plants.

Genome surveillance in terrestrial plants

Genome surveillance recognizes and targets transposable elements (TEs), distinguishing these mutagenic sequences from genes for key cellular functions. While host genes are evolutionarily conserved over millions of years, TEs can differ in sequence, copy number and position even within a single species. Specialized non-coding RNA polymerases have evolved to selectively transcribe TEs and other silent chromatin regions (Figure 1A). Resulting small interfering RNAs (siRNAs) guide TE silencing and the regulation of repeat-associated genes.

RNA polymerase IV (Pol IV) is a non-coding RNA polymerase that evolved in plants from the ancestral duplication of RNA polymerase II (Pol II) subunit genes. Pol IV transcribes TEs and channels short transcripts to an RNA-dependent RNA polymerase (RDR2) to produce double-stranded RNAs, which are cut into ~24 nt siRNAs by the Dicer-like 3 (DCL3) enzyme (Figure 1A). These siRNAs bind to AGO4, conveying information needed to selectively methylate TEs, scanning plant chromosomes in a molecular dance that began over 450 million years ago.

To explore how genome surveillance is regulated, we are studying Pol IV-RDR2 complexes harboring point mutations in distinct catalytic and non-catalytic domains. This genetic analysis revealed an amino acid motif conserved in Pol IV’s largest subunit, NRPD1, but missing in all Pol II enzymes; we detected this signature N-terminus in diverse seeds plants (Figure 1B). Point mutations in the motif cripple Pol IV’s ability to transcribe and silence TEs via RNA-directed DNA methylation. We hypothesize that the NRPD1 N-terminus evolved to license Pol IV transcription of TEs and repeat-associated genes (Figure 1C).

How does Pol IV perceive novel TE insertions in the genome?

Despite host genome defenses, TEs sometimes escape surveillance and proliferate exponentially. This temporary escape is influenced by genetic factors, chromatin context and the environment. Certain long-terminal repeat (LTR) retrotransposons, for example, are known to circumvent silencing at high ambient temperatures because their LTRs contain heat response elements. What processes or factors allow Pol IV to perceive active TEs to re-initiate genome surveillance? (Figure 2A)

To study Pol IV perception of TEs, we are isolating active TE loci in the Brassicaceae species Arabidopsis thaliana and in the Poaceae species Brachypodium distachyon. Oxford Nanopore “long-read” sequencing facilitates the discovery of novel TE integration sites in these host genomes (Figure 2B). Direct methylcytosine detection from the raw Nanopore signals permits DNA methylation to be profiled at individual TEs while taking account of genomic structural variation (Figure 2C).

How is ncRNA transcription regulated in plants?

Fundamental questions remain about how plant ncRNA polymerases are regulated to achieve adaptive levels of TE repression while avoiding ectopic silencing of host genes. Tapping into the mechanisms that regulate Pol IV and related ncRNA polymerases will allow researchers to adjust the “intensity” of genome surveillance to unlock genomic and phenotypic variation for crop breeding.