Here, we purify and characterize the Retron-Eco2 system from Escherichia coli Cl-1 and determine its cryo-electron microscopy (cryo-EM) structure at 2.6 Å resolution. We find that Eco2 forms a triangular trimer comprising three msDNA-RT-Toprim units. The RNA part of msDNA interconnects RT domains across adjacent protomers, creating a self-inhibitory complex. Toprim domains, arranged as three legs extending from the msDNA-RT core, exhibit RNase activity upon activation, degrading RNA to arrest phage replication. Genetic analyses confirm the importance of Toprim enzymatic activity and implicate the phage-encoded DenB endonuclease IV-like protein in activating Eco2-mediated defense. Although Eco2 activation leads to translational inhibition, the underlying mechanism remains to be explored.
We first analyzed 3395 retron/retron-like RTs sequences and constructed a comprehensive phylogenetic tree (Supplementary Fig. S1). Notably, ~32.8% of these RT proteins are fused with effector domains, such as Toprim, protease, and Toll/interleukin-1 receptor (TIR) domains. Among these, the Toprim domain is the most prevalent, accounting for 37% of the effector associations, as seen in Retron-Eco2. Based on this observation, we focused our structural and biochemical studies on Eco2 to elucidate its function.
To elucidate how the Retron-Eco2 system defends against phage infection, we first cloned and purified the RT-Toprim fusion protein from E. coli S10 (Fig. 1a). In contrast to the challenging purification of the Eco1 RT protein, we found that the Eco2 RT-Toprim fusion protein could be readily purified, which was eluted at the volume corresponding to the monomeric form during size exclusion chromatography (SEC) (Supplementary Fig. S2a). Subsequently, we co-expressed and purified RT-Toprim together with its ncRNA, and observed that the Retron-Eco2 complex assembled into an oligomeric complex (Supplementary Fig. S2a), consistent with previous reports. The presence of msDNA in the Eco2 complex was confirmed by UV absorbance at a wavelength of 260 nm and urea-polyacrylamide gel electrophoresis (urea-PAGE) analysis.
To further investigate the assembly of the Retron-Eco2 complex, we determined its cryo-EM structure at a resolution of 2.6 Å (Supplementary Fig. S2b‒f and Table S1). Overall, Retron-Eco2 forms a trimeric, cyclic assembly composed of three msDNA molecules and three RT-Toprim proteins (three msDNA-RT-Toprim units). In the Retron-Eco2 trimer, the msDNA-RT portions from each protomer together adopt a triangular architecture, with the Toprim domains protruding as three perpendicular "legs" extending from this triangular base (Fig. 1b). The three RT-Toprim proteins are held together by the msDNA molecules. The msrRNA part of each msDNA molecule spans from one RT-Toprim to the next in a head-to-tail fashion, occupying the active sites of the RT domain (Fig. 1c, d). Collectively, these findings demonstrate the structural basis of the Eco2 complex.
Although different retrons produce msDNAs with varying primary sequences, they share structural and functional features. Similar to other reported msDNAs, the IRa1/a2 regions at the 5'- and 3'- ends of the Eco2 ncRNA are complementary and can hybridize. The Eco2 msr region, which is not reverse transcribed, forms a single short RNA stem-loop (RSL), while the msd region, composed of 67 bases, folds into a single long DNA hairpin (DNA stem-loop, DSL) (Fig. 2).
Within the Eco2 complex, the msrRNA segments of the three msDNA molecules form a three-bladed propeller-like structure (Fig. 2b). Each msrRNA comprises a 23-bp RSL (rG20‒rU42) flanked by two single-stranded tails, designated the 5'-end tail and the 3'-end tail (Fig. 2c, f). The 5'-end tail includes IRa2, rG15, and the single-stranded RNA a (ssRa) segment (rA16‒rU19), while the 3'-end tail consists of the ssRb (rG43‒rU51) and the RNA portion of the DNA‒RNA duplex. These tails and the RSL region interact tightly with the positively charged surface of the RT domain, stabilizing the trimeric assembly (Fig. 2d). Each msrRNA tail in one protomer associates with an RT-Toprim from another via electropositive grooves, forming hydrogen bonds between the protein and the RNA bases (Fig. 2d).
In the solved structure, a 9-nt IRa2 is clearly visible, while only two complementary bases (rC8 and rA9) of IRa1 are resolved (Fig. 2c, f). The 2',5'-phosphodiester bond between the branched guanosine (rG15) and dT1 (the 5'-end of the msdDNA) is observed in the density map (Fig. 2c, e, f). The final 11 nucleotides (rG52‒rA62) at the 3'-end of msrRNA and the nucleotides dT57‒dC67 of msdDNA form the DNA‒RNA duplex, which is located within the RT active site (Fig. 2c, f). This configuration shows more paired bases between the msrRNA template and msdDNA product compared to previous reports. Except for dT1 and the DNA part of the RNA‒DNA duplex, the cryo-EM density of the remainder of the msdDNA is weaker, indicating less stable binding and increased flexibility. Such flexibility, particularly in the DSL region, may facilitate its role as a sensor of phage infection.
The full-length Eco2 RT-Toprim protein consists of an N-terminal RT domain (residues 1‒368) and a C-terminal Toprim domain (residues 369‒586). The N-terminal RT domain adopts the characteristic right-hand-like fold found in all other known RT structures, consisting of 16 α-helices and 10 β-strands (Fig. 3a). Structural alignment indicates that the Eco2 RT domain is related to non-LTR RTs, including those from group II introns, human long interspersed element-1 (LINE-1) elements, insect R2 elements, and other eukaryotic non-LTR retrotransposons. DALI analysis further reveals structural similarities to RNA-directed RNA polymerases identified in various RNA viruses (Supplementary Fig. S3a).
We compared several representative RTs from different organisms based on DALI analysis. While the core RT fold is conserved, variations occur in each homologous structure. For example, group IIC introns, DRT2, and LINE-1 possess an additional N-terminal extension (NTE) domain, while Abi-P2 replaces the thumb domain with a helical domain (Supplementary Fig. S3b-i). Among non-retron family RTs, Eco2 RT domain shows the highest structural similarity to the prokaryotic DRT2 defense system, which can reverse-transcribe neighboring noncoding RNA into tandem repeats that facilitate the expression of a toxic repetitive protein (Supplementary Fig. S3a, d).
The trimer interface of Eco2 RT domain is mediated by a loop region (residues 328‒342) primarily through hydrogen bonds involving amino acids D331, R332, and Y333 (Fig. 3b, c). Alanine substitutions of these residues revealed that R332A abolishes Eco2's antiphage activity, while D331A and Y333A have no obvious effect (Fig. 3d). To confirm whether R332A affects trimer formation, we purified the R332A variant and compared it to the wild-type Eco2 complex using SEC. The R332A mutant partially impaired trimer assembly, indicating that this loop region alone is insufficient for trimer formation (Supplementary Fig. S4). Collectively, these results suggest that msDNA mediates the trimer formation of the Eco2 complex, while the RT-Toprim protein alone exists as a monomer.
As mentioned above, the base of the Eco2 trimer complex is a trimeric, triangular assembly of three RT domains and three msDNA molecules. The msrRNA portion of msDNA adopts an RSL flanked by 5' and 3' single-stranded tails. The RSL is sandwiched between RT and RT domains, with the 5'-end binding RT and the 3'-end binding RT, respectively. This arrangement creates five extensive interfaces for RT‒msDNA interaction (Supplementary Fig. S5a). The 5'-end single-stranded tail primarily contacts the RT palm subdomain through hydrogen bonding and electrostatic interactions (Supplementary Fig. S5b). Notably, the rG15, which is crucial for reverse transcription initiation, forms hydrogen bonds with R184. The RSL region makes extensive contacts with the RT palm subdomain, as well as with Q329/R264 in RT thumb subdomain and RT thumb subdomain (Supplementary Fig. S5c, d). Although the Eco2 RSL loop sequence (rA30-rA31-rU32) differs from the loop in Eco1 (rU52-rU53-rU54), they share a similar conformation. In Eco2, rA30 and rU32 (Eco1 counterparts rU52 and rU54) flip outwards, while rA31 (Eco1 counterpart rU53) base-stacks with rG29 (Eco1 counterpart rG51) and aromatic residue H281 (Eco1 counterpart H268). These suggest that an aromatic residue (a conserved histidine) in the thumb subdomain facilitates the stabilization of the msrRNA loop. The ssRb region in the 3'-end tail is stabilized by interactions with residues F46 and R56 in the fingers subdomain, K245/R247/T249 in the palm subdomain, and S253 in the thumb subdomain of RT (Supplementary Fig. S5e). The DNA‒RNA duplex, closely aligned with the catalytic center YADD, engages in extensive interactions within the palm and thumb subdomains, stabilizing the catalytic pocket (Supplementary Fig. S5f).
Analysis of the msdDNA reveals that the last five nucleotides (dC63-dT64-dG65-dC66-dC67) match previous reports, reflecting a post-catalytic state (Fig. 2c, d). Y199 in the YADD motif forms a hydrogen bond network with the bases of dG65 and dC66 (which pair with rG53), while K66 and R70 interact with the phosphate backbone of msrRNA (Supplementary Fig. S5f). The last nucleotide, dC67, is positioned near the RT catalytic residues D201 and D202 (Fig. 4a, b). Interestingly, rU51 -- poised as the next nucleotide to be reverse-transcribed -- is flipped out and stabilized by π‒π stacking with an aromatic residue F46 in the fingers subdomain (Fig. 4b). Additionally, rA49 and rA50 stack with dC67 and rG52 (the template of dC67), respectively. Replacing F46 with alanine compromised antiphage activity (Fig. 4c) and reduced msDNA production in vivo (Fig. 4d, e), indicating that F46 is crucial for msDNA synthesis and defense.
Sequence alignment of diverse retron RTs shows this aromatic residue is highly conserved, often as a tyrosine (Fig. 4f). Analysis of Eco1 and Eco2 RTs structures reveals conserved YADD, NAXXH, VTG motifs and a similar arrangement of key residues (Supplementary Fig. S6a, b). In Eco1, the base of rU72 (Eco2 counterpart rU51) flips out and forms π‒π stacking interactions with dA76 and dG77 (Supplementary Fig. S6c). The Eco1 aromatic amino acid Y51 (Eco2 counterpart F46) forms a base-stacking interaction network with dA76, rU72, and dG77. Further antiphage assays showed that Y51A abolishes Eco1's defense activity (Supplementary Fig. S6d). Together, these findings underscore the conserved role of this aromatic residue in msDNA synthesis and antiphage activity across retron RTs.
The components of Eco2 differ significantly from those of Eco1, which belongs to the type II retron family, with a standalone nucleoside 2-deoxyribosyltransferase-like (NDT) effector. In contrast, Eco2 integrates the RT and Toprim domains into a single protein. The overall structure of the Eco2 RT domain is similar to that of Eco1, with an RMSD of 3.3 Å over 288 residues. Interestingly, while Eco2 msrRNA forms a 23-bp RSL flanked by two single-stranded tails that bridge two RT domains, Eco1 msDNA adopts a roughly five-pointed star configuration wrapping around a standalone RT (Supplementary Fig. S7a, b). Several structural differences in the RT domains may explain their distinct oligomeric states and msDNA binding modes. For instance, the Eco2 α1 and α2 helices are antiparallel, whereas they are parallel in Eco1. In Eco1 the α1 helix mediates the dimer interface, tilting outward by ~13° compared to the Eco2 α1 helix (Supplementary Fig. S7c). Another difference is an inserted β sheet (β-sheet1) in the Eco2 palm domain that could hinder msrRNA binding if it adopted an Eco1-like conformation (Supplementary Fig. S7d, e). Additionally, Eco1 msrRNA has an extra RSL region absent in Eco2, potentially influencing whether the msrRNA encircles one RT or bridges two (Supplementary Fig. S7b, e). Finally, Eco2's thumb domain features inserted β-sheet2, and a long loop that engages in mutual interactions among RT monomers in the trimer but do not solely dictate trimer formation (Supplementary Fig. S7f, g).
Within the RT-Toprim fusion protein of Eco2, the C-terminal Toprim-like effector domain features a central four-stranded parallel β-sheet surrounded by α-helices (Fig. 5a, b). DALI analysis shows that the effector domain is structurally similar to class 1/2 OLD nuclease Toprim domains, as well as to effectors from the bacterial Gabija and PARIS antiphage defense systems (Supplementary Fig. S8). The OLD nuclease Toprim domain exhibits processive nuclease activity via a two-metal catalysis mechanism. Consistent with this model, the Eco2 Toprim domain possesses a complete active site comprising an invariant glutamate, an invariant glycine, and a conserved DxD motif (Fig. 5b). Structural and sequence comparison of the Eco2 Toprim domain with the class 2 OLD nuclease (PDB: 6NK8) reveals conservation of residues E374/D460/D462 (DxD motif) coordinating metal A, D378/S513/E515 coordinating metal B, and the catalytic lysine K562 (Supplementary Fig. S9). Alanine substitutions in these residues abolished the antiphage activity of Eco2, indicating that Toprim enzymatic activity is crucial for its defense function (Fig. 5c).
To investigate the enzymatic activity of the Eco2 Toprim domain, we first assessed its potential host toxicity using RT-Toprim fusion protein and truncated Toprim domain constructs. Expression of the Toprim domain truncations (residues N267‒S586, S300‒S586, and N337‒S586) in E. coli resulted in pronounced growth inhibition, while the full-length RT-Toprim fusion protein exhibited no toxicity in spot growth assays (Fig. 5d). These results suggest that Toprim enzymatic activity is detrimental to the host and may contribute to Eco2-mediated antiphage defense.
To directly evaluate its catalytic function, we purified the Toprim domain truncation (residues N267‒S586) and assessed its activity using the RNase Alert assay. The wild-type Toprim domain displayed robust RNase activity, as evidenced by a time-dependent increase in fluorescence (Fig. 5e). In contrast, alanine substitutions in the predicted catalytic residues (E374 and D378) dramatically reduced RNase activity (Fig. 5e), indicating that the observed cleavage is specific and dependent on an intact active site.
We next performed RNA-seq analysis to determine the functional consequences of Eco2 activation during phage infection. Compared to the empty-vector control, Eco2-expressing cells displayed a marked downregulation of late phage genes and increased reads for early and middle genes at 15 min and 30 min post-infection (Fig. 5f). As successful progression to late gene expression depends on effective translation of earlier gene products, these data support a model in which Eco2 activation disrupts phage protein synthesis, likely via RNA degradation mediated by the Toprim domain.
Collectively, these findings reveal that the Retron-Eco2 Toprim domain functions as an RNase, and that this activity underlies its toxicity and Retron-Eco2's antiphage function.
The Toprim domain extends from the RT thumb subdomain and helps stabilize the DNA‒RNA duplex (Supplementary Fig. S10a). We also used AlphaFold2 to predict the structure of the isolated RT-Toprim protein. The model revealed a "closed" conformation compared to the more open arrangement observed by cryo-EM (Supplementary Fig. S10b, c). Within the resolved region of the protein, basic residues K383, H387, K418, R555, and K557 in the Toprim domain interact with the phosphate backbone of the RNA segment of the duplex (Supplementary Fig. S10d). To elucidate the functional state of Eco2, we superimposed its Toprim domain on that of DNA-bound GajA, a known active configuration of the Gabija defense system. The key catalytic residues of Eco2 are aligned, yet we detect no Ca density at the active site (Fig. 5b; Supplementary Fig. S10e). Moreover, the density for much of the msdDNA is absent, which could prevent substrate engagement and thus impede Toprim catalysis (Fig. S10f, g). Together with the observed RNase activity and toxicity of the isolated Toprim domain (Fig. 5d, e), these findings support a model in which Eco2 is maintained in an autoinhibited conformation and becomes activated only upon phage infection, likely through interactions with phage-derived factors.
To identify potential phage-encoded triggers of the Retron-Eco2 system, we isolated and sequenced phage mutants that evade the Eco2-mediated defense (escaper phages). Genome sequencing of four escapers revealed mutations in the denB gene, all featuring a thymine (dT) insertion between denB dT192 and dG193. This insertion results in a frameshift and the introduction of a premature stop codon (Fig. 6a). AlphaFold2-based predictions indicate that denB encodes an endonuclease IV-like protein that cleaves single-stranded DNA in a dC-specific manner (Fig. 6b). Sequence and structural alignments revealed that the DenB protein possesses a conserved 'PD_D/ExK' catalytic motif (Fig. 6c, d). The toxicity of DenB protein prevented direct assays to confirm its activation role. Nonetheless, the recurring denB mutations in escapers strongly suggest that the denB gene is integral to triggering Eco2 activation and that mutations disrupting its expression enable phages to circumvent this defense mechanism.