Journal of General Virology (2004), 85, 1113–1124
DOI 10.1099/vir.0.19462-0
Conserved RNA secondary structures in Flaviviridae genomes Caroline Thurner,1 Christina Witwer,1 Ivo L. Hofacker1 and Peter F. Stadler1,2,3 1
Institut fu¨r Theoretische Chemie und Molekulare Strukturbiologie, Universita¨t Wien, Wa¨hringerstraße 17, A-1090 Wien, Austria
Correspondence Ivo L. Hofacker
2
[email protected] Bioinformatik, Institut fu¨r Informatik, Universita¨t Leipzig, Kreuzstraße 7b, D-04103 Leipzig, Germany
3
The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
Received 26 June 2003 Accepted 10 December 2003
Presented here is a comprehensive computational survey of evolutionarily conserved secondary structure motifs in the genomic RNAs of the family Flaviviridae. This virus family consists of the three genera Flavivirus, Pestivirus and Hepacivirus and the group of GB virus C/hepatitis G virus with a currently uncertain taxonomic classification. Based on the control of replication and translation, two subgroups were considered separately: the genus Flavivirus, with its type I cap structure at the 59 untranslated region (UTR) and a highly structured 39 UTR, and the remaining three groups, which exhibit translation control by means of an internal ribosomal entry site (IRES) in the 59 UTR and a much shorter less-structured 39 UTR. The main findings of this survey are strong hints for the possibility of genome cyclization in hepatitis C virus and GB virus C/hepatitis G virus in addition to the flaviviruses; a surprisingly large number of conserved RNA motifs in the coding regions; and a lower level of detailed structural conservation in the IRES and 39 UTR motifs than reported in the literature. An electronic atlas organizes the information on the more than 150 conserved, and therefore putatively functional, RNA secondary structure elements.
INTRODUCTION Viral RNA genomes not only code for proteins but in many instances also carry RNA motifs that play a crucial role in the viral life-cycle. Well-known examples are the internal ribosomal entry site (IRES), the RRE motif in human immunodeficiency virus, or the CRE hairpin in Picornaviridae. The detection of such functional motifs in the viral genome is a difficult task because almost all RNA molecules form secondary structures and the functional structures are not significantly different from the structures formed by random sequences (Fontana et al., 1993; Rivas & Eddy, 2000). RNA secondary structures have been shown to be very sensitive to mutations (Fontana et al., 1993; Schuster et al., 1994): mutations in about 10 % of the sequence positions already leads almost surely to unrelated structures if the mutated positions are chosen randomly. Secondary structure elements that are consistently present in a group of sequences with less than, say, 95 % mean pairwise identity are therefore most likely the result of stabilizing selection, not a consequence of the high degree of sequence homology. This fact can be exploited to design algorithms that reliably detect Supplementary figures are supplied in JGV Online.
0001-9462 G 2004 SGM
Printed in Great Britain
conserved RNA secondary structure elements in a small sample of related RNA sequences (Hofacker et al., 1998; Hofacker & Stadler, 1999). This method was recently applied quite successfully to a survey of the genomes of Picornaviridae (Witwer et al., 2001) and the RNA pre-genome of Hepadnaviridae (Stocsits et al., 1999). Here we report a comprehensive survey of members of the family Flaviviridae, which possess a single-stranded positivesense RNA (ss+RNA) genome. The family is subdivided into the three genera Flavivirus, Pestivirus and Hepacivirus and the group of GB virus C/hepatitis G viruses (GBV-C) with a currently uncertain taxonomic classification (van Regenmortel et al., 2000). The RNA genome, which has a size of 9?6–12?3 kb, is characterized by a similar organization (Fig. 1) in all genera and acts as the only mRNA found in infected cells. It contains one single long open reading frame flanked by 59 and 39 untranslated regions (UTR). These are known to form into specific secondary structures required for genome replication and translation. Viral proteins are synthesized as one single polyprotein, which is co- and post-translationally cleaved by viral and cellular proteinases. Based on the control of replication and translation it is useful to consider two subgroups of Flaviviridae. The first 1113
C. Thurner and others
Fig. 1. Genome map for the Flavivirus species dengue virus, Japanese encephalitis virus, yellow fever virus and tick-borne encephalitis virus and Pestivirus species hepatitis C virus, GB virus C/hepatitis G virus and non-cytopathic virus. Putative conserved secondary structures are indicated by the boxes above the RNA sequence.
group is formed by the genus Flavivirus and is characterized by a type I cap structure at the 59 UTR (Brinton & Dispoto, 1988) and a highly structured 39 UTR. In this group there is evidence that the 59 and 39 ends stack together to cause a cyclization of the genome (sometimes referred to as a ‘panhandle structure’) that might be an important feature for RNA replication (Hahn et al., 1987; Khromykh et al., 2001). The second group, consisting of Hepacivirus (hepatitis C virus; HCV), Pestivirus (PESTI) and GBV-C, controls translation by means of an IRES in the 59 UTR and has a short, less-structured 39 UTR. Pestivirus and Hepacivirus have very similar IRES regions (Pestova et al., 1998); the IRES of GBV-C is 50 % longer and structurally quite different (Simons et al., 1996). Therefore we treat these two groups separately. While the 59 and 39 UTRs of Flaviviridae have been the object of several studies, very little is known about the secondary structures of the coding regions despite some evidence that the coding region might also contain functional RNA motifs (Simmonds & Smith, 1999; Tuplin et al., 2002).
METHODS Sequence data were obtained from the NCBI genome database. The phylogenetic distribution and the pairwise sequence similarity of the available data were such that the following groups of Flaviviridae 1114
could be investigated in detail: the genera Pestivirus and Hepacivirus, the unclassified group GBV-C and some species of the genus Flavivirus. These are dengue virus (DEN), Japanese encephalitis virus (JEV), yellow fever viruses (YFV) and tick-borne encephalitis virus (TBE). Statistical information on these sequence data is compiled in Table 1. The genus Pestivirus can be subdivided into two groups, the cytopathic pestiviruses, which cause cell shrinkage, membrane blebbing and cell death, and the non-cytopathic ones. Cytopathic viruses develop from non-cytopathic viruses by RNA recombination, resulting in genome duplicates, rearrangements, deletions and insertions (Myers & Thiel, 1996). Characteristically, they have at least one additional copy of the NS3 protein, isolated from NS2/NS3 through ubiquitin (Ub) or cIns insertions (Myers & Thiel, 1996; Tautz et al., 1999). In this study we only use full-length genomes of non-cytopathic pestiviruses, because insertions of Ub and cIns cause extended gaps in the multiple alignments that interfere with the analysis. On the other hand, a detailed study of cytopathic viruses and particularly the effects of the extended insertions on the secondary structure of pestiviruses was not possible because there were too few sequences available in the public databases. Multiple sequence alignments were calculated using CLUSTAL W (Thompson et al., 1994). All sequence positions reported here refer to the multiple sequence alignments that are available as part of the supplemental material. RNA genomes were folded in their entirety using McCaskill’s partition function algorithm (McCaskill, 1990) as implemented in the VIENNA RNA package (Hofacker et al., 1994), based on the energy parameters published in Mathews et al. (1999). The result of this computation is a matrix of base pairing probabilities for each potential base pair (i, j) of the genomic RNA. Journal of General Virology 85
Conserved RNA secondary structures in Flaviviridae genomes
Table 1. Number of analysed sequences n, length of our alignments, length of 59 UTR, IRES (if present), coding region and 39 UTR with the respective mean pairwise sequence identities s Group
n
Length
s
5§ UTR
IRES
s
Coding reg.
3§ UTR
s
99..10288 96..10394 122..10351 133..10374
10289..10775 10395..10979 10352..10863 10375..11143
85?4 95?6 91?7 69?8
9086..9397 9400..9679 12115..12393
96?7 85?1 62?0
Group 1: Flavivirus DEN JEV YFV TBE
16 17 7 6
10775 10979 10863 11143
80?4 95?5 96?4 86?5
1..98 1..95 1..121 1..132
87?4 98?9 99?8 91?2
Group 2: Pestivirus, Hepacivirus and GB virus C/hepatitis G Virus GBV-C HCV PESTI
10 9 11
9397 9679 12393
89?8 87?1 74?9
1..556 1..342 1..388
The ALIDOT algorithm (Hofacker et al., 1998; Hofacker & Stadler, 1999) was used to search the base pairing probability matrix for conserved secondary structure patterns. This method requires an independent prediction of the secondary structure for each of the sequences and a multiple sequence alignment that is obtained without any reference to the predicted secondary structures. The algorithm ranks base pairs using both the thermodynamic information contained in the base pairing probability matrix and the information on compensatory, consistent (e.g. GCRGU) and inconsistent mutations contained in the multiple sequence alignment. The approach is different from efforts to simultaneously compute alignment and secondary structures (Gorodkin et al., 1997; Sankoff, 1985) and from programs such as ¨ ck et al., 1996, 1999) and ALIFOLD (Hofacker et al., CONSTRUCT (Lu 2002) because it does not assume that the sequences have a single common structure. An implementation of this algorithm is available from http://www.tbi.univie.ac.at/RNA/. The ALIFOLD algorithm (Hofacker et al., 2002) is used to obtain consensus structures of regions with significant structural conservation. Computational results are shown as Hogeweg-style mountain plots (Hogeweg & Hesper, 1984) with colour codes indicating sequence covariations.
RESULTS In the survey reported here we found many putative structural elements, as indicated in Fig. 1. A complete description of each one of them cannot be displayed in print because of space constraints, but see Fig. 6 for selected examples. The complete material including positional information, sequences, accession numbers, multiple sequence alignments, structure predictions, structure drawings and information on the sequence covariation are available as supplemental material in electronic form in our Viral RNA Structure Database at http://rna.tbi.univie.ac.at/virus/. This web site can also be used to retrieve the computational results for regions that we have not identified as structurally conserved. Unless noted otherwise, the names used to denote individual conserved helices follow the scheme used on the web site. Genus Flavivirus The genus Flavivirus is a widespread genus containing very diverse species. Since at least six sufficiently diverged genomic sequences of each species are necessary for our analysis http://vir.sgmjournals.org
94?2 98?3 80?7
45-556 43-354 65-388
557..9085 343..9399 389..12114
method, we focused on the species DEN, JEV, YFV and TBE. In Fig. 2 we show an overview of the conserved secondary structure elements of the 59 and 39 ends and the cyclization domains called P19, P1, P2 and CS or CS ‘‘A’’, respectively, which we found by applying our algorithms. Genome cyclization. Hahn et al. (1987) found complementary sequences [cyclization sequences (CS)] close to the 59 and the 39 end of the genome and concluded that the two ends of the genome of Flavivirus stick together in a panhandle-like structure. Recently, it has been shown that RNA synthesis in vitro requires both 59 and 39 ends present, either connected in the same RNA sequence, or added in trans (You & Padmanabhan, 1999). Another piece of evidence for the cyclization of the genomic RNA is the finding that the first stem in 59 UTR and the last stem in 39 UTR together with the CS are necessary and sufficient for virus translation and replication (Khromykh et al., 2001).
The mean pairwise sequence identity of all four species of the genus Flavivirus (less than 50 %) was too low to yield good alignments. The species TBE differs most from the other species in both sequence and structure. From an alignment of the remaining species (DEN, JEV and YFV) we obtained a common structure for the CS (Fig. 2), which supports the prediction of Hahn et al. (1987). We then compared only DEN and JEV. In our data the CS contained no sequence variation but was predicted with pair probabilities close to one. Adjacent to the CS we found a further stem which participates in genome cyclization and which contains several sites of sequence variation (P20 in supplemental material A). Between CS and P20 there is a well-conserved hairpin structure supported by numerous compensatory mutations (DV2/JE2 in supplemental material A). TBE. The conserved cyclization motifs first reported by
Hahn et al. (1987) for mosquito-borne viruses are absent in the TBE group. Putative CSs were proposed for Powassan virus RNA (Mandl et al., 1993), for TBE and cell fusing agent (Khromykh et al., 2001). In all proposed motifs for genome cyclization we did not find any mutations in the sequences; thus we could not use our method 1115
C. Thurner and others
(a)
(b)
'
' (c)
'
' (d)
'
'
'
to confirm the predicted structure by means of sequence covariation. Thermodynamic folding, however, provided strong evidence for the CS ‘‘A’’-motif (Fig. 2) (Khromykh et al., 2001; Mandl et al., 1993) because these base pairs appeared with probabilities close to one in the folds of the complete genome. Khromykh’s region CS‘‘B’’ was folded only by one single sequence (TEU27491) and thus could not be considered as a common motif for all members of TBE.
'
Fig. 2. The minimum free energy structure of one sequence of the respective virus species is represented. Coloured backgrounds mark regions that our folding algorithm and selection criteria allowed for all sequences of the respective species. The same colour is used for equivalent structures in different species, grey motifs are conserved only within a single species. The nomenclature of the structures corresponds to the web site atlas of structures, see http://rna.tbi.univie.ac.at Conserved secondary structures where 59 and 39 UTRs are involved in genome cyclization are called P19, P1, and P2; CS and CS ‘‘A’’ are taken from Hahn et al. (1987) and Khromykh et al. (2001), respectively. Distances along the x-axis are not to scale; the exact positions of the structure elements are given in supplemental material B.
A stem carrying the initiator AUG proposed by Hahn et al. (1987) for YFV was found to fold in all sequences but is not supported by sequence covariation. The 59 UTR structure proposed by Khromykh et al. (2001) for TBE was inconsistent with the available sequence data. We found a different structure that was confirmed by several mutations, both consistent and compensatory (TB1 in Fig. 3). Coding region. Several conserved secondary structures were
5§ UTR. The 59 UTRs of DEN, JEV and YFV form into a very
found in the coding regions of DEN, JEV, YFV and TBE. These structures are available on the web site. So far, no functions have been proposed for these regions. Hahn et al. (1987) already proposed the stem–loop DV2 for Den-2 virus.
similar secondary structure, while the structure for TBE is significantly different (Fig. 3); see DV1, JE1, YF1 and TB1, respectively. For DEN we found structural conservation, while the sequences of JEV and YFV were highly conserved. A manually improved alignment of this region for JEV and YFV to DEN showed that there was significant structure conservation among all three genera. Furthermore, all structures contained an interior loop of three Us (one on the 59and two on the 39-strand; DV1, JE1 and YF1 in Fig. 3).
3§ UTR. Conserved structures in the 39 UTR are shown in
For DEN, a structure similar to DV1 was proposed by Leitmeyer et al. (1999) and Khromykh et al. (2001). A stem–loop structure from positions 80–105 reported in Leitmeyer et al. (1999) is predicted thermodynamically, but has a conserved sequence and thus is not supported by sequence covariation.
Structures similar to DV6, JE7, YF27 or TB19 were also proposed by Hahn et al. (1987) for DEN 2 and YFV, by Khromykh et al. (2001) for DEN, YFV, JEV and TBE, by Rauscher et al. (1997) (B for DEN, YFV, and JEV and I, II and III for TBE), by Proutski et al. (1999) (TL1/RCS2 or
1116
Fig. 3: the structures show strong similarity between species. Sequence variation in the stem DV6a was high in DEN and present in JEV. For YFV, we found a stem corresponding to DV6a and to JE7a.
Journal of General Virology 85
Conserved RNA secondary structures in Flaviviridae genomes
Fig. 3. Conserved secondary structures of the Flavivirus species DEN, JEV, YFV and TBE in the 59 UTR (first column) and 39 UTR (second and third column). Mountain plots (Hogeweg & Hesper, 1984) faithfully represent secondary structures: each base pair (i, j ) is represented by a slab ranging from position i to j; its height is proportional to the base pairing probability in thermodynamic equilibrium, computed with McCaskill’s algorithm. Colours indicate the number of different types of base pairs (red 1, ochre 2, green 3, turquoise 4, blue 5, violet 6). Saturated colour indicates that all sequences can form the base pair, while two levels of pale colour mean that 1 or 2 input sequences have non-pairing bases at positions i and j. If there are more than 2 non-compatible sequences, the pair is not displayed. In the conventional drawings consistent and compensatory mutations are indicated by circles around bases that have mutations. Grey letters indicate inconsistent mutations.
TL2/CS2 for DEN and JEV, and ‘‘stem–loop 1 in subregion I’’ for YFV), and by Leitmeyer et al. (1999) for DEN. DEN. For the DEN 39 UTR, we found the same structures as Rauscher et al. (1997) where the analysis was restricted to the isolated 39 UTR. None of the long-range interactions interfered with any of these structural motifs. Leitmeyer et al. (1999) propose additional base pairings that we could not find because they conflicted with the cyclization domains.
We only found parts of the secondary structures proposed by Proutski et al. (1997) for DEN2 as conserved for all DEN species. In particular, we did not find structures I2 and I3, II1 and III except region 39 LSH (our DV7). DV6 and DV7 are also discussed by Proutski et al. (1999) for DEN4. All other structures that are reported in that study are disrupted by the cyclization of the viral genome. Assuming that cyclization of the genome is vital, we can reinterpret the deletion studies reported by Men et al. (1996) in the following way: the deletion of DV6a (TL2) yields a delayed and reduced growth in simian and mosquito cells. When the deletions were extended more, to the 39 end of the sequence, the CS region was destroyed [mutant 39 172–83 of Men et al. (1996)] and hence no viable viruses were found. A non-viable mutant 39d 172–107 may be explained by the importance of the sequence motif CAAAAA for virus propagation (Men http://vir.sgmjournals.org
et al., 1996). Our data indicate that, in this case, the sequence motif is important rather than any structure associated with it. For the mutants 39d 333–183 and 39d 384–183, Men et al. (1996) measure a greatly delayed and reduced growth in living cells. We would argue that these deletions destroy a possible prolongation of the cyclization region that we found for dengue viruses (data not shown). Our data indicate that each single sequence allows additional stems for cyclization in this region even though their exact positions vary slightly among the different sequences. It is plausible that such an extended cyclization region adds to the efficiency of virus replication but is not necessarily essential for its viability. YFV and JEV. The sequences in our dataset had about s=91?7 % pairwise identity in the 39 UTR. We observed
only a small number of compensatory mutations to verify structural features predicted based on our thermodynamic algorithm. We essentially found the same structures as Rauscher et al. (1997); again none of the structures reported by Rauscher et al. (1997) conflicted with CS regions. YF28 was shorter by 9 bp than reported by Hahn et al. (1987). YF28 and YF27 corresponded to 39 LSH and I1, respectively, JE7 and JE8 to 39 LSH and II2, respectively, as proposed by Proutski et al. (1997). More structures could not be found for similar reasons as explained for DEN above. 1117
C. Thurner and others
TBE. We recovered structures very similar to those reported by Mandl et al. (1998) and Rauscher et al. (1997). In particular, TB17 and TB18 correspond to IV and VI of Mandl et al. (1998) and Rauscher et al. (1997), respectively, TB19 contains stem III of Mandl et al. (1998) and Rauscher et al. (1997) and TB16 corresponds to VII, VIII and IX of Mandl et al. (1998) and Rauscher et al. (1997). Structure A1 reported by Mandl et al. (1998) was shorter because of conflicts with cyclization sequences P19 and CS ‘‘A’’. Structure A2 (Mandl et al., 1998) did not seem to be conserved. For structures MS and V of Mandl et al. (1998) we had evidence from thermodynamic folding. However, these two structures conflict with P2. TB16 to TB21 conform with structures proposed by Proutski et al. (1997).
highly conserved than the rest of the genome (Table. 1). Most of the sequence variation occurred around nucleotide (nt) positions 410–437, which comprised the structural element HG6 (IVb) (Fig. 5a). This motif was also predicted in previous studies (Simons et al., 1996; Smith et al., 1997). We found a stem, HG2 (Fig. 4a), that was shorter and more shifted to the 59 end of the IRES than stem–loop II reported by Simons et al. (1996). Our prediction was supported by compensatory mutations (data not shown). The reason for the discrepancy was the formation of a panhandle-like structure by means of a base pairing interaction from nt 163–175 with nt 9213–9201 (discussed later).
5§ UTR. The 59 UTRs of these virus groups contain an
The sequences were too conserved in the remainder of the 59 UTR to support predicted structures by means of sequence variation. The thermodynamic prediction, however, found structures similar to those previously proposed (Katayama et al., 1998; Simons et al., 1996; Smith et al., 1997).
IRES. For parts of the HCV IRES even tertiary structure studies are available (Kieft et al., 2002; Lukavsky et al., 2001). The sequences of 59 UTRs of GBV-C and HCV are significantly more conserved than the rest of their respective genomes (Table. 1). For these two virus groups we found that the secondary structure of the 59 UTR is less conserved than we expected (due to the few sequence covariations); an overview is given in Fig. 4. This was consistent with the data reported by Witwer et al. (2001) for Picornaviridae. In contrast, the IRES of PESTI turned out to be highly conserved.
HCV. The 59 UTR of HCV comprises 341–342 nt. The RNA fold algorithm recovered structures similar to those reported in previous studies (Collier et al., 2002; Honda et al., 1996b; Kalliampakou et al., 2002; Kieft et al., 2001; Kolupaeva et al., 2000a; Odreman-Macchioli et al., 2000; Psaridi et al., 1999; Spahn et al., 2001; Tang et al., 1999; Fig. 4b). Our algorithm was not designed to predict pseudoknots. However, we made sure that nucleic acids that are known to be involved in pseudoknots (Pestova et al., 1998) do not pair to other parts of the sequence.
Pestivirus, Hepacivirus and GBV-C
The IRES structures of HCV and PESTI shared a common overall structure despite the fact that they were not comparable at the sequence level. Nevertheless, they shared a few significant details: the IIIa stem carried a completely conserved loop sequence and stem IIIc was conserved in its sequence. GBV-C. The 59 UTR sequences of GBV-C were more
(a)
Due to high sequence conservation (Table 1) we found only two sites with compensatory mutations in HC3 (called IIIa, b and c by Honda et al., 1996b) in our dataset of nine complete genomic sequences. When additional sequences of the IRES region were included in the analysis, the structure was well supported by compensatory mutations (data not shown). This structure, HC3, has received considerable attention since it appears to act as a binding site for the
(b)
5'
3'
GBV_C 1_556
(c)
5'
3'
HCV 1_342
5'
3'
PESTI 1_388
Fig. 4. Schematic illustration of 59 UTRs of GBV-C, HCV and PESTI. Conserved structures are discussed in the text. Notations in parentheses correspond in (a) to Simons et al. (1996), (b) to Honda et al. (1996a) and (c) to Brown et al. (1992). 1118
Journal of General Virology 85
Conserved RNA secondary structures in Flaviviridae genomes
Fig. 5. (a) GBV-C: 59 UTR nt: 410–437, IRES conserved element HG6(IVb). (b) Pestivirus 59 UTR nt:1–420; the IRES is supposed to begin with stem PV2(II).
eIF3–40S complex. It has an internal loop, which is twisted in itself (Collier et al., 2002). Even though we found a mean identity of 98?3 % in this region, there were two compensatory mutations just before and after this highly structured part of the HCV IRES. This confirms Collier’s interpretation that the shape of the backbone rather than the sequence composition is important for translation initiation. We found a stem, HC2, which corresponds to IIa proposed by Honda et al. (1996a). For the nucleotides following stem IIa, the prediction favoured long-range interactions with nt 8571–8552 (NS5B); see HCVCS2 (discussed later). When the isolated IRES region (i.e. nt 44–357) was folded separately, stems IIa and IIb were recovered as proposed by Honda et al. (1996a). Pestivirus. As with HCV and GBV-C the sequence of the
59 UTR region was more conserved than the rest of the genome (Table 1) but we still found a considerable amount of consistent and compensatory mutations. Stem PV1 was proposed as Ia by Brown et al. (1992) and as domain A by Deng & Brock (1993) (Fig. 5b). Fletcher & Jackson (2002) observed that a deletion of nucleotides comprising stem PV2 (II in Brown et al., 1992; domain C in Deng & Brock, 1993) decreased the activity of IRES to 19 %. Though the pair probabilities in stem PV2 were small (Fig. 5b), we found no inconsistencies and a considerable amount of compensatory mutations. This might point out the importance of the structure rather than the sequence to IRES function in this region. As in previous studies (Deng & Brock, 1993; Fletcher & Jackson, 2002; Kolupaeva et al., 2000b; Moser et al., 2001) our method detected stem PV3 as an important feature of Pestivirus IRES structure. Even though our algorithm does not allow pseudoknots, both stems of the pseudoknot reported by Pestova et al. (1998) show up in the base pairing probabilities. http://vir.sgmjournals.org
Coding region GBV-C. We found two significantly conserved stems (HG9
and HG10) in the E1 region, which were previously proposed by Simmonds & Smith (1999) based on a different algorithm (data presented in the supplemental material). Conserved secondary structures seemed to be concentrated in the NS5A and NS5B region of the GBV-C genome (Fig. 6). Some of these had already been proposed by Cuceanu et al. (2001) (Fig. 6c, e). Furthermore, HG38 corresponds to SLV and HG39 to SLIV. In our data the SLI motif is completely conserved in the sequence. In SLVI we found more inconsistent mutations than compensatory mutations (data not shown) and the proposed SLVII structure could not be found with our method. HCV. Again, we found most of the conserved structures in
the NS5A and NS5B regions. Some of these have been previously reported as important for the efficiency of the IRES function (Tuplin et al., 2002; Zhao & Wimmer, 2001). One of the motifs detected by Tuplin et al. (2002) is HC4, shown in Fig. 6(d). Tuplin et al. (2002) further found HC6 as SL443, HC27 as SL8828 and HC28 as SL9011. According to our data there was no evidence for the existence of SL7730 and SL9118. SL8926 showed too many inconsistencies in our data and SL8376 was not folded because of interactions of this region with the 39 UTR (discussed later). Ray et al. (1999) argued that HCV persistence is associated with sequence variability in putative envelope genes E1 and E2. We found a conserved RNA structure, HC7, in the E1 region (Fig. 6f). Pestivirus. All putative conserved secondary structural ele-
ments in the coding region of PESTI were very short. A stem–loop downstream of the initiator AUG appears in our data to have too many inconsistencies and thus cannot be considered as a conserved feature of PESTI, in agreement with the analysis of Myers et al. (2001). The 1119
C. Thurner and others
Fig. 6. Examples of conserved secondary structures in the coding region of GBV-C, HCV and PESTI. (a)–(c) and (e), Conserved structures in GBV-C coding region; (c) and (e) were already proposed by Cuceanu et al. (2001) (SLII and SLIII, respectively). (d) and (f), Examples from HCV, (d) was first proposed by Tuplin et al. (2002). (g) and (h), Proposed conserved structures in PESTI coding region.
most prominent stems found in the coding region are shown in Fig. 6(g) and (h). 3§ UTR GBV-C. The 39 UTR sequences of GBV-C are highly conserved (s=96?7 %). Not surprisingly, we predicted structures similar to those previously reported (Katayama et al., 1998; Okamoto et al., 1997; Xiang et al., 2000) but not all of them were supported by sequence covariation (data not shown). Some of the previously proposed structures conflict with long-range interactions to the 59 UTR predicted by our method (discussed later). One example well supported by sequence covariation is the structure HG43 that was also proposed by Cuceanu et al. (2001) and Xiang et al. (2000). HCV. The 39 UTR consists of a short sequence of variable length and composition (variable region), a U-rich stretch 1120
(poly-U-UC region) variable in its length and a highly conserved sequence of approximately 100 nt at the 39 end (conserved region, X-tail) (Kolykhalov et al., 1996; Tanaka et al., 1996; Yamada et al., 1996). Within this X-tail we found only a single mutation, which is compatible with the predicted structure. Our stem HC29 corresponds to SL1 as previously reported (Blight & Rice, 1997; Ito & Lai, 1997; Yamada et al., 1996). Stems SL2 and SL3, as proposed by Blight & Rice (1997) and Ito & Lai (1997), compete in our data with the formation of two long-range interactions, LR1 and LR2. The probability of base pairs in LR1 was around P=0?54, significantly higher than HC29 (SL1). The elements SL2 and SL3 were thermodynamically unfavourable in the genomic context and could only be detected when a sequence window was used that was too small to contain the long-range interactions. More recently, Yi & Lemon (2003) introduced several point mutations in the X-tail of the 39 UTR of HCV. Their results could not provide proof for the existence of Journal of General Virology 85
Conserved RNA secondary structures in Flaviviridae genomes
SL2 or SL3 but indicated that there are stringent requirements for the sequence in this region. Pestivirus. Pestiviruses are very heterogeneous in their 39
UTR region, due to extended AU-rich insertions in some strains. The only RNA feature that was shared among all available sequences is the terminal stem PV15 that was originally described by Deng & Brock (1993) and also by Becher et al. (1998) and Yu et al. (1999). Genome cyclization. Surprisingly, we discovered strong
evidence for genome cyclization not only in the genus Flavivirus, where this effect has already been described in the literature, but also within HCV and GBV-C. The most prominent of them are shown in Fig. 7. In the GBV-C genome, cyclization is localized to basepairings between nt 33–48 with nt 9367–9353 (HGCS1: pair probabilities < 0?6), nt 128–140 with nt 9224–9214 (HGVCS2) and nt 163–175 with nt 9213–9201 (HGVCS3) (both with pair probabilities < 0?7) (Fig. 7). These domains are very conserved in the sequence. We found only one consistent mutation at base pair (42,9357). On the other hand there was one sequence carrying an inconsistent mutation at base pair (130,9222). In HCV, putative cyclization domains comprised base pairs of nt 1–3 with nt 8627–8625 (HCVCS1), 88–92 with 8602– 8606 (HCVCS2) and 95–110 with 8556–8571 (HCVCS3). Within HCVCS3 we found two sites of compensatory mutations (Fig. 7). In HCV, nucleotides from the IRES region (nt 1–3, 88–92 and 95–110) are paired with nucleotides within the coding region for the protein NS5B. At the same time we observed two regions of the 39 UTR to fold forward to the NS5B region as well: (i) LR1: nt 8628–8661 (NS5B) paired with nt 9599–9633 (39 UTR) and (ii) LR2: nt 8978–8995 (NS5B) paired with nt 9583–9598 (39 UTR). This brought the 59 and 39 regions into very close proximity, as illustrated in supplemental material C. Sequence position 8627 is involved in the interaction with the IRES; the adjacent nt 8627 pairs with the 39 UTR.
Fig. 7. Putative cyclization regions HCVCS3 and HGVCS3 in the genomes of HCV and GBV-C, respectively. The boxed areas point out sequences that might be read as palindrome sequences and may play a functional role in replication processes. http://vir.sgmjournals.org
All of the mutations (15 point mutations and 6 double mutations) studied in Yi & Lemon (2003) exhibit reduced or no replication activity. Most of them would disrupt base pairs in either LR1 or LR2, supporting our proposed interactions. However, five of the point mutations are in predicted loop regions and would be expected to cause only minor secondary structural changes. This could indicate that there are sequence constraints beyond conservation of secondary structure. However, to prove or disprove the existence of LR1 and LR2, more mutation experiments would be needed.
DISCUSSION We have employed a combination of structure prediction based on thermodynamic rules and the evaluation of consistent and compensatory mutations to search Flaviviridae genomes for functional RNA structure motifs. While the UTRs of some of these viruses have been previously studied, this contribution reports a comprehensive survey of structural features across the full genomes of the whole family Flaviviridae. Furthermore, instead of using a ‘sliding window’ technique, all predictions were carried out for the complete genomic RNA sequences. This enables our algorithm to find long-range interactions; in particular we found significant probability for cyclization in all genera except Pestivirus. In the genus Flavivirus a cyclization of the genome had already been described in the literature and localized to very conserved cyclization sequences. Apart from recovering these known cyclization sequences, we detected further sequences which took part in cyclization for all species in this study (P19, P1 and P2). These sequences varied considerably in sequence, length and position. Men et al. (1996) showed that deleting these sequences led to a greatly delayed and reduced growth in simian and mosquito cells. It is possible that these additional cyclization domains are not strictly necessary for virus viability, but only support and stabilize viral genome cyclization. Most surprisingly, we also found viral genome cyclization in GBV-C and HCV, which had not been reported before, although Yi & Lemon (2003) suppose a cyclization of HCV genome by the assistance of some cellular protein. Our algorithm made out base pair probabilities for both previously reported secondary structures in 59 and 39 UTRs as well as for genome cyclization. For both cases, our data revealed no inconsistencies. Thus known structures compete with genome cyclization. Our evaluation conditions favoured genome cyclization based both on thermodynamic prediction, in the case of HCV, and sequence covariation. This result can be interpreted either as a relict of ancient ancestors between these genera and the genus Flavivirus or, more speculatively, as a switch providing different functions in different states of the viral life-cycle (e.g. a switch between replication and translation states of the virus). While in Flavivirus and GBV-C the 59 and 39 ends pair within the untranslated regions, we found base pairing in HCV 1121
C. Thurner and others
between the 59 and the 39 ends to a region some 1000 nt upstream of the 39 end (i.e. a region within the NS5B protein). More interestingly, we observed that, in this way, 59 and 39 ends were brought closely together. This could be a reason for the particular importance of the NS5B region as assumed in the literature (Oh et al., 1999, 2000). It may also explain the results of Friebe et al. (2001) and Kim et al. (2002), who observed that domains HC1(I) and HC2(II) in the 59 UTR are essential for replication, while domain HC3(III) helps to facilitate replication but is not absolutely required. Furthermore, in this report (and in the supplementary material available online), we present a large number of secondary structure elements that have not been described before, most importantly within the coding region. This information could be used to identify additional regions that might be important for virus viability and propagation, and thus to gain more insight into the life-cycle of the members of the family Flaviviridae.
ACKNOWLEDGEMENTS
Friebe, P., Lohmann, V., Krieger, N. & Bartenschlager, R. (2001).
Sequences in the 59 nontranslated region of hepatitis C virus required for RNA replication. J Virol 75, 12047–12057. Gorodkin, J., Heyer, L. J. & Stormo, G. D. (1997). Finding common sequences and structure motifs in a set of RNA molecules. In Proceedings of the ISMB-97, pp. 120–123. Edited by T. Gaasterland, P. Karp, K. Karplus, C. Ouzounis, C. Sander & A. Valencia. Menlo Park, CA: AAAI Press. Hahn, C. S., Hahn, Y. S., Rice, C. M., Lee, E., Dalgarno, L., Strauss, E. G. & Strauss, J. H. (1987). Conserved elements in the 39 untrans-
lated region of flavivirus RNAs and potential cyclization sequences. J Mol Biol 198, 33–41. Hofacker, I. L. & Stadler, P. F. (1999). Automatic detection of con-
served base pairing patterns in RNA virus genomes. Comput Chem 23, 401–414. Hofacker, I. L., Fontana, W., Stadler, P. F., Bonhoeffer, S., Tacker, M. & Schuster, P. (1994). Fast folding and comparison of RNA
secondary structures. Monatsh Chem 125, 167–188. Hofacker, I. L., Fekete, M., Flamm, C., Huynen, M. A., Rauscher, S., Stolorz, P. E. & Stadler, P. F. (1998). Automatic detection of
conserved RNA structure elements in complete RNA virus genomes. Nucleic Acids Res 26, 3825–3836. Hofacker, I. L., Fekete, M. & Stadler, P. F. (2002). Secondary structure prediction for aligned RNA sequences. J Mol Biol 319, 1059–1066. Hogeweg, P. & Hesper, B. (1984). Energy directed folding of RNA
C. T. and C. W. are supported by the Austrian Fonds zur Fo¨rderung der Wissenschaftlichen Forschung, project no. P-13545-MAT. The work of P. F. S. is supported in part by the DFG Bioinformatics Initiative, BIZ-6/1-2. We would like to thank the anonymous reviewers for their helpful comments.
REFERENCES Becher, P., Orlich, M. & Thiel, H. J. (1998). Complete genomic
sequence of border disease virus, a pestivirus from sheep. J Virol 72, 5165–5173.
sequences. Nucleic Acids Res 12, 67–74. Honda, M., Brown, E. A. & Lemon, S. M. (1996a). Stability of a stem–
loop involving the initiator AUG controls the efficiency of internal initiation of translation on hepatitis C virus RNA. RNA 2, 955–968. Honda, M., Ping, L. H., Rijnbrand, R. C., Amphlett, E., Clarke, B., Rowlands, D. & Lemon, S. M. (1996b). Structural requirements for
initiation of translation by internal ribosome entry within genomelength hepatitis C virus RNA. Virology 222, 31–42. Ito, T. & Lai, M. M. C. (1997). Determination of the secondary structure of and cellular protein binding to the 39-untranslated region of the hepatitis C virus RNA genome. J Virol 71, 8698–8706.
Blight, K. J. & Rice, C. M. (1997). Secondary structure determination of the conserved 98-base sequence at the 39 terminus of hepatitis C virus genome RNA. J Virol 71, 7345–7352.
Kalliampakou, K. I., Psaridi-Linardaki, L. & Mavromara, P. (2002).
Brinton, M. A. & Dispoto, J. H. (1988). Sequence and secondary
Katayama, K., Kageyama, T., Fukushi, S., Hoshino, F. B., Kurihara, C., Ishiyama, N., Okamura, H. & Oya, A. (1998). Full-
structure analysis of the 59-terminal region of flavivirus genome RNA. Virology 162, 290–299. Brown, E. A., Zhang, H., Ping, L. H. & Lemon, S. M. (1992).
Secondary structure of the 59 nontranslated regions of hepatitis C virus and pestivirus genomic RNAs. Nucleic Acids Res 20, 5041–5045. Collier, A. J., Gallego, J., Klinck, R., Cole, P. T., Harris, S. J., Harrison, G. P., Aboul-Ela, F., Varani, G. & Walker, S. (2002). A conserved
RNA structure within the HCV IRES eIF3-binding site. Nat Struct Biol 9, 375–380. Cuceanu, N. M., Tuplin, A. & Simmonds, P. (2001). Evolutionarily
conserved RNA secondary structures in coding and non-coding sequences at the 39 end of the hepatitis G virus/GB-virus C genome. J Gen Virol 82, 713–722. Deng, R. & Brock, K. V. (1993). 59 and 39 untranslated regions of
pestivirus genome: primary and secondary structure analyses. Nucleic Acids Res 21, 1949–1957. Fletcher, S. P. & Jackson, R. J. (2002). Pestivirus internal ribosome
entry site (IRES) structure and function: elements in the 59 untranslated region important for IRES function. J Virol 76, 5024–5033. Fontana, W., Konings, D. A. M., Stadler, P. F. & Schuster, P. (1993).
Statistics of RNA secondary structures. Biopolymers 33, 1389–1404. 1122
Mutational analysis of the apical region of domain II of the HCV IRES. FEBS Lett 511, 79–84.
length GBV-C/HGV genomes from nine Japanese isolates: characterization by comparative analysis. Arch Virol 143, 1–13. Khromykh, A. A., Meka, H., Guyatt, K. J. & Westaway, E. G. (2001).
Essential role of cyclization sequences in flavivirus RNA replication. J Virol 75, 6719–6728. Kieft, J. S., Zhou, K., Jubin, R. & Doudna, J. A. (2001). Mechanism of
ribosome recruitment by hepatitis C IRES RNA. RNA 7, 194–206. Kieft, J. S., Zhou, K., Grech, A., Jubin, R. & Doudna, A. (2002).
Crystal structure of an RNA tertiary domain essential to HCV IRESmediated translation initiation. Nat Struct Biol 9, 370–374. Kim, Y. K., Kim, C. S., Lee, S. H. & Jang, S. K. (2002). Domains I and
II in the 59 nontranslated region of the HCV genome are required for RNA replication. Biochem Biophys Res Commun 290, 105–112. Kolupaeva, V. G., Pestova, T. V. & Hellen, C. U. (2000a). An
enzymatic footprinting analysis of the interaction of 40S ribosomal subunits with the internal ribosomal entry site of hepatitis C virus. J Virol 74, 6242–6250. Kolupaeva, V. G., Pestova, T. V. & Hellen, C. U. (2000b). Ribosomal
binding to the internal ribosomal entry site of classical swine fever virus. RNA 6, 1791–1807. Journal of General Virology 85
Conserved RNA secondary structures in Flaviviridae genomes Kolykhalov, A. A., Feinstone, S. & Rice, C. M. (1996). Identification
of a highly conserved sequence element at the 39 terminus of hepatitis C virus genome RNA. J Virol 70, 3363–3371.
initiation of hepatitis C and classical swine fever virus RNAs. Genes Dev 12, 67–83. Proutski, V., Gould, E. A. & Holmes, E. C. (1997). Secondary
Leitmeyer, K. C., Vaughn, D. W., Watts, D. M., Salas, R., Villalobos, I., de Chacon, I. V., Ramos, C. & Rico-Hesse, R. (1999). Dengue virus
structure of the 39 untranslated region of flaviviruses: similarities and differences. Nucleic Acids Res 25, 1194–1202.
structural differences that correlate with pathogenesis. J Virol 73, 4738–4747.
Proutski, V., Gritsun, T. S., Gould, E. A. & Holmes, E. C. (1999).
Lu¨ck, R., Steger, G. & Riesner, D. (1996). Thermodynamic predic-
tion of conserved secondary structure: application to the RRE element of HIV, the tRNA-like element of CMV, and the mRNA of prion protein. J Mol Biol 258, 813–826. Lu¨ck, R., Gra¨f, S. & Steger, G. (1999). ConStruct: a tool for thermo-
dynamic controlled prediction of conserved secondary structure. Nucleic Acids Res 27, 4208–4217. Lukavsky, P. J., Kim, I., Otto, G. A. & Puglisi, J. D. (2003). Structure of
HCV IRES domain II determined by NMR. Nat Struct Biol 10, 1033–1038. Mandl, C. W., Holzmann, H., Kunz, C. & Heinz, F. X. (1993).
Complete genomic sequence of Powassan virus: evaluation of genetic elements in tick-borne versus mosquito-borne flaviviruses. Virology 194, 173–184. Mandl, C. W., Holzmann, H., Meixner, T., Rauscher, S., Stadler, P. F., Allison, S. L. & Heinz, F. X. (1998). Spontaneous and engineered
deletions in the 39 noncoding region of tick-borne encephalitis virus: construction of highly attenuated mutants of a flavivirus. J Virol 72, 2132–2140. Mathews, D. H., Sabina, J., Zuker, M. & Turner, H. (1999). Expanded
sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288, 911–940. McCaskill, J. S. (1990). The equilibrium partition function and base
Biological consequences of deletions within the 39-untranslated region of flaviviruses may be due to rearrangements of RNA secondary structure. Virus Res 64, 107–123. Psaridi, L., Georgopoulou, U., Varaklioti, A. & Mavromara, P. (1999).
Mutational analysis of a conserved tetraloop in the 59 untranslated region of hepatitis C virus identifies a novel RNA element essential for the internal ribosome entry site function. FEBS Lett 453, 49–53. Rauscher, S., Flamm, C., Mandl, C. W., Heinz, F. X. & Stadler, P. F. (1997). Secondary structure of the 39-noncoding region of flavivrus
genomes: comparative analysis of base pairing probabilities. RNA 3, 779–791. Ray, S. C., Wang, Y. M., Laeyendecker, O., Ticehurst, J. R., Villano, S. A. & Thomas, D. L. (1999). Acute hepatitis C virus structural gene
sequences as predictors of persistent viremia: hypervariable region 1 as a decoy. J Virol 73, 2938–2946. Rivas, E. & Eddy, S. R. (2000). Secondary structure alone is generally
not statistically significant for the detection of noncoding RNAs. Bioinformatics 16, 583–605. Sankoff, D. (1985). Simultaneous solution of the RNA folding, alignment, and proto-sequence problems. SIAM J Appl Math 45, 810–825. Schuster, P., Fontana, W., Stadler, P. F. & Hofacker, I. L. (1994).
From sequences to shapes and back: a case study in RNA secondary structures. Proc R Soc Lond B Biol Sci 255, 279–284.
pair binding probabilities for RNA secondary structure. Biopolymers 29, 1105–1119.
Simmonds, P. & Smith, D. B. (1999). Structural constraints on RNA
Men, R., Bray, M., Clark, D., Chanock, R. M. & Lai, C. J. (1996).
Simons, J. N., Desai, S. M., Schultz, D. E., Lemon, S. M. & Mushahwar, I. K. (1996). Translation initiation in GB viruses A and
Dengue type 4 virus mutants containing deletions in the 39 noncoding region of the RNA genome: analysis of growth restriction in cell culture and altered viremia pattern and immunogenicity in rhesus monkeys. J Virol 70, 3930–3937. Meyers, G. & Thiel, H. J. (1996). Molecular characterization of
pestiviruses. Adv Virus Res 47, 53–118.
virus evolution. J Virol 73, 5787–5794.
C: evidence for internal ribosome entry and implication for genome organization. J Virol 70, 6126–6135. Smith, D. B., Cuceanu, N., Davidson, F., Jarvis, L. M., Mokili, J. L., Hamid, S., Ludlam, C. A. & Simmonds, P. (1997). Discrimination of
Moser, C., Bosshart, A., Tratschin, J. D. & Hofmann, M. A. (2001). A
hepatitis G virus/GBV-C geographical variants by analysis of the 59 non-coding region. J Gen Virol 78, 1533–1542.
recombinant classical swine fever virus with a marker insertion in the internal ribosome entry site. Virus Genes 23, 63–68.
Spahn, C. M., Kieft, J. S., Grassucci, R. A., Penczek, P. A., Zhou, K., Doudna, J. A. & Frank, J. (2001). Hepatitis C virus IRES RNA-
Myers, T. M., Kolupaeva, V. G., Mendez, E., Baginski, S. G., Frolov, I., Hellen, C. U. & Rice, C. M. (2001). Efficient translation initiation is
induced changes in the conformation of the 40s ribosomal subunit. Science 291, 1959–1962.
required for replication of bovine viral diarrhea virus subgenomic replicons. J Virol 75, 4226–4238.
Stocsits, R., Hofacker, I. L. & Stadler, P. F. (1999). Conserved
Odreman-Macchioli, F. E., Tisminetzky, S. G., Zotti, M., Baralle, F. E. & Buratti, E. (2000). Influence of correct secondary and tertiary RNA
secondary structures in hepatitis B virus RNA. In Computer Science in Biology, pp. 73–79. Univ. Bielefeld, Bielefeld, Germany. Proceedings of the GCB’99, Hannover, Germany.
folding on the binding of cellular factors to the HCV IRES. Nucleic Acids Res 28, 875–885.
Tanaka, T., Kato, N., Cho, M. J., Sugiyama, K. & Shimotohno, K. (1996). Structure of the 39 terminus of the hepatitis c virus genome.
Oh, J. W., Ito, T. & Lai, M. M. (1999). A recombinant hepatitis C virus
J Virol 70, 3307–3312.
RNA-dependent RNA polymerase capable of copying the full-length viral RNA. J Virol 73, 7694–7702.
Tang, S., Collier, A. J. & Elliott, R. M. (1999). Alterations to both the
initiation site selection by hepatitis C virus polymerase on a minimal viral RNA template. J Biol Chem 275, 17710–17717.
primary and predicted secondary structure of stem-loop IIIc of the hepatitis C virus 1b 59 untranslated region (59UTR) lead to mutants severely defective in translation which cannot be complemented in trans by the wild-type 59UTR sequence. J Virol 73, 2359–2364.
Okamoto, H., Nakao, H., Inoue, T., Fukuda, M., Kishimoto, J., Iizuka, H., Tsuda, F., Miyakawa, Y. & Mayumi, M. (1997). The entire
Tautz, N., Harada, T., Kaiser, A., Rinck, G., Behrens, S. & Thiel, H. J. (1999). Establishment and characterization of cytopathogenic and
nucleotide sequences of two GB virus C/hepatitis G virus isolates of distinct genotypes from Japan. J Gen Virol 78, 737–745.
Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W:
Oh, J. W., Sheu, G. T. & Lai, M. M. (2000). Template requirement and
Pestova, T. V., Shatsky, I. N., Fletcher, S. P., Jackson, R. J. & Hellen, C. U. (1998). A prokaryotic-like mode of cytoplasmic eukaryotic
ribosome binding to the initiation codon during internal translation http://vir.sgmjournals.org
noncytopathogenic pestivirus replicons. J Virol 73, 9422–9432. improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 4673–4680. 1123
C. Thurner and others
Tuplin, A., Wood, J., Evans, D. J., Patel, A. H. & Simmonds, P. (2002).
Thermodynamic and phylogenetic prediction of RNA secondary structures in the coding region of hepatitis C virus. RNA 8, 824–841. van Regenmortel, M. H. V., Fauquet, C., Bishop, D. & 8 other authors (2000). Virus Taxonomy: The Classification and Nomenclature of Viruses.
The Seventh Report of the International Committee on Taxonomy of Viruses. San Diego: Academic Press. http://www.ncbi.nlm.nih.gov/ICTVdb/ Witwer, C., Rauscher, S., Hofacker, I. L. & Stadler, P. F. (2001).
Conserved RNA secondary structures in Picornaviridae genomes. Nucleic Acids Res 29, 5079–5089. Xiang, J., Wunschmann, S., Schmidt, W., Shao, J. & Stapleton, J. T. (2000). Full-length GB virus C (Hepatitis G virus) RNA
diversity of the 39 noncoding region of the hepatitis C virus genome. Virology 223, 255–261. Yi, M. K. & Lemon, S. M. (2003). 39 nontranslated RNA signals required for replication of hepatitis C virus RNA. J Virol 77, 3557–3568. You, S. & Padmanabhan, R. (1999). A novel in vitro replication system for dengue virus. Initiation of RNA synthesis at the 39-end of exogenous viral RNA templates requires 59- and 39-terminal complementary sequence motifs of the viral RNA. J Biol Chem 274, 33714–33722. Yu, H., Grassmann, C. W. & Behrens, S. E. (1999). Sequence and
structural elements at the 39 terminus of bovine viral diarrhea virus genomic RNA: functional role during RNA replication. J Virol 73, 3638–3648.
transcripts are infectious in primary CD4-positive T cells. J Virol 74, 9125–9133.
Zhao, W. D. & Wimmer, E. (2001). Genetic analysis of a poliovirus/
Yamada, N., Tanihara, K., Takada, A., Yorihuzi, T. T., Tsutsumi, M., Shimomura, H., Tsuji, T. & Date, T. (1996). Genetic organization and
hepatitis C virus chimera: new structure for domain II of the internal ribosomal entry site of hepatitis C virus. J Virol 75, 3719–3730.
1124
Journal of General Virology 85