J. gen. Virol. (1988), 69, 1777-1787.
Printed in Great Britain
1777
Key words: CMV/nucleotide sequence/RNA 2
Nucleotide Sequence and Evolutionary Relationships of Cucumber Mosaic Virus (CMV) Strains: CMV RNA 2 By T H O M A S M. R I Z Z O AND P E T E R P A L U K A I T I S * Department of Plant Pathology, Cornell University, Ithaca, New York 14853, U.S.A.
(Accepted 25 April 1988)
SUMMARY
The nucleotide sequence of RNA 2 of the Fny strain (Subgroup I) of cucumber mosaic virus (CMV) was determined and compared at both the nucleic acid and protein level with the previously determined corresponding sequence of RNA 2 of the Q strain (Subgroup 2) of CMV. Fny-CMV RNA II 2 consisted of 3050 nucleotides and contained a single open reading frame (ORF) of 2571 nucleotides, whereas Q-CMV RNA 2 consists of 3035 nucleotides and contains a single ORF of 2517 nucleotides. At the nucleotide level, there was 71 ~ sequence homology between the two RNAs, while at the protein level sequence homology was 73 ~. Protein homology was greater (89 ~) in the central third than in either the N-terminal (64 ~ ) or the C-terminal (56 ~o) thirds. The secondary structures of the 3' end of the RNAs were very similar, even though the nucleotide sequence homology between the Y-terminal 180 nucleotides was only 62 ~. By contrast, there was 80~ sequence homology between the Y-terminal 86 residue, non-translated regions of the two RNAs. The evolutionary relationships and the divergence and retention of specific sequences among the two CMV strains and other plant viruses are discussed.
INTRODUCTION Cucumber mosaic virus (CMV) is a positive-sense RNA plant virus with a broad host range of over 775 species in 365 genera and 85 families, including both monocotyledonous and dicotyledonous plants (Douine et al., 1979; Kaper & Waterworth, 1981). There are many isolates or strains of this virus and it has a world-wide distribution (Kaper & Waterworth, 1981); strains differ in either pathology or host range. The genome of CMV consists of three single-stranded RNA species, designated RNAs 1, 2 and 3 in decreasing order of Mr (Peden & Symons, 1973); virions also contain a subgenomic RNA (RNA 4) derived from RNA 3, which is the mRNA used for coat protein synthesis (Schwinghamer & Symons, 1975). On the basis of serology (Devergne & Cardin, 1975) and nucleic acid hybridization (Gonda & Symons, 1978; Piazzolla et al., 1979), the strains of CMV appear to fall into two subgroups. Of 39 strains examined in this and other laboratories (Gonda & Symons, 1978; Piazzolla et al., 1979; F. Garcia-Arenal & P. Palukaitis, unpublished results) by nucleic acid hybridization analysis, 30 belong to one subgroup (Subgroup I) and nine belong to a second (Subgroup II). Complementary DNA RNA hybridization between the two subgroups has indicated varying degrees (10 to 50~) of nucleotide sequence homology; however, RNAs belonging to the two subgroups can be reassorted to construct viable pseudorecombinants (Rao & Francki, 1982; Edwards et al., 1983). Thus, the two subgroups contain genetically compatible RNAs. The nucleotide sequences of the genomic RNAs of one strain from Subgroup II, Q-CMV, have been determined (Gould & Symons, 1982; Rezaian et al., 1984, 1985). In this paper, we describe the nucleotide sequence of RNA 2 of a Subgroup I strain, Fny-CMV, and compare it with that of Q-CMV RNA 2. 0000-8270 © 1988 SGM
1778
T. M. R I Z Z O AND P. PALUKAITIS METHODS
Bacterial strains and plasmids. EscherichiacolistrainJMlO1 (Messing, 1979)andpUC18(Norranderetal., 1983) were used as recipient and vector, respectively, in the construction of cDNA clones of Fny-CMV RNA 2. Recipient E. coli strains JM109 (Yanisch-Perron et al., 1985) or DH5ctF' (Liss, 1987) and vectors M13mpl8 or M 13mp19 (Norrander et al., 1983) were used in the construction of a Bal 31-generated, ordered set of deletions (Poncz et al., 1982) used for sequencing Fny-CMV RNA 2. Complementary DNA cloning. Fny-CMV was propagated and isolated, and the viral RNA was extracted and purified as previously described for other CM¥ strains (Palukaitis & Zaitlin, 1984). Complementary DNA was prepared to total CMV RNA, by the procedure of Gubler & Hoffman (1983) using a decamer primer (5'-TGGTCTCCTT-3') complementary to the T-terminal 10 nucleotides of all four CMV RNAs. The sequence of the 3' end was determined on B-CMV RNA 4; B-CMV and Fny-CMV belong to the same subgroup of CMV and have extensive sequence homology (P. Palukaitis, unpublished results). Subsequently, this cDNA was blunt end-ligated to SmaI-linearized pUC18; the ligation mixture was used to transform competent E. coli strain JMI01. The resultant white colonies were blotted onto nitrocellulose and probed with randomly primed 32p-labelled cDNA made to total Fny-CMV RNA, and transformants carrying CMV-specific cDNA clones were detected. DNA purification and manipulation. Alkaline lysis (Maniatis et al., 1982) was used in the large scale isolation of plasmid DNA. DNA fragments were separated by electrophoresis and extracted from low melting temperature agarose (Maniatis et al., 1982). Escheriehia coli was transformed as described by Messing (1983). A modification of the procedure of Birnboim & Doly (1979) was used to screen recombinant clones rapidly. Nucleic acid hybridizations. Agarose gel electrophoresis of RNA after denaturation with formamide/formaldehyde, transfer of RNA to nitrocellulose and hybridization of immobilized RNA to 32p-labelled DNA were as previously described (Palukaitis, 1984). Agarose gel electrophoresis of restriction enzyme-cleaved plasmid DNA, Southern transfer of DNA to nitrocellulose and hybridization were as described by Maniatis et al. (1982). Northern blots were probed with 3ZP-labelled plasmid DNA prepared by the 'oligolabelling' method of Feinberg & Vogeistein (1983). Southern blots were probed with randomly primed 32Pqabelled cDNA made to total Fny-CMV RNA as described by Palukaitis (1986). Nucleic acid sequencing. M13 DNAs containing CMV RNA 2 sequences were prepared and sequenced by the dideoxynucleotide chain termination method (Sanger et al., 1977, 1980), except that 7-deaza-dGTP was used instead of dGTP in some reactions to alleviate band compressions (Mizusawa et al., 1986). Enzymic decapping of Fny-CMV RNA 2, 5' end-labelling with [y-32p]ATP by polynucleotide kinase, 3' endlabelling with [5'-32p]pCp-3 ' by T4 RNA ligase and direct, enzymic RNA sequencing were as described by Garcia-Arenal et al. (1987) and Haseloff& Symons (1981). End-labelled RNA was electrophoresed in low melting temperature agarose, located by ethidium bromide staining, eluted by melting the agarose gel in 5 gel volumes of water, extracted with phenol and recovered by ethanol precipitation. Computer analysis. An IBM Personal Computer AT equipped with the Microgenie Sequence Analysis Program (Beckman) was used in the analysis of the Fny-CMV RNA 2 sequence. This program was used to generate amino acid sequences from nucleotide sequences, to align nucleotide sequences and amino acid sequences, to determine the Mr of the RNA 2-encoded protein and to calculate percentage homologies. Enzymes and chemicals. Restriction endonucleases were purchased from Bethesda Research Laboratories, Boehringer Mannheim, New England Biolabs and United States Biochemicals. T4 DNA ligase, Klenow fragment and polynucleotide kinase were obtained from United States Biochemicals. Bal 31 was obtained from Boehringer Mannheim, and T4 RNA ligase was purchased from Bethesda Research Laboratories. All enzymes were used as recommended by their manufacturers. Amersham and New England Nuclear supplied [c~-32p}dATP (400 to 800 Ci/mmol) and [],-3zp]ATP (3000 Ci/mmol). All other chemicals were of analytical grade.
RESULTS AND DISCUSSION I d e n t i f i c a t i o n o f an R N A
2-specific e D N A
clone
Virtually all of the F n y - C M V R N A 2 s e q u e n c e was o b t a i n e d using a n e a r full-length R N A 2-specific e D N A clone. A f t e r initial screening o f the e D N A library for R N A 3- and R N A 4-specific clones by colony h y b r i d i z a t i o n w i t h r a n d o m l y p r i m e d 32P-labelled e D N A p r e p a r e d to gel-purified R N A 4, the r e m a i n i n g clones were screened by analysing insert size. T h e largest p l a s m i d (pFny200) c o n t a i n e d an insert o f length 3.0 kb. A N o r t h e r n blot o f total C M V R N A was p r o b e d w i t h 32p-labelled pFny200, w h i c h specifically h y b r i d i z e d to R N A 2 (results n o t shown). T h e l e n g t h of the R N A 2-specific c D N A insert in p F n y 2 0 0 was that e x p e c t e d for a fulllength clone.
CMV RNA 2 sequence
1779
Strategy for sequencing Fny-CMV RNA 2 Initially, pFny200 was cleaved with KpnI and one resultant cDNA fragment was subcloned into pUC18 to form pFny202, while the other was circularized to form pFny201 (Fig. 1). Both pFny202 and pFny201 could be linearized by cleavage at a unique restriction enzyme site at either vector/cDNA insert junction. These sites are within the pUC 18 polylinker regions which flank both ends of each cDNA insert. Such linearization could not be done with pFny200. The linearized plasmids were treated with nuclease Bal 31 for various lengths of time, and the resultant truncated fragments were cleaved at the remaining vector/cDNA insert junction. This mixture of vector and cDNA fragments was subcloned into either M 13mp ! 8 or M 13mp 19 such that the ends of the deleted regions were adjacent to the M13 primer site. Subclones containing cDNA were identified by restriction enzyme analysis of M13 replicative forms and/or hybridization of M13 replicative forms to randomly primed 32p-labelled cDNA made to total Fny-CMV RNA (results not shown). The DNA sequence of the original full-length cDNA insert of pFny200 was determined by sequencing into insert cDNA in subclones of progressively smaller size. Both strands of the insert were sequenced, since pFny202 and pFny201 had been linearized at both vector/cDNA insert junctions prior to digestion with nuclease Bal 31. The ordered set of deletions is illustrated in Fig. 1. Since the construction of pFny202 and pFny201 may have resulted in the loss of a small undetected KpnI fragment because of a second KpnI site being very close to the one shown in Fig. 1, a 0.27 kb NruI fragment containing the KpnI restriction site(s) was subcloned in both orientations into M 13mp 18 to form pFny200.1J and pFny200.2J. Subsequent sequencing of this NruI fragment revealed only one KpnI restriction site. The 5'-terminal 33 nucleotides of Fny-CMV RNA 2 were determined by direct RNA sequencing. This revealed that the cDNA insert of pFny200 lacked the 5'-terminal 19 nucleotides of RNA 2. Also, four 3'-terminal nucleotides of the primer sequence were lost from pFny200 during the cloning procedure. The complete nucleotide sequence of Fny-CMV RNA 2 is presented in Fig. 2. Fny-CMV RNA 2 encodes one long open reading frame The only long open reading frame (ORF) of Fny-CMV RNA 2 begins at the first AUG codon at nucleotides 87 to 89 and contains 2571 nucleotides, encoding a 96720 Mr protein (857 amino acids; Fig. 2). By contrast, the translation product of Q-CMV RNA 2 has a predicted Mr of 94333 (Rezaian et al., 1984). In Fny-CMV RNA 2, the next largest ORF on the positive strand starts at residue 2419 and is 330 nucleotides long; Q-CMV RNA 2 has a corresponding 300 nucleotide ORF beginning at residue 2410 (Rezaian et al., 1984). The longest ORF on the negative strand of Fny-CMV RNA 2 is only 174 nucleotides long; no counterpart ORF is present in Q-CMV RNA 2. Comparison of Fny-CMV RNA 2 and Q-CMV RNA 2 nucleotide sequences and translation products The alignment of the RNA 2 sequences of Fny-CMV and Q-CMV is shown in Fig. 3. Translation products of the two RNAs are aligned in Fig. 4. The nucleotide sequences and corresponding translation products have overall homologies of 7 1 ~ and 73~, respectively. Unmatched nucleotides within the coding regions are localized in six specific RNA segments (Fig. 3). These segments are frameshifted with respect to one another, resulting in corresponding protein segments with little or no amino acid sequence homology (Fig. 4). Examination of the alignment of the translation products reveals that their central regions are more similar than are either their amino termini or carboxy termini. This is due in part to the absence of frameshifted segments in the central regions. Translation product homologies and corresponding coding region homologies between the amino-terminal, central and carboxyterminal regions are listed in Table 1. In the amino termini and carboxy termini, the degree of homology in the coding regions of the nucleotide sequences is higher than that of the corresponding translation products, while the
1780
T. M. RIZZO AND P. PALUKAITIS
-
-
--
- -
Kb pFny200
....
0 I
1
-
-
--pFny201.2RA5 201.2RA4 201.2RA3 201.2RA2 201.2Al
- 200.2J 202.2AI 2 202.2A11 202.2RIA9 202.2A8 202.2RIA7 202.2A6 202.2A5 202.2A4.1 202.2R3A4 202.2RIA4 202.2,5.3 202.2A2 202.2R1AI 202.2R
2
,,
,
3 ,
pFny201,1 201. I A 1 201. IA2 201.1A3 201. IA3.1 - 201 AA4 - 200.1J 202.1R 2021A1 202.1A2 202.1A3 202.1A4 202.1A4.1 202.1A5 202.1A6 202. IA7 202.1A8 202,1A9 202.1AI0 202.1A10.1 202.1A11 2021A12
Fig. 1. Ordered set of overlapping deletions used for sequencing pFny200, a c D N A clone of Fny-CMV RNA 2. Abbreviations: B, BamHI; K, KpnI; N, NruI; R, EcoRI. The ends of the c D N A insert
corresponding to the 5" and 3' termini of RNA 2 are indicated. converse is true for the central region homologies. This is due largely to the following reasons. (i) The two terminal regions contain six frameshifted segments which encode amino acid sequences with relatively little or no (0 to 38 ~ ) homology. However, each frameshifted segment exhibits a higher degree of homology at the nucleotide sequence level [Fig. 5 a shows that part of frameshifted segment D which, contained the highest level (86~) of nucleotide sequence homology and no protein sequence homology; other segments contained lower levels (22 to 79~) of nucleotide sequence homology]. (ii) In the central regions, the degenerate genetic code results in amino acid sequences having a higher degree of homology than their respective coding sequences (an example of this is shown in Fig. 5 b), such that 201 out of the 360 aligned codons have one or two base changes; however, 163 of these 201 heterologous codons still encode the same amino acid.
Non-coding 3' terminus of Fny-CMV RNA 2 The non-coding T-terminal region of Fny-CMV R N A 2 is 393 nucleotides long (including the stop codon). This compares with 426 nucleotides in Q-CMV R N A 2. Ahlquist et al. (1981) have shown that RNAs 1, 2 and 3 of brome mosaic bromovirus (BMV) and related viruses, such as Q-CMV, have strong sequence similarities within and between viruses at their 3' ends. These authors proposed that these 3' termini can adopt two specific secondary structures on the basis of results of mapping studies using S1 nuclease, the positions of band compressions in sequencing
C M V R N A 2 sequence
1781
1 m7GpppGUUUAUUUACAAGAGCGLIACGGUUCAACCCCUGCCUCCC~U~UAA/~CUCCCUAGACUU/U~,AUCIJUUUCUUUCUAGUAUCULIUUCU 87
i~¢ Al.a Phe Pro ASa Pro Al.a pl~ Ssr L~u Ala Aen Leu Leu A~n GS~I S~r TJir GS~t VaS Asp Tar Pro aZu alrp Val GSu AI,~ Lau AC~ AU__G . GGCU UUC COO ~C OCC GCA UU¢ UCA CUA ~ C ~ U CUU u ~ ~ C GGC A ~ UAC GOU GUC GAG ACU CCC GAG GAU GUG GAA CGU UUG
177
Set CS= OZn Az~8 GSU GS= AZa AZa AZa ASa C~e Ar~ aen ~ r Az~ Pro L#u Pro AZa VaZ Amp YeS ~ , GZU star gas ~ r ~A~ A~Ae~CAZa UCLIGAG CAA CGC GAA GAG GCU GCU GCG GCC UGU CGU AAU UAC AGG CCC CUA CCC GCU GUG GAU GUC AGC GAG AGU GUC ACA GCG Hie Se, Leu Am@ T ~ P~o Asp GS¥ A ~ Pro AZ~ GSU AZn VaS 8#, ASp GSu P~e ~aS Ta, T~r GS~ ASa GSu Asp
267 CAu ucc cuc cGA ACU CCU ~C GGA GCU CCC GCU G~ GCG GUG UCU GAU ~G UUU GUA Acu uAu GGU GCU ~ GAG c.c c~o Guc ~cu uuu ~.G ~cG ~uG GUC ~
CGC .UG CGV ~UC ~
C,~ ~GA ~
Leu ~
GAU ~
~
COO
E~
UCU
357
~u ~.
447
ISe Set Be, IZe AT.~ l ~ t h~z Az~ A ~ LeU LeU L~m AZ~ Pro Az~ Tl~ Set Hie Az~ The H~t Lye C¥~ P~e Clu Asp Leu V~l AZ~ AZ~ AUU U¢C AGC AUU GCU AUG GCC AGA GCU UUG UUG UUG GCA CCU AGA ACA UCC CAC CGA ACC AUG AAG UGU UUU GAA GAG CUG GUC GCG GCU
UGC CCU GCG t ~ . ,~U
537
G~2J ~ * " I~e ~ , Tar Lye Set Asp Phe Tyr Tyr Set GSu GS~ Cys G~u A~z Aap Asp A ~ GS~ IZe Asp ISe ~,r ~s~ Ar~' .4~p YaZ ~ AUU UAC ACU AAA UCU GAU UUC UAC UAC AGU GAA GAG UGU GAA GCC GAC GAG GCU GAG AUA GAU AUC UCG UCU CGC GAU GUA CCC GGU UAU
627
S~, P ~ alu P~o TPp Se~ Ar~ T ~ Set a ~ Phe GSu Pro Pro Pro Ils C~e Glu A~a UCO UUC ~ CCG UGG UCC CGA ACG UCU GGA UUU GAA CCG CCG CCC AUU UGU GAA GCG UGC
717
Asp P~e Am~ A ~ Leu Lye L~a Ssr C~s A ~ Glu A ~ T ~ P~e A ~ As~ Asp 2¥r gaZ ISe GSu GS~ beu Aap GS~ VaS VaZ Asp A ~ A ~ GAU UUU AAU GCU UUA AAG AAA UCG UGC GCU GAG AUG ACC UUC GCU GAU GAU UAU GUU AUU GAA GGU UUA GAU G~GUGUll GUU GAU AAU 6CG
807
~ r Le~ Leu Set AS. L~u G~¥ Pro Phe Leu ~a~ Pro YaS L~S Ojs GSn 2~r ~Su L~o Ct/8 Pro Tar Pro ~ r I~e AZa IZe Pmo Pmo Aa~ ACU CUG UUG UCG AAU UUG GGU CCA UUU UUG GUA CCC GUG AAG UGU CAA UAU GAA AAA UGU CCA ACG CCA ACC AUG GCG AUU CCU CCG GAll
897
~ u Ao. Ar~ A ~ Ta, Asp Arg Val Asp ISe Aen Leu Val Gln Set IZe Cys ASp Ser UUA AAC CGU GCU ACU GAU CGU GUU GAU AUC AAU UUA GUU CAA UCC AUU UGU GAG UCG ACU CUG CCC ACU CAB AGU AAU UAC GAC GAC UCU
987
P~e His GSn VaZ Phe VaS GSU Set ASa Asp T ~ Set I~e Asp Le~ Asp Hie VaS Ar~ L ~ Arg GI~ Se~ A~p Leu IZe AZa L ~ IZ~ Pmo AUU CCA UUO CAU CAA GUG UUC GUC GAA AGU GCA ~AC UAU UCIJ AUA GAU CUG GAU CAU GUU AGA CUU CGA CAG UCU GAU CUU AUU GCA ~
1077
Asp Set GS~ Rie Met ISe Pro VaZ Leu Ash Tar CS~ Set GZ~ H~B L~/8 Arg Vat G ~ Tar Tar Lye GSu VaS Lew Tar A ~ IL~ LVm LV~ GAU UCA GGG CAU AUG AUA CCG GUU CUG AAC ACC GGG AGC GGU CAC AAG AGA GUA GGU ACA ACG AAG GAG GUC CUU ACA GCA AUU MG AAA
1167
IZ~ A ~ Ash AZa Asp VaZ Pro GZu Leu GS~ Asp 8e~ Va~ A~n Leu SeP Arg Lau Set Lye A ~ VaS Ala Glu A~8 P~e P~e I ~ 8e~ ~ CGU AAU GCU GAG GUU CCA GAG CUA GGU GAU UCC GUU AAU UUG UCU AGA UUG AGU AAA GCU GUG GCU GAG AGA IIUC UUC AUU UCA IIAC AUG
1257
~ A~n Gl~ A ~ ~e, Le~ Ala Set Set Aen P~e YaS Ash V~S YaS Set Aen P~e H~e Asp ~ r Met aZu Ly8 Tr~ L~le S~Z* S#r a~¥ ~ ~ U GGU AAC UCU CUA GCA UCC AGU AAC UUU GUC AAU GUC ~UU AGU AAC UUC CAC GAU UAC AUG G/~ AAA UGG AAG UCC UCA GGU CUU UCU
1347
~ A ~ Asp I ~ Pro Aap L ~ Hie Ala GS~ Ash Leu G~n Phe ~Jr Asp His Met Ile ~ A Ser Aap VaZ Lye Pro VaZ VaZ E ~ A ~ UCC GAU GUG AAA CCU GUG G(J~ AGC GAG ACA UAU GAU GAU CUU CCG GAU CUU CAU GCU GAG AAU UUG CAG UUU UAU GAG CAC AUG AUA
1437
Le~ Ae~ IZe ASp At@ Pro WaZ PrO AL~ Thr ISe Thr Tyr His LyS Lye Ser IZe Tar Set GSn Phe Set Pro Le~ F~e ~ AZa L ~ ~I~ CUe MU AUC GAC AGA CCG GUU CCA GCU ACU AUA ACS UAU CAU ~G ~G AGU AUA ACC UCC CAG UUC UCA CCG UUA UUC ACA GCG CGA UUC
1527
GS~ Arg P~e Gin Arg C~e Leu Az~ Glu Arg Ile IZa Le~ Pro Val Gl9 Lye ISe Set 8er Leu GZu Mee A ~ GZy ~ Ae~ VaS Lye ABet GAG CGC UUC CAG AGA UGC CUU CGA GAA CGU AUU AUU CULl CCU GUU GGU AAG AUU uCA UCC CUU GAG AUG GCA GGA UUU GAU 6UC AAG AAC
1617
Lye B£o Cys Leu Glu ISe Asp Le~ Ser Lye Phe Asp Lye Set GSn UZ~ GSu Phe Rie Leu Leu ISe GSn GI~ Hi8 Ile ~ Aem Gl¥ AAG CAC UGC CUC GAG AUU GAG CUG UCU AAG UUU GAU AAG UCU CAA GGU GAA ~UU CAC UUG CUA AUG CA6 G/~ CAC AUU UUG )L~U GGU CUA
AUG AUC AUG UAC
UC.
UGC CCG UGU UtlU
GS~ Cge P~o Ala Pro Ile Thr Lye Trp Trp C~s Asp Phe His Arg Phe Set T~r Ile Arg Ae~ A ~ Arg AZ~ Gl¥ Va~ G~¥ Met Pro I ~
1707 GGA UGU CCA ~CU CCG AUA AGO AAG UG~ UGG UGU ~U UUC CAU CGA UUC UCU UAC AUU AGA GAC CGU AGA ~CU ~
Gee ~ U AU~ CCU AmJ
1797
Set P~e Gln A~3 AZ~ T~P Gl¥ Aop AZ~ Leu Thr T~r P~e G ~ Aen Tar IZe V~S T ~ He~ AZ~ GZU P~e A~a T~p C ~ ~ Amp 2~e A~p A~J UUC GAG AGA CGA ACU GGC GAU GCA CUC ACU UAU UUU GGC AAu ACC AUG GUC ACC AUG GCU GAG UUU GC£ U~G UGU UAU GAG ACC GAG
1887
GZn P~e GSu L~o L~u Leu P~e Set GS~ Asp ~ep S~r Leu GS~ P~e Set L~u L~u Pro Pro VnZ G~¥ Asp Pro Se~ L~e P~e ~ ~k~ Z4w CAA UUC GAA AAG CUU UUA UUC UCA GGC GAU GAU (/CU CUA GGA UUU UCA CUG CUU CCC CCU GUU GGU GAG CCG AGU AAA U~C ACA ACU CUU
]977
P~e Am~ Met Gl~ AL~ LR8 V~l Met Gl~ P~o AZ~ Val Pro T~r IZe C~e Set L~e P~e De~ L~w See ASp GlU P ~ G ~ A ~ ~u~ P ~ E~e uuc AAC AUG GAA GCU AAG GUG AUG GAA CCU GCC GUA CCA UAU AUU UGU UCG AAG UUC UUA CUC UCU GAG GAG UUC GGU AAC ACA UUU tICC
2067
~ Z P~o Asp Pro Le~ A ~ Gl~ VaS GSn Ar~ Lm~ Gl~ ~ L¥~ Lye IS~ Pro T ~ S~r Ae~ Am. Amp GSu P~e E ~ PI~ AZ~ ~ P ~ ~4~f; Guu ccA GAu ccA uuG CGC GAG GUU GAG CGG UUA GGA ACA AAG AAA AUU CCC UAU UCU GAG AAU GAU GAA UUC UUG UUU GCU CAC UUC AUG
2157
S , , PI~ $'aZ Asp Arg Leu Lye Fhe Leu Ae~ AX,8 He~; 8sz, GSn Sex" Oje ISe Asp GSn L~u S,~x" I7.¢ P ~ Phe GZu L~u Lye ~I/~C Lye .C~lm AGC UUU GUU GAU CGA UUG AAG t~])l) UUG GAG CGA AUG PCI) GAG UCG UGU AUG GAU CAA CUU UCG AUU UUC UUC C,~U~UUG AkA AAG AAG
2247
S#P GS¥ ~Su G~u A ~ AZ~ Lew Net Leu G ~ AL~ Phe Lye Lye ~ r T ~ A ~ Ae~ P ~ GS~ S~e ~ r Lyre G~u L ~ TVe ~Vr E ~ Asp A ~ ~U GGG GAA GAG GCU GCU UUA AUG UUA GGC GCC UUU AAG AAG UAU ACC GCU AAU UUC GAG UCC UAC AAA GAA CUC UAU UGU UCA GAU CGU
2337
A ~ GSn C ~ GSu Leu IS~ As. S~r P~e Cye S#r ~ ~ u P ~ 4~# YaS GSU Ar~ Ya~ A~. S#~ As. &~e GS. ArR ~¥m A~n 9 ~ ~ CGU GAG UGC GAA UUG AUG AAU UCG UUU UGU AGU ACA GAG UUC AGG GUU GAG CGO GUA AAU UCC GAG AAA GAG CGA AAG AAU UAU
2427
Glu Ax~ Az~ C)je Am. hmp Ltle Az~ Az~ ~ Pro 2~zr Gl¥ Set T~lr G~¥ G~/j G~¥ G ~ G~u AZa GZw ~ LV~ V~Z $~P GZ. ~ e GZu $~z~ G/~ CGU ~ UGC AAU GAG AAA CGU CGA ACU CCA ACU GGC UCG UAU GGU GGA GGC GAA GAA GCA GAG ACG AAG GUC UCA CAA ACA GAA UCG
2517 AC. 2607 2708
AC. ACU
UCAC,'a ~G VCCCA~CGA GAG AGe GCG uuc GUC AUG CCG CCA ~
~
~'~ aze c~u G ~ ~
C~'~U~CGGUMC~U~G~C~'.~GUGCUUUCU~.~CCUCCCC
VaS ^cc Z~. Guu
UCU ~G ACU AUU CCG CUU CCu ACC GW CGA U~ ~SU
I~ AIAJ
U~:
VaZ __uGA~ c c u c u c ~ u c ~ , , G A c c ~ u u u , , ~ c ~ Guc
UUCC~AUCUCCCUCCG~UUUGCU~GCUGA~m~C~
2827 AGUC~CU~,'OC~U,WGGU~'~CG".~G~CA UC~GCU'~C~CU,U,,'~UGGUCAGUCGmGAGG,'~UCmC~GCAGACUmC.~r~CUCU~CC~ 2946 ccu~G~uuucu~c~u~`~cuuc~`.ucc~nuAc~c~v~cm~cGA~uuuuwm~c~mr~ccccccAcuu~u~uGm~ccuc~GAccA
3050
Fig. 2. Nucleotide sequence of Fny-CMV R N A 2 and the encoded amino acid sequence of the large ORF. The initiation and stop codons are underlined.
1782
T. M. R I Z Z O A N D P. P A L U K A I T I S 1 ~uL~cuEAA~6cGuA~c`GuUcAAcc~cuGCcUCCUCuGUGAAA~uACCCUAG-UuUuAUuGAUCUACUuCuAGUCUcUCuuCUGUuAcUAuGAUAAGucCUcCACccAcUuUcuCA IIIIIII l l r l l l l l l l li[[[[[[[iiirrlrlr irll ill i IIIFII FI ' I I ] l i l l l l I11 [fEf iT[ ill I [[I [ I[III| 1 GUUUAUU~ACAAGAGCGUA~GGUUCAA~CC~GC~U~CCUG~JAAAA~U~CCCUAGACU~AAAU~U~UUCUUUCUAGUA~CUU~UCU . . . . . . AUGGCUUUCCCUGCCCCCGCAUUCUCA
120 ~UCGCC#A'di~UGUk~GAAuG~CUCCUAU~G~GUUGACA~CCC~GGAAGUGGAACGcGUUAGACGUGAACAACGCGAAGAUGCUGAGGCGGCUUUACGU AAUUAUAAGCCUUUACCCGCU [lilllrlf fllll III 11 llIll I I I I I I I I I I I I I I llli[Frl I II[ Illllllllll Illl IIIII I l l l l l l I i l I l l lllII[ll
114 240 234 360
CUAGCC~AVCUUUU(1AACGGCAGUUACGGUGUCGACACUCCCGAGGAUGUGGAACGUUUGCGAUCUGAGCAACGCGAAGAG~CUGCUGCGGcCUGUCGUAAUUACAGGcCCCUACCCGCU 'A GUGGAUGUCAGUGAGAGUGU~C~UAGAGAc~GAACCUAUUGUCU~GCAAACCGUCACU~CA~CUC~UGUUACAUCAGU~UGA~GCGUUUGUUUCUUU~GGUGcUGAGGACUACCUUGAA
llIllllII[l
[I1111111 1
I I I I I I Ill
Ill
II
I I I I I I llllll
I I I ili
ilJ I rIIIIi
Ill
IIllllll
II IllliIli{
GUGGAUGUCAGcGAGAGUGUCACAGAGGA~c~°CGCAUUc~CUCcGAA~UCCUGAC~~ 'GAGCUCCCGCUGAAGCGGUGUCUGAUGAGUUUGUAAcVUAUGGUGCUGAAGA~UUAcCUUGAA~
AVGUCCCCAUCUGAGCUGCUUUCCGCUUUUGAGUUGAUGGVCAAACCCUUGCGUGUCGGUGAAGUGUUGUGCUCGAGUUUUGAUCGUUCGCUAUUCAUCUCCAGCGUCGCCAUGGCUAGG
I II [IIlll Ill lllII[llll [II{llIIIIIIl III fill I I rlrl i Ill [i r Ir iI IrII[l r I IIII[ I[ . A~GUUcUUUUAUUUCCAGcAUUGCUAUGGCcACA '.~ 351 AAAVCUGAUGAU~A~CUCCUUGUCGCUUUU~AGACGAUGGUCAAACcCAUGCGUAUCGGAcAACUAUGGUGcCCUGcGUUUAAUAA A C G U U G U U G U U G G C A C C A ~ U C A C A U C C A C C C G A A C G U V G A A G C G U U U U G A A G A C C U U G U G G C C G C G A U c U A U C U A A A A A C U G A U U U ~ U U . . . . U U A . . . . . . . ~ A , A G A C G A U GGGCCC 480 I II~1111111}II} IIII)I i]l]l] 11])) i)))llI)iIlll II il II II II llr Illlilllll r I I I I I I I iii I I ; UAUUUAcAcUAAAU~UGAUUUqUACUACAGUGAAC~AGUGU~GAAGCCGAcGACGCU 471 GcUUUGUUGUUGGcACCUA~GAACAUCccACcGAAcCAUGAAGUGUUUUC~AGAcCUGGUCGCG(C 588 cAGACUGA•GUcUCCCAAAG•GAUGUGCCcGGUUAUAUCUUcGAAcCAGGGCAACACUCAUCcGGUUUUGAAcCCCCCCCUAUUUGUGCUAAAUGUGACUUGAUUUUGUAUCAAUGUCCG fill I l i i I i l Illiili lililrill IIiilIil [I r I i ~I r i i i i i l i l l !I !li~Irl i i i r r r l r l I r ll i~c~~ . AACCGCCGCCCAUUUGUGAA~GCGUGCGACAUGAUCAUGUACCAGU . 591 CAGAUAGAUAUCUCGUCUCGCGAUGUACCCGGUUAUUCUUUCGAACCGUGGUCCCGAACGUCUGGAUUUG 708 UGUUUc&AUUUUAACGcACUUCGUGAGUCUUGCGCAGAGAAAACAUUCUCUCAC•ACUAUGUUAUCGA•GUCUCGAUGGUGUCAUU•AUAACGCUA•GCU•UUAUCAAAUUUGGGACCA [1[11 I/lliJ|l I1 I {ll Illll I/ll II III 11141 IJllllil I/ rlp[r[r[rlrJ Ir[llll II Ii IIJII II Illllll[ Ill 711 828 UUUUUG~A~CUG~cCAUUG~AU~CA~GA~c~GGA~CGA-~CG~G~UGAU~UAGU~UUG~GCGA~CUA~UGAUAGGGUUGAUGUA~AUGUAG~UCAAG~CG~G~c~ [[ilii iIl[ li i iii lil lllil FI ill |Ill I[ ttt It II Illl[lllltlll(l( (I IIllllll II IIII II 831 UuuNGGUACCCGU~GUGUC~JA-UG~ut~VGUC.c~CGC¢AA£~UCGCGAUUC?¢CGGAUUUA~C¢GUGCUACUGAU¢GUGUUc~UAU¢~UUNGUUC~UCCAVVVGVGA
UGUUUVGAUUUVAAUGCUUUAAAG~AUCGUGCGCUGAGAGGACCUUCGCUGAV~UUAUGUUAUUGAAGGUUVAGAUGGUGUUGVUGAUAAUGCGACUCVGVUGV
947 UAccAc~c~Gcc~A~CAuGGU~cUA~c~u~uU~uUU~AuCAAGuCUUUGuGGA~U~UGCUc~UUAuuc~A~uGA~AUGGAU~ACGUU~GGuUA~Gu~ccc~uUUAGuAGC~A i[lliililili iii Jill ilill[ Ir1111111] I1 II t] Ill tl lillIT Ir i111111 rill lit Ii ii rill i lllii 950 CUC~CU~G~C~ACUCA~AGUAAUUACGACc~C~CUUUUCA~C~GUGUUCGUCGAAAGUGCAc~CUAUU~UAUAGAUCUGGAU~AUGU~AGACU~C~CAGU~Uf~UCUUA~NC~ 1067 ~UU~CG~GGU~G~A~AUG~UA~CGGU~UUAAA~A~CGGGAGUG~UCAC~AGAGAGUAGGUACUACG~.GGAGGU~UUAA~AG~UA~G~GAAA~GCU~GU~CC~GAG~ IIIill I1[ II[li[ll[ Ill[ill Jill l i[[l[I llill[rlrr[rr ii]ll[lllllll[rrrllll iiiirll IllillI II II iii1[ 1070 AAUUccA~UUcA~GGcAUAUc~UACCGGUucUG~cAC~GG~GCGGUcAc~AGAc~GuAGGUA¢AAcG~GGAGGUCCuuAcAGC~UU~G~A¢GUAAUGCUc~CGUUcCAc~G¢U . UUUUCA 1187 UGGUGACUCUGUCAAUCUGUCACGCCUGAGCAAAGCAGUAGCCGAACGGUUUCGUCUCUCGUAUAUGAAUGUUGACGCUUUGGCUAAAAGUAACUUUGuUAAUGUUGUCAGUAA Illll II II I f ) i I I l I Jill II I{ I II I l l II (I fill II II I I I I I I I I I I I I ( ( IIill II llIil ~c~l A 1190 AGGUGAUUcCGUUAAUUUGUCUAGAUUGAGUAAA~CUGUGGCUGAGA~AUUCUUCAUUUCAUACAUCAAUGGUAACUCUCUAGCAUCCAGUAACUUUGUCAAUGU£GUUAGUAA~
1307 CGCUuACAUGCAAAAAUGGCCAUCUU•UGGA•UUUCUUAUGAUGAUCUUCCUGAUCUUCACGCGGAGAAUUUACAGUUUUAUGAUCACAUGAUUAAGUCCGAUGUUAAGCCAGUUGUCAC I[ Illllll IIIFIIII [r ( I I I II l]JI l]lill lJriIlr ri Jill iilliii llr[ilii II llllllli il rl Ii tl I 1310 CGAUUAcAUGGAAAAAUGG~AAGUCcUCAGGUCUUU~UUAUGAUGAUCUUCCGGAUCUUCAUGCUGAGAAUUUGCAGUUUUAUGACCACAUGAUAAAAUCCGAUGUGAAACCUGUGGUGAG 1427 UGACACGUUGAACGUCGACAGACCUGUCCCAGCUACUAUUACAUUUCACAAAAAGACCAUAACAUCCCAGUUCUCACCGUUGUUUAUAUCUCUGUUUGAGAGAUUCCAGAGAUGCCUUCG II{II ( II Illlllllli II illl{ II I{I li III ilili II iiil II { I ii i! iii i llllilllllllllllI 1430 CGACACA~UCAAUAUCGACAGAcCGGUUCCAGCUACUAUAACGUAUCAUAAGAAGAGUAUAACCUCCCAGUUCUCACCGUUAUUCACAGCGCUAUUC~GCGCUUCCAGA~GAUGCCUU£G 1547 GGAACGUGUUGUUCUGCCCGUUGGUAAGAUUUCCUCCCUUGAGAUGACUGGUUUUUCAGUCUUGAACAAACAUUGUCUUGAAAUUGAUUUAUCUAAAUUCGAcAAAUCUCAAGGUGAGUU lilIII I[ IIII II llIiFrII[rIIll Illl i r Ir [[ JlJ il I I i l III I ilIiJ II li II IlIIiillIll II 1550 AGAACGUAUUAUUCUUCCUGUUGGUAAGAUUUCAUCCCUUGAGAUGGCAGGAUUUGAUGUCAAGA~CAAGCACUGCCUCGA~AUUGACCUGUCUAAGUUUGAUAAGUCUCAAGGUGAAUU 1667 CCACcUUAUGAUUCAAGAGCACAUUCUCAACGAUCUUGGUUGCCCAGCACCCAUCACCAAAUGGUGGUGUGACUUUCAUAGGUUUUCCUAUAUUAAAGACAAACGUGCUGGGGUUGGAAU Ill I I {{ II II IIIlll { {{ {ll& {I {II II L II ii I{ Ill { {{ IIIl Illl { If!If {l{II { 1670 UCACUUGCUAA~CCAGGAACACAUUUUGAAUGGUCUAGGAUGUCCAGCUCCGAUAACUAAGUGGUGGUGUGAUUUCCAUCGAUUCUCUUACAUUAGAGACCGUAGAGCUGGUGUUGGUAU 1787 GCc•AUCAGCUUUCAACGCCGUACUGGUGACG•UUUCACCUAUUUUGGCAAUACUAUUGUCACGAUGGCUGAGUUUGCUUGGUGUUAUGACACUGAUCAAUUUGACCGAUUGCUCUUUUC III II II II TI ill rllll il TI IIil II[lillli[i I [I!i li I I [ ii ll]llr i[ lili[ li I I II II 1790 GCCUAUUAGUUUCCAGAGACGAACUGGCGAUGCACUCACUUAUUUUGGCAAUACCAUCGUCACCAUGGCUGAGUUUGCCUGGUGUUAUGACACCGACCAA~UUCGAAAAG?UUUAUUCUC 1907 UGGUGAUGAUUCUCUGGCcUUUUCUAAGCUCCCACCUGUUGGAGAUCCCAGUAAGUUUACAACUUUAUUCAACAUGGAAGCUAAGGUGAUGGAACCUG•AGUACCUUAUAUCUGUUCGAA Ii Illllllilli I rIIll 111 ir [l{ilili ii 11 [I i ii[i II [iii II iiIiIIliliii lllli lilIl IiiIllll I! CCCCCUGUUGGUGACCCGAGUAAAUUCACAAC~CUUUUCAACAUGGAAGCUAAGGUGAUGGAACCUGCCGUACCAUAUAUUVG~UCGAA 1910 AGGC~AVGA?UCUCUAGGAUUUUCACUGCU G~. U ACUC~C~U~GUUUGGUAACACG~UU~AGUCCCCC~CCA-U~cGUgAAAUCCAGCGGUUAGGUACCAAGAAG~UACCGUAUUCGG ACAACAAC&AU~UCUU&~VCgCUCA 2027 llllilllt[llIl[ ltllr Irml]lll iJil[ II rJ ,/ ill IllJ II i j[illilllri ~l lilll [r it [iJ/I !1111 Ill ~trl~r~ ~ilr GUu~uACUcucuGAc~GUUcGG~. ~ CAcAUUUUC¢GUUC¢AGAUcCAuU~G¢GAGGUUCA~cGGUUAGG~CAAAGAAAAUUC~¢UAUU¢UGAcAAUGAuG~uU~UUGUUUgcucA 2030 2144 u~uAuc~G~uu~G~u~A~AGG~G~uA~GGAcA~c~G~cG~AG~c~GuA~GA~AA~G~c~A~U~U~GAG~AAuA~AAGAAGu~GGAAcGAAG~Gccc~AGu ii liilr iiilllll I rllliiil iiiii r iIrii i1[11 [r [jill! I it iliJ] illl ii [l[!llliirll [][ i ir li li fl i 2150 ~uu~A~c~G~uuuG~GA~cGAu~G~G~uGGA~GAA~G~AG~UG~A~GA~c~cuu~cGAu~uc~GA~GAAA~AcAAG~GU~uGGG~AAGAGGc~G~uuuAA~ 2264 ~uuGGGcgcG~G~GUAcAccGc~AA~AAcGcc~AU~GAG~A~A~ucuGAc~AA~AA~GAc~a~AcG~G~A~A~cuaAG~u~GGGuGA~ il illil illililiiri irilllillrl r llll II ii ii r! ii [l[ll II !1 !1 li I[[[I II!!lilll i rli[rr Jill il 2270 G~AGGcGCCuUuAA~^GUAUACcGcUAAUUUCcAGu¢CuAc~AG~cUcUAuUAUucA~UCGUCGU¢AGUGC~AUUcaU¢~UUCGUUUuGUAauA¢AGAGUUCAGGGUU~AGcG u~G~A~uAccG~AAGAA~AAGAAGAA~G~GuG~GA~AG~c~Ac~G~GA~u~cA~uAG~G~AG~AGG~GAAA~G~GAAGAcG~GGuc~CA~G~cA 2384 I (I II I( (( iiiiii i illl iill iii ill I i ii Iii Iiiii I II iiI }1}11 iii i illllllllllllll 2390 UG~UUcC~C~¢AGcGA~GAAUUAuGG~uUG~.cGu~GGUGC~A¢~ACGUCG~¢UCCAAcUGGcU¢GUAuGGUGGAG~¢GAAc~A~cAGA~cGAAGGUcu¢AC~Ac ..... 2501 ~caGc~G~GGGGUUA~A~G~AG~GAaA~cG~caUCUAUAsCaAGA~CA~UGUUA~cAuU~A~GaAa~AGAU~G~G~AuuG---GU~c~G i[II ill i lil illl[iillllrriii[[lll//ll tll illll !i]] I l/it lll l lie ill ii[l[I i liF 2510 Ac`~uCc~cGG.~c~GGUcAcAA~GucccAG~GAGAGAGcGcauu~AAAucucAGA?AuuccG~uUccuAccGuuCuAuc~GuGGAuGauu~a.~UGAc~GGGuc~Gec~cc . . . . . GA--AUC .... UUUUGAUGAUACUGAUUGGUUUGCUGGU~CG~UGGGCC~GGGUCGUUUU 2612 AUGU---CGU-UC---GCUCUCCGUCCGOUACCAGCCUUGUUU---CUUAU TIll I[[ i I i lii i I il lili Illi I irlIti i[tii iliillii ri lliIlll[t[llli ilill I ....... u 2630 AuGuG~uGG~G.~GuuA~ccGAG~c~GAGGccucucGu~uAGAG~AU~GGcG~c~ccA~GA~u~GA~c~ACAGAUuGGUU~ccGGUAAcG~UGGGCGGAAGG 2711 ~UUu~c~¢¢CUucGucGUcC~GAcGUUAAAcUACGCUCUCUUUAuuGCc~GUGcUGAGUUGGuAAGuUuGcU~uA~CuAuCuG~GuCGCUA~u¢CAuUAcuG~U~CG~CGG I i)l[ II I ii I Illl i I ii ii I ( { I I III I llllllll I IIIII Illlll lilllliii lil{I Illl liil 11111( J~CACAU~-GUGG~---r-~C~' 2743 GCUUUCUG~.CCUCCCCUUCCG----CAVCUCCCUCCr~U-UUUCUGUc~CGGc~GCUGA~UUGGCAGUAUUGCUAUAAACUGUCUG~GVCACUA UVVCVUCGG~C~GCVUCGVG CUAGA 2831 GUUGUCCAUCCAGCUUACGGCUA~AUGGUCAGUAUGCCCCA.~GGCAGUGCCG-ACACCUACAGGGUUGUCc~GCUACCCUUc~AAUCAUCUC tillllllllllllllllllllllllllllllll il I ! t ili ]lli I J I Ill I1[ I[rl!l ilrrrlllll itr[llilrlll ll~L U a~. G~cV 2854 gUU~Uc~UccAG¢VUACGGcU~VGGUcAGUCGVAGA~GAAU¢UACGC¢AG¢AGA¢uUA¢AAGUCUcU-GAGG¢A¢cUUVG~AC¢AUCUc¢UAGGUUU¢VVcGG~. 2950 AG~GUG~GGuAAUACA~UGAUAU~AC~A~GUGCGGGUAUCGC~G~GGUUCU~CACAGGUUcUc~AUAAGGAGACCA 3035 I ~ I I I I II II ili Iil II lilli I II II i il iilll lililllili 2973 CCGUGUACUUCVA~GCACAAC~GUGCUAGUUUC~AGGGUACGGGUGCCCCCCACUUUUGU~GGGGCCU~CAAAAGGAGACCA3050
CMVRNA
(a)
1783
2 sequence
.A 1 MISP~P~F~FANLLNGSYGVDT~EEvERvRpEQ~EDAEAALRNY~PLPAVDV~ESV~R[~'T~'~AA~v~SvDDAFvSFGAE~YLEMSPSELLSAFE q 11011 lllllllllqllllolllol Ill]ol II Illolllillll[ll II I Ill ol I IIoolllllEl I Ill lit
1 MA~APAFS~A~N~SYGVDTPE~VER~R~EQREEAAAACRNYRPL~Av~VSE~VTE[~-AHSLRTPD~aAPAEAVS~EFvTYGAE~Y~EK~DDELLVA~E . . . . . B . . . . .
101 LMVKPLRvGEVLCSSFD~SLFISSVAMARTLLLAPLTSTRTLKRFEDLVAA~LKTD~DD~PQTDV~Q~DvPGYIFE~GQ~SSGFE~P~AK rllI°loF 0 I °1 ° IIIIolIPlolIPIP fl rrol ][lIrlllp Iopl~ II I p lol lllll Ill oolllllllll
100 TMVK~R~GQLW¢PAFNK~SF[SSIAMARALLLA~RTSHRTMKCFEDLVAA~YTKSD~AD~AQ~D~SSRDVPGYSFEPWSRTSGFEPPP~CEA . . . . . C " 197 CD~LYQCPCFD~ALRESCAEKTFSH~YVIEG~DGV~A~LLSN~pFLLPv~CS~'~-~E~'~VV~P~ARPTDRv~VHVVQAV~DT
IIololllllll]l[Io
IIirollo
Illllll]]lollll]llrlII]Iorlo]
Irl
I I o ]lllllllo
olloollo
200 ~M~MYQCPC~A~KKSCAE~T~AD~YY~EG~vv~NAT~L~N~GPFLVPVKC~YEKCPTP~IA~PP~LNRA~Rv~INLVQSIC~S
(b)
TLPTHGNYDD IIIII II11 TLPTHSNYDD
287 290
297 ~FH~vFvD~ADY~T~MDHvRLRQ~DLVAKIPDGGHM~PVLNTG~GHQR~GTTKE~LTArKKR~A~VPELGD~VNL~RL~KAVAERFRL~y~N~DALAK~ Ill[Itlolllll Iolllllll[l[olllll [llollllllll] Irllll[lllllllllll[lllrlllITllltllllll[ ollol oil II 300 SFHQvFVESADYSIDLDH•RLRQ•DLIAK••DSGHMIP•LNTG•GHKRVGTTKEVLTA•KKRNADVPELGDSvNLSRLSKAVAERFFISYINGNSLASSN
397 F•NV••NFHAYMQKW•••GLSYDDL•DLHAENLQF•DHMIK•DVK•v•TDTLN••R•V•ATITFHKKTIT•QF••LF••LFERFQR•LRERV•L••GKI• IIrll[lll II 11 IIlIIIl]l[lllllll]ltlIlIlll]lllloll]lollllllillolllollllllll[ olllll[llIIIIoolllllll 400 FVNvV•NFHDYMEKWKS•GL•YDDL•DLHAENLQFYDHMIK•VK•v•TLN•D•v•AT•TYHKK••QF•LFTALFERFQR•LRER•L•VGKI• 497 ~LEMTGF~NKHC~EIDL~KFDK~QGEFH~MIQEHILNDLGc~A~ITK~cDFHRFSYIKDKRAG~GM~I~FQRRTG~AFTYFGNTI~TMAEFAWCYD~ Jill°l[ II(/(l(Illlltltlllllllotltllll Illlttl/l/llllllllllololllllllll11111111 I//111/115511111111 500 SLEMAGFDVKNKHCLEI DLSKFDKSQGEFHLLIQEHILNGLGCPAPITKWWCDFHRFSYI RDRRAGVGMPISFQRRTGDALTYFGNTIVTMAEFAWCYDT
597 DQFDRLLFSGDDSLAFSKLPPVGDPSKFTTLFNMEAKVMEPAVPYICSKF
llloolllllIlll
II IlllIlllIIIIIlllllllIIIIllIllIIl
600 DQFEKLLFSGDDSLGFSLLPPVGDPSKFTTLFNMEAKVMEPAVPYICSKF
(c)
"D
647 650
696
.
.
.
.
.
.
~SLMSLVTRFQSPT-~REIQRLGTKKIPYSDNNDFLFAHFMSFVDRLKFM | I I ~llolIIIllIIIIIll olllIrIl(]lIIlllo ~LSD . EFGNTFSVPDPI.~EVQRLGTKKIPYSDND~FLFAHFMSFVORLKFL
DRM~QSC~DQLS~FFELKYKKSGNEAAL~LGAFKKYTANFNAYKELYYSDRQQCDLVNTFC~SEFRV~RRTTVKKKKNGCVD~DRRPPLSQFAGGE IlIllllllllllllllIlllll
Illlollllllllllloolllllllll
IIoloIoll
ollll
I o I oil
oo[
I Ill
o Ill
700 DRM~Q~C~DQLS~FFELKYKK~GEEAAL~LGAFKK~TANFQS~KELYY~DRRQCELINSF~STEFRvERVNSNKQRKNYG~EC'~-~)KRRTpTGSYGGGE
• F
795 TSKTKVSRQKPASEGLQKSQRESAIYSETFPDVTIPRSTS~E'~'4........... o IIII o IIIIIIII I I I Io I I1 I
839
800 EAETKVSQTESTGTRSQKSQRESAFKSQT!PLPTVLSSGWTF~,~PPCERGGVTRV 857
Fig. 4. Alignment of the translation products of Q-CMV RNA 2 (top line) and Fny-CMV RNA 2 (bottom line). The translation products are divided into three regions: (a) amino-terminal, (b) central and (c) carboxy-terminal (see Table 1). Identical amino acids are indicated by vertical lines; chemically similar amino acids [as described by Dayhoff et al. (1972)] are indicated by circles between lines; gaps are denoted by dashes. Amino acid sequences encoded by frameshifted segments A to F (see Fig. 3) are boxed. gels and base pairing according to the rules of Tinoco et al. (1971). These Q-CMV R N A 2 configurations and the corresponding Fny-CMV R N A 2 secondary structures are illustrated in Fig. 6. Although the sequences of Q-CMV R N A 2 and Fny-CMV R N A 2 comprising these configurations have only 6 2 ~ homology, the similarities between the viruses in both secondary structures are very striking. Rietveld et al. (1983) and Joshi et al. (1983) have structurally mapped the 3' ends of BMV R N A 2 by chemical modifications and digestions with specific nucleases. This has demonstrated that, in the absence of Mg 2+, structures very similar to those in Fig. 6 (a) and (c) predominate in solution (Rietveld et al., 1983). In the presence of Mg 2÷, hairpin A is melted out and the 3" termini take on an L shape (Fig. 6b, d), reminiscent of the classical secondary folding of t R N A (Kim, 1976)• The major difference between the 3' end secondary structures of F n y - C M V R N A 2 and Q-CMV R N A 2 lies in the length of stem D (Fig. 6b, d). However, in R N A from the related
Fig. 3. Alignment of the nucleotide sequences of Q-CMV RNA 2 (upper line of each pair) and FnyCMV RNA 2 (lower line of each pair). Identical nucleotides are indicated by vertical lines; gaps are denoted as dashes; the initiation and stop codons are underlined; frameshifted segments within coding regions are boxed and labelled A to F.
1784
T. M. RIZZO AND P. PALUKAITIS (a) 647 Y S L M S L V T R 2031-UA""~UCUCUGAUGAGUUUGGUAJ~CACGU
(b) 566 P I S F ~ R R 1787 C " ~ A " O ' C / ~ U ' ~ C A ' ~ " C C ~
IIltllllll lllll Illllllll IL II II TI II ill 2034UU..ACU..CUC._UGA_..CGA_.GUU._CGG._UAA__CAC.1791 ..AUCCUAUUAGUU.UCCA_.GAG_..A A 650 L
L
S
D
E
F
G
N
T
569 P
I
S
F
O
R
R
Fig. 5. The nature of variation between protein and nucleotide percentage sequence homologies. The top and bottom lines in both (a) and (b) are Q-CMV RNA 2 and Fny-CMV RNA 2 sequences, respectively. Numbers refer to either nucleotide (Fig. 3) or amino acid (Fig. 4) positions. Identical nucleotides are indicated by vertical lines; the dash denotes a nucleotide gap; codons are indicated by horizontal lines with the corresponding amino acid shown above or below these lines. (a) Part of frameshifted segment D (Fig. 3 and 4) with 86~ nucleotide sequence homology and 0~ amino acid sequence homology. (b) Part of the central region (Fig. 4b) with 62 ~ nucleotide sequence homology and 100~ amino acid sequence homology.
Table 1. Sequence homologies between nucleotide coding regions and encoded proteins of
Q-CMV RNA 2 and Fny-CMV RNA 2* Region ).
(
Sequence Nucleotide position Q-CMV RNA 2 Fny-CMV RNA 2 Amino acid position Q-CMV RNA 2 Fny-CMV RNA 2 Homology (~) Nucleotide coding region Protein sequence
Amino terminus
Central
Carboxy terminus
93450 87-953
951-2030 954-2033
2031-2609 2034~2657
1-286 1 289
287 646 290-6~9
647-839 650-857
68 64
77 89
66 56
*Homologies were calculated from three specific segments ofthe nucleotide coding regions and the encoded proteins shown in Fig. 3 and 4. tomato aspermy cucumovirus this stem is even longer, and in broad bean mottle bromovirus R N A it is absent (Joshi et al., 1983). Thus, while variation in stem D is not unexpected, it was necessary to rule out the possibility of a cloning artefact in the 3' end structure of F n y - C M V R N A 2, since reverse transcriptase has been reported to make errors when synthesizing c D N A from an R N A template (Battula & Loeb, 1974). Thus, two additional Fny-CMV R N A 2-specific c D N A clones (pFny250 and pFny251) from the Fny-CMV c D N A clone bank were identified by restriction enzyme analysis (results not shown). Since the inserts of pFny200, pFny250 and pFny251 were of different lengths, they were not sibling c D N A clones. The nucleotide sequence of a StuI-EcoRI c D N A fragment from pFny250 and pFny251 representing 0-4 kb of the 3' terminus of R N A 2 was determined, and revealed that the T-terminal sequences of all three cDNA clones were identical except in the primer region, i.e. in pFny250 and pFny251 two and seven T-terminal nucleotides, respectively, of the primer sequence were missing. Therefore, it is highly unlikely that the T-terminal sequence of Fny-CMV R N A 2 reported here is artefactual.
Non-coding 5' terminus of Fny-CMV RNA 2 The non-coding Y-terminal region of Fny-CMV R N A 2 contains 86 nucleotides, 69 of which are homologous to the 92 nucleotide, 5' non-coding region of Q-CMV R N A 2 (Fig. 3). Nucleotides at positions 8, 9, 20, 44, 48, 64, 65, 70 and 82 of F n y - C M V R N A 2 differ from the aligned sequences in Q-CMV R N A 2, but are identical to the corresponding nucleotides in Q-CMV R N A 1 (compare with Q-CMV R N A 2; Rezaian et al., 1985). This suggests that certain sequences are interchangeable between the 5' non-coding regions of C M V R N A s 1 and 2.
1785
C M V R N A 2 sequence (a)
G H
F
ACA
3'
(c)
cAA U G
I
G~
AGAGG A A dOd~,u /-~ U
180 C A C-G A A' OG C'G-140C A A C-G ~u u c---~ A U U'G
U'A
C'G
i
" C-G
G A-U
H
G-U A.U G.C UGGACAccUc Gc.ClO U.GCC.OAG.cG(~~ 0 G0 (~~,UU B A C'U U U'A-40 A'U
5t
{/uCUACU'~A A a OUC'G
c
E%
a
O A A.U
AC U A
C.G ,,~ U A c C A A A G U U U . G - 4 0 U ......
A
Et~
GGUUUC
C G G,C 100_G G A AG.C U'A A.U-80 A,U G'C A,U C.G U-A U'A A'U U.A C A-60 C-O A G CU G A
uG
C A ~
A
O'C U-A "f" ' " " A U'G 160-A,U C'G ~CUCCAA C'G U'A 1 4 0 - A"U ~" U GGGGUG U G.U G'C G-C /''~;'~A6 - B G,CAGUC.G C CA GC A-U C U G AG,C~ U~ C-180 A G-C U C'G l A'U 120-C-G
~CACG.cA
uu
3' I
F
GA
A A A A U
5'
AG
C.GUG
dA00O~
G
AC, u U
(~AU.GG.G
C A G GG A G,C G.C A'U C'G-80
U'A
A,U
G.C C-G A'U C G
U G UC
AA C
D
60
C
C
D
(b)
1o,o
(d) E
AG G C U U
uGcuuCGAG
-- G A 6 A A & 6 U 6 810 G
AG G C U U
c u .u u A.G/A .UCCUC GAAAUC A G G A G A C C A - - 3'
A D
/A--UN
U U
C G.C-G" o
1½0
A
O
U A
C.G" G A" 5' G'C-AA 4~U CGl Uc G.C C' AU U.A Gc'G" 20 A.U
A
CU
C
G
U"A G'CA UG.C .C G
D
GAAACC
AGGAGACC~--3
6
u
A
80
G
U
°G.c.~,.~ U-A40 G ,C-120
U
AAC
B
6'0 C
UuC
E 100/A~U~ c .u u u. G .Gp uCCU. C
CG@6
U'G A U'G" C C,G.uU ~ U'A A,U G-C C.G CA'U G
c.G-
A,U U'A A.U C A-f0
A UOCAGG
t
A ,,
C
c'G~ C C.c..C C'." G C~.G-2O ~c.G A.U c'G O UuU
B
Fig. 6. Secondary structures of the 3' termini of Q-CMV RNA 2 and Fny-CMV RNA 2. Comparable stem and loop structures are labelled A to H. Configurational representations of the 3' ends of Q-CMV RNA 2 (a and b) and Fny-CMV RNA 2 (c and d) are shown; (a) and (c) predominate in the absence of magnesium ions, while in the presence of magnesium ions, (b) and (d) are formed by the base pairing of the lined sequences (Rietveld et al., 1983). Nucleotides comprising stem and loop structures F, G and H are not shown in (b) and (d). Structure (a) is adapted from Ahlquist et al, (1981) with the permission of the authors and the publishers (copyright held by Cell Press), and structure (b) is taken from Joshi et al. (1983) with the permission of the authors and the publishers (copyright held by IRL Press).
The 5'-terminal nucleotide sequences conserved between Q-CMV RNAs 1 and 2 that are complementary to sequences of the satellite RNA of CMV (Rezaian et al., 1985) are also conserved in Fny-CMV RNA 2. However, whereas the satellite RNA was unable to anneal to and form a complex in vitro with Q-CMV RNA 2 (Rezaian & Symons, 1986), various satellite RNAs of CMV can form such complexes in vitro with RNA 2 of Fny-CMV (Garcia-Arenal & Palukaitis, 1987). The implications of such interactions in terms of changes in either the level of replication of CMV RNAs, the gene expression of these RNAs, or the pathology associated with the presence of satellite RNAs remain obscure. Limited variation in the central region
Table 1 shows that the central regions of the Q-CMV and Fny-CMV R N A 2 translation products have approximately 89 % sequence homology. However, of the 38 mismatched amino acids in this region, 20 reflect conservative changes (Fig. 4b), and 12 of the remaining 18 amino
1786
T, M. R I Z Z O AND P. P A L U K A I T I S
acid replacements also occur at the same position of the analogous proteins encoded by alfalfa mosaic virus, BMV and/or tobacco mosaic virus. Moreover, three of the remaining six amino acid replacements represent conservative changes with regard to one or more of the other plant viruses. Therefore, amino acid changes within the highly conserved central region do not occur at random, suggesting that a highly conserved function is associated with this domain and, thus, only certain amino acid substitutions are tolerated. The conserved amino acid sequence GlyAsp-Asp associated with viral replicase proteins (Kamer & Argos, 1984) is located at positions 609 to 611 of the central region of the Fny-CMV RNA 2 translation product. These data suggest that within a virus group a large number of amino acid sequences must remain conserved, presumably to foster interactions with other virus-encoded and host proteins involved in the replication process. It will be of interest to determine whether such conserved central regions also exist in the Q-CMV and Fny°CMV RNA 1 translation products. The authors thank Dr J. Owen for providing the Fny-CMV c D N A clone bank described here. This work was supported by grant no. 86-CRCR-l-1983 from the USDA CGO.
REFERENCES AHLQUIST, P., DASGUPTA,R. & KAESBERG,P, (1981). Near identity of 3' R N A secondary structure in bromoviruses and cucumber mosaic virus. Cell 23, 183-189. BATTULA,N. & LOEB,L. A. (1974). The infidelity of avian myeloblastosis virus deoxyribonucleic acid polymerase in polynucleotide replication. Journal of Biological Chemistry 249, 4086-4093. BIRNBOIM, H. C. & DOLY, J. (1979). Rapid alkaline extraction procedure for screening recombinant plasmid DNA. Nucleic Acids Research 7, 1513-1523. DAYHOFF, M. O., ECK, R. V. & PARK, C. M. (1972). A model of evolutionary change in proteins. In Atlas of Protein Sequence and Structure, vol. 5, pp. 89-99. Edited by M. O. Dayhoff. Washington, D.C. : National Biomedical Research Foundation. DEVERGNE, J. C. & CARDIN, L. (1975). Relations s6rologiques entre cucumoviruses (CMV, TAV, PSV). Annales de Phytopathologie 7, 255-276. DOUINE, L., QUIOT, J. B., MARCHOUX,G. & ARCHANGE,P. (1979). Recensement d'esp6ces v6g&ales sensibles au virus de la mosaique du concombre (CMV). l~tude bibliographique. Annales de Phytopathologie 11, 439-475. EDWARDS,M. C., GONSALVES,D. & PROWlDENTI, R. (1983). Genetic analysis of cucumber mosaic virus in relation to host resistance: location of determinants for pathogenicity to certain legumes and Lactuca satigna. Phytopathology 73, 269-273. FEINBERG,A. P. & VOGELSTEIN,B. (1983). A technique for radiolabeling D N A restriction endonuclease fragments to high specific activity. Analytical Biochemistry 132, 6-13. GARelA-ARENAL,r. & P~d~UKAITtS,P. (1987). Interaction of CMV-satellite R N A s with R N A s of helper and nonhelper viruses. In Abstracts of the VI1 International Congress of Virology, p. 299. Ottawa: National Research Council. GARCIA-ARENAL,F., ZAITLIN, M. & PALUKAITIS,P. (1987). Nucleotide sequence analysis of six satellite R N A s of cucumber mosaic virus: primary sequence and secondary structure alterations do not correlate with differences in pathogenicity. Virology 158, 339-347. GONDA,T. J. & SYMOr~S,R. H. (1978). The use of hybridization analysis with complementary D N A to determine the R N A sequence homology between strains of plant viruses: its application to several strains of cucumoviruses. Virology 88, 361-370. GOULD, A. R. & SYMONS,R. H. (1982). Cucumber mosaic virus R N A 3. Determination of the nucleotide sequence provides the amino acid sequences of protein 3A and viral coat protein. European Journal of Biochemistry 126, 217-226. GUBLER, U. & HOFFMAN,B. J. (1983). A simple and very efficient method for generating c D N A libraries. Gene 25, 263-269. I-IASELOFF,J. & SYMONS,R. H. (1981). Chrysanthemum stunt viroid: primary sequence and secondary structure. Nucleic Acids Research 9, 2741-2752. JOSHI, R. L., JOSHI, S., CrIAPEVlLLE, F. & l~Er,rNi, A. L. (1983). tRNA-like structures of plant viral RNAs: conformational requirements for adenylation and aminoacylation. EMBO Journal 2, 1123-1127. KAMER, G. & ARGOS,V. (1984). Primary structural comparison of RNA-dependent R N A polymerases from plant, animal and bacterial viruses. Nucleic Acids Research 12, 7269-7282. KAPER, J. M. & WATERWORTH,H. E. (1981 ). Cucumoviruses. In Handbook of Plant Virus Infections and Comparative Diagnosis, pp. 257-332. Edited by E. Kurstak. New York: Elsevier/North-Holland Biomedical Press. KIM, s.-rl. (1976). Three-dimensional structure of transfer RNA. Progress in Nucleic Acid Research and Molecular Biology 17, 181-216. LISS, L. R. (1987). New M13 host: DH5~F' competent cells. Focus 9(3), 13. Gaithersburg: Bethesda Research Laboratories.
C M V R N A 2 sequence
1787
MANIATIS, T., rRITSCH, E. F. & SAMBROOK,J. (1982). Molecular Cloning: A Laboratory Manual. New York: Cold Spring Harbor Laboratory. MESSING, J. (1979). A multipurpose cloning system based on the single-stranded D N A bacteriophage M13. Recombinant DNA Technical Bulletin. Bethesda: N I H Publication No. 79-99, 2(2), 43-48. MESSING, J. (1983). New M13 vectors for cloning. Methods in Enzymology 101, 20-78. MIZUSAWA,S., NISHIMURA,S. & SEELA, V. (1986). Improvement of the dideoxy chain termination method of D N A sequencing by use of deoxy-7-deazaguanosine triphosphate in place of d G T P . Nucleic Acids Research 14, 1319-1324. NORRANDER, J., KEMPE, T. & MESSING, J. (1983). Construction o f improved M I 3 vectors using oligonucleotidedirected mutagenesis. Gene 26, 101 106. PALUKAITIS,P. (1984). Detection and characterization of subgenomic R N A in plant viruses. Methods in Virology7, 259-317. PALUKAITIS,P. (1986). Preparation and use of c D N A probes for detection of viral genomes. Methods in Enzymology 118, 723-742. PALUKAITIS, P. & ZAITLIN, M. (1984). Satellite R N A s of cucumber mosaic virus: characterization of two new satellites. Virology 132, 426-435. PEDEN, K. W. C. & SYMONS,R. H. (1973). C u c u m b e r mosaic virus contains a functionally divided genome. Virology 53, 487-492. PIAZZOLLA,P., DIAZ-RUIZ,L R. & KAPER, J'. M. (1979). Nucleic acid homologies of eighteen cucumber mosaic virus isolates determined by competition hybridization. Journal of General Virology 45, 361-369. PONCZ, M., SOLOWlE$CZYK,D., BALLANTINE,M., SCHWARTZ,E. & SURREY, S. (1982). " N o n r a n d o m " D N A sequence analysis in bacteriophage M13 by the dideoxy chain-termination method. Proceedings of the National Academy of Sciences, U.S.A. 79, 4298-4302. gxO, A. L. N. & rRAYCKI, R. I. B. (1982). Distribution of determinants for s y m p t o m production and host range on the three R N A components of cucumber mosaic virus. Journal of General Virology 61, 197-205. REZAIAN, M. A. & SYMONS,R. H. (1986). Anti-sense regions in satellite R N A of cucumber mosaic virus form stable complexes with the viral coat protein gene. Nucleic Acids Research 14, 3229-3239. REZAIAN, M. A., WILLIAMS,R. H. V., GOULD, A. R. & SYMONS,R. H. (1984). Nucleotide sequence of cucumber-mosaicvirus R N A 2 reveals a translation product significantly homologous to corrresponding proteins of other viruses. European Journal of Biochemistry 143, 277-284. REZAIAN, M. A., WILLIAMS,R. H. V. & SYMONS,R. H. (1985). Nucleotide sequence of cucumber mosaic virus R N A 1. Presence of a sequence complementary to part of the viral satellite R N A and homologies with other viral R N A s . European Journal of Biochemistry 150, 331-339. RmTVELD, K., I'LEIJ, C. W. A. & BOSCh, L. (1983). Three-dimensional models of the tRNA-like 3' termini of some plant viral R N A s . EMBO Journal 2, 1079-1085. SANGER, F., NICKLEN, S. & COULSON,A. R. (1977). D N A sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, U.S.A. 74, 5463-5467. SANGER, F., COULSON, A. R., BARRELL, B. G., SMITH, A. J. H. & ROE, B. A. (1980). Cloning in single-stranded bacteriophage as an aid to rapid D N A sequencing. Journal of Molecular Biology 143, 161-178. SCHWINGHAMER,M. W. & SYMONS,R. H. (1975). Fractionation of cucumber mosaic virus R N A and its translation in a wheat embryo cell-free system. Virology63, 252-262. TINOCO, I., UHLENBECK,O. C. & LEVINE, M. D. (1971). Estimation of secondary structure in ribonucleic acids. Nature, London 230, 362-367. YANISCH-PERRON, C., VIEIRA, J. & MESSING, J. (1985). Improved M13 phage cloning vectors and host strains: nucleotide sequences of the M13 m p l 8 and p U C 1 9 vectors. Gene 33, 103-119.
(Received 27 January 1988)