Journal of General Virology (1999), 80, 245–254. Printed in Great Britain ...................................................................................................................................................................................................................................................................................
Comparative sequence analysis and predictions for the envelope glycoproteins of foamy viruses George Wang2 and Mark J. Mulligan1, 2 Departments of Medicine1 and Microbiology2, University of Alabama at Birmingham, 845 19th Street South, BBRB 220, Birmingham, AL 35294-2170, USA
The foamy viruses (FVs) are a genus of complex retroviruses that has recently been found to possess several novel molecular features. There is increasing interest in the development of FVs as novel vectors for gene delivery. As there are remarkably few published studies of FV proteins, these recent findings prompted us to predict the structural features of FV glycoproteins with the aid of computer programs. We analysed all seven available FV Env sequences, a greater number of sequences than in previously published analyses. The relative rates of change for FV structural proteins were Pol Env Gag in increasing order, which differs from all other retroviruses. We determined that this difference is primarily caused by a higher relative rate of change for FV Gag proteins. We analysed the functional domains of FV glycoproteins and found that their structural organization was generally similar to other retroviruses. Putative structures were identified for the signal peptide, cleavage site, fusion peptide, membrane-spanning domain and the unique endoplasmic reticulum retrieval signal. Based on the predicted secondary structure of the transmembrane glycoprotein (TM) subunit, gp47, we also identified a unique prolonged central ‘ sheets and loops ’ region as the dominant feature of an unusually lengthy TM ectodomain. This lengthy central domain was flanked at each end by α-helices. The predictions reported here will stimulate and facilitate experimental approaches to better understand the structure and function of FV glycoproteins, and should assist in the planning and development of FV vectors.
Introduction Foamy viruses (FV) are ubiquitous in bovines, felines and non-human primates, but are not associated with any diseases (Loh, 1993 ; Schweizer et al., 1995). Zoonotic infections of humans exposed to primates occur without any recognized clinical sequelae (Schweizer et al., 1995, 1997 ; Goepfert et al., 1996 ; Neumann-Haefelin & Schweizer, 1997 ; Weiss, 1998 ; Heneine et al., 1998). While they are the least studied genus of retroviruses, FVs have recently become the subject of increasing interest owing to several unique biological features and for their potential use as vectors for gene replacement therapy. For example, FV Pol proteins are translated from a spliced mRNA instead of as Gag–Pol precursors (Enssle et al., 1996 ; Lo$ chelt & Flu$ gel, 1996 ; Yu et al., 1996) ; nascent Gag proteins are transiently transported into the nucleus (Schliephake & Rethwilm, 1994) ; and the functional nucleic acid of FV extracellular particles is linear double-stranded DNA (Moebes et al., 1997). Relatively little has been published Author for correspondence : Mark J. Mulligan. Fax j1 205 975 6027. e-mail mulligan!uab.edu
0001-5789 # 1999 SGM
on the structure or functions of FV glycoproteins. We recently reported that FV glycoproteins possess an endoplasmic reticulum (ER) retrieval signal (Goepfert et al., 1995) that localized the human FV (HFV) glycoprotein to the ER (Goepfert et al., 1997), a novel finding relative to all other retroviral glycoproteins. In the present study, we used computer-based methods (Livingstone & Barton, 1996 ; Rost & Sander, 1993 ; Russell & Sternberg, 1995 ; Gatot et al., 1998) to perform updated sequence analyses and predictions based on the seven available sequences – a greater number than has previously been analysed – of FV glycoproteins in comparison to glycoproteins from other virus families. The results obtained in this work will stimulate and guide future molecular studies of FV glycoproteins.
Methods
The protein sequences were obtained directly from SWISS-PROT or translated from the GenBank database. For the Env proteins of HFV (Flu$ gel et al., 1987) and simian FV type 1 (SFV-1) (Mergia et al., 1990), we used the first translational start codon (Kupiec et al., 1991) instead of the CEF
G. Wang and M. J. Mulligan second for two reasons : (1) the first ATG had a better Kozak context than the second (Kozak, 1989), and (2) the translated protein sequences were highly conserved. Sequence alignments and comparisons were made with programs in the Genetics Computer Group (GCG) package (Genetics Computer Group, 1997). Numbering of the sequences referred to the aligned amino acid sequence of FV Env including the signal peptide. Helical net (Lim, 1978) and wheel (Schiffer & Edmundson, 1967) analyses were made by using the Helnet and Wheel programs (Jones et al., 1992) run on VAX computers. Hydrophobicity indices (H.I.) were calculated with the normalized consensus scale of Eisenberg (1984). The signal peptide cleavage site prediction was made using SignalP, available on the Internet at http :\\www.cbs.dtu.dk\services\SignalP\ (Nielsen et al., 1997). The membrane-spanning domain, protein secondary structure and solvent accessibility predictions were made by using PredictProtein – PHDsec, PHDacc and PHDhtm – available on the Internet at http :\\www.emblheidelberg.de\predictprotein\predictprotein.html (Rost & Sander, 1994 a, b ; Rost et al., 1995 ; Rost, 1996). Hydrophobic cluster analysis (HCA) is available on the Internet at http :\\www.lmcp.jussieu.fr\ " soyer\www-hca\hca-form.html (Gaboriaud et al., 1987 ; LemesleVarloot et al., 1990 ; Callebaut et al., 1997). The sequence similarity search was performed using BLAST 2, available on the Internet at http :\\ www.ncbi.nlm.nih.gov\BLAST (Altschul et al., 1997).
Results FV Gag proteins are more variable than FV Env glycoproteins
We began our analysis by searching the NCBI nr database (Non-redundant GenBank CDS translationsjPDBjSWISSPROTjPIR) with BLAST 2 (Altschul et al., 1997) and the SWISS-PROT database with FastA (Pearson & Lipman, 1990) for proteins homologous to the HFV glycoprotein (Flu$ gel et al., 1987 ; Maurer et al., 1988). As expected, all the hits with statistically significant scores were FV glycoprotein sequences [the highest E value (least significant) was 2i10−#', which was a partial SFV-1 Env protein, with identity at 59 of 93 residues]. The FV glycoproteins identified were from SFV of chimpanzees (SFVcpz) (Herchenro$ der et al., 1994), SFV-1 (Mergia et al., 1990), SFV type 3 (SFV-3) (Renne et al., 1992), feline FV (FeFV) (Winkler et al., 1997), feline syncytial virus (FeSV) (C. R. Helps & D. A. Harbour, accession no. U85043) and bovine syncytial virus (BSV) (Renshaw & Casey, 1994 ; Holzschu et al., 1998). Among these seven protein sequences, the highest percentage identity was 90n1 (between HFV and SFVcpz), the lowest was 40n4 (between SFV-3 and FeFV), and the average was 52n9p16n0. We created a phylogenetic tree (Fig. 1), which prompted us to also perform searches for proteins homologous to either FeFV or BSV glycoproteins which had greater evolutionary distance from HFV. Similar results were obtained (not shown). This suggested that FVs are highly divergent from other virus families. Previous analyses of Pol protein sequences had suggested relatedness between FV and murine leukaemia virus (MuLV) (Lewe & Flugel, 1990 ; Renne et al., 1992 ; Dias et al., 1996), but this was not observed with the glycoprotein sequences (Pearson, 1996). CEG
Fig. 1. Evolutionary relationship of FV Env TM. The phylogenetic tree was generated by Growtree based on the evolutionary distances of the TM domain of the FV Env glycoproteins. The distances were generated by Distances, following alignments by Pileup. All three programs were part of the GCG package running on a UNIX operating system. This tree was identical to the one based on FV Pol protein sequences, but slightly different from those based on the Gag or entire Env proteins (data not shown). The sum of the horizontal distance from one strain to another represents their evolutionary distances. The vertical bars are for clarity only. The highest and lowest percentage identities of the FV Env TM were 98n6 and 49n0, respectively. The averagepstandard deviation was 60n8p15n7. Sources of the sequences are indicated in the text.
Table 1. Relative rates of change of Gag and Env proteins Relative rate of change is defined as the ratio of the evolutionary distance (calculated amino acid substitutions per 100 amino acids) for the indicated protein to that of the Pol protein. Each result was an average of 8 to 22 determinations, presented with standard deviation. Evolutionary distances were generated by the Distances program of the GCG package, following alignment by Pileup and arbitrary trimming of gapped regions. The Distances program used the formula of Kimura (1983) : distance l kln(1kDk0n2D#), where S l exact matches\positions scored, and D l 1kS.
Pol Gag MA CA NC Env
FV*
MuLV
PIV
1n0 2n3p0n2 2n5p0n5 1n9p0n2 3n0p0n7 1n8p0n1
1n0 1n6p0n1 2n6p0n2 0n6p0n1 0n7p0n2 2n0p0n1
1n0 1n2p0n3 1n3p0n4 0n8p0n2 1n3p0n3 2n0p0n7
* Delimitation of the fuctional domains of FV Gag followed Lo$ chelt & Flu$ gel (1995).
A general rule for retroviruses is that their Pol, Gag and Env proteins have increasing rates of change in that order (McClure et al., 1988). The results we obtained for MuLV and primate immunodeficiency viruses (PIVs) agreed with this rank order (Pol Gag Env) (Table 1). However, the relative rates of change for FVs were Pol Env Gag. The rate of change for Env proteins of FVs was 1n8, which was compatible with 2n0
Foamy virus glycoproteins
CEH
Fig. 2. FV glycoproteins possess the general features of a retroviral glycoprotein. (A) The glycoprotein of the prototypic HFV was used for the hydrophobicity plot, which was generated by Pepplot of the GCG package. Both Kyte–Doolittle (window size l 9 residues) and Goldman (window size l 20 residues) methods were used. (B) A similarity plot of the seven aligned primate FV Env protein sequences was generated by the Plotsimilarity program of the GCG package. (C) Schematic diagram of the overall structure of the FV glycoprotein precursor (gp130). Residues in the hydrophobic domain of the signal peptide, the RXK/RR cleavage signal, the conserved hydrophobic residues in the potential fusion peptide, the charged residues in the membrane-spanning domain, and the unique, conserved ER retrieval signal at the glycoprotein C terminus are in bold. Arrows indicate the predicted cleavage sites for the signal peptide and the SU/TM junction. The fusion peptide shows a typical i, ij3/4, ij7 pattern which is characteristic of amphipathic α-helices (Livingstone & Barton, 1996). The conserved residues and predicted helical content of the membrane-spanning domain are shown. Abbreviations : PHDhtm, helix transmembrane domain prediction ; H, predicted helical region ; Relhtm, estimated reliability of the helical prediction on a 0 to 9 scale.
G. Wang and M. J. Mulligan
Fig. 3. (A) Helical net (Lim, 1978) and (B) helical wheel (Schiffer & Edmundson, 1967) analyses of the potential fusion peptide of HFV. Bulky apolar amino acids are shaded. (C) Analyses of the fusion peptides of selected enveloped viruses. Hydrophobicity indices (H.I.) were calculated with the normalized consensus scale of Eisenberg (1984). These data were generated by the Helnet and Wheel programs (Jones et al., 1992). Fusion peptide length was based on the region yielding the maximal hydrophobic moment for each sequence.
for MuLV and PIVs. However, the rate of change for Gag proteins of FVs was 2n3, much higher than the 1n6 for MuLV and 1n2 for PIV. If one assumes that the rates of change for all the retroviral Pol proteins were constant, then one can infer that it was the high variability of the Gag of FVs that made the rank order of change in FV structural proteins different from that of other retroviruses. Further analysis of the relative rates of change within the functional domains of the Gag proteins revealed that the MA of FV was comparable to MuLV, with both being higher than the MA of PIVs (Table 1). However, the rates of change in both CA and NC of FVs were much higher than those of either MuLV or PIV. The relative rates of change when normalized to the corresponding reverse transcriptases (RT) (data not shown) were similar to the results obtained here with normalization to the entire Pol, but had larger standard deviations in general. FV Env glycoproteins possess the general features of retroviral glycoproteins
As seen in the hydrophobicity plot, two prominent hydrophobic regions were identified (Fig. 2 a). Analogous to other retroviral Env glycoproteins, the first of these was the CEI
hydrophobic domain of the signal peptide and the second the membrane-spanning domain (Figs 2 c and 4 a). A conserved RXK\RR subtilisin-like protease cleavage site was identified, in a position appropriate to divide the glycoprotein into its mature surface (SU ; gp80) and transmembrane (TM ; gp47) subunits (Figs 2 c and 4 a). We recently performed site-directed mutagenesis of this site within the HFV env which confirmed that this sequence is the FV glycoprotein cleavage site (K. L. Shaw, A. Bansal & M. J. Mulligan, unpublished results). The signal peptide was predicted to be cleaved between cysteine and phenylalanine by the SignalP program (Fig. 2 c) (Nielsen et al., 1997), which conformed to the k1 k3 (small and neutral) rule (Nothwehr & Gordon, 1990 ; Perlman & Halvorson, 1983 ; von Heijne, 1986). Potential fusion peptide. A prominent feature of the TM
subunits of retroviral Env glycoproteins is the fusion peptide (Gallaher, 1987). We identified a highly conserved segment downstream from the RXK\RR cleavage signal as a putative fusion peptide based on the criteria of White (1990, 1992) (Fig. 2 b). This putative fusion peptide was close to the TM protein N terminus, although not immediately N-terminal as for primate lentiviruses (Fig. 2 c). This region possessed a typical i,
Foamy virus glycoproteins
ij3\4, ij7 pattern (Fig. 2 c) which indicated it had a high potential to form an amphipathic α-helix structure (Livingstone & Barton, 1996) (Fig. 3 a, b). Interestingly, unlike other retroviruses, the FV fusion peptide did not appear as a prominent hydrophobic region in the hydrophobicity plot (Fig. 2 a). This was due to its low overall average hydrophobicity per residue, or hydrophobicity index (H.I.), which was the lowest among the viral fusion peptides we analysed (Fig. 3 c). However, in α-helix modelling, the fusion peptides of FVs have maintained a high H.I. at the hydrophobic face of the helix, with a score comparable to other viral fusion peptides (Fig. 3 c). Membrane-spanning domain (MSD). The second prominent hydrophobic domain near the C terminus was predicted to be the MSD by PHDhtm (Rost & Sander, 1995) (Fig. 2 c). This region is similar to the MSDs of HIV and SIV, which are punctuated by positively charged residues both within and immediately C-terminal to their proposed MSDs. Using sitedirected mutagenesis of HFV env, we recently confirmed experimentally that this putative MSD is necessary and sufficient for anchorage of the FV glycoprotein in cellular lipid bilayers (G. Wang & M. J. Mulligan, unpublished results). Taken together, these results indicated the topology of the FV glycoprotein is that of a type 1 membrane protein.
The unique central ‘ sheets and loops ’ region of FV TM proteins
The predictions for secondary structure and solvent accessibility for the TM (gp47) proteins of FVs were made with the Profile Neural Network System (Rost, 1996 ; Rost & Sander, 1994 a, b), and were based on multiple alignment of the seven FV glycoprotein sequences made by the Pileup program of GCG. These predictions revealed a division of three structural regions within TM between the fusion peptide and the MSD : a long N-terminal α-helix region immediately downstream from the fusion peptide, a second C-terminal αhelix region just upstream from the MSD, and a lengthy central region which consisted of β-sheets and loops (Fig. 4 a, b). A similar N-terminal α-helix domain was previously proposed to form the structural backbone of viral fusion glycoproteins (Bullough et al., 1994 ; Chambers et al., 1990 ; Gallaher et al., 1989 ; Wilson et al., 1981) whereas peptide homologues to the second α-helix domain of HIV showed a potent inhibitory effect on membrane fusion induced by the HIV-1 glycoprotein (Matthews et al., 1994). These two predicted α-helix domains in FVs appeared similar in position and length to the two αhelices observed in the solved structures of influenza virus HA2 and HIV-1 gp41 (Fig. 4 b). However, the long central ‘ sheets and loops ’ domain was unique and distinguished the TM proteins of FVs from those of all other retroviruses. This unusual long central region increased the length of the extracellular domain of the FV TM
to more than twice the length of that of any other retrovirus (Fig. 5).
Discussion Recent reports have described distinctive biological features of FVs which distinguish this retroviral genus from all others (Enssle et al., 1996 ; Goepfert et al., 1995, 1997 ; Lo$ chelt & Flu$ gel, 1996 ; Schliephake & Rethwilm, 1994 ; Yu et al., 1996 ; Moebes et al., 1997). We found that the relative rates of change for FV structural proteins were Pol Env Gag. That result stands in contrast to Pol Gag Env observed with other retroviruses, e.g. for MuLV and PIV. Our results were based on analyses of seven FV sequences, and agreed with an earlier observation based on three FV sequences (Renne et al., 1992). This inversion of retroviral Env and Gag conservation could be caused either by a high conservation of FV Env, a low conservation of FV Gag, or both. Our results suggested that it was primarily caused by the low conservation of FV Gag. This conclusion was based on the rate of change calculations which assumed that retroviral Pol proteins were relatively constant compared to Gag or Env, and allowed a comparison of Env and Gag proteins with their rate of change normalized to their Pol proteins. The retroviral enzymes encoded by Pol play a vital role throughout the replication cycle of the retroviruses, which imposes the most stringent functional and structural constraints on Pol. And indeed they were found to be the most conserved structural proteins of the retroviruses, and no obvious recombinational events were observed (McClure et al., 1988). The cause of the low conservation (or the high variability) of the FV Gag proteins is unknown. We speculated that this could be related to the separation of Gag and Pol translation in FVs, as opposed to translation of a Gag–Pol polyprotein in all the other retroviruses. In the latter case, co-translation of the structurally stringent Pol protein with Gag may impose constraints on Gag variation within the Gag–Pol polyprotein. Mutations within Gag might affect Pol and\or Gag–Pol functions in the Gag–Pol polyprotein as was reported for HIV1 (Huang et al., 1997). Despite their sequence divergence from other retroviruses, the FV glycoproteins were found to possess the general structural features observed in all retroviral glycoproteins. Another research group recently aligned five FV glycoproteins and made a similar observation (Holzschu et al., 1998). The putative FV fusion peptide is slightly internal relative to the glycoprotein cleavage site, similar to that of Rous sarcoma virus but different from most retroviruses. The FV fusion peptide had a high propensity to form α-helical structures. The amphipathicity of the fusion peptide α-helix was shown by the high hydrophobic moment as determined by the helical wheel analysis. Despite its low overall H.I. compared to other viral fusion peptides, the fusion peptide of FVs has maintained a high H.I. at the hydrophobic face in α-helix modelling. Among the fusion peptide parameters we analysed for the five viruses, CEJ
(A)
Fig. 4. (A) For legend see facing page.
G. Wang and M. J. Mulligan
CFA
Foamy virus glycoproteins
(B)
Fig. 4. (A) Prediction for secondary structure and solvent accessibility of the seven available FV TM glycoproteins. The predicted RXK/RR cleavage site between SU and TM is shown for clarity (bold). Abbreviations : AAF, amino acids numbered according to the aligned FV glycoprotein sequences ; PhDsec, secondary structure predictions (H l helix, E l extended sheet, blank l loop) ; P 3 acc, predicted relative solvent accessibility (e l exposed, b l buried, blank l intermediate). (B) Schematic diagrams of the HA2 glycoprotein of influenza A virus and the TM glycoproteins of HIV-1 and FV. Empty boxes represent fusion peptides ; hatched boxes, α-helices ; solid boxes, membrane-spanning domains. Horizontal lines between hatched boxes represent ‘ sheets and loops ’ regions. Conserved cysteines (c) and conserved potential N-linked glycosylation sites ( ) are shown. For simplicity, cytoplasmic domains are not shown. Predictions were made with the Profile Neural Network System (Rost & Sander, 1993, 1994 a, b). Twenty protein sequences were used to generate the diagram for influenza virus HA2 ; five were used for HIV-1 ; and seven were used for FV. The secondary structure predictions for influenza virus HA2 and for the HIV TM were consistent with their structures solved by crystallography (Bullough et al., 1994 ; Chan et al., 1997 ; Weissenhorn et al., 1997). Similar secondary structure predictions for FV TM were obtained with a second technique, hydrophobic cluster analysis (HCA) (data not shown) (Gatot et al., 1998). It is worth noticing that potential disulfide bonds exist as a result of the seven conserved cysteines in the central ‘ sheets & loops ’ region of the FV TM.
Fig. 5. Comparison of the glycoproteins of several retroviruses and influenza A virus. Horizontal lines and boxes represent mature polypeptide chains (signal peptides not shown) with their lengths drawn to scale. SU proteins are shown to the left of the cleavage site (arrow), TM proteins to the right. Empty boxes represent fusion peptides ; solid boxes represent membranespanning domains (MSD). The TM cytoplasmic domains are not drawn, but their lengths are indicated after the ‘ j ’ to the right of the solid boxes (MSD). The length of the TM protein extracellular domain is indicated in the column on the right. The HFV TM extracellular domain (376 aa) is more than twice the length of the others.
CFB
G. Wang and M. J. Mulligan
the high H.I. at the hydrophobic face was the only consensus we observed. It was reported that the hydrophobicity gradient and the oblique orientation of the fusion peptides were critical for their fusogenic activity (Brasseur et al., 1988, 1990). Mutagenesis that altered the theoretical angle of the fusion peptides decreased the fusion ability (Vone' che et al., 1992). Interestingly, fusion peptides had an unusual content of small amino acids, which may contribute to their conformational mobility to allow them to adopt multiple secondary structures (Callebaut et al., 1997 ; Durell et al., 1997). Mutagenesis studies of the FV putative fusion peptide are under way in our laboratory. It is worth noticing that the ER retrieval signal (lysine at k3, lysine or arginine at k4 or\and k5 relative to the C terminus) is conserved for all seven FV glycoproteins (Jackson et al., 1990 ; Shin et al., 1991 ; Goepfert et al., 1995). Studies of the significance of the ER retrieval signal for the FVs are under way in our laboratory (Goepfert et al., 1997). Based on the predicted secondary structure of the FV TM protein (gp47), we identified an unusually prolonged central ‘ sheets and loops ’ region that distinguished the FV TM from that of all other retroviruses. The resulting lengthy TM protein extracellular domain must be responsible for the distinctive appearance of FVs under the electron microscope where the glycoproteins appear as very prominent, regularly spaced, long spikes. The biological function provided to FVs by this distinctive glycoprotein structure is not known. All retroviral RTs are highly conserved and therefore were assumed to share a common evolutionary ancestry. Sequence divergence among the less conserved Env glycoproteins of divergent retrovirus families has occurred presumably due to antigenic escape driven by their hosts’ immune systems or due to adaptation to the host cellular receptor. Despite this sequence divergence, all retroviral TM glycoproteins share certain conserved features suggesting a common structural and functional organization (Coffin, 1986 ; Hunter & Swanstrom, 1990). The recently solved structures of the TM glycoproteins of MuLV (Fass & Kim, 1996) and HIV (Chan et al., 1997 ; Weissenhorn et al., 1997) revealed striking similarity to the previously solved influenza HA2 subunit (Bullough et al., 1994). It now seems likely that the TM proteins of all retroviruses share a similar three-dimensional core structure. This TM core structure is a trimer comprising N-terminal and C-terminal α-helices, similar to the α-helices we predicted for the TMs of FVs. Mutational analyses of specific domains within the unique TM proteins of FVs, particularly the lengthy, central ‘ sheets and loops ’ region and the flanking N- and Cterminal α-helices, will provide new insights into the structural features of retroviral glycoproteins. We wish to thank Drs William R. Pearson (University of Virginia) and Elliot Lefkowitz for their advice. The GCG programs package was made available by the UAB AIDS Center which is supported by U.S. Public Health Service grant P30 AI27767. We thank Drs Ewan Tytler and Jere CFC
Segrest for the Helnet and Wheel programs, and Alesia L. Hatten and Catherine Hardin for typing the manuscript. Sources of support : U.S. Public Health Service grants AI33784 and AI28147 ; and Cystic Fibrosis Foundation Research Development Grant R464.
References Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST : a
new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402. Brasseur, R., Cornet, B., Burny, A., Vandenbranden, M. & Ruysschert, J. M. (1988). Mode of insertion into a lipid membrane of the N-terminal
HIV gp41 peptide segment. AIDS Research and Human Retroviruses 4, 83–90. Brasseur, R., Vandenbranden, M., Cornet, B., Burny, A. & Ruysschert, J.-M. (1990). Orientation into the lipid bilayer of an asymmetric
amphipathic helical peptide located at the N-terminus of viral fusion proteins. Biochimica et Biophysica Acta 1029, 267–273. Bullough, P., Hughson, F., Skehel, J. & Wiley, D. (1994). Structure of influenza haemagglutinin at the pH of membrane fusion. Nature 371, 37–43. Callebaut, I., Labesse, G., Durand, P., Poupon, A., Canard, L., Chomilier, J., Henrissat, B. & Mornon, J. P. (1997). Deciphering protein
sequence information through hydrophobic cluster analysis (HCA) : current status and perspectives. Cellular and Molecular Life Sciences 53, 621–645. Chambers, P., Pringle, C. R. & Easton, A. J. (1990). Heptad repeat sequences are located adjacent to hydrophobic regions in several types of virus fusion glycoproteins. Journal of General Virology 71, 3075–3080. Chan, D. C., Fass, D., Berger, J. M. & Kim, P. S. (1997). Core structure of gp41 from the HIV envelope glycoprotein. Cell 89, 263–273. Coffin, J. (1986). Genetic variation in AIDS viruses. Cell 46, 1–4. Dias, H. W., Aboud, M. & Flu$ gel, R. M. (1996). Analysis of the phylogenetic placement of different spumaretroviral genes reveals complex pattern of foamy virus evolution. Virus Genes 11, 183–190. Durell, S. R., Martin, S., Ruysschaert, J.-M., Shai, Y. & Blumenthal, R. (1997). What studies of fusion peptides tell us about viral envelope
glycoprotein-mediated membrane fusion (Review). Molecular Membrane Biology 14, 97–112. Eisenberg, D. (1984). Three-dimensional structure of membrane and surface proteins. Annual Review of Biochemistry 53, 595–623. Enssle, J., Jordan, I., Maurer, B. & Rethwilm, A. (1996). Foamy virus reverse transcriptase is expressed independently from the Gag protein. Proceedings of the National Academy of Sciences, USA 93, 4137–4141. Fass, D. & Kim, P. S. (1996). Structure of Moloney murine virus envelope domain at 1n7 AH resolution. Nature Structural Biology 3, 365–369. Flu$ gel, R. M., Rethwilm, A., Maurer, B. & Darai, G. (1987). Nucleotide sequence analysis of the env gene and its flanking regions of the human spumaretrovirus reveals two novel genes. EMBO Journal 6, 2077–2084. Gaboriaud, C., Bissery, V., Benchetrit, T. & Mornon, J. P. (1987).
Hydrophobic cluster analysis : an efficient new way to compare and analyse amino acid sequences. FEBS Letters 224, 149–155. Gallaher, W. R. (1987). Detection of a fusion peptide sequence in the transmembrane protein of human immunodeficiency virus. Cell 50, 327–328. Gallaher, W., Ball, J., Garry, R. F., Griffin, M. C. & Montelaro, R. C. (1989). A general model for the transmemebrane protein of HIV and
other retroviruses. AIDS Research and Human Retroviruses 5, 431–440.
Foamy virus glycoproteins Gatot, J.-S., Callebaut, I., Gaboriaud, C., Mornon, J. P., Portetelle, D., Burny, A., Kerkhofs, P., Kettmann, R. & Willems, L. (1998). Con-
servative mutations in the immunosuppressive region of the bovine leukemia virus transmembrane protein affect fusion but not infectivity in vivo. Journal of Biological Chemistry 273, 12870–12880. Genetics Computer Group (1997). Program Manual for the Wisconsin Package, Version 9.1. 575 Science Drive, Madison, Wisconsin. Goepfert, P. A., Wang, G. & Mulligan, M. J. (1995). Identification of an ER retrieval signal in a retroviral glycoprotein. Cell 82, 543–544. Goepfert, P. A., Ritter, G. D., Gbkaima, A., Zhang, Y., Hahn, B. H. & Mulligan, M. J. (1996). Analysis of West African hunters for foamy virus
infections. AIDS Research and Human Retroviruses 12, 1725–1730. Goepfert, P. A., Shaw, K. L. & Mulligan, M. J. (1997). A sorting motif localizes the foamy virus glycoprotein to the endoplasmic reticulum. Journal of Virology 71, 778–784. Heneine, W., Switzer, W. M., Sandstrom, P., Brown, J., Vedapuri, S., Schable, C. A., Khan, A. S., Lerch, N. W., Schweizer, M., NeumannHaefelin, D., Chapman, L. E. & Folks, T. M. (1998). Identification of a
human population infected with simian foamy viruses. Nature Medicine 4, 403–407. Herchenro$ der, O., Renne, R., Loncar, D., Cobb, E. K., Murthy, K. K., Schneider, J., Mergia, A. & Luciw, P. A. (1994). Isolation, cloning, and
sequencing of simian foamy viruses from chimpanzee (SFVcpz) : high homology to human foamy virus (HFV). Virology 201, 187–199. Holzschu, D. L., Delaney, M. A., Renshaw, R. W. & Casey, J. W. (1998).
The nucleotide sequence and spliced pol mRNA levels of the non-primate spumavirus bovine foamy virus. Journal of Virology 72, 2177–2182. Huang, Y., Khorchid, A., Wang, J., Parniak, M. A., Darlix, J. L., Wainberg, M. A. & Kleiman, L. (1997). Effect of mutations in the
nucleocapsid protein (NCp7) upon Pr160(gag–pol) and tRNA(Lys) incorporation into human immunodeficiency virus type 1. Journal of Virology 71, 4378–4384. Hunter, E. & Swanstrom, R. (1990). Retrovirus envelope glycoproteins. Current Topics in Microbiology and Immunology 157, 187–253. Jackson, M. R., Nilsson, T. & Peterson, P. A. (1990). Identification of a consensus motif for retention of transmembrane proteins in the endoplasmic reticulum. EMBO Journal 9, 3153–3162. Jones, M. K., Anantharamaiah, G. M. & Segrest, J. P. (1992). Computer programs to identify and classify amphipathic α-helical domains. Journal of Lipid Research 33, 287–296. Kimura, M. (1983). The Neutral Theory of Molecular Evolution. Cambridge : Cambridge University Press. Kozak, M. (1989). The scanning model for translation : an update. Journal of Cell Biology 108, 229–241. Kupiec, J.-J., Kay, A., Hayat, M., Ravier, R., Pe! rie' s, J. & Galibert, F. (1991). Sequence analysis of the simian foamy virus type 1 genome.
Gene 101, 185–194. Lemesle-Varloot, L., Henrissat, B., Gaboriaud, C., Bissery, V., Morgat, A. & Mornon, J. P. (1990). Hydrophobic cluster analysis : procedures to
derive structural and functional information from 2-D-representation of protein sequences. Biochimie 72, 555–574. Lewe, G. & Flu$ gel, R. M. (1990). Comparative analysis of the retroviral Pol and Env protein sequences reveal different evolutionary trees. Virus Genes 3, 195–204. Lim, V. I. (1978). Polypeptide chain folding through a highly helical intermediate as a general principle of globular protein structure formation. FEBS Letters 89, 10–14. Livingstone, C. D. & Barton, G. J. (1996). Identification of functional
residues and secondary structure from protein multiple sequence alignment. Methods in Enzymology 266, 497–512. Lo$ chelt, M. & Flu$ gel, R. M. (1995). The molecular biology of human and primate spuma retroviruses. In The Retroviridae, vol. 4, pp. 239–292. Edited by J. A. Levy. New York : Plenum Press. Lo$ chelt, M. & Flu$ gel, R. M. (1996). The human foamy virus pol gene is expressed as a pro-Pol polyprotein and not as a Gag–Pol fusion protein. Journal of Virology 70, 1033–1040. Loh, P. C. (1993). Spumaviruses. In The Retroviridae, 2nd edn, pp. 361–397. Edited by J. A. Levy. New York : Plenum Press. McClure, M. A., Johnson, M. S., Feng, D. F. & Doolittle, R. F. (1988).
Sequence comparisons of retroviral proteins : relative rates of change and general phylogeny. Proceedings of the National Academy of Sciences, USA 85, 2469–2473. Matthews, T. J., Wild, C., Chen, C., Bolognesi, D. P. & Greenberg, M. L. (1994). Structural rearrangements in the transmembrane glycoprotein
after receptor binding. Immunological Reviews 140, 93–104. Maurer, B., Bannert, H., Darai, G. & Flugel, R. M. (1988). Analysis of
the primary structure of the long terminal repeat and the gag and pol genes of the human spumaretrovirus. Journal of Virology 62, 1590–1597. Mergia, A., Shaw, K. E. S., Lackner, J. E. & Luciw, P. A. (1990).
Relationship of the env genes and the endonuclease domain of the pol genes of simian foamy virus type 1 and human foamy virus. Journal of Virology 64, 406–410. Moebes, A., Enssle, J., Bieniasz, P. D., Heinkelein, M., Lindemann, D., Bock, M., McClure, M. O. & Rethwilm, A. (1997). Human foamy virus
reverse transcription that occurs late in the viral replication cycle. Journal of Virology 71, 7305–7311. Neumann-Haefelin, D. & Schweizer, M. (1997). Nonhuman primate spumavirus infections among persons with occupational exposure. Morbidity and Mortality Weekly Report 46, 129–131. Nielsen, H., Engelbrecht, J., Brunak, S. & von Heijne, G. (1997).
Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Engineering 10, 1–6. Nothwehr, S. T. & Gordon, J. I. (1990). Targeting of proteins into the eukaryotic secretory pathway : signal peptide structure\function relationships. Bioessays 12, 479–484. Pearson, W. R. (1996). Effective protein sequence comparison. Methods in Enzymology 266, 227–258. Pearson, W. R. & Lipman, D. J. (1990). Improved tools for biological sequence analysis. Proceedings of the National Academy of Sciences, USA 85, 2444–2448. Perlman, D. & Halvorson, H. P. (1983). A putative signal peptidase recognition site and sequence in eukaryotic and prokaryotic signal peptides. Journal of Molecular Biology 167, 391–409. Renne, R., Friedl Schweizer, M., Fleps, U., Turek, R. & NeumannHaefelin, D. (1992). Genomic organization and expression of simian
foam virus type 3 (SFV-3). Virology 186, 597–608. Renshaw, R. W. & Casey, J. W. (1994). Transcriptional mapping of the 3h end of the bovine syncytial virus genome. Journal of Virology 68, 1021–1028. Rost, B. (1996). PHD : predicting one-dimensional protein structure by profile-based neural network. Methods in Enzymology 266, 525–539. Rost, B. & Sander, C. (1993). Prediction of protein structure at better than 70 % accuracy. Journal of Molecular Biology 232, 584–599. Rost, B. & Sander, C. (1994 a). Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19, 55–72. CFD
G. Wang and M. J. Mulligan Rost, B. & Sander, C. (1994 b). Conservation and prediction of solvent
accessibility in protein families. Proteins 20, 216–226. Rost, B., Casadio, R., Fariselli, P. & Sander, C. (1995). Transmembrane
helices predicted at 95 % accuracy. Protein Science 4, 521–533. Russell, R. B. & Sternberg, M. J. E. (1995). How good are we? Current Biology 5, 488–490. Schiffer, M. & Edmundson, A. B. (1967). Use of helical wheels to represent the structures of proteins and to identify segments with helical potential. Biophysical Journal 7, 121–135. Schliephake, A. W. & Rethwilm, A. (1994). Nuclear localization of foamy virus Gag precursor protein. Journal of Virology 68, 4946–4954. Schweizer, M., Turek, R., Hahn, H., Schliephake, A., Netzer, K.-O., Eder, G., Reinhardt, M., Rethwilm, A. & Neumann-Haefelin, D. (1995).
Markers of foamy virus infections in monkeys, apes, and accidentally infected humans : appropriate testing fails to confirm suspected foamy virus seroprevalence in humans. AIDS Research and Human Retroviruses 11, 161–170. Schweizer, M., Falcone, V., Ga$ nge, J., Turek, R. & Neumann-Haefelin, D. (1997). Simian foamy virus isolated from an accidentally infected
human individual. Journal of Virology 71, 4821–4824.
Fusiogenic segments of bovine leukemia virus and simian immunodeficiency virus are interchangeable and mediate fusion by means of oblique insertion in the lipid bilayer of their target cells. Proceedings of the National Academy of Sciences, USA 89, 3810–3814. von Heijne, G. (1986). A new method for predicting signal sequence cleavage sites. Nucleic Acids Research 14, 4683–4690. Weiss, R. A. (1998). Retroviral zoonoses. Nature Medicine 4, 391–392. Weissenhorn, W., Dessen, A., Harrison, S. C., Skehel, J. J. & Wiley, D. C. (1997). Atomic structure of the ectodomain from HIV-1 gp41. Nature
387, 426–430. White, J. M. (1990). Viral and cellular membrane fusion proteins. Annual
Review of Physiology 52, 675–697. White, J. M. (1992). Membrane fusion. Science 258, 917–924. Wilson, I. A., Skehel, J. J. & Wiley, D. C. (1981). Structure of the
haemagglutinin membrane glycoprotein of influenza virus at 3 AH resolution. Nature 289, 366–373.
Winkler, I., Bodem, J., Haas, L., Zemba, M., Delius, H., Flower, R., Flugel, R. & Lochelt, M. (1997). Characterization of the genome of feline
foamy virus and its proteins shows distinct features different from those of primate spumaviruses. Journal of Virology 71, 6727–6741.
Shin, J., Dunbrack, R. L., Jr, Lee, S. & Strominger, J. L. (1991). Signals for retention of transmembrane proteins in the endoplasmic reticulum studied with CD4 truncation mutants. Proceedings of the National Academy of Sciences, USA 88, 1918–1922. Vone' che, V., Portetelle, D., Kettmann, R., Willems, L., Limbach, K.,
Yu, S. F., Baldwin, D. N., Gwynn, S. R., Yendapalli, S. & Linial, M. L. (1996). Human foamy virus replication : a pathway distinct from that of
Paoletti, E., Ruysschaert, J.-M., Burny, A. & Brasseur, R. (1992).
Received 29 June 1998 ; Accepted 31 August 1998
CFE
retrovirus and hepadnaviruses. Science 271, 1579–1582.