Lecture 2: Principles of Protein Structure: Amino Acids Why study proteins? Proteins underpin every aspect of biological activity and therefore are targets for drug design and medicinal therapy, and in agriculture (novel insecticides, stress proteins). Understanding function, in vitro (assay, interactions) and in a cellular sense (proteomics) is key for controlling activity. Understanding structure is key to designing new ligands (drugs) that control activity. Understanding protein structure and function has led to new drugs, and there are numerous proteins (and peptides) as therapies e.g. insulin, relaxin. 1. Introduction: Genes Code for Proteins 9 The human genome consists of 3.2 x 10 bases, but only 25,000 genes. Only 1.5% of the genome codes for proteins. Genes mostly code for proteins. However, much of the noncoding part of the genome still has a functional role. The first protein structure was discovered in 1960 for myoglobin. The relationship between genes and proteins: DNA consists of genes (introns and exons) which is nucleic acid. => Transcription into RNA (still nucleic acid) => Translation into a protein (amino acids) the amino acid sequence defines the 3D structure of the protein Proteomics the postgenome Era: proteins are expressed at different times in different places One gene can express different spliced mRNA to give different proteins. Proteins can be posttranslationally modified: phosphorylated (signaling); glycosylated (extracellular protection, signaling); proteolytically cleaved (trafficking sequences e.g. to mitochondria); acylated (fatty acids, localisation e.g. to membrane, regulation). Proteins are sometimes only produced in one cell type or cell compartment (brain has 15,000 expressed proteins, gut has 2,000). Proteomics: is the analysis of the complete complement of expressed proteins. It is estimated that the proteome is an order of magnitude more complex than the genome. Proteins have motion which is functionally important: For a given protein sample, each molecule in the sample has approximately the same shape/structure. This means they are ordered and they can be crystallized (but not always). Proteins are thought of as soft matter and have flexibility. Important points: ● One gene is not always equal to one protein. ● All cell types from an organism have the same DNA, but not necessarily the same protein content (expression varies by cell type and compartment). ● Proteins are examples of soft matter that are both flexible and yet can be highly structured. ● Proteins can be posttranslationally modified in a variety of ways which can change their function. Protein structure can be considered at a number of levels: Primary: the amino acid sequence in order from Nterminus to Cterminus Secondary: local areas of regular ordered structure Tertiary: threedimensional gold of a protein subunit Quaternary: organisation of subunits
2. The Amino acids Should know: the full name, one letter and three letter codes; the molecular structure of the amino acids (generic and side chain); the functional properties of the side chains The general structure:
● ● ● ●
Lconfiguration (rather than D) Zwitterionic; carboxylate and amine can be ionized, neutral (no net charge) at neutral pH there are 20 common amino acids, which are defined by the Rgroup an amino acid residue is HNCHRCO
●
L and D convention is not to do with the optical activity of the amino acids (whether they rotate polarized light left or right). Lamino acids so named as they can all be synthesised from Lglyceraldehyde
Zwitterion :
Note: NH (CHR) COOH cannot exist 2 At 50% of each species, the pH = pK ] / [HA] ) = pK a because pH = pK a + log( [A a + log(1) = pK a + 0 = pK a Revision of pK The HendersonHasselbalch Equation a pH = pK + log( [A ] / [HA] ) a + e.g. HAc Ac + H (Ac = acetic acid, CH 3COOH) + K = [H ][Ac ] / [HAc] a Aminoacids naming conventions: alpha = the aminoacid basic frame excluding the R group (i.e. the alpha carbon and the amine, caryboxylate and hydrogen side group) (COOH)(H)(NH )C R 2 Assign each group a letter of the Greek alphabet: alpha, beta, gamma, theta, epsilon, zeta, eta. At branch sites, begin subclassifying and assign the heaviest group a lower number. if pH = pKa, [COOH] = [COO ] if pH > pKa, [COO ] > [COOH] if pH [COO ]
Rgroups of Hydrophilic (charged, polar) amino acids (1): Commonly found on surface of protein Acidic, carboxylate
Basic
Aspartate; Asp; D
Glutamate; Glu; E
Lysine; Lys; K
Arginine; Arg; R
Histidine; His; H
pKa 3.5
pKa 4.5
pKa 10.5
pKa 12.5
pKa 6.0
Freq 5.2%
Freq 6.2%
Freq 5.9%
Freq 5.1%
Freq 2.3%
Rgroups of Hydrophilic (neutral, polar) amino acids (2): Not typically ionisable in pH (214) but still hydrophilic
Asparagine; Asn; N
Glutamine; Gln; Q
Serine; Ser; S
Threonine; Thr; T
Freq 4.3%
Freq 4.1%
Freq 6.9%
Freq 5.9%
● ● ●
Asn and Gln are related to Asp and Glu, but have NH carboxamide 2 so referred to as Ser, Thr (and Tyr) are sometimes phosphorylated; Ser and Thr differ by an added methyl group Asn, Ser and Thr are sometimes glycosylated
Posttranslational modification: e.g. phosphorylation, glycosylation involves the enzymatic addition of group glycosylation, adding a carbohydrate phosphorylation, adding a phosphate group (typically performed by enzymes called kinases). It is done by taking a gamma phosphate group from ATP and forms a covalent bond to the oxygen in Serine or Threonine (hydroxyl group, hydrogen lost). Now a neutral polar residue is very negatively charged. Phosphorylation is important for the regulation and amplification of many biological processes. It changes the chemical nature of the residue, polar/neutral to polar/negatively charged.
Hydrophobic (aliphatic) amino acids (3): All contain methyl groups, sp3 hybridised so takes up a lot of space
Alanine; Ala; A
Valine; Val; V
Leucine; Leu; L
Isoleucine; Ile; I
Methionine; Met; M
Freq 7.7%
Freq 6.6%
Freq 8.5%
Freq 5.3%
Freq 2.4% Thioether
V, L, I are branchedchain (referred to as branchedchain amino acids). Very important in metabolism (source of energy). Isoleucine (isomer of leucine, has same molecular formula/weight). Methionine is a thioether (CSC). Hydrophobic (aromatic) amino acids (3):
Phenylalanine; Phe; F
Tyrosine; Tyr; Y
Tryptophan; Trp; W
Freq 4.0%
Ionizable, pKa 10 Freq 3.2%
Freq 1.4% (rarest)
Trp, Tyr absorb light at ~280 nm (UV): quantitate Trp is a useful fluorescence probe (ligand binding, protein folding, stability)
Amino acids (other) (4):
Glycine; Gly; G
Proline; Pro; P
Cysteine; Cys; C
No R group, two alpha hydrogens Not chiral Freq 7.4%
Rgroup cyclizes with peptide nitrogen; cyclic but not aromatic Freq 5.1%
Ionizable, pkA 8.5 Freq 2.0%
Two cycsteines can oxidise to cystine which is also called a disulphide bridge oxidises to form a covalent bond. The cytosol is a reducing environment, so cysteine in the cytosol is in it’s reduced form. Name
M.W.
Freq %
Rgroup function
Location
Special Property
Alanine, Ala, A
71
7.7
hydrophobic
mixed
Arginine, Arg, R
157
5.1
basic
surface
Asparagine, Asn, N
114
4.3
polar
surface
glycosylation
Aspartate, Asp, D
114
5.2
acidic
surface
Cysteine, Cys, C
103
2.0
thiol
disulphides
reduce or oxidise
Glutamate, Glu, E
128
6.2
acidic
surface
Glutamine, Gln, Q
128
4.1
polar
surface
Glycine, Gly, G
57
7.4
no R
mixed, loops
Histidine, His, H
137
2.3
basic
active sites
pKa ~6.5, enzymes
Isoleucine, Ile, I
113
5.3
hydrophobic
interior
Leucine, Leu, L
113
8.5
hydrophobic
interior
Lysine, Lys, K
129
5.9
basic
surface
acetylation
Methionine, Met, M
131
2.4
hydrophobic
interior
Phenylalanine, Phe, F
147
4.0
aromatic
interior
Proline, Pro, P
97
5.1
cyclic
surface, loops
Serine, Ser, S
87
6.9
polar
surface
phosphorylation
Threonine, Thr, T
101
5.9
polar
surface
phosphorylation
Tyrosine, Tyr, Y
163
3.2
aromatic
interior
absorb UV
Tryptophan, Trp, W
186
1.4
aromatic
interior
absorb UV, fluorescence
Valine, Val, V
99
6.6
hydrophobic
interior
Note: mean average weight is approximated to 110 (Daltons)