Pairwise Alignment of Protein Interaction Networks by Mehmet Koyuturk, Yohan Kim, Umut Topkara, Shankar Subramaniam, Wojciech Szpankowski and Anatha Grama
Presented by: Anastacia Sulkin 16/12/2015 1
The Goal Introduction
The Method
Duplication/ Divergence Model
• Discovery of conserved patterns in protein-protein interaction networks.
Why?
The Problem
Experimental Results
Conclusion
• These networks provide the experimental basis for understanding modular organization of cells, as well as useful information for predicting the biological function of individual proteins 2
The Main Challenge Introduction
The Method
Duplication/ Divergence Model
• It’s hard to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately.
The Problem
Experimental Results
Conclusion
3
So How Will We Do That? Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
• By presenting a framework for comprehensive alignment of PPI networks. • A mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch and duplication in network alignment. • evaluates similarity between graph structures through a scoring function that accounts for evolutionary events.
Conclusion
4
Sequence Alignment Introduction
The Method
Duplication/ Divergence Model
The Problem
• A way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. • Given two (or more) sequences we want to know to what extent they are similar.
Experimental Results
Conclusion
5
How Do Sequences Change? Introduction
The Method
Duplication/ Divergence Model
• Three types of changes • Substitution (point mutation) • Insertion Indel (replication slippage) • Deletion
The Problem
TCCGT
Experimental Results
Conclusion
TCAGT
TCGAGT TCGT 6
Sequence Alignment Introduction
The Method
• How do we quantitate sequence similarity? • A score is given for match, mismatch and gap (indel)
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
• Many alignments are possible, we take the one with the best score
TT-CGTCGTAGTCG-GC-TCGACC-TG GTACGTC-TAG-CGAGCGT-GATCCT17 matches +34 2 mismatches - 2 8 indels - 8
Total score +24 A strong match 7
From Sequences to Graphs Introduction
The Method
Duplication/ Divergence Model
The Problem
• As in the case with sequences, key problems on graphs derived from biomolecular interactions include: • • • •
aligning multiple graphs finding frequently occurring sub-graphs in a collection of graphs discovering highly conserved subgraphs in a pair of graphs finding good matches for a subgraph in a database of graphs
Experimental Results
Conclusion
8
Theoretical Models Introduction
The Method
Duplication/ Divergence Model
The Problem
• Several theoretical models have been developed based on the understanding of the structure of PPI networks. • These models focus on understanding the evolution of protein interactions. • One promising model is the duplication/divergence model.
Experimental Results
Conclusion
9
The Method Proposed Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
• The method proposed for alignment of PPI networks is based on these evolutionary models. • We will construct product graphs by matching pairs of orthologous nodes. • Orthologous: genes in different species that evolved from a common ancestral gene by speciation, they retain the same function.
• The edges will be weighted in order to reward or penalize evolutionary events. • We reduce the resulting alignment problem to a graph-theoretic optimization problem. • We propose efficient heuristics to solve the problem. 10
Some Insights Introduction
The Method
Duplication/ Divergence Model
• Studies show: • PPI networks expand continuously by adding of new nodes • These nodes prefer to attach to well-connected nodes when joining the network ( Preferential attachment)
The Problem
Experimental Results
Conclusion
11
What is the Duplication/Divergence Model? Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
• A common model of evolution that explains preferential attachment. • Based on gene duplication. • According to this model, when a gene is duplicated in the genome, the node corresponding to the product of this gene is also duplicated together with its interactions.
Conclusion
12
Protein Duplication Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
• A protein loses many aspects of its functions rapidly after being duplicated. • This translates to divergence of duplicated (paralogous) proteins through elimination and emergence of interactions. • Elimination of an interaction in a PPI network implies the loss of an interaction between 2 proteins due to structural/functional changes. • Emergence of an interaction in a PPI network implies the introduction of a new interaction between 2 noninteracting proteins caused by mutations that change protein surfaces. 13
Duplication/Divergence Model Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
• In order to accurately identify and interpret conservation of interactions, complexes, and modules across species, we base our framework for the local alignment of PPI networks on duplication/divergence models. • We evaluate mismatched interactions and paralogous proteins according to the model. • Introducing the concepts of match (conservation), mismatch (emergence or elimination), and duplication we are able to discover alignments that also allow speculation about the structure of the network in the common ancestor. 14
The PPI Network Alignment Problem Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
• A PPI network is modeled by an undirected graph G(U,E). • U denotes the set of proteins • uu’∈ E denotes an interaction between proteins u ∈ U and u’ ∈ U.
• For pairwise alignment we are given two PPI networks • G(U,E) • H(V,F)
Conclusion
15
The PPI Network Alignment Problem Introduction
• The homology between a pair of proteins is quantified by function S The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
• For any u, v ∈ U ∪ V , S(u, v) measures the degree of confidence in u and v being orthologous • If u and v belong to the same species, then S quantifies the likelihood that they are inparalogs • In-paralogs and out-paralogs are proteins that were duplicated before and after speciation, respectively.
• A protein subset pair P = { 𝑈, 𝑉 } is defined as a pair of protein subsets 𝑈 ⊆ U and 𝑉⊆ V. • Any protein subset pair P induces a local alignment A(G,H, S, P) = {M,N,D} of G and H with respect to S, characterized by a set of duplications D, a set of matches M, and a set of mismatches N. 16
Matches, Mismatches and Duplications Introduction
• Match The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
• Corresponds to a conserved interaction between two orthologous protein pairs. • Rewarded by a match score that reflects our confidence in both protein pairs being orthologous.
• Mismatch • The lack of an interaction in the PPI network of one organism between a pair of proteins whose orthologs interact in the other organism. • May correspond to the emergence of a new interaction or the elimination of a previously existing interaction. • Penalized to account for the divergence from the common ancestor. 17
Matches, Mismatches and Duplications Introduction
• Duplication The Method
Duplication/ Divergence Model
• Biological analog is the duplication of a gene in the course of evolution. • Associated with a score that reflects the divergence of function between the two proteins.
The Problem
Experimental Results
Conclusion
18
Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Local Alignment of PPI Networks: Formal Definition • Given protein interaction networks G(U,E), H(V,F), let functions ∆𝐺 (u, u’) and ∆𝐻 (v, v’) denote the distance between two corresponding proteins in the interaction graphs G and H, respectively. Given a pairwise similarity function S defined over the union of their protein sets U∪V , and a distance cutoff ∆ , any protein subset pair P = ( 𝑈, 𝑉) induces a local alignment A(G, V, S, P) = {M,N,D}, where:
Conclusion
19
Scoring Match ,Mismatch And Duplication Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
• For scoring matches and mismatches, we define the similarity between two protein pairs as follows: • S(uu’,vv’) = S(u,v)S(u’v’) • It quantifies the likelihood that interactions between u and v, and u’ and v’ are orthologous
• Match score: • μ(uu’, vv’) = 𝜇S(uu’, vv’) • 𝜇 is the match coefficient
• Mismatch score: • 𝜗(uu’,vv’)= -𝜗S(uu’,vv’) • 𝜗 is the mismatch coefficient 20
Scoring Match ,Mismatch And Duplication Introduction
• Duplication score: The Method
Duplication/ Divergence Model
The Problem
Experimental Results
• δ(u, u’) = 𝛿(S(u, u’) − 𝑑) • 𝑑 is the cut-off for being considered in-paralogs. If S(u, u’) > 𝑑, suggesting that u and u’ are likely to be in-paralogs, the duplication is rewarded by a positive score. If S(u, u’) < 𝑑, on the other hand, the proteins are considered out-paralogs , therefore, the duplication is penalized
• Duplicated proteins rapidly lose their interactions, therefore it is more likely that in-paralogs will share more interacting partners than outparalogs.
Conclusion
21
Introduction
The Method
Alignment Score and the Optimization Problem • Given PPI networks G and H, the score of alignment A(G,H, S, P) = {M,N,D} is defined as
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
• The goal is to find all maximal protein subset pairs P such that σ(A(G,H, S, P)) is locally maximal. 22
Example Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
23
Estimation of Similarity Scores Introduction
The Method
• Reminder: similarity score S(u,v) quantifies the likelihood that proteins u and v are orthologous.
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
• O is the set of all orthologous protein pairs • E(u,v) is the BLAST E-value for proteins u and v • 𝑂𝑢𝑣 represents the event that u an v are orthologous
24
Alignment Graph Introduction
The Method
Duplication/ Divergence Model
The Problem
• The information regarding two PPI networks can be represented using a single alignment graph. • Assigning appropriate weights to the edges, the local alignment problem can be reduced to an optimization problem. • All evolutionary information is encoded into edge weights through the concepts of matches, mismatches and duplications.
Experimental Results
Conclusion
25
Alignment Graph – Formal Definition Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
• For a pair of PPI networks G(U,E), H(F,V), and protein similarity function S, the corresponding weighted alignment graph G(V,E) is computed as follows:
• We have a node for each pair of ortholog proteins.
• The weight for each edge vv’ ∈E where v={u,v} and v’={u’,v’} is:
Conclusion
26
Alignment Graph – Example Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
27
Maximum Weight induced Subgraph Problem Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
• Given graph G(V,E) and a constant 𝜀, find a subset of nodes, 𝑉∈ V such that the sum of the weights of the edges in the subgraph induced by 𝑉is at least 𝜀 , i.e., W(𝑉) =
′) ≥ 𝜀 𝑤(𝑣𝑣 𝑣,𝑣′∈𝑉
• This problem is equivalent to the decision version of the local alignment problem defined previously! • Or formally:
Conclusion
28
Maximum Weight induced Subgraph Problem Introduction
The Method
Duplication/ Divergence Model
The Problem
• Problem: The MaWISh is NP-complete! • Solution: Locally optimal solutions of MaWISh are sufficient for our needs.
Experimental Results
Conclusion
• We will use fast heuristics to identify locally maximal heavy subgraphs in the alignment graph.
29
Some Insights Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
• In terms of protein-protein interactions, functional modules are likely to be densely connected while being separable from other modules. • Reminder: functional module is a set of proteins which partake in the same organic courses of biological action.
• Analysis of conserved motifs reveals that proteins in highly connected motifs are more likely to be to be conserved Proteins that belong to a conserved module will induce heavy subgraphs in the alignment graph, while being loosely connected to other parts of the graph. 30
The Algorithm Introduction
The Method
Duplication/ Divergence Model
• We will use iterative improvement base algorithm for finding a single conserved subgraph on the alignment graph. • We will start from a subgraph seeded at heavy nodes and grow it greedily.
The Problem
Experimental Results
Conclusion
• We repeatedly swap or move nodes with maximum gain. The move is performed even if it causes negative gain in order to climb over poor local optima. 31
To Sum Up Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
• We formally defined a computational problem that captures the underlying biological phenomena using matches, mismatches and duplications. • We then formulated PPI network alignment as a graph optimization problem. • We proposed efficient heuristics to effectively solve the problem. • We rank all subgraphs based on their significance and report the corresponding results.
Conclusion
32
Data And Implementation Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
• Implementation in C language. • Three commonly studied eukaryotic organisms • S. cerevisiae (yeast) • C. elegans (nematode) • D. melanogaster (fruit fly)
• Fixed set of parameters • 𝜇=1.0 𝛿=0.1 𝜗=1.0
Conclusion
33
Results Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
34
Results Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
35
Conclusion Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
• Implementation of proposed network is successful in uncovering conserved substructures in protein interaction data. • Based on the results pairwise alignment of PPI networks is established as a tool for not only identifying conserved modules, but also as a tool for assessing functional differences and similarities of homologous proteins based on shared and missing interactions. • Alignment results provide a means for discovery of new functional modules in relatively less studied organisms through mapping of functions at a modular level rather than at the level of single protein homologies. 36
Bibliography Introduction
The Method
Duplication/ Divergence Model
The Problem
Experimental Results
Conclusion
• http://msb.embopress.org/content/9/1/652 • http://www.biochemj.org/content/409/1/27 • http://www.news.cornell.edu/stories/2013/01/scientists-findholy-grail-evolving-modular-networks • http://webcourse.cs.technion.ac.il/236523/Winter20152016/en/ho.html • Pairwise Alignment of Protein Interaction Networks: http://compbio.case.edu/koyuturk/publications/ppi_alignment_jc b.pdf 37
38