Pairwise Alignment of Protein Interaction Networks - Semantic Scholar

Report 1 Downloads 23 Views
Pairwise Alignment of Protein Interaction Networks by Mehmet Koyuturk, Yohan Kim, Umut Topkara, Shankar Subramaniam, Wojciech Szpankowski and Anatha Grama

Presented by: Anastacia Sulkin 16/12/2015 1

The Goal Introduction

The Method

Duplication/ Divergence Model

• Discovery of conserved patterns in protein-protein interaction networks.

Why?

The Problem

Experimental Results

Conclusion

• These networks provide the experimental basis for understanding modular organization of cells, as well as useful information for predicting the biological function of individual proteins 2

The Main Challenge Introduction

The Method

Duplication/ Divergence Model

• It’s hard to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately.

The Problem

Experimental Results

Conclusion

3

So How Will We Do That? Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

• By presenting a framework for comprehensive alignment of PPI networks. • A mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch and duplication in network alignment. • evaluates similarity between graph structures through a scoring function that accounts for evolutionary events.

Conclusion

4

Sequence Alignment Introduction

The Method

Duplication/ Divergence Model

The Problem

• A way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. • Given two (or more) sequences we want to know to what extent they are similar.

Experimental Results

Conclusion

5

How Do Sequences Change? Introduction

The Method

Duplication/ Divergence Model

• Three types of changes • Substitution (point mutation) • Insertion Indel (replication slippage) • Deletion

The Problem

TCCGT

Experimental Results

Conclusion

TCAGT

TCGAGT TCGT 6

Sequence Alignment Introduction

The Method

• How do we quantitate sequence similarity? • A score is given for match, mismatch and gap (indel)

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

• Many alignments are possible, we take the one with the best score

TT-CGTCGTAGTCG-GC-TCGACC-TG GTACGTC-TAG-CGAGCGT-GATCCT17 matches +34 2 mismatches - 2 8 indels - 8

Total score +24 A strong match 7

From Sequences to Graphs Introduction

The Method

Duplication/ Divergence Model

The Problem

• As in the case with sequences, key problems on graphs derived from biomolecular interactions include: • • • •

aligning multiple graphs finding frequently occurring sub-graphs in a collection of graphs discovering highly conserved subgraphs in a pair of graphs finding good matches for a subgraph in a database of graphs

Experimental Results

Conclusion

8

Theoretical Models Introduction

The Method

Duplication/ Divergence Model

The Problem

• Several theoretical models have been developed based on the understanding of the structure of PPI networks. • These models focus on understanding the evolution of protein interactions. • One promising model is the duplication/divergence model.

Experimental Results

Conclusion

9

The Method Proposed Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

• The method proposed for alignment of PPI networks is based on these evolutionary models. • We will construct product graphs by matching pairs of orthologous nodes. • Orthologous: genes in different species that evolved from a common ancestral gene by speciation, they retain the same function.

• The edges will be weighted in order to reward or penalize evolutionary events. • We reduce the resulting alignment problem to a graph-theoretic optimization problem. • We propose efficient heuristics to solve the problem. 10

Some Insights Introduction

The Method

Duplication/ Divergence Model

• Studies show: • PPI networks expand continuously by adding of new nodes • These nodes prefer to attach to well-connected nodes when joining the network ( Preferential attachment)

The Problem

Experimental Results

Conclusion

11

What is the Duplication/Divergence Model? Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

• A common model of evolution that explains preferential attachment. • Based on gene duplication. • According to this model, when a gene is duplicated in the genome, the node corresponding to the product of this gene is also duplicated together with its interactions.

Conclusion

12

Protein Duplication Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

• A protein loses many aspects of its functions rapidly after being duplicated. • This translates to divergence of duplicated (paralogous) proteins through elimination and emergence of interactions. • Elimination of an interaction in a PPI network implies the loss of an interaction between 2 proteins due to structural/functional changes. • Emergence of an interaction in a PPI network implies the introduction of a new interaction between 2 noninteracting proteins caused by mutations that change protein surfaces. 13

Duplication/Divergence Model Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

• In order to accurately identify and interpret conservation of interactions, complexes, and modules across species, we base our framework for the local alignment of PPI networks on duplication/divergence models. • We evaluate mismatched interactions and paralogous proteins according to the model. • Introducing the concepts of match (conservation), mismatch (emergence or elimination), and duplication we are able to discover alignments that also allow speculation about the structure of the network in the common ancestor. 14

The PPI Network Alignment Problem Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

• A PPI network is modeled by an undirected graph G(U,E). • U denotes the set of proteins • uu’∈ E denotes an interaction between proteins u ∈ U and u’ ∈ U.

• For pairwise alignment we are given two PPI networks • G(U,E) • H(V,F)

Conclusion

15

The PPI Network Alignment Problem Introduction

• The homology between a pair of proteins is quantified by function S The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

• For any u, v ∈ U ∪ V , S(u, v) measures the degree of confidence in u and v being orthologous • If u and v belong to the same species, then S quantifies the likelihood that they are inparalogs • In-paralogs and out-paralogs are proteins that were duplicated before and after speciation, respectively.

• A protein subset pair P = { 𝑈, 𝑉 } is defined as a pair of protein subsets 𝑈 ⊆ U and 𝑉⊆ V. • Any protein subset pair P induces a local alignment A(G,H, S, P) = {M,N,D} of G and H with respect to S, characterized by a set of duplications D, a set of matches M, and a set of mismatches N. 16

Matches, Mismatches and Duplications Introduction

• Match The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

• Corresponds to a conserved interaction between two orthologous protein pairs. • Rewarded by a match score that reflects our confidence in both protein pairs being orthologous.

• Mismatch • The lack of an interaction in the PPI network of one organism between a pair of proteins whose orthologs interact in the other organism. • May correspond to the emergence of a new interaction or the elimination of a previously existing interaction. • Penalized to account for the divergence from the common ancestor. 17

Matches, Mismatches and Duplications Introduction

• Duplication The Method

Duplication/ Divergence Model

• Biological analog is the duplication of a gene in the course of evolution. • Associated with a score that reflects the divergence of function between the two proteins.

The Problem

Experimental Results

Conclusion

18

Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Local Alignment of PPI Networks: Formal Definition • Given protein interaction networks G(U,E), H(V,F), let functions ∆𝐺 (u, u’) and ∆𝐻 (v, v’) denote the distance between two corresponding proteins in the interaction graphs G and H, respectively. Given a pairwise similarity function S defined over the union of their protein sets U∪V , and a distance cutoff ∆ , any protein subset pair P = ( 𝑈, 𝑉) induces a local alignment A(G, V, S, P) = {M,N,D}, where:

Conclusion

19

Scoring Match ,Mismatch And Duplication Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

• For scoring matches and mismatches, we define the similarity between two protein pairs as follows: • S(uu’,vv’) = S(u,v)S(u’v’) • It quantifies the likelihood that interactions between u and v, and u’ and v’ are orthologous

• Match score: • μ(uu’, vv’) = 𝜇S(uu’, vv’) • 𝜇 is the match coefficient

• Mismatch score: • 𝜗(uu’,vv’)= -𝜗S(uu’,vv’) • 𝜗 is the mismatch coefficient 20

Scoring Match ,Mismatch And Duplication Introduction

• Duplication score: The Method

Duplication/ Divergence Model

The Problem

Experimental Results

• δ(u, u’) = 𝛿(S(u, u’) − 𝑑) • 𝑑 is the cut-off for being considered in-paralogs. If S(u, u’) > 𝑑, suggesting that u and u’ are likely to be in-paralogs, the duplication is rewarded by a positive score. If S(u, u’) < 𝑑, on the other hand, the proteins are considered out-paralogs , therefore, the duplication is penalized

• Duplicated proteins rapidly lose their interactions, therefore it is more likely that in-paralogs will share more interacting partners than outparalogs.

Conclusion

21

Introduction

The Method

Alignment Score and the Optimization Problem • Given PPI networks G and H, the score of alignment A(G,H, S, P) = {M,N,D} is defined as

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

• The goal is to find all maximal protein subset pairs P such that σ(A(G,H, S, P)) is locally maximal. 22

Example Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

23

Estimation of Similarity Scores Introduction

The Method

• Reminder: similarity score S(u,v) quantifies the likelihood that proteins u and v are orthologous.

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

• O is the set of all orthologous protein pairs • E(u,v) is the BLAST E-value for proteins u and v • 𝑂𝑢𝑣 represents the event that u an v are orthologous

24

Alignment Graph Introduction

The Method

Duplication/ Divergence Model

The Problem

• The information regarding two PPI networks can be represented using a single alignment graph. • Assigning appropriate weights to the edges, the local alignment problem can be reduced to an optimization problem. • All evolutionary information is encoded into edge weights through the concepts of matches, mismatches and duplications.

Experimental Results

Conclusion

25

Alignment Graph – Formal Definition Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

• For a pair of PPI networks G(U,E), H(F,V), and protein similarity function S, the corresponding weighted alignment graph G(V,E) is computed as follows:

• We have a node for each pair of ortholog proteins.

• The weight for each edge vv’ ∈E where v={u,v} and v’={u’,v’} is:

Conclusion

26

Alignment Graph – Example Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

27

Maximum Weight induced Subgraph Problem Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

• Given graph G(V,E) and a constant 𝜀, find a subset of nodes, 𝑉∈ V such that the sum of the weights of the edges in the subgraph induced by 𝑉is at least 𝜀 , i.e., W(𝑉) =

′) ≥ 𝜀 𝑤(𝑣𝑣 𝑣,𝑣′∈𝑉

• This problem is equivalent to the decision version of the local alignment problem defined previously! • Or formally:

Conclusion

28

Maximum Weight induced Subgraph Problem Introduction

The Method

Duplication/ Divergence Model

The Problem

• Problem: The MaWISh is NP-complete! • Solution: Locally optimal solutions of MaWISh are sufficient for our needs.

Experimental Results

Conclusion

• We will use fast heuristics to identify locally maximal heavy subgraphs in the alignment graph.

29

Some Insights Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

• In terms of protein-protein interactions, functional modules are likely to be densely connected while being separable from other modules. • Reminder: functional module is a set of proteins which partake in the same organic courses of biological action.

• Analysis of conserved motifs reveals that proteins in highly connected motifs are more likely to be to be conserved Proteins that belong to a conserved module will induce heavy subgraphs in the alignment graph, while being loosely connected to other parts of the graph. 30

The Algorithm Introduction

The Method

Duplication/ Divergence Model

• We will use iterative improvement base algorithm for finding a single conserved subgraph on the alignment graph. • We will start from a subgraph seeded at heavy nodes and grow it greedily.

The Problem

Experimental Results

Conclusion

• We repeatedly swap or move nodes with maximum gain. The move is performed even if it causes negative gain in order to climb over poor local optima. 31

To Sum Up Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

• We formally defined a computational problem that captures the underlying biological phenomena using matches, mismatches and duplications. • We then formulated PPI network alignment as a graph optimization problem. • We proposed efficient heuristics to effectively solve the problem. • We rank all subgraphs based on their significance and report the corresponding results.

Conclusion

32

Data And Implementation Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

• Implementation in C language. • Three commonly studied eukaryotic organisms • S. cerevisiae (yeast) • C. elegans (nematode) • D. melanogaster (fruit fly)

• Fixed set of parameters • 𝜇=1.0 𝛿=0.1 𝜗=1.0

Conclusion

33

Results Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

34

Results Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

35

Conclusion Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

• Implementation of proposed network is successful in uncovering conserved substructures in protein interaction data. • Based on the results pairwise alignment of PPI networks is established as a tool for not only identifying conserved modules, but also as a tool for assessing functional differences and similarities of homologous proteins based on shared and missing interactions. • Alignment results provide a means for discovery of new functional modules in relatively less studied organisms through mapping of functions at a modular level rather than at the level of single protein homologies. 36

Bibliography Introduction

The Method

Duplication/ Divergence Model

The Problem

Experimental Results

Conclusion

• http://msb.embopress.org/content/9/1/652 • http://www.biochemj.org/content/409/1/27 • http://www.news.cornell.edu/stories/2013/01/scientists-findholy-grail-evolving-modular-networks • http://webcourse.cs.technion.ac.il/236523/Winter20152016/en/ho.html • Pairwise Alignment of Protein Interaction Networks: http://compbio.case.edu/koyuturk/publications/ppi_alignment_jc b.pdf 37

38