Exploiting the linear organisation of omics network embedding spaces
Autori
Malod-Dognin, NoëlXenos, Alexandros
Doria Belenguer, Sergio
Pržulj, Nataša
Ostala autorstva
Morić, IvanaĐorđević, Valentina
Konferencijski prilog (Objavljena verzija)
,
© 2023 Institute of Molecular Genetics and Genetic Engineering, University of Belgrade
Metapodaci
Prikaz svih podataka o dokumentuApstrakt
We are increasingly accumulating large-scale biological omics data that describe different
aspects of cellular functioning. These datasets are typically modelled and analyzed as
networks. To ease the downstream analyses, recent approaches embed the nodes of a
network into a low-dimensional space by using a skip-gram neural network (e.g. DeepWalk,
LINE and node2vec). These methods are implicitly factorizing a positive pointwise mutual
information (PPMI) matrix, which could be explicitly factorized with Non-negative Matrix
Tri-Factorization (NMTF). Importantly, in Natural Language Processing (NLP), word
embeddings obtained by using similar approaches showed linear algebraic structures,
which allows for answering analogy questions by using simple linear vector operations.
Thus, we investigate if we can obtain and exploit similar linear embedding spaces for the
biological omics networks.
We initiate the use of the PPMI matrices to capture the neighborhood relationship or the
st...ructural (topological) similarities of nodes in the network. By embedding the human
Protein-Protein Interaction (PPI) network by factorizing its PPMI matrix representations
with NMTF, we demonstrate that the embedding vectors of genes having different Gene
Ontology (GO) annotations are linearly separated in the PPI embedding space.
Then, in analogy to the embedding vector of a sentence being obtained as the sum
(average) of the embedding vectors of its constituent words in NLP, we show that the
embedding vectors of biological functions and of protein complexes can be obtained by
averaging he embedding vectors of the genes that participate in then, and that these
embeddings can be used to predict protein complex memberships and cancer genes.
Finally, we investigate the embeddings of cancer and control tissue specific PPI networks
and show that simple subtractions allow for identifying cancer altered biological functions
and cancer genes.
Ključne reči:
bioinformatics / molecular omics networks / network data mining / network embeddingIzvor:
4th Belgrade Bioinformatics Conference, 2023, 4, 12-12Izdavač:
- Belgrade : Institute of molecular genetics and genetic engineering
Finansiranje / projekti:
- This project has received funding from the European Research Council (ERC) Consolidator Grant 770827 and the Spanish State Research Agency AEI 10.13039/501100011033 grant number PID2019-105500GB-I00.
Napomena:
- Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 2023
Kolekcije
Institucija/grupa
Institut za molekularnu genetiku i genetičko inženjerstvoTY - CONF AU - Malod-Dognin, Noël AU - Xenos, Alexandros AU - Doria Belenguer, Sergio AU - Pržulj, Nataša PY - 2023 UR - https://belbi.bg.ac.rs/ UR - https://imagine.imgge.bg.ac.rs/handle/123456789/1947 AB - We are increasingly accumulating large-scale biological omics data that describe different aspects of cellular functioning. These datasets are typically modelled and analyzed as networks. To ease the downstream analyses, recent approaches embed the nodes of a network into a low-dimensional space by using a skip-gram neural network (e.g. DeepWalk, LINE and node2vec). These methods are implicitly factorizing a positive pointwise mutual information (PPMI) matrix, which could be explicitly factorized with Non-negative Matrix Tri-Factorization (NMTF). Importantly, in Natural Language Processing (NLP), word embeddings obtained by using similar approaches showed linear algebraic structures, which allows for answering analogy questions by using simple linear vector operations. Thus, we investigate if we can obtain and exploit similar linear embedding spaces for the biological omics networks. We initiate the use of the PPMI matrices to capture the neighborhood relationship or the structural (topological) similarities of nodes in the network. By embedding the human Protein-Protein Interaction (PPI) network by factorizing its PPMI matrix representations with NMTF, we demonstrate that the embedding vectors of genes having different Gene Ontology (GO) annotations are linearly separated in the PPI embedding space. Then, in analogy to the embedding vector of a sentence being obtained as the sum (average) of the embedding vectors of its constituent words in NLP, we show that the embedding vectors of biological functions and of protein complexes can be obtained by averaging he embedding vectors of the genes that participate in then, and that these embeddings can be used to predict protein complex memberships and cancer genes. Finally, we investigate the embeddings of cancer and control tissue specific PPI networks and show that simple subtractions allow for identifying cancer altered biological functions and cancer genes. PB - Belgrade : Institute of molecular genetics and genetic engineering C3 - 4th Belgrade Bioinformatics Conference T1 - Exploiting the linear organisation of omics network embedding spaces EP - 12 SP - 12 VL - 4 UR - https://hdl.handle.net/21.15107/rcub_imagine_1947 ER -
@conference{ author = "Malod-Dognin, Noël and Xenos, Alexandros and Doria Belenguer, Sergio and Pržulj, Nataša", year = "2023", abstract = "We are increasingly accumulating large-scale biological omics data that describe different aspects of cellular functioning. These datasets are typically modelled and analyzed as networks. To ease the downstream analyses, recent approaches embed the nodes of a network into a low-dimensional space by using a skip-gram neural network (e.g. DeepWalk, LINE and node2vec). These methods are implicitly factorizing a positive pointwise mutual information (PPMI) matrix, which could be explicitly factorized with Non-negative Matrix Tri-Factorization (NMTF). Importantly, in Natural Language Processing (NLP), word embeddings obtained by using similar approaches showed linear algebraic structures, which allows for answering analogy questions by using simple linear vector operations. Thus, we investigate if we can obtain and exploit similar linear embedding spaces for the biological omics networks. We initiate the use of the PPMI matrices to capture the neighborhood relationship or the structural (topological) similarities of nodes in the network. By embedding the human Protein-Protein Interaction (PPI) network by factorizing its PPMI matrix representations with NMTF, we demonstrate that the embedding vectors of genes having different Gene Ontology (GO) annotations are linearly separated in the PPI embedding space. Then, in analogy to the embedding vector of a sentence being obtained as the sum (average) of the embedding vectors of its constituent words in NLP, we show that the embedding vectors of biological functions and of protein complexes can be obtained by averaging he embedding vectors of the genes that participate in then, and that these embeddings can be used to predict protein complex memberships and cancer genes. Finally, we investigate the embeddings of cancer and control tissue specific PPI networks and show that simple subtractions allow for identifying cancer altered biological functions and cancer genes.", publisher = "Belgrade : Institute of molecular genetics and genetic engineering", journal = "4th Belgrade Bioinformatics Conference", title = "Exploiting the linear organisation of omics network embedding spaces", pages = "12-12", volume = "4", url = "https://hdl.handle.net/21.15107/rcub_imagine_1947" }
Malod-Dognin, N., Xenos, A., Doria Belenguer, S.,& Pržulj, N.. (2023). Exploiting the linear organisation of omics network embedding spaces. in 4th Belgrade Bioinformatics Conference Belgrade : Institute of molecular genetics and genetic engineering., 4, 12-12. https://hdl.handle.net/21.15107/rcub_imagine_1947
Malod-Dognin N, Xenos A, Doria Belenguer S, Pržulj N. Exploiting the linear organisation of omics network embedding spaces. in 4th Belgrade Bioinformatics Conference. 2023;4:12-12. https://hdl.handle.net/21.15107/rcub_imagine_1947 .
Malod-Dognin, Noël, Xenos, Alexandros, Doria Belenguer, Sergio, Pržulj, Nataša, "Exploiting the linear organisation of omics network embedding spaces" in 4th Belgrade Bioinformatics Conference, 4 (2023):12-12, https://hdl.handle.net/21.15107/rcub_imagine_1947 .