TY - GEN
T1 - Scalable multipartite subgraph enumeration for integrative analysis of heterogeneous experimental functional genomics data
AU - Phillips, Charles A.
AU - Wang, Kai
AU - Bubier, Jason
AU - Baker, Erich J.
AU - Chesler, Elissa J.
AU - Langston, Michael A.
N1 - Publisher Copyright:
Copyright is held by the author/owner(s).
PY - 2015/9/9
Y1 - 2015/9/9
N2 - Functional genomics, the effort to understand the role of genomic elements in biological processes, has led to an avalanche of diverse experimental and semantic information defining associations between genes and various biological concepts across species and experimental paradigms. Integrating this rapidly expanding wealth of heterogeneous data, and finding consensus among so many diverse sources for specific research questions, require highly sophisticated big data structures and algorithms for harmonization and scalable analysis. In this context, multipartite graphs can often serve as useful structures for representing questions about the role of genes in multiple, frequently-occurring disease processes. The main focus of this paper is on finding and analyzing efficient algorithms for dense subgraph enumeration in such graphs. An O (3 n/3 )-time procedure was devised to enumerate all maximal k -partite cliques in a k -partite graph, where k ≥ 3. The maximum number of such cliques is also shown to obey this bound, and thus this procedure obtains the best possible asymptotic performance. Empirical testing on both real and synthetic data is conducted. Concrete applications to biological data are described, as are scalability issues in the context of big data analysis.
AB - Functional genomics, the effort to understand the role of genomic elements in biological processes, has led to an avalanche of diverse experimental and semantic information defining associations between genes and various biological concepts across species and experimental paradigms. Integrating this rapidly expanding wealth of heterogeneous data, and finding consensus among so many diverse sources for specific research questions, require highly sophisticated big data structures and algorithms for harmonization and scalable analysis. In this context, multipartite graphs can often serve as useful structures for representing questions about the role of genes in multiple, frequently-occurring disease processes. The main focus of this paper is on finding and analyzing efficient algorithms for dense subgraph enumeration in such graphs. An O (3 n/3 )-time procedure was devised to enumerate all maximal k -partite cliques in a k -partite graph, where k ≥ 3. The maximum number of such cliques is also shown to obey this bound, and thus this procedure obtains the best possible asymptotic performance. Empirical testing on both real and synthetic data is conducted. Concrete applications to biological data are described, as are scalability issues in the context of big data analysis.
KW - Big data analytics
KW - Dense subgraph enumeration
KW - Life science applications
KW - Multipartite graphs
UR - https://www.scopus.com/pages/publications/84963542433
U2 - 10.1145/2808719.2812595
DO - 10.1145/2808719.2812595
M3 - Conference article
T3 - BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
SP - 626
EP - 633
BT - BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics
PB - Association for Computing Machinery, Inc
T2 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2015
Y2 - 9 September 2015 through 12 September 2015
ER -