Our research focuses on machine learning machine Learning on networks, social networks, network science, chemoinformatics, and bioinformatics.
Machine Learning on Graphs
Deep learning and representation learning have rapidly emerged as cornerstones of modern artificial intelligence, revolutionizing how machines interpret complex data. At the heart of this evolution is the ability of deep learning models to learn efficient representations from vast amounts of data, transforming them into a format where patterns become discernible and actionable. This capability has unlocked unprecedented achievements in fields ranging from computer vision to natural language processing. As these models delve deeper into learning intricate structures and relationships within unstructured data, the necessity for advanced metrics that can accurately assess and preserve the integrity of these learned representations has become increasingly evident.
Our research group's focus is on graphs, which are increasingly used for modeling information and interaction in various domains including social networks, drug discovery, and modeling of physical/virtual systems. Graph data sets are difficult to analyze, often not reducible, and thus require novel analysis approaches. Despite the successful applicability of deep learning approaches to graph data, there is much we do not understand: what are the right representations for graph data, how is the geometry of representation related to success on a downstream task, what are the appropriate distance measures for analyzing representations, how do representations relate to robustness, and what aspects of representations makes them amenable to interpretations. Within the scope of graph data, we are examining three related concepts of representation, robustness, and explainability. Representations of graphs not only define the success of downstream machine learning tasks but they also capture the robustness of a architecture to perturbations and distribution shifts. Symmetry and invariance of representations allow them to be realistic to specific domains and understand the impact of perturbations. Finally, representations govern the interpretability and explainability of architectures and methods.
Social Networks
Our research on social networks examines the relationship between information content, network structure, and user behavior. We focus on the additional information/attributes beyond just the structure. We also consider network-wide metrics that can compare network states with polar sentiments based on models of opinion dynamics. We are investigating distance metrics that preserve quality and that at the same can be computed efficiently.
Recent convergence of research in social and psychological sciences, dynamic and quantitative modeling, and network science has led to a re-examination of collective team behavior from a quantitative and systems-oriented viewpoint. Teams cannot be understood fully by studying their components (members) in isolation: team performance is not simply a sum of individual performances; and a diversity of opinions among members leads to better group outcomes. However, it is not yet understood how patterns of interactions and relationships among team members impact performance. Specific questions we are exploring include: How to model the dynamics of decision making of teams on intellective and non-intellective tasks? How to model and design mixed teams with humans and AI agents? How does trust and influence evolve in such teams? How to attack (and defend) team behavior? Our focus is on theoretical models as well as empirical studies with humans and AI agents.
Network Science
Network science is an emerging scientific discipline that examines the interconnections among diverse physical or engineered networks, information networks, biological networks, cognitive and semantic networks, and social networks. This field of science seeks to discover common principles, algorithms and tools that govern network behavior. The National Research Council defines Network Science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena." We are developing methodologies, algorithms, and implementations needed for scalable, dynamic, and resilient networks. Specific problems include querying composite networks, modeling dynamic networks, sentiment analysis, analysis of content and user behavior, discovering unusual patterns, and sampling in composite networks. Compared to high dimensional data, analysis of network data is more challenging due to interdependencies between entities, the presence of attributes, and the natural evolution of networks over time.
Applications in Biology, Brain Sciences, and Drug Discovery
Intensive investigations over several decades have revealed the functions of many individual genes, proteins, and pathways. There has been an explosion of data of widely diverse types, arising from genome-wide characterization of transcriptional profiles, protein-protein interactions, genomic structure, genetic phenotype, gene interactions, gene expression, and proteomics. We are developing techniques that can integrate and analyze data from multiple sources efficiently. Research in systems biology has shown that clinical outcomes, such as susceptibility to cancer, depend not only on the expression level of a single protein, but on pathways or network modules. This can be modeled using a network with dynamic node labels and a global dynamic state; the node labels indicate the protein expression levels of an individual and the global state indicates the presence or absence of the disease. To predict the biological outcome or to categorize an individual, we need to find the sub-networks whose local states accurately predict the global network state. Learning discriminative subgraphs is key to understanding the complex relationship that exists between the local and the global states. Furthermore, they provide the platform for higher-order modeling tasks such as network regularization, classification, and regression.
Network-based modeling and characterization of brain architectures has provided both a framework for integrating multi-modal imaging data as well as for understanding the function and dynamics of the brain and its subunits. Brain networks are traditionally constructed either from structural or functional imaging data. Structural brain networks represent properties of physical neuronal bundles using methods such as diffusion MRI. Functional brain networks represent the functional associations between regions estimated by statistical similarities in regional time series, including correlation or coherence. Global network analysis of both functional and structural connectivity has demonstrated that brain networks have characteristic topological properties, including dense modular structures and efficient long-distance paths. The coupling of structure and function and understanding how the brain responds to an external stimulus are exciting open problems. We are using geometrical and topological approaches to answer questions such as: Can we predict the structure-function coupling of an individual? Can we predict/generate the stimulus from the response? Can we transfer models from a group of individuals to a new individual? How to fuse information from different imaging data?
Increased availability of large repositories of chemical compounds and other biochemical data has created new challenges and opportunities for chemical informatics and drug discovery: identification of active substructures and compounds, prediction of physicochemical properties and structure-activity relationships, diversity analysis of compound collections, drug repurposing, and generation of drug-like compounds. We are developing graph-based and geometry-based methods for such tasks.