Our research focuses on machine learning and data mining, social networks, bioinformatics, and brain networks.
Machine Learning and Data Mining on Networks
Current social and scientific endeavors are generating data that can be modeled as graphs: high-throughput biological experiments, screening of chemical compounds, social networks, ecological networks and food-webs, database schemas and ontologies. Most current research concentrates on problems where the graph structure is inherently static and does not change with time. But networks in the real world are dynamic in nature with a wide range of temporal changes—while the topology of networks such as social networks and transportation networks undergoes gradual change (or evolution), the content (information flow, annotations) changes more rapidly. Mining and analysis of these annotated and dynamic graphs is crucial for advancing the state of scientific research, accurate modeling and analysis of existing systems, and engineering of new systems. Our goal is to develop a set of machine learning, analysis, and modeling methods for such networks.
Our research on social networks examines the relationship between information content, network structure, and user behavior. We apply topic models in order to understand the relationship between content of a message and its flow in a network. We focus on the additional information associated with many graphs beyond just their structure, e.g., email, twitter, blog graphs. We also consider network-wide metrics that can compare network states with polar sentiments based on models of opinion dynamics. We are investigating distance metrics that preserve quality and that at the same can be computed efficiently. Finally, we consider the dynamics of groups. Recent convergence of research in social and psychological sciences, dynamic and quantitative modeling, and network science has led to a re-examination of collective team behavior from a quantitative and systems-oriented viewpoint. Teams cannot be understood fully by studying their components (members) in isolation: team performance is not simply a sum of individual performances; and a diversity of opinions among members leads to better group outcomes. However, it is not yet understood how patterns of interactions and relationships among team members (i.e. team networks) impact performance.
Network-based modeling and characterization of brain architectures has provided both a framework for integrating multi-modal imaging data as well as for understanding the function and dynamics of the brain and its subunits. Brain networks are traditionally constructed either from structural or functional imaging data. Structural brain networks represent properties of physical neuronal bundles using methods such as diffusion MRI. Functional brain networks represent the functional associations between regions estimated by statistical similarities in regional time series, including correlation or coherence. Global network analysis of both functional and structural connectivity has demonstrated that brain networks have characteristic topological properties, including dense modular structures and efficient long-distance paths. We are using local geometrical and topological properties in order to identify subgraphs that discriminate between groups of individuals, and to understand how structural and functional networks are coupled.
Intensive investigations over several decades have revealed the functions of many individual genes, proteins, and pathways. There has been an explosion of data of widely diverse types, arising from genome-wide characterization of transcriptional profiles, protein-protein interactions, genomic structure, genetic phenotype, gene interactions, gene expression, and proteomics. We are developing techniques that can integrate and analyze data from multiple sources and models efficiently in order to infer gene and network function. Research in systems biology has shown that clinical outcomes, such as susceptibility to cancer, depend not only on the expression level of a single protein, but on pathways or network modules. This can be modeled using a network with dynamic node labels and a global dynamic state; the node labels indicate the protein expression levels of an individual and the global state indicates the presence or absence of the disease. To predict the biological outcome or to categorize an individual, we need to find the sub-networks whose local states accurately predict the global network state. Learning discriminative subgraphs is key to understanding the complex relationship that exists between the local and the global states. Furthermore, they provide the platform for higher-order modeling tasks such as network regularization, classification and regression. These efforts on biological network analysis are being augmented by a unique distributed digital library of bio-molecular image data. Such searchable databases will make it possible to optimally understand and interpret the data, leading to a more complete and integrated understanding of cellular structure, function and regulation.