Research

The main research focuses of BDRL include:

1. Develop AI-empowered systems biology model to study biological processes, relations, and functions by using multi-omics data. The classic systems biology model views biological processes as dynamic systems over biological networks. Three major types of biological networks include transcriptional regulatory, metabolic, and signaling networks, each having distinct biomechanical or molecular characteristics. We focus on developing novel representation forms of systems biological models to leverage identifiability, mathematical rationale, and biological interpretation in quantifying and studying biological processes by using omics data. The key challenge is that omics data usually only measures one-time points from each biological sample, hence additional assumptions need to be made by utilizing a large sample size or other information extraction approaches to approximate the dynamic characteristics of each biological system. We focus on single-cell or spatiallly resolved transcriptomics data as the biological process in individual cells is more purely determined by signals from the same cell.

(1) Model dynamic transcriptional regulatory relations in single cells. Considering one gene could be regulated by multiple transcriptional regulatory inputs (TRIs) that could be varied through cells, genes regulated by the same TRI may share common expression trends and dependency in the cells having the TRI. For this goal, we are developing AI-empowered representation learning approaches to identify global/local linear/non-linear low-rank structures from matrix/tensor data.

(2) Model metabolic flux by using multi-omics data. Mass carrying flux holds flux balance condition under a steady state. We have recently developed a new graph neural network architecture by (i) considering changes of the flux of each metabolic step can be approximated by gene expression changes of the enzymes involved in the step and (ii) setting quadratic loss for metabolic flux imbalance. Our method enabled the estimation of cell-/sample-wise metabolic flux, metabolomic change, and the genes/metabolites that most affect each flux.

(3) Signaling pathways can be considered as a generalized metabolic flux, in which signals were transmitted through the pathway rather than real biomass. A generalized flux model is under-developing to model the signaling pathway activity by using gene expression or proteomics data.

(4) Model general systems biological process by using omics data. A systems biology model could be presented as a set of differential equations, which hold underlying constraints. For a given systems biology, we generalize its representation form into a network model holding the same constraints, which is called self-constrained model. We further utilize neural network to approximate the non-linear dependency between observed data and the derivatives of each system component. The existence of a systems biology model should follow two basic principles: (i) goodness of fitting to the system assumption: self-constraints should be kept and (ii) parsimony: the model should be simple enough to avoid overfitting. Based on this idea, we can test and quantify each systems biology model in a given data set.

(5) Model more complex systems such as the interaction between two or more cell types or species and the involvement of spatial or temporal dependent micro-environmental gradient of molecules.

(6) In silico perturbation. One ultimate goal of the “modern systems biology” model is to enable an efficient in silico perturbation analysis to identify the genes, metabolites, proteins ,or other molecules or biological functions and relations that could be targeted to revert the physiological states of a cell or tissue system.


2. Advancing the understanding of metabolic variations in human diseases and other health-related conditions. Metabolism affects the health- condition and living quality of humans. Almost all diseases have associated metabolic changes or risk factors. Dr. Zhang and Dr. Cao have long-term experiences in studying biochemical and metabolic changes in human diseases. Currently, the BDRL is focusing on the following topics:

(1) Reconstruction and effective representation of metabolic network for improving the study of complex diseases or environmental systems. We have recently developed a method and relevant mathematical theories to partition a metabolic map into connected modules, where each module is a linear-shaped reaction chain having one input and one output end (single-end). A similar definition of module is also used in KEGG. Compared to other metabolic models, such as EM and FBA, the impact of genes or metabolites in a module can be directly evaluated based on their local topology property, and the computation complexity of metabolic flux is largely reduced from reactions to modules. In our previous study, we reconstructed the major metabolic map of human and mouse metabolic networks. A more comprehensive reconstruction and representation of metabolic networks for 3000+ species is under development.

(2) Develop new systems biology models to estimate metabolic flux and metabolomic changes more accurately. A big gap in metabolic modeling is how to map diverse data types onto quantitative metabolic models in order to elucidate more thoroughly the metabolic fluxome, and hence to achieve functional characterization and accurate quantification of all levels of metabolic activities and their interaction. Although our recent progress and other studies provide a preliminary solution, no existing method can effectively handle the heterogeneity of directions of highly reversible reactions and imbalance of intermediate metabolites among cells within a disease microenvironment.

(3) Comprehensive characterization of metabolic changes in disease conditions. Metabolic variations happen on different levels, such as genes, enzymes, metabolites, network structure, or flux (kinetic models). It remains unsolved on how to design valid metrics and statistical models to quantify the true impact of such variations on context-specific metabolic activity. While large-scale transcriptomics and metabolomics profiles have been generated for various disease studies, only a few studies collected paired transcriptomics and metabolomics data from the same sample source. It is critical to design new approaches that can integrate unpaired transcriptomics and metabolomics for a comprehensive assessment of metabolic variations.

(4) Perturbation analysis. In our previous work, we evaluated the functional impact of each gene on the flux of the module that contains the gene, i.e., the impact of a gene on its local network, by computing their partial derivatives on the local neural network. As our method directly models the stoichiometric and kinetic dependency, the impact is capable to reflect true causal relations. We will extend our analysis to systematically evaluate the functional impacts of metabolic features on the whole metabolic map. (1) We compute the impact of gene, metabolite, or co-factor to the whole fluxome. (2) We study the impact of each gene or metabolite on the detailed flux distribution in its module. (3) We develop a method to study the impact of biochemical variations to the whole fluxome. (4) We will predict the potential missing dependency in the flux estimation model to infer unknown functions of genes, metabolites, and co-factors in metabolism.

(5) Biomarker prediction to support the development of personal wearable sensors. Substantial efforts have been paid to identify disease-associated protein biomarkers using multi-omics data, while disease-specific metabolic biomarkers are much less studied using these large-scale and informative high-throughput resources. Direct evidence, such as targeted metabolomics profiling, could not effectively solve this task because only a small number of metabolites are covered by existing experimental platforms, and the data is only available to a limited set of diseases over a small sample set. On the other hand, variations in gene expressions, metabolites, and protein abundance over the metabolic networks are deterministic of disease-specific metabolic shifts. Hence, proper systems biology model is needed to evaluate the intrinsic heterogeneity of metabolic variations. We are designing a set of advanced computational methods and new data resources to empower a reliable estimation and optimization of disease-specific metabolic biomarkers by using high throughput multi-omics data. Our tasks include the development of (i) a reference map of tissue, disease, and cell type-specific metabolic network, (ii) flux model, and relative abundance of metabolites, (iii) predictors of the presence of metabolites in different human body fluids, and (iv) disease association of each metabolite and their identifiability by different personal wearable sensors. Challenges in developing a robust prediction of disease-specific metabolic biomarkers for personal wearable sensors arise from the following aspects: (1) specificity: the biomarkers should have distinct variations in one or a few disease conditions, (2) sensitivity: variations in the biomarkers could reflect the disease characteristics, (3) presence in biospecimen: the biomarkers should have a high presence in the body fluids based biospecimens, and (4) identifiability: the metabolic markers could be quantitatively measured by the sensors.

(6) Nutrition recommendation. Food and nutrition play a crucial role in health promotion and chronic disease prevention. The scientific connection between food and health has been well documented for many decades, with substantial and increasingly robust evidence showing that a healthy lifestyle—including following a healthy dietary pattern— can help people achieve and maintain good health and reduce the risk of chronic diseases throughout all stages of the lifespan. Although nutrition guidelines for healthy people have been well developed, there is a lack of public level awareness and knowledge of optimizing nutrition and food plan for patients having chronic and aging-related diseases or people who have a specific health-improvement demand, such as muscle growth. We focus on developing and applying (1) our new systems biology model on omics data collected from patient samples and (2) natural language processing approaches to mine the biological research article to infer and exact the relations between nutrients (supplements, herbals, and food) and disease progression or health-related conditions. Our hypothesis is that the biochemical mechanisms derived from patient sample-based molecular data, the relations identified from population study, and the dependency observed from experimental systems could sever independent knowledge in determining the potential impact of a nutrient or supplement to human diseases.

3. Predict drug targets to improve the efficacy of immunotherapy. Numerous variations in the tumor microenvironment (TME) can cause a non-response effect of immunotherapy (PD-1 and other immune checkpoint inhibitors) in solid cancer. While substantial efforts have been made to quantify immuno-cell populations

(1) Antigen presentation pathway and recycling determines the quality of MHC class I antigen presentation in cancer cells and MHC class II in immune cells. To the best of our knowledge, there is no well annotated antigen presentation and recycling process and systems biology models. We are extending our flux estimation method by treating antigen presentation as a mass carrying flow (where peptide is considered as the mass) to estimate the antigen presentation capacity in each individual cell/tissue sample and identify the genes that most affect the antigen presentation in each tumor.

(2) Stromal cells and matrisome affect the infiltration and function of immune cells. We focus on identify changes in stromal cell subtypes, matrisome biosynthesis and components that alter physical or biochemical condition of extracellular matrix and affect immune cell activities in TME.

(3) Develop effective measures to quantify cell type specific functions of immune cells in TME. For example, we use relative cytotoxicity scores to quantify cytotoxicity level of T cells, which involve in (i) identify context specific markers of T cells and cytotoxicity and (ii) quantify relative cytotoxicity as the ratio of the signal of cytotoxicity against the signal of total T cells.

(4) Quantification of biochemical and cellular stresses, such variations of hypoxia, oxidative stress, pH, matrisome and extracellular component variation, and cytokine/chemokine signaling in TME.

4. Develop new AI frameworks or statistical approaches to solve mathematical problems in biological or general data sciences. Noting majority of questions in the study of biological data are transfer learning, which seeks to summarize biologically meaningful stories that could be utilized in further hypothesis raising and validation, our focuses include:

(1) Representation learning to identify biologically meaningful data patterns in the high dimensional matrix or tensor data.

(2) Generalized and explainable graphical models and graph neural networks to model mass-carrying networks or flux-like information passing mechanisms.

(3) Spatial-temporal data modeling.

(4) Transfer learning with a specific focus on learning the consistency and variation of knowledge representation in different systems/data sets.

(5) Natural language processing-based approaches to extract context-specific biological relations from scientific literature data.

Ongoing funding supports:

NSF DBI: CAREER: Mining biological functions from single-cell multi-omics data (Chi Zhang, PI). The major goal of this project is to develop AI empowered systems biological model on single cell multi-omics data and natural language processing-based approaches to construct real-world evidence based biological networks.

• NSF CISE: CRII & CAREER: Disentangled learning of high dimensional biomedical data in the presence of inherent heterogeneity (Sha Cao, PI). The major goal of this previously funded CRII project and ongoing CAREER project is to develop new statistical models to enable identification, inference and quantification of subspace structures and inherent heterogeneity in high dimensional biomedical data.

• American Cancer Society Research Scholar Award: Computational modeling of stromal-immune cell interactions in triple-negative breast cancer (Chi Zhang, PI). The major goal of this project is to identify sub-types of stromal cells and study their functional roles in regulating the presence and cytotoxicity function of T cells in triple negative breast cancer.

• NIH NIGMS R01: Construction of cell type specific gene co-regulation signatures based on single cell transcriptomics data (Chi Zhang, site PI). The major goal of this project to develop computational approaches to predict cell group specific transcriptomics regulatory relations by using scRNA-seq data.

• Selected collaborative projects: 

       Developmental and Hyperactive Ras Tumor (DHART) SPORE (https://dhartspore.org/)

Indiana University Melvin and Bren Simon Comprehensive Cancer Center (https://cancer.iu.edu/)

Indiana Alzheimer's Disease Research Center (https://medicine.iu.edu/research-centers/alzheimers)

IUSM/Purdue AD Drug Discovery Center (https://medicine.iu.edu/expertise/alzheimers/research/preclinical/drug-discovery)

IUSM triple-negative breast cancer immunotherapy study group (https://medicine.iu.edu/research-centers/breast-cancer)

Center for Computational Biology and Bioinformatics (https://medicine.iu.edu/research-centers/computational-biology-bioinformatics)