Research vision
Our research focuses on understanding how molecular programs establish cell identity, control cell-fate decisions and become altered in disease. We combine statistical modelling, interpretable machine learning and single-cell, spatial and multimodal omics with stem-cell, organoid and disease biology. Across our projects, we aim to reconstruct regulatory programs, build robust computational tools, establish community benchmarks and translate cell-state models into therapeutic hypotheses.
Cell identity and cell-fate decisions
Cell identity and cell-fate decisions are controlled by coordinated molecular programs spanning signalling, transcriptional, epigenomic and post-transcriptional regulation. We study how these trans-regulatory programs change during pluripotency progression, lineage specification and stem-cell differentiation, and how they connect in vitro stem-cell systems with in vivo development. By integrating multi-omic profiling with statistical learning, we reconstruct regulatory networks that explain how cellular identities are established, maintained and redirected.
Related publications:
Lim, B.✢, Chen, C.✢, Fredericks, A., Nilli, E., Mora, S., Weatherstone, M., Loi, T., Newman, P., Aryamanesh, N., Zreiqat, H., Tam, P., Yang, P.† & Gonzalez-Cordero, A.† (2026) Retinoic acid drives cell fate specification, maturation and retinal regionality in human retinal organoids. Nature Communications, in press. [Full Text]
Kim, H., Wang, K., Chen, C., Lin, Y., Tam, PPL., Lin, D., Yang, J. & Yang, P.† (2021) Uncovering cell identity through differential stability with Cepo. Nature Computational Science, 1, 784-790. [Full Text] [Nature Content Sharing link] [BioC R package]
Yang, P.✢†, Humphrey, S.✢†, Cinghu, S.✢, Pathania, R., Oldfield, A., Kumar, D., Perera, D., Yang, J., James, D., Mann, M. & Jothi, R.† (2019) Multi-omic profiling reveals dynamics of the phased progression of pluripotency. Cell Systems, 8(5), 427-445. [Full Text] [Biorxiv version] [The Stem Cell Atlas]
Cinghu, S.✢, Yang, P.✢, Kosak, J., Conway, A., Kumar, D., Oldfield, A., Adelman, K. & Jothi, R. (2017) Intragenic enhancers attenuate host gene expression. Molecular Cell, 68(1), 104–117. [Full Text], [PDF]
Oldfield, A.✢, Yang, P.✢, Conway, A., Cinghu, S., Freudenberg, J., Yellaboina, S. & Jothi, R. (2014). Histone-fold domain protein NF-Y promotes chromatin accessibility for cell type-specific master transcription factors. Molecular Cell, 55(5), 708-722. [Full Text], [PDF]
Lim, B.✢, Chen, C.✢, Fredericks, A., Nilli, E., Mora, S., Weatherstone, M., Loi, T., Newman, P., Aryamanesh, N., Zreiqat, H., Tam, P., Yang, P.† & Gonzalez-Cordero, A.† (2026) Retinoic acid drives cell fate specification, maturation and retinal regionality in human retinal organoids. Nature Communications, in press. [Full Text]
Kim, H., Wang, K., Chen, C., Lin, Y., Tam, PPL., Lin, D., Yang, J. & Yang, P.† (2021) Uncovering cell identity through differential stability with Cepo. Nature Computational Science, 1, 784-790. [Full Text] [Nature Content Sharing link] [BioC R package]
Yang, P.✢†, Humphrey, S.✢†, Cinghu, S.✢, Pathania, R., Oldfield, A., Kumar, D., Perera, D., Yang, J., James, D., Mann, M. & Jothi, R.† (2019) Multi-omic profiling reveals dynamics of the phased progression of pluripotency. Cell Systems, 8(5), 427-445. [Full Text] [Biorxiv version] [The Stem Cell Atlas]
Cinghu, S.✢, Yang, P.✢, Kosak, J., Conway, A., Kumar, D., Oldfield, A., Adelman, K. & Jothi, R. (2017) Intragenic enhancers attenuate host gene expression. Molecular Cell, 68(1), 104–117. [Full Text], [PDF]
Oldfield, A.✢, Yang, P.✢, Conway, A., Cinghu, S., Freudenberg, J., Yellaboina, S. & Jothi, R. (2014). Histone-fold domain protein NF-Y promotes chromatin accessibility for cell type-specific master transcription factors. Molecular Cell, 55(5), 708-722. [Full Text], [PDF]
Interpretable AI for single-cell systems biology
Interpretable AI provides a way to move beyond cell-state prediction towards mechanistic understanding. We develop machine-learning and statistical models that learn from single-cell, spatial and multimodal omics data while exposing the genes, pathways, regulatory programs and cell states that drive predictions. This includes multi-task and ensemble learning, interpretable deep generative models, feature attribution and feature-selection strategies for modelling cell-state transitions, regulatory programs and condition-specific cell identities.
Related publications:
Wagle, M., Liu, C., Liu, Z., Wang, Y., Kellis, M., Patrick, E. & Yang, P.† (2026) Interpretable deep generative ensemble learning for single-cell omics with Hydra. Molecular Systems Biology, in press. [Full Text] [PyPI] [Tutorial] [Repo]
Wagle, M.✢, Long, S.✢, Chen, C., Liu, C. & Yang, P.† (2024) Interpretable deep learning in single-cell omics. Bioinformatics, 40(6), btae374. [Full Text]
Liu, C., Huang, H. & Yang, P.† (2023) Multi-task learning from multimodal single-cell omics with Matilda. Nucleic Acids Research, 51(8), e45. [Full Text] [Repo and tutorial]
Huang, H., Liu, C., Wagle, M. & Yang, P.† (2023) Evaluation of deep learning-based feature selection for single-cell RNA sequencing data analysis. Genome Biology, 24, 259. [Full Text] [Repo]
Cao, Y., Geddes, T., Yang, J. & Yang, P.† (2020) Ensemble deep learning in bioinformatics. Nature Machine Intelligence, 2, 500-508. [Full Text] [Nature Content Sharing link]
Wagle, M., Liu, C., Liu, Z., Wang, Y., Kellis, M., Patrick, E. & Yang, P.† (2026) Interpretable deep generative ensemble learning for single-cell omics with Hydra. Molecular Systems Biology, in press. [Full Text] [PyPI] [Tutorial] [Repo]
Wagle, M.✢, Long, S.✢, Chen, C., Liu, C. & Yang, P.† (2024) Interpretable deep learning in single-cell omics. Bioinformatics, 40(6), btae374. [Full Text]
Liu, C., Huang, H. & Yang, P.† (2023) Multi-task learning from multimodal single-cell omics with Matilda. Nucleic Acids Research, 51(8), e45. [Full Text] [Repo and tutorial]
Huang, H., Liu, C., Wagle, M. & Yang, P.† (2023) Evaluation of deep learning-based feature selection for single-cell RNA sequencing data analysis. Genome Biology, 24, 259. [Full Text] [Repo]
Cao, Y., Geddes, T., Yang, J. & Yang, P.† (2020) Ensemble deep learning in bioinformatics. Nature Machine Intelligence, 2, 500-508. [Full Text] [Nature Content Sharing link]
Single-cell, spatial and multimodal omics tools
We build computational tools and benchmarks for single-cell, spatial and multimodal omics so that biological conclusions are reproducible, interpretable and task-aware. Our work spans data integration, batch correction, cell-type annotation, identity-gene detection, multimodal CITE-seq analysis, phosphoproteomics, spatial transcriptomics and benchmarking standards. A central goal is to move the field from visual or plausibility-based assessment towards quantitative, task-specific evaluation.
Related publications:
Liu, C., Ding, S., Kim, H., Long, S., Xiao, D., Ghazanfar, S. & Yang, P.† (2025) Multitask benchmarking of single-cell multimodal omics integration methods. Nature Methods, 22(11), 2449-2460. [Full Text] [Repo] [Shiny server]
Chen, C.✢, Kim, H.✢† & Yang, P.† (2024) Evaluating spatially variable gene detection methods for spatial transcriptomics data. Genome Biology, 25, 18. [Full Text] [Repo]
Kim, H.✢, Kim, T.✢, Hoffman, N., Xiao, D., James, D., Humphrey, S. & Yang, P.† (2021) PhosR enables processing and functional analysis of phosphoproteomic data. Cell Reports, 34(8), 108771. [Full Text] [BioC R package] [STAR Protocol]
Lin, Y., Cao, Y., Kim, H., Salim, A., Speed, T., Lin, D., Yang, P.† & Yang, J.† (2020) scClassify: sample size estimation and multiscale classification of cells using single and multiple reference. Molecular Systems Biology, 16, e9389. [Full Text] [BioC R package]
Kim, H.✢, Lin, Y.✢, Geddes, T., Yang, J. & Yang, P.† (2020) CiteFuse enables multi-modal analysis of CITE-seq data. Bioinformatics, 36(14), 4137-4143. [Full Text] [Biorxiv version] [BioC R package]
Lin, Y., Ghazanfar, S., Wang, K., Gagnon-Bartsch, J., Lo, K., Su, X., Han, Z., Ormerod, J., Speed, T., Yang, P.† & Yang, J.† (2019) scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets. Proceedings of the National Academy of Sciences of the United States of America, 116(20), 9775-9784. [Full Text] [BioC R package]
Liu, C., Ding, S., Kim, H., Long, S., Xiao, D., Ghazanfar, S. & Yang, P.† (2025) Multitask benchmarking of single-cell multimodal omics integration methods. Nature Methods, 22(11), 2449-2460. [Full Text] [Repo] [Shiny server]
• highlighted in Nature Genetics, doi:10.1038/s41588-025-02437-2, 2025. [fulltext]
Chen, C.✢, Kim, H.✢† & Yang, P.† (2024) Evaluating spatially variable gene detection methods for spatial transcriptomics data. Genome Biology, 25, 18. [Full Text] [Repo]
Kim, H.✢, Kim, T.✢, Hoffman, N., Xiao, D., James, D., Humphrey, S. & Yang, P.† (2021) PhosR enables processing and functional analysis of phosphoproteomic data. Cell Reports, 34(8), 108771. [Full Text] [BioC R package] [STAR Protocol]
Lin, Y., Cao, Y., Kim, H., Salim, A., Speed, T., Lin, D., Yang, P.† & Yang, J.† (2020) scClassify: sample size estimation and multiscale classification of cells using single and multiple reference. Molecular Systems Biology, 16, e9389. [Full Text] [BioC R package]
Kim, H.✢, Lin, Y.✢, Geddes, T., Yang, J. & Yang, P.† (2020) CiteFuse enables multi-modal analysis of CITE-seq data. Bioinformatics, 36(14), 4137-4143. [Full Text] [Biorxiv version] [BioC R package]
Lin, Y., Ghazanfar, S., Wang, K., Gagnon-Bartsch, J., Lo, K., Su, X., Han, Z., Ormerod, J., Speed, T., Yang, P.† & Yang, J.† (2019) scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets. Proceedings of the National Academy of Sciences of the United States of America, 116(20), 9775-9784. [Full Text] [BioC R package]
Stem-cell and organoid systems
Stem-cell and organoid systems allow us to study cell-fate decisions in experimentally tractable models of development, tissue formation and disease. We combine single-cell, spatial and multimodal profiling with computational modelling to assess organoid fidelity, identify molecular controls of differentiation and guide stem-cell engineering. Current applications include retinal organoids and retinal disease models, photoreceptor cell therapy, brain and dopaminergic organoid systems, cardiac differentiation and computationally guided organoid design.
Related publications:
Lim, B.✢, Chen, C.✢, Fredericks, A., Nilli, E., Mora, S., Weatherstone, M., Loi, T., Newman, P., Aryamanesh, N., Zreiqat, H., Tam, P., Yang, P.† & Gonzalez-Cordero, A.† (2026) Retinoic acid drives cell fate specification, maturation and retinal regionality in human retinal organoids. Nature Communications, in press. [Full Text]
Loi, T., Cheng, A., Kim, H., Fernando, M., Nash, B., Aryamanesh, N., Grigg, J., Yang, P., Gonzalez-Cordero, A. & Jamieson, R. (2025) Connecting cilium, stress response, and proteostasis abnormalities inform variant and therapy assessment in RPGRIP1 retinal organoids. Stem Cell Reports, 20(12), 102717. [Full Text]
Toh, H., Xu, L., Chen, C., Yang, P., Sun, A.† & Ouyang, J.† (2025) BrainSTEM: A single-cell multiresolution fetal brain atlas reveals transcriptomic fidelity of human midbrain cultures. Science Advances, 11(44), eadu7944. [Full Text] [Online resource]
Chen, C., Lee, S., Zyner, K., Fernando, M., Nemeruck, V., Wong, E., Marshall, L., Wark, J., Aryamanesh, N., Tam, P., Graham, M.†, Gonzalez-Cordero, A.† & Yang, P.† (2024) Trans-omic profiling uncovers molecular controls of the early human cerebral organoid formation. Cell Reports, 43(5), 114219. [Full Text] [Repo]
Kim, H., O'Hara-Wright, M., Kim, D., Loi, T., Lim, B., Jamieson, R., Gonzalez-Cordero, A.† & Yang, P.† (2023) Comprehensive characterization of fetal and mature retinal cell identity to assess the fidelity of retinal organoids. Stem Cell Reports, 18(1), 175-189. [Full Text] [Eikon Shiny Web Server]
Lim, B.✢, Chen, C.✢, Fredericks, A., Nilli, E., Mora, S., Weatherstone, M., Loi, T., Newman, P., Aryamanesh, N., Zreiqat, H., Tam, P., Yang, P.† & Gonzalez-Cordero, A.† (2026) Retinoic acid drives cell fate specification, maturation and retinal regionality in human retinal organoids. Nature Communications, in press. [Full Text]
Loi, T., Cheng, A., Kim, H., Fernando, M., Nash, B., Aryamanesh, N., Grigg, J., Yang, P., Gonzalez-Cordero, A. & Jamieson, R. (2025) Connecting cilium, stress response, and proteostasis abnormalities inform variant and therapy assessment in RPGRIP1 retinal organoids. Stem Cell Reports, 20(12), 102717. [Full Text]
Toh, H., Xu, L., Chen, C., Yang, P., Sun, A.† & Ouyang, J.† (2025) BrainSTEM: A single-cell multiresolution fetal brain atlas reveals transcriptomic fidelity of human midbrain cultures. Science Advances, 11(44), eadu7944. [Full Text] [Online resource]
Chen, C., Lee, S., Zyner, K., Fernando, M., Nemeruck, V., Wong, E., Marshall, L., Wark, J., Aryamanesh, N., Tam, P., Graham, M.†, Gonzalez-Cordero, A.† & Yang, P.† (2024) Trans-omic profiling uncovers molecular controls of the early human cerebral organoid formation. Cell Reports, 43(5), 114219. [Full Text] [Repo]
Kim, H., O'Hara-Wright, M., Kim, D., Loi, T., Lim, B., Jamieson, R., Gonzalez-Cordero, A.† & Yang, P.† (2023) Comprehensive characterization of fetal and mature retinal cell identity to assess the fidelity of retinal organoids. Stem Cell Reports, 18(1), 175-189. [Full Text] [Eikon Shiny Web Server]
Disease-state modelling and therapeutic prioritisation
We study disease as disrupted or maladaptive regulation of cell identity and cell state. Our goal is to develop condition-aware single-cell learning methods that separate disease programs from patient, batch and cohort effects, identify programs across genes, cells and cell types, and connect those programs to druggable targets and candidate compounds. This disease-state work extends our core cell-identity and cell-fate framework into therapeutic prioritisation for retinal, immune-mediated and other disease systems.
Related publications:
Wagle, M., Liu, C., Liu, Z., Wang, Y., Kellis, M., Patrick, E. & Yang, P.† (2026) Interpretable deep generative ensemble learning for single-cell omics with Hydra. Molecular Systems Biology, in press. [Full Text] [PyPI] [Tutorial] [Repo]
Xiao, D. et al. (2025) Refate: regulatory-network-guided compound prioritisation for cell-state reprogramming. Preprint. [Repo]
Huang, H., Liu, C., Wagle, M. & Yang, P.† (2023) Evaluation of deep learning-based feature selection for single-cell RNA sequencing data analysis. Genome Biology, 24, 259. [Full Text] [Repo]
Cao, Y., Lin, Y., Patrick, E., Yang, P.† & Yang, J.† (2022) scFeatures: Multi-view representations of single-cell and spatial data for disease outcome prediction. Bioinformatics, 38(20), 4745-4753.
Kim, H.✢, Kim, T.✢, Hoffman, N., Xiao, D., James, D., Humphrey, S. & Yang, P.† (2021) PhosR enables processing and functional analysis of phosphoproteomic data. Cell Reports, 34(8), 108771. [Full Text] [BioC R package] [STAR Protocol]
Wagle, M., Liu, C., Liu, Z., Wang, Y., Kellis, M., Patrick, E. & Yang, P.† (2026) Interpretable deep generative ensemble learning for single-cell omics with Hydra. Molecular Systems Biology, in press. [Full Text] [PyPI] [Tutorial] [Repo]
Xiao, D. et al. (2025) Refate: regulatory-network-guided compound prioritisation for cell-state reprogramming. Preprint. [Repo]
Huang, H., Liu, C., Wagle, M. & Yang, P.† (2023) Evaluation of deep learning-based feature selection for single-cell RNA sequencing data analysis. Genome Biology, 24, 259. [Full Text] [Repo]
Cao, Y., Lin, Y., Patrick, E., Yang, P.† & Yang, J.† (2022) scFeatures: Multi-view representations of single-cell and spatial data for disease outcome prediction. Bioinformatics, 38(20), 4745-4753.
Kim, H.✢, Kim, T.✢, Hoffman, N., Xiao, D., James, D., Humphrey, S. & Yang, P.† (2021) PhosR enables processing and functional analysis of phosphoproteomic data. Cell Reports, 34(8), 108771. [Full Text] [BioC R package] [STAR Protocol]