{ "cells": [ { "cell_type": "markdown", "id": "laden-daisy", "metadata": {}, "source": [ "# Temporal Patterns" ] }, { "cell_type": "markdown", "id": "brief-browser", "metadata": {}, "source": [ "[](https://colab.research.google.com/github/earmingol/scCellFie/blob/main/docs/source/notebooks/temporal_patterns.ipynb)" ] }, { "cell_type": "markdown", "id": "structural-empire", "metadata": {}, "source": [ "In this tutorial, we will walk you through how to use General Additive Models (GAMs) to identify metabolic tasks following a trend across a trajectory defined by time points, pseudo-time, or ordered labels of cells.\n", "\n", "Here, we will use the results we previously generated for the Human Endometrial Cell Atlas (HECA) dataset [(Mareckova & Garcia-Alonso et al 2023)](https://doi.org/10.1038/s41588-024-01873-w) by running our [Quick Start Tutorial](https://sccellfie.readthedocs.io/en/latest/notebooks/quick_start_human.html)." ] }, { "cell_type": "markdown", "id": "exterior-haiti", "metadata": {}, "source": [ "## This tutorial includes following steps:\n", "* [Loading libraries](#loading-libraries)\n", "* [Loading endometrium results](#loading-endometrium-results)\n", "* [Defining cell trajectory](#defining-cell-trajectory)\n", "* [Running GAM](#running-gam)\n", "* [Filtering GAM results](#filtering-gam-results)\n", "* [Visualization of results](#visualization-of-results)" ] }, { "cell_type": "markdown", "id": "postal-luther", "metadata": {}, "source": [ "## Loading libraries " ] }, { "cell_type": "code", "execution_count": 1, "id": "extreme-harrison", "metadata": {}, "outputs": [], "source": [ "import sccellfie\n", "import scanpy as sc\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "## To avoid warnings\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "id": "guilty-patrol", "metadata": {}, "source": [ "In addition, we set up a folder to save our figures. This folder is stored in the settings of Scanpy:" ] }, { "cell_type": "code", "execution_count": 2, "id": "indonesian-opportunity", "metadata": {}, "outputs": [], "source": [ "sc.settings.figdir = './results/GAM-Figures/'" ] }, { "cell_type": "markdown", "id": "primary-metallic", "metadata": {}, "source": [ "## Loading endometrium results " ] }, { "cell_type": "markdown", "id": "wired-audience", "metadata": {}, "source": [ "We start opening the results previously generated by running the scCellFie pipeline on the HECA dataset. If you haven't run the pipeline yet, please follow [this tutorial](https://sccellfie.readthedocs.io/en/latest/notebooks/quick_start_human.html).\n", "\n", "In this case, we will load the objects that were present in ``results['adata']`` in that tutorial. This object contains:\n", "- ``results['adata']``: contains gene expression in ``.X``.\n", "- ``results['adata'].layers['gene_scores']``: contains gene scores as in the original CellFie paper.\n", "- ``results['adata'].uns['Rxn-Max-Genes']``: contains determinant genes for each reaction per cell.\n", "- ``results['adata'].reactions``: contains reaction scores in ``.X`` so every scanpy function can be used on this object to visualize or compare values.\n", "- ``results['adata'].metabolic_tasks``: contains metabolic task scores in ``.X`` so every scanpy function can be used on this object to visualize or compare values.\n", "\n", "Here, we will name this object directly as ``adata``. Each of the previous elements should be under ``adata.``, as for example ``adata.metabolic_tasks``." ] }, { "cell_type": "code", "execution_count": 3, "id": "loving-prediction", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./results//Human_HECA_scCellFie.h5ad was correctly loaded\n", "./results//Human_HECA_scCellFie_reactions.h5ad was correctly loaded\n", "./results//Human_HECA_scCellFie_metabolic_tasks.h5ad was correctly loaded\n" ] } ], "source": [ "adata = sccellfie.io.load_adata(folder='./results/',\n", " filename='Human_HECA_scCellFie'\n", " )" ] }, { "cell_type": "markdown", "id": "supreme-portuguese", "metadata": {}, "source": [ "In this case, we are interested in the metabolic task scores, which can be found in the ``adata.metabolic_tasks`` AnnData object." ] }, { "cell_type": "code", "execution_count": 4, "id": "crucial-channel", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 90001 × 215\n", " obs: 'n_genes', 'sample', 'percent_mito', 'n_counts', 'Endometriosis_stage', 'Endometriosis', 'Hormonal treatment', 'Binary Stage', 'Stage', 'phase', 'dataset', 'Age', 'lineage', 'celltype', 'label_long'\n", " uns: 'Binary Stage_colors', 'Biopsy_type_colors', 'Endometrial_pathology_colors', 'Endometriosis_stage_colors', 'GarciaAlonso_celltype_colors', 'Group_colors', 'Hormonal treatment_colors', 'Library_genotype_colors', 'Mareckova_celltype_colors', 'Mareckova_epi_celltype_colors', 'Mareckova_lineage_colors', 'Processing_colors', 'Symbol_colors', 'Tan_cellsubtypes_colors', 'Tan_celltype_colors', 'Treatment_colors', 'celltype_colors', 'dataset_colors', 'genotype_colors', 'hvg', 'label_long_colors', 'leiden', 'leiden_R_colors', 'leiden_colors', 'lineage_colors', 'neighbors', 'normalization', 'phase_colors', 'umap'\n", " obsm: 'X_scVI', 'X_umap'\n", " obsp: 'connectivities', 'distances'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata.metabolic_tasks" ] }, { "cell_type": "markdown", "id": "focused-orbit", "metadata": {}, "source": [ "## Defining cell trajectory \n", "\n", "We can model the behavior of metabolic tasks across a trajectory of cells using GAMs. This trajectory can be defined by time points, pseudo-time, or pre-defined by ordered labels.\n", "\n", "To illustrate its application, we will focus on glandular epithelial cells in the endometrium, that differentiate across distinct phases of the menstrual cycle as shown in the figure below. In this figure, we observed the layers of the endometrium, the main glandular cell types and the trajectory they follow across the menstrual phases.\n", "\n", "{ width=50% }" ] }, { "cell_type": "markdown", "id": "silver-banking", "metadata": {}, "source": [ "Thus, we can define our trajectory based on this prior knowledge from the endometrial biology." ] }, { "cell_type": "code", "execution_count": 5, "id": "complicated-wagner", "metadata": {}, "outputs": [], "source": [ "trajectory = ['SOX9_basalis', 'SOX9_functionalis_I', 'SOX9_functionalis_II', 'preGlandular', 'Glandular', 'Glandular_secretory']" ] }, { "cell_type": "markdown", "id": "modular-overall", "metadata": {}, "source": [ "Next, we need to make sure that we use cells labeled with these cell types, but also that they are present in the proper menstrual phases. Additionally, we only consider cells from Control samples." ] }, { "cell_type": "code", "execution_count": 6, "id": "fundamental-metro", "metadata": {}, "outputs": [], "source": [ "def filter_donors(adata, only_control = True):\n", " df = adata.obs\n", " # First, include only cells in the menstrual phase where they are expected to be\n", " cond_filter = (df.Stage.isin(['Proliferative', 'Proliferative Disordered', 'Proliferative Late']) & \\\n", " df.celltype.isin(['SOX9_functionalis_I', 'SOX9_functionalis_II']) \n", " ) | \\\n", " (df.Stage.isin(['Secretory Early', 'Secretory Early-Mid', 'Secretory Mid', 'Secretory Late',]) & \\\n", " df.celltype.isin(['preGlandular', 'Glandular', 'Glandular_secretory',]) \n", " ) | \\\n", " (df.Stage.isin(['Proliferative', 'Proliferative Disordered', 'Proliferative Late', \n", " 'Secretory Early', 'Secretory Early-Mid', 'Secretory Mid', 'Secretory Late',]) & \\\n", " df.celltype.isin(['SOX9_basalis', ]) \n", " )\n", " # Then, include cells that are only in Control Samples\n", " if only_control:\n", " cond_filter = cond_filter & (df.Endometriosis == 'Control')\n", " return adata[cond_filter]" ] }, { "cell_type": "markdown", "id": "hollywood-double", "metadata": {}, "source": [ "We create the ``mt_adata`` variable to represent the ``adata.metabolic_tasks`` object after filtering the cells" ] }, { "cell_type": "code", "execution_count": 7, "id": "genuine-nelson", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(7974, 215)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mt_adata = filter_donors(adata.metabolic_tasks)\n", "mt_adata.shape" ] }, { "cell_type": "markdown", "id": "crucial-tribune", "metadata": {}, "source": [ "## Running GAM \n", "\n", "GAMs extend linear models by allowing non-linear relationships between predictors and the response variable, while maintaining interpretability.\n", "\n", "A GAM models the expected value of a response variable $y$ as the sum of smooth functions of predictors:\n", "\n", "$$\n", "y = \\beta_0 + f_1(x_1) + f_2(x_2) + \\cdots + f_p(x_p)\n", "$$\n", "\n", "Where:\n", "- $y$ is the response variable (in our case, **the inferred activity of a metabolic task**),\n", "- $\\beta_0$ is the intercept,\n", "- $f_j(x_j)$ are smooth functions (often splines) applied to the predictors.\n", "\n", "In this analysis, we fit a GAM of the form:\n", "\n", "$$\n", "y_i = \\beta_0 + f(x_i) + \\varepsilon_i\n", "$$\n", "\n", "- $y_i$: inferred metabolic task activity in cell $i$,\n", "- $x_i$: an ordered index representing pseudotime, spatial progression, or a predefined trajectory of cell types,\n", "- $f(x)$: a smooth function capturing non-linear trends in metabolic activity along the trajectory,\n", "- $\\varepsilon_i$: residual noise.\n", "\n", "We use the **[pyGAM](https://pygam.readthedocs.io/)** library to fit this model with **penalized splines**, which help control overfitting by penalizing excessive wiggliness in the function $f(x)$.\n", "\n", "By fitting GAMs in this way, we can visualize and statistically assess whether a metabolic task shows structured variation across cell states, such as upregulation in specific cell types or gradual trends across a biological process (e.g. menstrual cycle)." ] }, { "cell_type": "code", "execution_count": 8, "id": "induced-disabled", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Fitting GAMs for each var in adata: 100%|██████████| 215/215 [00:25<00:00, 8.50it/s]\n" ] } ], "source": [ "gam_results = sccellfie.stats.fit_gam_model(\n", " mt_adata,\n", " cell_type_key='celltype',\n", " cell_type_order=trajectory,\n", ")" ] }, { "cell_type": "markdown", "id": "secret-aaron", "metadata": {}, "source": [ "Typically scCellFie runs GAMs on single cells. Additionally, it allows running\n", "GAMs on pseudo-bulks, by passing the following parameters:\n", "\n", "- ``use_pseudobulk=True`` : enables the use of pseudo-bulks instead of single cells.\n", "- ``pseudobulk_agg='trimean'`` : specifies the aggregation method to summarize single cells into pseudo-bulks.\n", "- ``n_pseudobulks=10`` : number of pseudo-bulks.\n", "- ``cells_per_bulk=100`` : number of single cells included in each pseudo-bulk.\n", "\n", "Pseudo-bulks will be built on the ``cell_type_key``.\n", "\n", "\n", "
| \n", " | n_samples | \n", "edof | \n", "scale | \n", "AIC | \n", "loglikelihood | \n", "deviance | \n", "p_value | \n", "explained_deviance | \n", "mcfadden_r2 | \n", "mcfadden_r2_adj | \n", "gene | \n", "significant | \n", "adj_p_value | \n", "significant_fdr | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Glycine degradation | \n", "7974.0 | \n", "5.845167 | \n", "0.029838 | \n", "225703.731784 | \n", "-112845.020725 | \n", "7968.154833 | \n", "1.110223e-16 | \n", "0.708444 | \n", "0.258058 | \n", "0.741929 | \n", "Glycine degradation | \n", "True | \n", "1.147586e-16 | \n", "True | \n", "
| Serine degradation | \n", "7974.0 | \n", "5.845167 | \n", "0.024380 | \n", "282270.796524 | \n", "-141128.553095 | \n", "7968.154833 | \n", "1.110223e-16 | \n", "0.703896 | \n", "0.266481 | \n", "0.733508 | \n", "Serine degradation | \n", "True | \n", "1.147586e-16 | \n", "True | \n", "
| IMP salvage from hypoxanthine | \n", "7974.0 | \n", "5.845167 | \n", "0.328057 | \n", "21182.718189 | \n", "-10584.513927 | \n", "7968.154833 | \n", "1.110223e-16 | \n", "0.687279 | \n", "0.283959 | \n", "0.715884 | \n", "IMP salvage from hypoxanthine | \n", "True | \n", "1.147586e-16 | \n", "True | \n", "
| GMP salvage from guanine | \n", "7974.0 | \n", "5.845167 | \n", "0.328057 | \n", "21182.718189 | \n", "-10584.513927 | \n", "7968.154833 | \n", "1.110223e-16 | \n", "0.687279 | \n", "0.283959 | \n", "0.715884 | \n", "GMP salvage from guanine | \n", "True | \n", "1.147586e-16 | \n", "True | \n", "
| Methionine degradation | \n", "7974.0 | \n", "5.845167 | \n", "0.045556 | \n", "140316.366871 | \n", "-70151.338268 | \n", "7968.154833 | \n", "1.110223e-16 | \n", "0.674027 | \n", "0.279506 | \n", "0.720470 | \n", "Methionine degradation | \n", "True | \n", "1.147586e-16 | \n", "True | \n", "