Metabolic Task Visualizer

In this tutorial, we will walk you through how to export scCellFie’s inferred metabolic activities to be visualized in the scCellFie Metabolic Task Visualizer.

The visualizer enables users to drag and drop the .csv output to visualize results interactively through the portal.

Figure 1.

The dataset we are using is a downsampled version of the Human Endometrial Cell Atlas found in The Repoductive Cell Atlas (Mareckova & Garcia-Alonso et al 2023).

Loading libraries

[1]:

import sccellfie
import scanpy as sc
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import glasbey

## To avoid warnings
import warnings
warnings.filterwarnings("ignore")

First, let’s define file paths for all results of our analysis

[2]:

results_dir = './results/endometrium2023/'

Loading endometrium data

The downsampled Human Endometrial Cell Atlas contains ~90k cells (n_obs) with 17,736 genes (n_vars).

This is a processed data, with raw count matrix stored in .X, including 36 cell type annotations in .obs['celltype'].

[3]:

adata = sc.read(filename='./data/HECA-Subset.h5ad',
                backup_url='https://zenodo.org/records/15072628/files/HECA-Subset.h5ad')

[4]:

adata

[4]:

AnnData object with n_obs × n_vars = 90001 × 17736
    obs: 'n_genes', 'sample', 'percent_mito', 'n_counts', 'Endometriosis_stage', 'Endometriosis', 'Hormonal treatment', 'Binary Stage', 'Stage', 'phase', 'dataset', 'Age', 'lineage', 'celltype', 'label_long'
    var: 'gene_ids', 'feature_types'
    uns: 'Binary Stage_colors', 'Biopsy_type_colors', 'Endometrial_pathology_colors', 'Endometriosis_stage_colors', 'GarciaAlonso_celltype_colors', 'Group_colors', 'Hormonal treatment_colors', 'Library_genotype_colors', 'Mareckova_celltype_colors', 'Mareckova_epi_celltype_colors', 'Mareckova_lineage_colors', 'Processing_colors', 'Symbol_colors', 'Tan_cellsubtypes_colors', 'Tan_celltype_colors', 'Treatment_colors', 'celltype_colors', 'dataset_colors', 'genotype_colors', 'hvg', 'label_long_colors', 'leiden', 'leiden_R_colors', 'leiden_colors', 'lineage_colors', 'neighbors', 'phase_colors', 'umap'
    obsm: 'X_scVI', 'X_umap'
    obsp: 'connectivities', 'distances'

[5]:

adata.uns['neighbors']

[5]:

{'connectivities_key': 'connectivities',
 'distances_key': 'distances',
 'params': {'method': 'umap',
  'metric': 'euclidean',
  'n_neighbors': 15,
  'random_state': 0,
  'use_rep': 'X_scVI'}}

However, we need the raw counts of all genes. So we load the raw version of it and transfer annotations from the processed dataset.

Run scCellFie

Now we run scCellFie on the raw data.

scCellFie provides an option to compute n_neighbors. Since the this dataset contains a pre-computed .uns['neighbors'], we will use neighbors_key='neighbors' instead.

Users may also want to tweak the parameter chunk_size for computing large datasets on local machines.

[6]:

results = sccellfie.run_sccellfie_pipeline(adata,
                                           organism='human',
                                           sccellfie_data_folder=None,
                                           n_counts_col='n_counts',
                                           process_by_group=False,
                                           groupby=None,
                                           neighbors_key='neighbors',
                                           batch_key=None,  # Specify batch_key or leave as None
                                           threshold_key='sccellfie_threshold',
                                           smooth_cells=True,
                                           alpha=0.33,
                                           chunk_size=5000,
                                           disable_pbar=False,
                                           save_folder=None,
                                           save_filename=None
                                          )


==== scCellFie Pipeline: Initializing ====
Loading scCellFie database for organism: human

==== scCellFie Pipeline: Processing entire dataset ====

---- scCellFie Step: Preprocessing data ----

---- scCellFie Step: Preparing inputs ----
Gene names corrected to match database: 22
Shape of new adata object: (90001, 839)
Number of GPRs: 748
Shape of tasks by genes: (215, 839)
Shape of reactions by genes: (748, 839)
Shape of tasks by reactions: (215, 748)

---- scCellFie Step: Smoothing gene expression ----

Smoothing Expression: 100%|██████████| 19/19 [00:44<00:00,  2.33s/it]


---- scCellFie Step: Computing gene scores ----

---- scCellFie Step: Computing reaction activity ----

Cell Rxn Activities: 100%|██████████| 90001/90001 [07:51<00:00, 190.95it/s]


---- scCellFie Step: Computing metabolic task activity ----
Removed 0 metabolic tasks with zeros across all cells.

==== scCellFie Pipeline: Processing completed successfully ====

Export results

The scCellFie results are stored as dictionary and can be retrieved by the dictionary keys.

[7]:

results.keys()

[7]:

dict_keys(['adata', 'gpr_rules', 'task_by_gene', 'rxn_by_gene', 'task_by_rxn', 'rxn_info', 'task_info', 'thresholds', 'organism'])

To access metabolic activities, we need to inspect results['adata']:

The processed single-cell data is located in the AnnData object results['adata'].
The reaction activities for each cell are located in the AnnData object results['adata'].reactions.
The metabolic task activities for each cell are located in the AnnData object results['adata'].metabolic_tasks.

In particular:

results['adata']: contains gene expression in .X.
results['adata'].layers['gene_scores']: contains gene scores as in the original CellFie paper.
results['adata'].uns['Rxn-Max-Genes']: contains determinant genes for each reaction per cell.
results['adata'].reactions: contains reaction scores in .X so every scanpy function can be used on this object to visualize or compare values.
results['adata'].metabolic_tasks: contains metabolic task scores in .X so every scanpy function can be used on this object to visualize or compare values.

Other keys in the results dictionary are associated with the scCellFie database and are already filtered for the elements present in the dataset ('gpr_rules', 'task_by_gene', 'rxn_by_gene', 'task_by_rxn', 'rxn_info', 'task_info', 'thresholds', 'organism').

Summarise results into a table to use on scCellFie Metabolic Task Visualizer 

We want metabolic scores at a cell type level. Here, the column summarizing these groups is 'celltype'.

[8]:

cell_group = 'celltype'

[9]:

report = sccellfie.reports.generate_report_from_adata(results['adata'].metabolic_tasks, cell_group, feature_name='metabolic_task')

Processing tissues:   0%|          | 0/1 [00:00<?, ?it/s]
Processing groups for tissue:   0%|          | 0/36 [00:00<?, ?it/s]
Processing groups for tissue:   6%|▌         | 2/36 [00:00<00:02, 11.95it/s]
Processing groups for tissue:  11%|█         | 4/36 [00:00<00:02, 12.46it/s]
Processing groups for tissue:  17%|█▋        | 6/36 [00:00<00:03,  8.85it/s]
Processing groups for tissue:  22%|██▏       | 8/36 [00:00<00:02, 10.23it/s]
Processing groups for tissue:  28%|██▊       | 10/36 [00:01<00:04,  6.12it/s]
Processing groups for tissue:  33%|███▎      | 12/36 [00:01<00:03,  6.96it/s]
Processing groups for tissue:  39%|███▉      | 14/36 [00:01<00:02,  8.38it/s]
Processing groups for tissue:  44%|████▍     | 16/36 [00:01<00:02,  9.73it/s]
Processing groups for tissue:  50%|█████     | 18/36 [00:02<00:01,  9.24it/s]
Processing groups for tissue:  56%|█████▌    | 20/36 [00:02<00:02,  7.19it/s]
Processing groups for tissue:  58%|█████▊    | 21/36 [00:02<00:02,  7.05it/s]
Processing groups for tissue:  61%|██████    | 22/36 [00:03<00:02,  4.74it/s]
Processing groups for tissue:  64%|██████▍   | 23/36 [00:03<00:02,  4.91it/s]
Processing groups for tissue:  69%|██████▉   | 25/36 [00:03<00:01,  6.52it/s]
Processing groups for tissue:  72%|███████▏  | 26/36 [00:03<00:01,  6.92it/s]
Processing groups for tissue:  75%|███████▌  | 27/36 [00:04<00:02,  3.91it/s]
Processing groups for tissue:  81%|████████  | 29/36 [00:04<00:01,  5.17it/s]
Processing groups for tissue:  86%|████████▌ | 31/36 [00:04<00:00,  6.82it/s]
Processing groups for tissue:  89%|████████▉ | 32/36 [00:04<00:00,  6.76it/s]
Processing groups for tissue:  94%|█████████▍| 34/36 [00:04<00:00,  8.05it/s]
Processing groups for tissue: 100%|██████████| 36/36 [00:04<00:00,  9.60it/s]
Processing tissues: 100%|██████████| 1/1 [00:04<00:00,  4.96s/it]

[10]:

report.keys()

[10]:

dict_keys(['agg_values', 'variance', 'std', 'threshold_cells', 'nonzero_cells', 'cell_counts', 'min_max', 'melted'])

Saving scCellFie reports

Once we have our report data to export, we can save it with the following function in a specific output folder:

[11]:

sccellfie.io.save_data.save_result_summary(results_dict=report, output_directory=results_dir)

Results saved to ./results/endometrium2023/

This function generates multiple CSV files, the ones that we need are Melted.csv which contains the metabolic activities per task for each of the cell types. Another useful file is the Min_max.csv in case we would like to explore the range of scores for each metabolic task either in a single-cell or cell-type level.

Computing scaled trimean using CELLxGENE reference

The Melted.csv file already contains a 'scaled_trimean' column in case we would like to visualize scaled values. However, this is by using the min and max values within the dataset. In case we would like to compare it with different organs and cell types at a cell atlas level, we can use the outputs generated by using the CZI CELLxGENE atlas to scale our metabolic activities.

We start first downloading the range of min and max values in the CZI’s atlas:

[12]:

minmax = pd.read_csv('https://raw.githubusercontent.com/ventolab/sccellfie-website/refs/heads/main/data/CELLxGENEMetabolicTasksMinMax.csv', index_col=0)

[13]:

minmax

[13]:

	(R)-3-Hydroxybutanoate synthesis	3'-Phospho-5'-adenylyl sulfate synthesis	AMP salvage from adenine	ATP generation from glucose (hypoxic conditions) - glycolysis	ATP regeneration from glucose (normoxic conditions) - glycolysis + krebs cycle	Acetoacetate synthesis	Alanine degradation	Alanine synthesis	Arachidonate degradation	Arachidonate synthesis	...	Valine to succinyl-coA	Vesicle secretion	beta-Alanine degradation	beta-Alanine synthesis	cis-vaccenic acid degradation	cis-vaccenic acid synthesis	gamma-Linolenate degradation	gamma-Linolenate synthesis	glyco-cholate synthesis	tauro-cholate synthesis
single_cell_min	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	...	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
single_cell_max	7.419829	11.795217	20.651447	13.613079	4.979552	8.038148	3.030361	7.951063	1.752682	4.296328	...	6.424888	2.780753	2.980583	6.203485	2.161476	4.050148	2.341566	7.578767	2.982760	2.982760
cell_type_min	0.041299	0.000000	0.000000	0.033310	0.024252	0.023852	0.028060	0.025760	0.000000	0.021624	...	0.000000	0.000000	0.025981	0.029364	0.000000	0.013848	0.000000	0.000000	0.000000	0.000000
cell_type_max	4.403184	4.105401	5.187688	7.840590	3.159272	4.670686	1.874831	4.480565	0.854841	2.594763	...	1.313472	0.798270	1.899791	3.463882	1.151658	2.225383	1.263210	1.716187	0.941299	0.941299

4 rows × 218 columns

We then generate a dictionary containing the cell-type level info for each of the metabolic tasks:

[14]:

min_mapper = minmax.T['cell_type_min'].to_dict()
max_mapper = minmax.T['cell_type_max'].to_dict()

And use this information to scale the metabolic scores (‘trimean’ here) in our dataset.

[15]:

melted = report['melted'].copy()
melted['min'] = melted.metabolic_task.map(min_mapper)
melted['max'] = melted.metabolic_task.map(max_mapper)

melted['scaled_trimean'] = (melted['trimean'] - melted['min']) / (melted['max'] - melted['min'])
melted['scaled_trimean'] = melted['scaled_trimean'].apply(lambda x: 0. if x < 0. else 1. if x > 1. else x)

Finally we can generate a new Melted.csv file that includes the scaled values with respect to the CZI CELLxGENE atlas, and use it for visualization.

[16]:

melted.to_csv(f'{results_dir}/Melted.csv', index=False)

Online visualizations

Next, visit the scCellFie Metabolic Task Visualizer portal, you can Drag N’ Drop your saved Melted.csv to visualise the result in a heatmap.

Figure 2.

Figure 3.

Figure 4.

Additional local visualizations

These outputs can be further used to visualize metabolic task activities with a radial plot. In this case, all cell types within the tissue are included. The maximum activity per task, across all cell types, is shown.

[17]:

fig, ax = sccellfie.plotting.create_radial_plot(melted, results['task_info'], figsize=(6,6), sort_by_value=False, ylim=1.0)

../_images/notebooks_visualizer_45_0.png

Or we can also plot the activities of each cell type:

[18]:

# Create figure with subplots
fig = plt.figure(figsize=(16, 16))
ax1 = fig.add_subplot(221, projection='polar')
ax2 = fig.add_subplot(222, projection='polar')
ax3 = fig.add_subplot(223, projection='polar')
ax4 = fig.add_subplot(224, projection='polar')

for i, (cell, ax) in enumerate(zip(['eStromal', 'dStromal_early', 'dStromal_mid', 'dStromal_late'], [ax1, ax2, ax3, ax4])):
    sccellfie.plotting.create_radial_plot(melted,
                                          results['task_info'],
                                          cell_type=cell,
                                          ax=ax,
                                          show_legend=i == 1,
                                          ylim=1.0
                                          )

../_images/notebooks_visualizer_47_0.png