General Overview

This notebook provides a general overview of scCellFie analyses, including the inference of metabolic tasks, metabolic marker identification, differential analysis, and cell-cell communication.

To illustrate a different application, the dataset we are using here includes BALF samples from donors with varying severities of COVID-19 (Liao et al 2020).

This tutorial includes following steps:

Loading libraries
Loading COVID-19 data
Run scCellFie
Export results
Identification of metabolic task markers and visualization
Differential metabolic analysis
Cell-cell communication

Loading libraries

[ ]:

import sccellfie
import scanpy as sc
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import glasbey

import textwrap

## To avoid warnings
import warnings
warnings.filterwarnings("ignore")

Loading COVID-19 data

We start loading our single-cell data. In this case, adata.X already contains raw counts, which are the main inputs of scCellFie.

[3]:

adata = sc.read(filename='BALF-COVID19.h5ad',
                backup_url='https://zenodo.org/record/7535867/files/BALF-COVID19-Liao_et_al-NatMed-2020.h5ad')

[4]:

adata

[4]:

AnnData object with n_obs × n_vars = 63103 × 33538
    obs: 'sample', 'sample_new', 'group', 'disease', 'hasnCoV', 'cluster', 'celltype', 'condition'

Run scCellFie

Now we run scCellFie on the raw data.

[5]:

batch_key = 'sample' # Specify batch_key or leave as None

[6]:

results = sccellfie.run_sccellfie_pipeline(adata, # Raw counts
                                           organism='human',
                                           sccellfie_data_folder=None,
                                           n_counts_col='n_counts', # Column where total counts per cells are stored in adata.obs
                                           process_by_group=False, # Whether to do the processing by cell groups
                                           groupby=None, # Column indicating cell groups if `process_by_group=True`
                                           neighbors_key='neighbors', # Neighbors information if precomputed. Otherwise, it will be computed here
                                           n_neighbors=10, # Number of neighbors to use
                                           batch_key=batch_key, # None if batches are not included
                                           threshold_key='sccellfie_threshold', # This is for using the default database. If personalized thresholds are used, specificy column name
                                           smooth_cells=True, # Whether to perform gene expression smoothing before running the tool
                                           alpha=0.33, # Importance of neighbors' expression for the smoothing (0 to 1)
                                           chunk_size=5000, # Number of chunks to run the processing steps (helps with the memory)
                                           disable_pbar=False,
                                           save_folder=None, # In case results will be saved. If so, results will not be returned and should be loaded from the folder (see sccellfie.io.load_data function
                                           save_filename=None # Name for saving the files, otherwise a default name will be used
                                          )


==== scCellFie Pipeline: Initializing ====
Loading scCellFie database for organism: human

==== scCellFie Pipeline: Processing entire dataset ====

---- scCellFie Step: Preprocessing data ----

---- scCellFie Step: Computing neighbors ----

2025-04-14 13:02:45.789400: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1744635765.810778   14108 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744635765.817489   14108 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-14 13:02:45.844525: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


---- scCellFie Step: Preparing inputs ----
Gene names corrected to match database: 22
Shape of new adata object: (63103, 929)
Number of GPRs: 787
Shape of tasks by genes: (218, 929)
Shape of reactions by genes: (787, 929)
Shape of tasks by reactions: (218, 787)

---- scCellFie Step: Smoothing gene expression ----

Smoothing Expression: 100%|██████████| 13/13 [01:09<00:00,  5.36s/it]


---- scCellFie Step: Computing gene scores ----

---- scCellFie Step: Computing reaction activity ----

Cell Rxn Activities: 100%|██████████| 63103/63103 [05:49<00:00, 180.48it/s]


---- scCellFie Step: Computing metabolic task activity ----
Removed 1 metabolic tasks with zeros across all cells.

==== scCellFie Pipeline: Processing completed successfully ====

Export results

[7]:

results.keys()

[7]:

dict_keys(['adata', 'gpr_rules', 'task_by_gene', 'rxn_by_gene', 'task_by_rxn', 'rxn_info', 'task_info', 'thresholds', 'organism'])

To access metabolic activities, we need to inspect results['adata']:

The processed single-cell data is located in the AnnData object results['adata'].
The reaction activities for each cell are located in the AnnData object results['adata'].reactions.
The metabolic task activities for each cell are located in the AnnData object results['adata'].metabolic_tasks.

In particular:

results['adata']: contains gene expression in .X.
results['adata'].layers['gene_scores']: contains gene scores as in the original CellFie paper.
results['adata'].uns['Rxn-Max-Genes']: contains determinant genes for each reaction per cell.
results['adata'].reactions: contains reaction scores in .X so every scanpy function can be used on this object to visualize or compare values.
results['adata'].metabolic_tasks: contains metabolic task scores in .X so every scanpy function can be used on this object to visualize or compare values.

Other keys in the results dictionary are associated with the scCellFie database and are already filtered for the elements present in the dataset ('gpr_rules', 'task_by_gene', 'rxn_by_gene', 'task_by_rxn', 'rxn_info', 'task_info', 'thresholds', 'organism').

Save single-cell results

We can save our single-cell results contained in the AnnData objects (results['adata']) into a specific folder. Here we only need to specify the output folder, and the basename that our AnnData objects will have. This function below saves the expression object (results['adata']) and those containing scores for reactions and metabolic tasks (results['adata'].reactions and results['adata'].metabolic_tasks, respectively), as separate files.

[ ]:

sccellfie.io.save_adata(adata=results['adata'], output_directory='./results/', filename='Human_COVID19_scCellFie')

Identification of metabolic task markers and visualization

To identify markers, we can use different approaches. Here we show an approach based on TF-IDF, which comes from the Natural Language Processing field, and the logistic regression implemented in Scanpy.

[8]:

# Our column indicating the cell types in the adata.obs dataframe
cell_group = 'celltype'

[9]:

# We use glasbey to expand the palette into a larger number of colors
# This is useful when we have many cell types
palette = glasbey.extend_palette('Set2',
                                 palette_size=max([10, results['adata'].metabolic_tasks.obs[cell_group].unique().shape[0]]))

Detection using TF-IDF

scCellFie runs a TF-IDF approach, as implemented in the tool SoupX in R. We define a express_cut=5*np.log(2) to define cells expressing each metabolic task. Here, a value of 5*np.log(2) indicates that all reactions in the task are over the threshold value of the determinant gene (or “active”). This condition is often rare in single-cell data due to data sparsity (rarely all reactions in a task are over their determinant gene’s threshold), but it is a good approach for identifying markers.

[10]:

mrks = sccellfie.external.quick_markers(results['adata'].metabolic_tasks,
                                        cluster_key=cell_group,
                                        n_markers=20,
                                        express_cut=5*np.log(2))

[11]:

mrks.head()

[11]:

	gene	cluster	tf	idf	tf_idf	gene_frequency_outside_cluster	gene_frequency_global	second_best_tf	second_best_cluster	pval	qval
0	AMP salvage from adenine	B	0.212121	2.324907	0.493162	0.097433	0.097792	0.242395	NK	0.000001	0.000000e+00
1	Fructose degradation (to glucose-3-phosphate)	B	0.474747	0.915665	0.434710	0.400016	0.400250	0.542776	NK	0.019813	2.608937e-66
2	ATP generation from glucose (hypoxic condition...	B	0.282828	1.489275	0.421209	0.225356	0.225536	0.458583	T	0.034830	0.000000e+00
3	Glucose to lactate conversion	B	0.186869	1.984669	0.370872	0.137270	0.137426	0.361966	T	0.031173	0.000000e+00
4	Conversion of 1-phosphatidyl-1D-myo-inositol 4...	B	0.116162	3.081438	0.357945	0.045672	0.045893	0.135940	Epithelial	0.000045	2.539957e-168

By looking at the distributions given each task TF-IDF scores, we can do some filtering to ensure more specific tasks.

[12]:

mrks['tf_idf'].hist()

[12]:

<Axes: >

../_images/notebooks_extended_quick_start_27_1.png

[13]:

scatter = plt.scatter(mrks['tf'], mrks['idf'], c=mrks['tf_idf'], cmap='YlGnBu')
plt.colorbar(scatter)
plt.xlabel('TF')
plt.ylabel('IDF')

[13]:

Text(0, 0.5, 'IDF')

../_images/notebooks_extended_quick_start_28_1.png

scCellFie includes a filtering function that fits a hyperbolic curve, while the user can define some thresholds based on the tf-idf scores, and the ratio between the top and second best cell clusters (tf_ratio).

[14]:

x_col = 'tf'
y_col = 'idf'
df = mrks
tfidf_threshold = 0.3
tf_ratio = 1.2

# Visualization
plt.figure(figsize=(6, 6))

# Plot all points
plt.scatter(df[x_col], df[y_col], alpha=0.5, c='lightgray', label='All points')

# Plot selected points
filtered_mrks, curve = sccellfie.external.filter_tfidf_markers(df, tf_col=x_col, idf_col=y_col, tfidf_threshold=tfidf_threshold, tf_ratio=tf_ratio)

plt.scatter(filtered_mrks[x_col], filtered_mrks[y_col], c='red', label='Selected points')
plt.plot(*curve, label='Fit curve')

plt.xlabel('TF')
plt.ylabel('IDF')
plt.legend()

[14]:

<matplotlib.legend.Legend at 0x7fe01ea8f2e0>

../_images/notebooks_extended_quick_start_30_1.png

[15]:

tf_idf_mrks = filtered_mrks['gene'].unique().tolist()
len(tf_idf_mrks)

[15]:

Here we identified a number of metabolic markers, which we can visualize using regular UMAP visualizations.

[16]:

plt.rcParams['figure.figsize'] = (3,3)
plt.rcParams['font.size'] = 10

sc.pl.embedding(results['adata'].metabolic_tasks,
                color=[cell_group] + tf_idf_mrks,
                ncols=2,
                palette=palette,
                frameon=False,
                basis='X_umap',
                wspace=0.7,
                title=["\n".join(textwrap.wrap(t, width=60)) for t in [cell_group] + tf_idf_mrks],
                cmap='OrRd'
               )

../_images/notebooks_extended_quick_start_33_0.png

We can further inspect the metabolic score distribution across single cells per cell type for these markers.

[17]:

fig, axes = sccellfie.plotting.create_multi_violin_plots(results['adata'].metabolic_tasks,
                                                         features=tf_idf_mrks,
                                                         groupby=cell_group,
                                                         stripplot=False,
                                                         n_cols=3,
                                                         ylabel='Metabolic Score'
                                                        )

../_images/notebooks_extended_quick_start_35_0.png

Detection using Logistic Regression in Scanpy

Similarly, we can use another approach for identifying metabolic markers. In this case, a logistic regression implemented in Scanpy.

[18]:

method = 'logreg'
sc.tl.rank_genes_groups(results['adata'].metabolic_tasks, cell_group, method=method,
                        use_raw=False, key_added = method)

sc.pl.rank_genes_groups(results['adata'].metabolic_tasks, n_genes=20, sharey=True, key=method, fontsize=4)

../_images/notebooks_extended_quick_start_38_0.png

[19]:

scanpy_df = sc.get.rank_genes_groups_df(results['adata'].metabolic_tasks,
                                          key=method,
                                          group=None)

In this case we can also inspect them using dot plots.

[20]:

sc.pl.rank_genes_groups_dotplot(results['adata'].metabolic_tasks, n_genes=5, groupby=cell_group,
                                key=method, use_raw=False, standard_scale='var',
                               )

WARNING: dendrogram data not found (using key=dendrogram_celltype). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.

../_images/notebooks_extended_quick_start_41_1.png

We can further filter the metabolic tasks given the scores assigned by the logistic regression model.

[21]:

scanpy_df['scores'].hist()

[21]:

<Axes: >

../_images/notebooks_extended_quick_start_43_1.png

[22]:

scores_threshold = 1.0

sc_markers_df = scanpy_df
scanpy_markers = sc_markers_df.loc[sc_markers_df['scores'] > scores_threshold]['names'].unique().tolist()
len(scanpy_markers)

[22]:

And perform the same visualizations.

[23]:

plt.rcParams['figure.figsize'] = (3,3)
plt.rcParams['font.size'] = 10

sc.pl.embedding(results['adata'].metabolic_tasks,
                color=[cell_group] + scanpy_markers,
                ncols=2,
                palette=palette,
                frameon=False,
                basis='X_umap',
                wspace=0.7,
                cmap='OrRd',
                title=["\n".join(textwrap.wrap(t, width=60)) for t in [cell_group] + scanpy_markers],
               )

../_images/notebooks_extended_quick_start_46_0.png

[24]:

fig, axes = sccellfie.plotting.create_multi_violin_plots(results['adata'].metabolic_tasks,
                                                         features=scanpy_markers,
                                                         groupby=cell_group,
                                                         stripplot=False,
                                                         n_cols=3,
                                                         ylabel='Metabolic Score'
                                                        )

../_images/notebooks_extended_quick_start_47_0.png

Visualize Markers from both methods

We can combine the metabolic tasks from different approaches too:

[25]:

both_markers = sorted(set(tf_idf_mrks + scanpy_markers))
len(both_markers)

[25]:

Select only highly active markers

And if we’d like to, we can filter them given their activity level. Here, we first aggregate the metabolic activity of the markers into a cell-type level.

[26]:

agg = sccellfie.expression.aggregation.agg_expression_cells(results['adata'].metabolic_tasks[:, both_markers],
                                                            groupby=cell_group,
                                                            agg_func='trimean')

Then, we filter markers given ther average activity (trimean). Our threshold here was an aggregated score of 1 in at least one cell type.

[27]:

both_markers = agg.T.loc[(agg.T > 1.).any(axis=1)].index.tolist()
len(both_markers)

[27]:

And finally, visualize both markers that are “more active”.

[28]:

plt.rcParams['figure.figsize'] = (3,3)
plt.rcParams['font.size'] = 10

sc.pl.embedding(results['adata'].metabolic_tasks,
                color=[cell_group] + both_markers,
                ncols=2,
                palette=palette,
                frameon=False,
                basis='X_umap',
                wspace=0.7,
                cmap='OrRd',
                title=["\n".join(textwrap.wrap(t, width=60)) for t in [cell_group] + both_markers],
               )

../_images/notebooks_extended_quick_start_57_0.png

[29]:

fig, axes = sccellfie.plotting.create_multi_violin_plots(results['adata'].metabolic_tasks,
                                                         features=both_markers,
                                                         groupby=cell_group,
                                                         stripplot=False,
                                                         n_cols=4,
                                                         ylabel='Metabolic Score'
                                                        )

../_images/notebooks_extended_quick_start_58_0.png

Differential metabolic analysis

A differential analysis can be done between conditions (per cell type) to identify metabolic changes.

Select conditions to compare

[30]:

adata.obs.condition.unique()

[30]:

['Control', 'Moderate COVID-19', 'Severe COVID-19']
Categories (3, object): ['Control', 'Moderate COVID-19', 'Severe COVID-19']

[31]:

contrasts = [('Control', 'Severe COVID-19')]
condition_key = 'condition'

Perform differential analysis

[32]:

dma = sccellfie.stats.scanpy_differential_analysis(results['adata'].metabolic_tasks,
                                                   cell_type=None,
                                                   cell_type_key=cell_group,
                                                   condition_key=condition_key,
                                                   min_cells=20,
                                                   condition_pairs=contrasts)

Processing DE analysis: 100%|██████████| 10/10 [00:14<00:00,  1.42s/it]


Excluded comparisons due to insufficient cells:

pDC:
  - Control vs Severe COVID-19 (n1=12, n2=76)

Plasma:
  - Control vs Severe COVID-19 (n1=3, n2=1024)

Mast:
  - Control vs Severe COVID-19 (n1=6, n2=39)

Neutrophil:
  - Control vs Severe COVID-19 (n1=0, n2=1603)

Identify significant changes per cell type

After we performed our differential analysis, we need to filter them. We can filter tasks by the adjusted P value (FDR), and by the Cohen’s D, which is a variance-normalized effect size and helps against outliers. If we want to we can further add a threshold for the logFC value, which here we are not doing (logfc_threshold = 0.0).

Typically, a Cohen’s D greater to 0.5 should be a decent effect size. A value of 0.75 should be a good effect size, and if we want to be more strict, we can use a value of 1.0, as done here.

[33]:

cohen_threshold = 1.0
pval_threshold = 0.05
logfc_threshold = 0.0

fig, ax, sig = sccellfie.plotting.create_beeswarm_plot(dma,
                                                       x='cohens_d',
                                                       cohen_threshold=cohen_threshold,
                                                       pval_threshold=pval_threshold,
                                                       logfc_threshold=logfc_threshold,
                                                       show_n_significant=False,
                                                       strip_jitter=True,
                                                       lgd_bbox_to_anchor=(1.01, 1),
                                                       ticks_fontsize=16,
                                                       labels_fontsize=18,
                                                       figsize=(12, 10),
                                                       #save=scCellFie-Diff-Analysis.pdf
                                                      )

../_images/notebooks_extended_quick_start_67_0.png

[34]:

sig_tasks = sig.reset_index().feature.unique().tolist()
len(sig_tasks)

[34]:

Plot number of major task groups up/down regulated

We use our outputs from scCellFie to calculate the number of significant tasks given their major categories located in results['task_info'].

[35]:

sccellfie_db = results

[36]:

def plot_single(condition, ax, pathway, sig_data):
    pathway_counts = {'System': pd.DataFrame(), 'Subsystem': pd.DataFrame()}

    for c, df in sig_data.reset_index().groupby('cell_type'):
        for k, v in pathway_counts.items():
            df2 = df.set_index('feature').join(sccellfie_db['task_info'].set_index('Task'), how='inner')
            df2 = df2.loc[df2.log2FC > 0] if condition == 'up' else df2.loc[df2.log2FC < 0]
            df2 = df2.value_counts(k).to_frame().reset_index()
            df2['cell_type'] = c
            df2 = df2[['cell_type', k, 'count']]
            df2[k] = df2[k].apply(lambda x: x.upper())
            pathway_counts[k] = pd.concat([pathway_counts[k], df2])

    df = pathway_counts[pathway].sort_values('count')
    totals = df.groupby('cell_type')['count'].sum()

    df_normalized = df.copy()
    df_normalized['fraction'] = df.groupby('cell_type')['count'].transform(lambda x: x / x.sum())
    pivoted_data = df_normalized.pivot(index='cell_type', columns=pathway, values='fraction')
    pivoted_data = pivoted_data.sort_index().fillna(0.)

    n_systems = len(pivoted_data.columns)
    colors = glasbey.extend_palette('Set2', palette_size=max([n_systems, 10]))

    pivoted_data.plot(kind='barh', stacked=True, ax=ax, color=colors, legend=False)
    ax.set_xlim((0, 1))
    ax.invert_yaxis()

    ylabels = ax.get_yticklabels()
    ax.tick_params(axis='both', which='major', labelsize=12)

    for idx, cell_type in enumerate(pivoted_data.index):
        ax.text(1.02, idx, f'n={totals[cell_type]:,}', va='center', fontsize=12)

    ax.set_xlabel('Fraction', fontsize=14)
    ax.set_ylabel('')

    return pivoted_data.columns, colors

Then we plot these numbers

[37]:

from matplotlib.colors import to_rgb

pathway = 'System'
condition1_color='#8B0000'
condition2_color='#000080'
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(3, 6), height_ratios=[1, 1])

legend_elements, colors = plot_single('up', ax1, pathway, sig)
plot_single('down', ax2, pathway, sig)

ax1.set_title('Severe COVID-19', fontsize=16, pad=10, color=to_rgb(condition2_color))
ax2.set_title('Control', fontsize=16, pad=10, color=to_rgb(condition1_color))

plt.figlegend(labels=legend_elements,
              bbox_to_anchor=(0.5, -0.005),
              ncols=2 if pathway == 'System' else 2,
              frameon=False,
              loc='upper center')

plt.subplots_adjust(hspace=0.5)
# For saving this figure, uncomment the code below
# plt.savefig(f'./scCellFie-DA-{pathway}-Combined.pdf',
#             dpi=300,
#             bbox_inches='tight')

../_images/notebooks_extended_quick_start_74_0.png

Visualize changes in distributions across single cells

For the significant tasks, we can also compare their distribution of metabolic scores for each condition.

[38]:

cell_show = ['Epithelial', 'Macrophages', 'mDC']

for c in cell_show:
    sig_features = [idx for idx in sig.loc[c].sort_values('log2FC').index]
    N = len(sig_features)
    if N > 0:
        width = min([18, 1.5 + 0.5*N])
        fig, ax = sccellfie.plotting.create_comparative_violin(adata=results['adata'].metabolic_tasks,
                                                               significant_features=sig_features,
                                                               group1='Control',
                                                               group2='Severe COVID-19',
                                                               condition_key=condition_key,
                                                               cell_type_key=cell_group,
                                                               celltype=c,
                                                               xlabel='',
                                                               ylabel='Metabolic Score',
                                                               palette=['coral','cornflowerblue'],
                                                               wrapped_title_length=100,
                                                               figsize=(width, 4),
                                                               fontsize=14,
                                                               title=c,
                                                               tight_layout=False,
                                                               #save=f'DE-Dist-{cname}'
                                                              )

../_images/notebooks_extended_quick_start_77_0.png

../_images/notebooks_extended_quick_start_77_1.png

../_images/notebooks_extended_quick_start_77_2.png

Cell-cell communication

Transfer metabolic tasks into gene expression (all variables in one AnnData object)

Since scCellFie generates separate AnnData objects containing either reaction scores (results['adata'].reactions) or task scores (results['adata'].metabolic_tasks), we need to put both gene expression and metabolic scores into the same AnnData object. Here, we will transfer metabolic scores into the AnnData object containing gene expression.

First, we generate our AnnData with gene expression values in log1p(CP10k).

[39]:

bdata = adata.copy()

Our original adata had raw counts, but after running scCellFie these were converted to CP10k, so we only need to run sc.pp.log1p().

[40]:

sc.pp.log1p(bdata)

We assign a ‘type’ column for the features in each AnnData, so we can easily recognize which are genes and tasks.

[41]:

results['adata'].metabolic_tasks.var['type'] = 'metabolic score'
bdata.var['type'] = 'gene expression'

Now we use a built-in function in scCellFie to transfer features from one source adata into another target adata. In this case we transfer all metabolic tasks (by using var_names=results['adata'].metabolic_tasks.var_names).

[42]:

adata_updated = sccellfie.preprocessing.adata_utils.transfer_variables(
    adata_target=bdata,
    adata_source=results['adata'].metabolic_tasks,
    var_names=results['adata'].metabolic_tasks.var_names
)

[43]:

adata_updated

[43]:

AnnData object with n_obs × n_vars = 63103 × 33755
    obs: 'sample', 'sample_new', 'group', 'disease', 'hasnCoV', 'cluster', 'celltype', 'condition', 'total_counts'
    var: 'type'
    uns: 'normalization', 'neighbors', 'umap', 'log1p'
    obsm: 'X_pca', 'X_umap'
    layers: 'counts'
    obsp: 'distances', 'connectivities'

Compute cell-cell communication based on synthesis and receptor pairs

Using our new AnnData object, we can then compute the communication scores for a list of tuples, where the first element in each tuple is the metabolic task synthesizing the ligand, and the second is its cognate receptor. In this case we use the geometric mean between the cell-type level aggregated values (using the trimean) of metabolic task (ligand) and the log1p(CP10k) gene expression (receptor), for each cell type.

[44]:

ccc_df = sccellfie.communication.compute_communication_scores(adata_updated,
    var_pairs=[('Synthesis of estradiol-17beta (E2) from androstenedione', 'ESR1'), ('Synthesis of L-kynurenine from tryptophan', 'AHR')], # List of LR pairs (synthesis task and receptor)
    groupby=cell_group,
    communication_score='gmean',
    agg_func='trimean',
    ligand_threshold=np.log(2), # Greater than this metabolic activity for consideringing it active
    receptor_threshold=0., # Greater than this expression for considering it active
)

Among the outputs, we get the 'score' columm, containing the geometric mean in this case. Additionlly, this returns the fraction of cells expressing the ligand, and the receptor. These fractions come from the thresholds passed above to say that a metabolic task and a receptor are “active” or “expressed”, respectively.

[45]:

ccc_df

[45]:

	sender_celltype	receiver_celltype	ligand	receptor	score	ligand_fraction	receptor_fraction
0	B	B	Synthesis of estradiol-17beta (E2) from andros...	ESR1	0.000000	0.050505	0.000000
1	B	B	Synthesis of L-kynurenine from tryptophan	AHR	0.000000	0.095960	0.090909
2	B	Epithelial	Synthesis of estradiol-17beta (E2) from andros...	ESR1	0.000000	0.050505	0.011173
3	B	Epithelial	Synthesis of L-kynurenine from tryptophan	AHR	0.124446	0.095960	0.414804
4	B	Macrophages	Synthesis of estradiol-17beta (E2) from andros...	ESR1	0.000000	0.050505	0.023146
...	...	...	...	...	...	...	...
195	pDC	T	Synthesis of L-kynurenine from tryptophan	AHR	0.000000	0.141892	0.147710
196	pDC	mDC	Synthesis of estradiol-17beta (E2) from andros...	ESR1	0.000000	0.108108	0.007439
197	pDC	mDC	Synthesis of L-kynurenine from tryptophan	AHR	0.095979	0.141892	0.318810
198	pDC	pDC	Synthesis of estradiol-17beta (E2) from andros...	ESR1	0.000000	0.108108	0.006757
199	pDC	pDC	Synthesis of L-kynurenine from tryptophan	AHR	0.000000	0.141892	0.094595

200 rows × 7 columns

Visualize kynurenine - AHR interactions

In this case we can visualize each of the ligand-receptor interactions we passed. Here, we will visualize the kynurenine synthesis and its AHR receptor interaction.

[46]:

kyn = ccc_df.loc[ccc_df.receptor == 'AHR'].sort_values('score', ascending=False)

For clear visualizations, we can filter important interactions. Below, we first filter by 'score', then by the 'ligand_fraction' and 'receptor_fraction'.

[47]:

### IMPORTANT! THRESHOLD FOR IMPORTANT COMMUNICATION HERE IS MEAN + 1 STD
kyn_thresh = kyn.score.mean() + kyn.score.std()
### AND CELL TYPES EXPRESS LIGAND AND RECEPTOR IN MORE THAN 10% OF CELLS
filtered_kyn = kyn.loc[(kyn.ligand_fraction > 0.1) & (kyn.receptor_fraction > 0.1) & (kyn.score >= kyn_thresh)]

Finally, we can visualize a network of cell types using this ligand-receptor pair.

[48]:

fig, ax = sccellfie.plotting.plot_communication_network(filtered_kyn,
                                                        'sender_celltype',
                                                        'receiver_celltype',
                                                        'score',
                                                        panel_size=(6,6),
                                                        network_layout='spring',
                                                        edge_color='magenta',
                                                        edge_width=8,
                                                        edge_arrow_size=12,
                                                        edge_alpha=0.25,
                                                        node_color='#210070',
                                                        node_size=500,
                                                        node_alpha=0.75,
                                                        node_label_size=10,
                                                        node_label_alpha=0.7,
                                                        node_label_offset=(0.05, -0.2),
                                                        title_fontsize=20,
                                                        tight_layout=True
                                                       )

../_images/notebooks_extended_quick_start_100_0.png