10X空间转录组和10X单细胞数据联合分析方法汇总

如题所述

这是Seurat包的一个函数,具体的用法我之前分享过,文章在这里 Seurat包的打分函数AddModuleScore ,大家可以看一下,运用这种方法进行单细胞和空间联合分析的文章是发表于cell的文章 Multimodal Analysis of Composition and Spatial Architecture in Human Squamous Cell Carcinoma ,这篇文章我详细解读过,文章在 人鳞状细胞癌成分和空间结构的多峰分析(空间转录组与单细胞文章 ,我们稍微总结一下文献联合的思路

对空间转录组数据进行聚类,表达相似的spot将聚成一类。

运用这个方法在文献 Spatiotemporal analysis of human intestinal development at single-cell resolution ,发表与cell,文章主要研究的是肠道发育,运用这个联合分析分析的方法,主要看看细胞类型在肠道发育过程中的变化

这个方法被发表在Nature Biotechnology 上的文章

这个就需要比较强的背景,尤其对于不规则样本,更需要强有力的生物学背景作为支撑才可以划分出来,第一步就很难。

算法我就不在这里多说了,大家可以看我以前分享的文章,这个方法运用的地方会比较少。

这个方法我之前分享过,文章在 10X单细胞和空间联合分析的方法---cell2location ,这个方法类似于之前普通转录组解卷积的方法,文章在 Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics ,简单看一下过程:

Cell2location maps the spatial distribution of cell types by integrating single-cell RNAseq (scRNA-seq) and multi-cell spatial transcriptomic data from a given tissue。

从原理图上来看,单细胞作为参考,匹配细胞类型的空间位置,这个方向无可改变。
首先第一步:利用模型估计单细胞数据的细胞类型的表达特征。例如,通过使用常规聚类来识别细胞类型和亚群,然后估算平均聚类基因表达谱而获得的结果(如下图)

,我们需要逐步分析。Cell2location基于 负二项式回归 实现此估算步骤,从而可以跨技术和批次可靠地组合数据。(又是数学)。
第二步:cell2location decomposes mRNA counts in spatial transcriptomic data using these reference signatures, thereby estimating the relative and absolute abundance of each cell type at each spatial location。(分解数据)。
Cell2location被实现为可解释的分层贝叶斯模型,thereby (1) providing principled means to account for model uncertainty, (2) accounting for linear dependencies in cell type abundances, (3) modelling differences in measurement sensitivity across technologies, and (4) accounting for unexplained/residual variation by employing a flexible count-based error model. Finally, (5) cell2location is computationally efficient, owing to variational approximate inference and GPU acceleration。(这些方法我们下一篇分享解析)。
To validate cell2location, we initially used simulated data that reflects diverse cell abundance and spatial patterns。(作者模拟了空间转录组数据)。

这里我们需要注意的是 Jensen–Shannon divergence ,也就是J-S散度,数学的内容我们下面讲解。
Briefly, we simulated a spatial transcriptomics dataset with 2,000 locations, based on reference cell-type annotations obtained from a mouse brain snRNA-seq reference dataset including 46 cell types,Multi-cell gene expression profiles at each location were derived by combining cells drawn from different reference cell types, using one of four cell abundance patterns with variable density and sparsity distribution that mimics the patterns observed in real data。然后运用cell2location进行分析,得到图中的结果。基本上有很高的相关性,但是这里有一个问题,那就是模拟的空间转录组数据是依据单细胞数据合并而来,一旦真正的空间转录组数据含有某些单细胞不存在的细胞类型(比如说技术壁垒,10X单细胞捕获中性粒细胞结果很差),那么预测的结果很可能出现错误,我们往后看看,是否作者提到这个问题。
Next, we compared cell2location to recently proposed alternative methods for the inference of relative cell-type abundance from spatial transcriptomics。一样的文献结果,自己的软件表现最好。并且该模型还产生了相对细胞类型丰度的更准确估计。

这里我们需要注意的是, PR曲线 ,这些数学上的问题我们下面讲解。
cell2location not only provides estimates of relative cell type fractions but additionally estimates absolute cell type abundance, which can be interpreted as the number of cells that express a reference cell type signature at a given location, which again were highly concordant with the simulated ground truth(估计细胞数量,这个也很重要)。

总之,these results support that cell2location can accurately estimate cell abundance across diverse cell types.
然后文章用了两个例子,运用该软见解决我们的联合分析问题。具体案例我们这里就不多说了,我们需要更多的是算法的原理。

我们首先解决一下J-S散度和PR曲线。

KL散度又称为相对熵,信息散度,信息增益。KL散度是是两个概率分布P和Q 差别的非对称性的度量。 KL
散度是用来 度量使用基于Q的编码来编码来自P的样本平均所需的额外的位元数。 典型情况下,P表示数据的真实分布,Q表示数据的理论分布,模型分布,或P的近似分布。
定义如下:

因为对数函数是凸函数,所以 KL散度的值为非负数。

相对于PR曲线,ROC曲线了解的更多一些,大家可以参考我关于ROC曲线的讲解 深入理解R包AUcell对于分析单细胞的作用 .
而PR曲线

模型的简单介绍
For a complete derivation of the cell2location model, please see supplementary computational methods. Briefly, cell2location is a Bayesian model, which estimates absolute cell density of cell types by decomposing mRNA counts 𝑑 s,g of each gene 𝑔 = {1, . . , 𝐺} at locations 𝑠 = {1, . . , 𝑆} into a set of predefined reference signatures of cell types g f g .For 10X Visium data, this matrix can be directly obtained from the 10X SpaceRanger software and imported into data format used in a popular python package Scanpy(利用scanpy来读取10X分析数据,也可以联合Suerat进行分析)。d s,g should be fltered to a set of genes expressed in the single cell reference g f g .这个地方的处理在于单细胞与空间转录组映射的时候,表达基因的相同。cell2location的图表模型如下图:

Let G = {g f,g }, denote an F X G matrix of reference cell type signatures, which consist of F = {1,..., F} gene expression profiles G f, : for g = {1,...,G} genes, representing average expression of each gene in each cell type in linear mRNA counts space (not log-space).This matrix needs to be provided to cell2location and can be estimated from scRNA-seq profles.这个地方我们可以看到,对各个细胞类型的基因表达求平均值来代表这个细胞类型。Cell2location models the elements of D as Negative Binomial distributed,这个地方稍微说一下负二项分布,
负二项分布是统计学上一种离散概率分布。满足以下条件的称为负二项分布:实验包含一系列独立的实验, 每个实验都有成功、失败两种结果,成功的概率是恒定的,实验持续到r次不成功,r为[正整数]。可以参考百度百科 负二项分布 ,不过从这里开始,开始涉及到很深的数学只是背景,本人数学不会,但没有因此而骄傲过,所以希望有数学的大牛来分享一下内容。
最后展示一下分析的结果,

这个方法目前处于前发,仍需要更多的验证。

这个方法也是非负卷积分解的方法,是一个R包,目前高分文章也没有引用,不过方法还不错,关于spotlight的算法,大家可以看 spotlight 和 spotlight_github ,算法在这里不过多介绍了,如图:

比如scanpy的联合分析方法,我们不再过多介绍,希望对大家有帮助。

New York is three hours ahead of California, but that does not make California slow
Cameroon is six hours ahead of New York but it does not make New York slow.
Someone graduated from college at 22 but waited five years before securing a job.
Someone became a CEO at 25 but died at 50.
Someone became a CEO at 50 but lived to 90 years.
Someone is still single,
While another is married with children
Absolutely, everyone in this world works based on their own time zone.
People around you might seem to be ahead of you.
That's total fine. Some are behind you.
Everyone is running their own race in their own time zone.
Don't envy or mock them.
They are in their own time zone and you are in yours.
Life is about waiting for the right moment to react.
So RELAX .
You're not late
You're not early
You're very much on time, and in your time zone.
Everyone have a different exams paper meaning different questions.
Everyone have a different assignment meaning different purpose in life.
So focus on your own exam paper, your assignment and purpose.
Don't copy and paste or steal answer else you will fail big time.
Your dreams and visions are all valid. Just take your time and do the best you can.
Be like the hummingbird. Even when mighty lions and tigers underestimated him, he continued to do what he could, where he was, just as he was, with the little he had.
You're ok just the way you are. The little work you are doing today might seem insignificant but I bet someday you will see the big picture.
You're Not late! You're Not early.

温馨提示:答案为网友推荐,仅供参考
第1个回答  2023-12-05
10X Genomics提供的空间转录组数据和单细胞数据联合分析主要涉及以下几种主流方法:
1.共表达分析:
使用共表达网络分析(WGCNA)或其他相关性分析方法,识别在不同细胞类型或组织区域中共同表达的基因。
2.空间映射和细胞类型注释:
使用单细胞数据对空间转录组数据中的细胞进行类型注释。这可以通过比较空间数据中的基因表达模式与已知单细胞类型的表达模式来实现。
3.功能富集分析:
结合单细胞表型和空间位置,使用GO或KEGG进行功能注释,揭示不同组织区域内细胞的功能特征。
4.伪时间分析:
结合单细胞数据中的伪时间轨迹分析与空间数据,揭示细胞在不同组织结构中的发育轨迹。
5.细胞-细胞相互作用分析:
利用空间数据推断细胞之间的空间相互作用,并结合单细胞数据进一步分析细胞间通讯。
6.可视化:
使用t-SNE、UMAP或spatial plots进行数据可视化,结合细胞类型标识和空间信息,展示组织结构中的细胞异质性。
联合分析的软件通常也包括R包或Python包,如Seurat(R),Scanpy(Python),和spatialDE(Python),可以用于高级分析和整合。
相似回答