10X单细胞和空间联合分析的方法---cell2location

如题所述

第1个回答  2022-07-12

组织中细胞类型的空间位置从根本上塑造了细胞之间的相互作用和功能,but the high-throughput spatial mapping of complex tissues remains a challenge。We present сell2location , a principled and versatile Bayesian model(贝叶斯模型) that integrates single-cell and spatial transcriptomics to map cell types in situ in a comprehensive manner。在准确性和全面性的方面,cell2location的表现优异,In the mouse brain, we use a new paired single nucleus and spatial RNA-sequencing dataset to map dozens of cell types and identify tissue regions in an automated manner。We discover novel regional astrocyte subtypes including fine subpopulations in the thalamus and hypothalamus(新的发现)。In the human lymph node, we resolve spatially interlaced immune cell states and identify co-located groups of cells underlying tissue organisation.(细胞共定位)。我们在空间上绘制罕见的萌发前中心B细胞种群,并预测与干扰素反应相关的推定细胞相互作用。总之方法很好用。
这里我们需要注意的一点就是, 贝叶斯模型 ,这个模型在建模的时候很常用,这里就不多介绍了,推荐大家看一本书《机器学习原理、算法与应用》,书中讲述了很多有关机器学习的算法和基础知识,有利于我们加深生信分析的算法原理。

The cellular architecture of tissues, where distinct cell types are organized in space, underlies cell-cell communication, organ function and pathology.(组织是一个复杂的统一体)。Emerging spatial genomics technologies hold considerable promise for characterising tissue architecture, providing key opportunities to map resident cell types and cell signalling in situ, thereby helping guide in vitro tissue engineering efforts.(空间转录组的主要应用价值)。但是空间转录组仍然存在挑战,One reason is the enormous variation in tissue architecture across organs, ranging from the brain with hundreds of cell types found across discrete anatomical regions to immune organs with continuous cellular gradients and dynamically modified microenvironments。To create and map comprehensive tissue atlases, experimental and computational methods need to be aligned to cope with this variation and in particular, enable mapping numerous resident cell types across diverse and complex tissues in situ.(技术挑战)。
coupled single-cell and spatially resolved transcriptomics offer a scalable approach to address these challenges(单细胞和空间转录组的技术互补)。首先第一步要从解离的组织中识别各种细胞类型(单细胞转录组),然后匹配各个细胞类型的空间位置分布。目前的挑战是First, spatial RNA-seq measurements (i.e. locations) combine multiple cell types as array-based mRNA capture currently do not match cellular boundaries in tissues. Thus, each spatial position corresponds to either several cell types (Visium, Tomo-Seq) or fractions of multiple cell types (Slide-Seq, HDST). Second, spatial RNA-seq measurements are confounded by different sources of variation as 1) cell numbers vary across tissue positions, 2) different cells and cell types differ in total mRNA content, and 3) thin tissue sectioning captures variable fractions of each cell’s volume. Computational approaches need to appropriately model and account for all of these factors。
Here, we present cell2location, a principled and versatile Bayesian model for comprehensive mapping of cell types in spatial transcriptomic data.(我们关注的重点)Cell2location uses reference gene expression signatures of cell types derived from scRNA-seq to decompose multi-cell spatial transcriptomic data into cell type abundance maps(简单的原理与其他方法相同,算法有差异)。The model accurately maps complex tissues, including rare cell types and fine subtypes, and it identifies tissue regions and co-located cell types downstream in an automated manner(能够识别共定位的细胞类型,这个很重要)。下面是两个应用案例,证明这个方法好。

Cell2location maps the spatial distribution of cell types by integrating single-cell RNAseq (scRNA-seq) and multi-cell spatial transcriptomic data from a given tissue。

我们首先解决一下J-S散度和PR曲线。

KL散度又称为相对熵,信息散度,信息增益。KL散度是是两个概率分布P和Q 差别的非对称性的度量。 KL
散度是用来 度量使用基于Q的编码来编码来自P的样本平均所需的额外的位元数。 典型情况下,P表示数据的真实分布,Q表示数据的理论分布,模型分布,或P的近似分布。
定义如下:

因为对数函数是凸函数,所以 KL散度的值为非负数。

相对于PR曲线,ROC曲线了解的更多一些,大家可以参考我关于ROC曲线的讲解 深入理解R包AUcell对于分析单细胞的作用 .
而PR曲线

模型的简单介绍
For a complete derivation of the cell2location model, please see supplementary computational methods. Briefly, cell2location is a Bayesian model, which estimates absolute cell density of cell types by decomposing mRNA counts 𝑑 s,g of each gene 𝑔 = {1, . . , 𝐺} at locations 𝑠 = {1, . . , 𝑆} into a set of predefined reference signatures of cell types g f g .For 10X Visium data, this matrix can be directly obtained from the 10X SpaceRanger software and imported into data format used in a popular python package Scanpy(利用scanpy来读取10X分析数据,也可以联合Suerat进行分析)。d s,g should be fltered to a set of genes expressed in the single cell reference g f g .这个地方的处理在于单细胞与空间转录组映射的时候,表达基因的相同。cell2location的图表模型如下图:

相似回答