单细胞预测Doublets软件包汇总｜过渡态细胞是真的吗？

写在前面

Doublets:一个液滴或一个微孔中包含了2个或多个细胞。

对于高通量方法，在细胞捕获效率和doublets比例之间存在折中，通常的做法是以1-5％的doublets为目标(Ziegenhain et al., 2017)(
http://refhub.elsevier.com/S0098-2997(17)30049-3/sref115) 。

以前在分析单细胞数据的时候，我其实从来没有注意过这样一个问题，即使知道可能会出现doublet,还是会天真地认为自己的专业知识是可以把这一部分避免掉，比如当同时在一类细胞中发现两种不同的细胞Marker（他们之间不可以相互转化）,我一般会果断的定义为污染。可是随着看的文献越来越多，发现细胞之间的特异性marker越来越模糊，一些不能进行相互转化的细胞marker可能存在于某一小类细胞中。

当然，这些R包并不一定能够解决这些问题，多数的Doublet软件对异质性较高的细胞之间的预测较好，但希望通过总结这类软件，提醒大家在定义过渡态细胞时一定要反复去验证，保证数据的真实性。

Doublet软件包汇总

1.DoubletFinder

DoubletFinder是一种R包，可预测单细胞RNA测序数据中的doublet。

实现DoubletFinder:Seurat> = 2.0（
https://satijalab.org/seurat/）

DoubletFinder由Cell Systems于2019年4月出版：https：
//www.cell.com/cell-systems/fulltext/S2405-4712(19)30073-0

安装（在R/RStudio中）

devtools::install_github('chris-mcginnis-ucsf/DoubletFinder')

依赖包

Seurat (>= 2.0)

Matrix (1.2.14)

fields (9.6)

KernSmooth (2.23-15)

modes (0.7.0)

ROCR (1.0-7)

DoubletFinder概述

DoubletFinder可以分为4个步骤：
（1）从现有的scRNA-seq数据中生成artificial doublets；

（2）预处理合并的real-artifical data;

（3）执行PCA并使用PC距离矩阵查找每个单元的artificial k
最近邻居（pANN）的比例;
（4）根据预期的doublets数量排序和计算阈值pANN值；

缺点：DoubletFinder对同种类型细胞间的doublets不敏感 – 即从转录相似的细胞状态衍生的doublets。

Example

## Pre-process Seurat object -------------------------------------------------------------------------------------------------seu_kidney <- CreateSeuratObject(kidney.data)seu_kidney <- NormalizeData(seu_kidney)seu_kidney <- ScaleData(seu_kidney, vars.to.regress = "nUMI")seu_kidney <- FindVariableGenes(seu_kidney, x.low.cutoff = 0.0125, y.cutoff = 0.25, do.plot=FALSE)seu_kidney <- RunPCA(seu_kidney, pc.genes = seu_kidney@var.genes, pcs.print = 0)seu_kidney <- RunTSNE(seu_kidney, dims.use = 1:10, verbose=TRUE)## pK Identification ---------------------------------------------------------------------------------------------------------sweep.res.list_kidney <- paramSweep(seu_kidney, PCs = 1:10)sweep.stats_kidney <- summarizeSweep(sweep.res.list_kidney, GT = FALSE)bcmvn_kidney <- find.pK(sweep.stats_kidney)## Homotypic Doublet Proportion Estimate -------------------------------------------------------------------------------------homotypic.prop <- modelHomotypic(annotations)           ## ex: annotations <- seu_kidney@meta.data$ClusteringResultsnExp_poi <- round(0.075*length(seu_kidney@cell.names))  ## Assuming 7.5% doublet formation rate - tailor for your datasetnExp_poi.adj <- round(nExp_poi*(1-homotypic.prop))## Run DoubletFinder with varying classification stringencies ----------------------------------------------------------------seu_kidney <- doubletFinder(seu_kidney, PCs = 1:10, pN = 0.25, pK = 0.09, nExp = nExp_poi, reuse.pANN = FALSE)seu_kidney <- doubletFinder(seu_kidney, PCs = 1:10, pN = 0.25, pK = 0.09, nExp = nExp_poi.adj, reuse.pANN = "pANN_0.25_0.09_913")## Plot results --------------------------------------------------------------------------------------------------------------seu_kidney@meta.data[,"DF_hi.lo"] <- seu_kidney@meta.data$DF.classifications_0.25_0.09_913seu_kidney@meta.data$DF_hi.lo[which(seu_kidney@meta.data$DF_hi.lo == "Doublet" & seu_kidney@meta.data$DF.classifications_0.25_0.09_473 == "Singlet")] <- "Doublet_lo"seu_kidney@meta.data$DF_hi.lo[which(seu_kidney@meta.data$DF_hi.lo == "Doublet")] <- "Doublet_hi"TSNEPlot(seu_kidney, group.by="DF_hi.lo", plot.order=c("Doublet_hi","Doublet_lo","Singlet"), colors.use=c("black","gold","red"))

详情可以点击DoubletFinder(
https://github.com/ddiez/DoubletFinder)对自己的数据进行一下预测哦！

2.scrublet

Single-Cell Remover of Doublets

用于识别单细胞RNA-seq数据中doublets的Python代码。可以参考 Cell Systems(
https://www.sciencedirect.com/science/article/pii/S2405471218304745) 上的文章或者 bioRxiv(
https://www.biorxiv.org/content/early/2018/07/09/357368).

Quick start:

给定原始（非标准化）UMI计算矩阵counts_matrix，其中细胞为行，基因为列，计算每个cell的doublet分数：

import scrublet as scrscrub = scr.Scrublet(counts_matrix)doublet_scores, predicted_doublets = scrub.scrub_doublets()

scr.scrub_doublets（）模拟数据的doublets，并使用k-最近邻分类器是每个转录组计算连续的doublet_score（在0和1之间）。分数是由自动设定的阈值生成
predict_doublets，一个布尔数组，预测是doublets时为True，否则为False。

Best practices:

1.处理来自多个样品的数据时，分别对每个样品运行Scrublet。Scrublet用于检测由两个细胞的随机共包封形成的doublets，所以它可能在合并数据集上表现不佳;

2.在2-D嵌入（例如，UMAP或t-SNE）中可视化doublets预测;

Installation:

To install with PyPI:

pip install scrublet

To install from source:

git clone https://github.com/AllonKleinLab/scrublet.gitcd scrubletpip install -r requirements.txtpip install --upgrade .

详情可以点击scrublet(
https://github.com/AllonKleinLab/scrublet)对自己的数据进行一下预测哦！

3.DoubletDecon

一种细胞状态识别工具，用于从单细胞RNA-seq数据中去除doublets.

具体步骤可以参见文章：

bioRxiv(https://www.biorxiv.org/content/early/2018/07/08/364810)

安装

if(!require(devtools)){  install.packages("devtools") # If not already installed}devtools::install_github('EDePasquale/DoubletDecon')

依赖包

DeconRNASeq

gplots

dplyr

MCL

clusterProfiler

mygene

tidyr

R.utils

foreach

doParallel

stringr

source("https://bioconductor.org/biocLite.R")biocLite(c("DeconRNASeq", "clusterProfiler", "hopach", "mygene", "tidyr", "R.utils", "foreach", "doParallel", "stringr"))install.packages("MCL")#进行安装依赖包

Example

以下数据的应用均来自于：

bioRxiv(https://www.biorxiv.org/content/early/2018/07/08/364810)

location="/Users/xxx/xxx/" #Update as neededexpressionFile=paste0(location, "counts.txt")genesFile=paste0(location, "Top50Genes.txt")clustersFile=paste0(location, "Cluster.txt")newFiles=Seurat_Pre_Process(expressionFile, genesFile, clustersFile)filename="PBMC_example"write.table(newFiles$newExpressionFile, paste0(location, filename, "_expression"), sep="t")write.table(newFiles$newFullExpressionFile, paste0(location, filename, "_fullExpression"), sep="t")write.table(newFiles$newGroupsFile, paste0(location, filename , "_groups"), sep="t", col.names = F)results=Main_Doublet_Decon(rawDataFile=newFiles$newExpressionFile,                           groupsFile=newFiles$newGroupsFile,                           filename=filename,                           location=location,                           fullDataFile=NULL,                           removeCC=FALSE,                           species="hsa",                           rhop=1.1,                           write=TRUE,                           PMF=TRUE,                           useFull=FALSE,                           heatmap=FALSE,                           centroids=TRUE,                           num_doubs=100,                           only50=FALSE,                           min_uniq=4)

详情可以点击：

DoubletDecon(
https://github.com/EDePasquale/DoubletDecon)对自己的数据进行一下去除doublets哦！

4.DoubletDetection

DoubletDetection是一个Python3包，用于检测单细胞RNA-seq计数矩阵中的doublets（技术错误）。

安装

git clone https://github.com/JonathanShor/DoubletDetection.gitcd DoubletDetectionpip3 install .

运行基本doublet分类：

import doubletdetectionclf = doubletdetection.BoostClassifier()# raw_counts is a cells by genes count matrixlabels = clf.fit(raw_counts).predict()

raw_counts是scRNA-seq计数矩阵（基因细胞），并且是阵列式的
labels是一维numpy ndarray，值1表示检测到的doublet，0表示单细胞，np.nan表示模糊的细胞。

分类器在以下情况运行最适合

数据中存在几种细胞类型；

它在聚合计数矩阵中每次单独运行；

参见jupyter notebook ，链接为：

https://nbviewer.jupyter.org/github/JonathanShor/DoubletDetection/blob/master/tests/notebooks/PBMC_8k_vignette.ipynb，可以看到举个栗子！

详情可以点击：

DoubletDetection(
https://github.com/JonathanShor/DoubletDetection)对自己的数据进行一下预测吧！

声明：本站部分文章及图片源自用户投稿，如本站任何资料有侵权请您尽早请联系jinwei@zod.com.cn进行处理,非常感谢！

单细胞预测Doublets软件包汇总｜过渡态细胞是真的吗？

写在前面

Doublet软件包汇总

1.DoubletFinder

安装（在R/RStudio中）

依赖包

DoubletFinder概述

Example

2.scrublet

Quick start:

Best practices:

Installation:

3.DoubletDecon

安装

依赖包

Example

4.DoubletDetection

安装

运行基本doublet分类：

相关推荐