# 单细胞分析之harmony

## Harmony原理

Harmony需要输入低维空间的坐标值（embedding），一般使用PCA的降维结果。Harmony导入PCA的降维数据后，会采用soft k-means clustering算法将细胞聚类。常用的聚类算法仅考虑细胞在低维空间的距离，但是soft clustering算法会考虑我们提供的校正因素。这就好比我们的高考加分制度，小明高考成绩本来达不到A大学的录取分数线，但是他有一项省级竞赛一等奖加10分就够线了。同样的道理，细胞c2距离cluster1有点远，本来不能算作cluster1的一份子；但是c2和cluster1的细胞来自不同的数据集，因为我们期望不同的数据集融合，所以破例让它加入cluster1了。聚类之后先计算每个cluster内各个数据集的细胞的中心点，然后根据这些中心点计算各个cluster的中心点。最后通过算法让cluster内的细胞向中心聚集，实在收敛不了的离群细胞就过滤掉。调整之后的数据重复：聚类—计算cluster中心点—收敛细胞—聚类的过程，不断迭代直至聚类效果趋于稳定。

Overview of Harmony algorithm. PCA embeds cells into a space with reduced dimensionality. Harmony accepts the cell coordinates in this reduced space and runs an iterative algorithm to adjust for dataset specific effects.

• a, Harmony uses fuzzy clustering to assign each cell to multiple clusters, while a penalty term ensures that the diversity of datasets within each cluster is maximized.
• b, Harmony calculates a global centroid for each cluster, as well as dataset-specific centroids for each cluster.
• c, Within each cluster, Harmony calculates a correction factor for each dataset based on the centroids.
• d, Finally, Harmony corrects each cell with a cell-specific factor: a linear combination of dataset correction factors weighted by the cell’s soft cluster assignments made in step a.

Harmony repeats steps a to d until convergence. The dependence between cluster assignment and dataset diminishes with each round. Datasets are represented with colors, cell types with different shapes.

## 10X数据实测

### 安装harmony

```#此包安装简单，没有特殊的依赖包
library(devtools)
install_github("immunogenomics/harmony")```

### 测试数据准备

```##==准备seurat对象列表==##
library(Seurat)
library(tidyverse)
library(patchwork)
rm(list=ls())
setwd("~/project/harmony")
dir <- dir("GSE139324/")           #GSE139324是存放数据的目录
dir <- paste0("GSE139324/",dir)
sample_name <- c('HNC01PBMC', 'HNC01TIL', 'HNC10PBMC', 'HNC10TIL', 'HNC20PBMC',
'HNC20TIL',  'PBMC1', 'PBMC2', 'Tonsil1', 'Tonsil2')
scRNAlist <- list()
for(i in 1:length(dir)){
scRNAlist[[i]] <- CreateSeuratObject(counts, project=sample_name[i], min.cells=3, min.features = 200)
scRNAlist[[i]][["percent.mt"]] <- PercentageFeatureSet(scRNAlist[[i]], pattern = "^MT-")
scRNAlist[[i]] <- subset(scRNAlist[[i]], subset = percent.mt < 10)
}
saveRDS(scRNAlist, "scRNAlist.rds")```

### Seurat整合样本

```##==seurat整合多样本==##
for (i in 1:length(scRNAlist)) {
scRNAlist[[i]] <- NormalizeData(scRNAlist[[i]])
scRNAlist[[i]] <- FindVariableFeatures(scRNAlist[[i]]
}
##以VariableFeatures为基础寻找锚点
system.time({scRNA.anchors <- FindIntegrationAnchors(object.list = scRNAlist)})
#    用户    系统    流逝
# 605.661   4.949 610.328
##利用锚点整合数据，运行时间较长
system.time({scRNA_seurat <- IntegrateData(anchorset = scRNA.anchors)})
#    用户    系统    流逝
# 171.598  18.471 189.932
scRNA_seurat <- ScaleData(scRNA_seurat) %>% RunPCA(verbose=FALSE)
scRNA_seurat <- RunUMAP(scRNA_seurat, dims = 1:30)
scRNA_seurat <- FindNeighbors(scRNA_seurat, dims = 1:30) %>% FindClusters()
##作图
#group_by_cluster
plot1 = DimPlot(scRNA_seurat, reduction = "umap", label=T)
#group_by_sample
plot2 = DimPlot(scRNA_seurat, reduction = "umap", group.by='orig.ident')
#combinate
plotc <- plot1 plot2
ggsave("scRNA_seurat.png", plot = plotc, width = 10, height = 5)
saveRDS(scRNA_seurat, 'scRNA_seurat.rds')```

### Harmony整合样本

```##==harmony整合多样本==##
library(harmony)
##PCA降维
scRNA_harmony <- merge(scRNAlist[[1]], y=c(scRNAlist[[2]], scRNAlist[[3]], scRNAlist[[4]], scRNAlist[[5]],
scRNAlist[[6]], scRNAlist[[7]], scRNAlist[[8]], scRNAlist[[9]], scRNAlist[[10]]))
scRNA_harmony <- NormalizeData(scRNA_harmony) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA(verbose=FALSE)
##整合
system.time({scRNA_harmony <- RunHarmony(scRNA_harmony, group.by.vars = "orig.ident")})
#   用户   系统   流逝
# 34.308  0.024 34.324
#降维聚类
scRNA_harmony <- RunUMAP(scRNA_harmony, reduction = "harmony", dims = 1:30)
scRNA_harmony <- FindNeighbors(scRNA_harmony, reduction = "harmony", dims = 1:30) %>% FindClusters()
##作图
#group_by_cluster
plot1 = DimPlot(scRNA_harmony, reduction = "umap", label=T)
#group_by_sample
plot2 = DimPlot(scRNA_harmony, reduction = "umap", group.by='orig.ident')
#combinate
plotc <- plot1 plot2
ggsave("scRNA_harmony.png", plot = plotc, width = 10, height = 5)
saveRDS(scRNA_harmony, 'scRNA_harmony.rds')```