# R语言：单基因批量相关性分析

1.已经确定研究的基因，但是想探索他潜在的功能，可以通过跟这个基因表达最相关的基因来反推他的功能，这种方法在英语中称为guilt of association，协同犯罪。

2.我们的注释方法依赖于TCGA大样本，既然他可以注释基因，那么任何跟肿瘤相关的基因都可以被注释，包括长链非编码RNA

1.加载已经整理好的癌症数据

exprSet[1:3,1:3]

这个数据依然是行是样本，列是基因。

2.批量相关性分析

y <- as.numeric(exprSet[,"PDCD1"])
colnames <- colnames(exprSet)
cor_data_df <- data.frame(colnames)
for (i in 1:length(colnames)){
test <- cor.test(as.numeric(exprSet[,i]),y,type="spearman")
cor_data_df[i,2] <- test\$estimate
cor_data_df[i,3] <- test\$p.value
}
names(cor_data_df) <- c("symbol","correlation","pvalue")

3.筛选最相关的基因

library(dplyr)
library(tidyr)
cor_data_sig <- cor_data_df %>%
filter(pvalue < 0.05) %>%
arrange(desc(abs(correlation)))%>%
dplyr::slice(1:500)

4.随机选取正的和负的分别作图验证

library(ggstatsplot)
ggscatterstats(data = exprSet,
y = PDCD1,
x = IL2RG,
centrality.para = "mean",
margins = "both",
xfill = "#CC79A7",
yfill = "#009E73",
marginal.type = "histogram",
title = "Relationship between PDCD1 and IL2RG")

负相关的选取MARK1

library(ggstatsplot)
ggscatterstats(data = exprSet,
y = PDCD1,
x = MARK1,
centrality.para = "mean",
margins = "both",
xfill = "#CC79A7",
yfill = "#009E73",
marginal.type = "histogram",
title = "Relationship between PDCD1 and IL2RG")

我们还可以用cowplot拼图

library(cowplot)
p1 <- ggscatterstats(data = exprSet,
y = PDCD1,
x = IL2RG,
centrality.para = "mean",
margins = "both",
xfill = "#CC79A7",
yfill = "#009E73",
marginal.type = "histogram",
title = "Relationship between PDCD1 and IL2RG")

p2 <- ggscatterstats(data = exprSet,
y = PDCD1,
x = MARK1,
centrality.para = "mean",
margins = "both",
xfill = "#CC79A7",
yfill = "#009E73",
marginal.type = "histogram",
title = "Relationship between PDCD1 and IL2RG")

plot_grid(p1,p2,nrow = 1,labels = LETTERS[1:2])

5.下面进行聚类分析

library(clusterProfiler)

#获得基因列表

library(stringr)

gene <- str_trim(cor_data_sig\$symbol,'both')

#基因名称转换，返回的是数据框

gene = bitr(gene, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")

go <- enrichGO(gene = gene\$ENTREZID, OrgDb = "org.Hs.eg.db", ont="all")

barplot(go, split="ONTOLOGY") facet_grid(ONTOLOGY~., scale="free")

这是气泡图

dotplot(go, split="ONTOLOGY") facet_grid(ONTOLOGY~., scale="free")

这时候，我们能推断PDCD1这个基因主要参与T细胞激活，细胞因子受体活性调剂等功能，大致跟她本身的功能是一致的。

• 版权声明 本文源自 果子学生信 整理 发表
• 转载请务必保留本文链接：https://www.plob.org/article/26325.html