# 计算OR值（odds ratio、比值比、优势比）

Odds ratio(OR)从字面上可看出，是两个odds的ratio，其用于：

CancerNormalTotal
Mutated gene23117140
No mutated gene6210216
Total29327356

OR = (23/117) / (6/210) = 6.88RR = (23/140) / (6/216) = 5.91

OR值的统计学意义：

• OR>1，暴露与疾病的危险度增加，两者呈正相关
• OR<1，暴露与疾病的危险度减少，两者呈负相关
• OR=1，暴露与疾病的危险度无关，两者呈不相关

RR值的统计学意义：

• OR>1，暴露因素是疾病的危险因素，两者呈正相关
• OR<1，暴露因素是疾病的保护因素，两者呈负相关
• OR=1，暴露因素与疾病无关，两者呈不相关

#### Odds ratio(OR)的计算方法

StatQuest教程中StatQuest: Odds Ratios and Log(Odds Ratios)这节讲到了如何计算OR值以及P值（statistical significance），大致可以分为3种方法：

• Fisher’s Exact Test
• Chi-Square Test
• The Wald Test （对应常用的logistic regression）

dat <- matrix(c(23, 6, 117, 210), nrow = 2, ncol = 2)
rownames(dat) <- c("Mutated gene", "No mutated gene")
colnames(dat) <- c("Cancer", "Normal")
##### Fisher’s Exact Test

> fisher.test(dat)

Fisher's Exact Test for Count Data

data:  dat
p-value = 1.099e-05
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
2.613152 21.139349
sample estimates:
odds ratio
6.842952
##### Chi-Square Test

> chisq.test(dat, correct = F)

Pearson's Chi-squared test

data:  dat
X-squared = 21.154, df = 1, p-value = 4.237e-06
##### epitools package

dat2 <- matrix(c(6, 23, 210, 117), nrow = 2, ncol = 2)
rownames(dat2) <- c("No mutated gene", "Mutated gene")
colnames(dat2) <- c("Normal", "Cancer")

library(epitools)
> epitools::oddsratio(dat2, correction = F, rev = "c")
$data Cancer Normal Total No mutated gene 210 6 216 Mutated gene 117 23 140 Total 327 29 356$measure
NA
odds ratio with 95% C.I. estimate    lower    upper
No mutated gene 1.000000       NA       NA
Mutated gene    6.717846 2.805078 18.87268

$p.value NA two-sided midp.exact fisher.exact chi.square No mutated gene NA NA NA Mutated gene 6.572274e-06 1.098703e-05 4.237152e-06$correction
[1] FALSE

attr(,"method")
[1] "median-unbiased estimate & mid-p exact CI"

epitools::riskratio(dat2, correction = F, rev = "c")
##### fmsb package

library(fmsb)
> fmsb::oddsratio(dat)
Disease Nondisease Total
Exposed         23        117   140
Nonexposed       6        210   216
Total           29        327   356

Odds ratio estimate and its significance probability

data:  dat
p-value = 4.371e-06
95 percent confidence interval:
2.724202 17.377236
sample estimates:
[1] 6.880342
##### logistic regression

logistic regression，即假设error terms服从binomial distribution，并使用logit作为link function；然后通过model计算出变量对应的logit(p)，即logodds，odds则是等于exp(logodds)，而p（predict probabilities ）则是odds/(1+odds)

data <- read.csv("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/sample.csv")
female read write math hon femalexmath  predicted predicted2
1      0   57    52   41   0           0 -1.4708517 -3.3839875
2      1   68    59   53   0          53 -0.8780695 -1.5079033
3      0   44    33   54   0           0 -1.4708517 -1.3515629
4      0   63    44   47   0           0 -1.4708517 -2.4459454
5      0   47    52   57   0           0 -1.4708517 -0.8825418
6      0   44    52   51   0           0 -1.4708517 -1.8205840

f1<-glm(hon~female,data = data,family = binomial)
# summary(f1)$coeff > summary(f1) Call: glm(formula = hon ~ female, family = binomial, data = data) Deviance Residuals: Min 1Q Median 3Q Max -0.8337 -0.8337 -0.6431 -0.6431 1.8317 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.4709 0.2690 -5.469 4.53e-08 *** female 0.5928 0.3414 1.736 0.0825 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 222.71 on 199 degrees of freedom Residual deviance: 219.61 on 198 degrees of freedom AIC: 223.61 Number of Fisher Scoring iterations: 4 从上可看出，每一单位female的变化（在此例子中相当于从0变成1），hon的log adds增加0.5928，即回归系数（logistic regression coefficients） 查看回归系数以及对应的显著性P值（默认是用） coef(summary(g))["g2",c("Estimate","Pr(>|z|)")] 从回归系数可计算出OR值（1.8090145）以及置信区间（0.9362394 - 3.5929859） > exp(cbind(OR = coef(f1), confint(f1))) Waiting for profiling to be done... OR 2.5 % 97.5 % (Intercept) 0.2297297 0.1312460 0.3792884 female 1.8090145 0.9362394 3.5929859 # confint.default(f1) 按照公式，OR值也可以手动计算： data$predicted<-predict(f1)
# Calculate log odds
s1 <-data$predicted[data$female==0][1]
s2 <-data$predicted[data$female==1][1]
odd_ratio<-exp(s1-s2)

predict probabilities从公式上可得是odds/(1+odds)，从上述可的female变量对应的log odds，然后转化成odds后即可计算，如：

exp(s2)/(1 + exp(s2))
# exp(s1)/(1 + exp(s1))

predict(f1, type = "response")

# f2<-glm(hon~math,data = data,family = binomial)
#
# library(dplyr)
# dt <-data %>%
#   group_by(math,hon) %>%
#   summarise(freq=n()) %>%
#   mutate(all=sum(freq),prob=freq/all,odds=prob/(1-prob),logodds=log(odds)) %>%
#   round(.,5)
#
# data\$fit <- predict(f2, data, type = "response")
#
# dt <- left_join(dt, data[,c("math", "fit")])
# library(ggplot2)
# ggplot(dt, aes(x=math, y=prob)) +
#   geom_point() +
#   geom_line(aes(x=math, y=fit))

#### 参考资料

• 版权声明 本文源自 Kai 整理整理发表
• 除非特殊声明，本站文章均为原创，转载请务必保留本文链接