# R笔记：描述性统计分析

library(foreign)

【1】summary

summary(ma["weight"])

summary(ma["height"])

var=c("weight","height")

summary(ma[var])

summary(ma[c("weight","height")])  #等同于使用命令summary(ma[3:4])或summary(ma[-1:-2])

## by(ma[c("weight","height")],ma\$group,summary)  #对数据框ma中的weight和height变量按group分组，分别进行summary获取描述统计量后输出结果

【2】stat.desc{pastecs}

stat.desc(x, basic=TRUE, desc=TRUE, norm=FALSE, p=0.95)，这是一个比较牛X的函数，会得出比较多的描述性指标。x为数据框或时间序列，在默认情况下（basic=TRUE, desc=TRUE），函数会返回x所有值、空值、缺失值的数量，最小值，最大值，值域，总和，中位数，平均数，平均数的标准误，均数在P水平时的置信区间，方差，标准差以及变异系数。若norm=TRUE（默认是FALSE），返回正态分布的一些统计量，如偏度和峰度（以及它们的统计显著程度）和Shapiro-Wilk正态检验结果。P=0.95，是默认的置信度为0.95来计算平均数的置信区间。

library(pastecs)  #载入程序包pastecs，该程序包非默认，需要通过install.packages("pastecs")下载安装

stat.desc(ma[3:4],norm=TRUE,p=0.95)

## stat.desc(ma[1:8,3],norm=TRUE)  #weightA

stat.desc(ma[9:16,”weight”],norm=TRUE)  #weightB

stat.desc(ma[1:8,”height”],norm=TRUE)  #heightA

stat.desc(ma[9:16,4],norm=TRUE)  #heightB

## by(ma[3:4],ma\$group,function(x)stat.desc(x,norm=TRUE))  #对数据框ma中的第3列和第4列变量按group分组，分别进行stat.desc获取基本描述统计量和正态分布的统计量后输出结果

stat.desc{pastecs}：Descriptive statistics on a data frame or time series。Compute a table giving various descriptive statistics about the series in a data frame or in a single/multiple time series

Useage：stat.desc(x, basic=TRUE, desc=TRUE, norm=FALSE, p=0.95)

x：a data frame or a time series

basic：do we have to return basic statistics (by default, it is TRUE)? These are: the number of values (nbr.val), the number of null values (nbr.null), the number of missing values (nbr.na), the minimal value (min), the maximal value (max), the range (range, that is, max-min) and the sum of all non-missing values (sum)

desc：do we have to return various descriptive statistics (by default, it is TRUE)? These are: the median (median), the mean (mean), the standard error on the mean (SE.mean), the confidence interval of the mean (CI.mean) at the p level, the variance (var), the standard deviation (std.dev) and the variation coefficient (coef.var) defined as the standard deviation divided by the mean

norm：do we have to return normal distribution statistics (by default, it is FALSE)? the skewness coefficient g1 (skewness), its significant criterium (skew.2SE, that is, g1/2.SEg1; if skew.2SE > 1, then skewness is significantly different than zero), kurtosis coefficient g2 (kurtosis), its significant criterium (kurt.2SE, same remark than for skew.2SE), the statistic of a Shapiro-Wilk test of normality (normtest.W) and its associated probability (normtest.p)

p：the probability level to use to calculate the confidence interval on the mean (CI.mean). By default, p=0.95

