R -- カイ二乗分布を用いる独立性の検定

カイ二乗分布を用いる独立性の検定　　　　　Last modified: Aug 19, 2009
目的

カイ二乗分布を用いる独立性の検定を行い，Ｈaberman による残差分析も行う。
残差分析の結果は summary メソッドで表示する。

R では，chisq.test で計算できる。 

使用法

my.chisq.test(x)

引数

x	分割表

ソース

インストールは，以下の 1 行をコピーし，R コンソールにペーストする
source("http://aoki2.si.gunma-u.ac.jp/R/src/my-chisq-test.R", encoding="euc-jp")

# カイ二乗分布を用いる独立性の検定
my.chisq.test <- function(x)                         # 分割表
{
        if (is.matrix(x)) {
                nr <- nrow(x)
                nc <- ncol(x)
                if (nr < 2 || nc < 2) {
                        stop("行数・列数は 2 以上でないといけません")
                }
        }
        else {
                stop("分割表を入力してください")
        }
        data.name <- deparse(substitute(x))
        method <- "カイ二乗分布を用いる独立性の検定（残差分析）"
        rt <- rowSums(x)
        ct <- colSums(x)
        n <- sum(x)
        expectation <- outer(rt, ct)/n
        if (any(expectation < 1)) {
                warning("expectation less than 1")
        }
        else if (sum(expectation <= 5)/(nr*nc) > 0.2) {
                warning("more than 20% cells have expectation less than 5")
        }
        e <- (x-expectation)/sqrt(expectation)
        d <- e/sqrt(outer(1-rt/n, 1-ct/n))
        chi2 <- sum(e^2)
        df <- (nr-1)*(nc-1)
        P <- pchisq(chi2, df, lower.tail=FALSE)
        names(chi2) <- "X-squared"
        names(df) <- "df"
        return(structure(list(statistic=chi2, parameter=df, p.value=P,
                method=method, data.name=data.name, observed=x, expected=expectation,
                standardized.residuals=e, adjusted.residuals=d),
                class=c("htest", "my.chisq.test")))
}
# summary メソッド
summary.my.chisq.test <- function(   obj,            # my.chisq.test が返すオブジェクト
                                        digits=5)       # 出力桁数
{
        cat("調整された残差\n")
        print(obj$adjusted.residuals, digits=digits)
        cat("\nP 値\n")
        print(pnorm(abs(obj$adjusted.residuals), lower.tail=FALSE)*2, digits=digits)
}


使用例

> (a <- my.chisq.test(matrix(c(4, 5, 2, 0, 0, 7, 6, 1, 1, 0, 3, 1), byrow=TRUE, nc=4)))

	カイ二乗分布を用いる独立性の検定（残差分析）

data:  matrix(c(4, 5, 2, 0, 0, 7, 6, 1, 1, 0, 3, 1), byrow = TRUE, nc = 4) 
X-squared = 11.3443, df = 6, p-value = 0.0783

Warning message:	# この警告は，期待値のうちに 1 より小さいものがあるため
In my.chisq.test(matrix(c(4, 5, 2, 0, 0, 7, 6, 1, 1, 0, 3, 1), byrow = TRUE,  :
  expectation less than 1

> summary(a)	# 残差分析の結果は summary メソッドを使って表示する

調整された残差	# 標準化された残差は，要素名 standardized.residuals にある
         [,1]     [,2]     [,3]      [,4]
[1,]  2.20265  0.46402 -1.59862 -1.113823
[2,] -2.29129  1.04583  0.65817  0.097808
[3,]  0.21909 -2.00000  1.18604  1.309307

P 値
         [,1]    [,2]    [,3]    [,4]
[1,] 0.027619 0.64264 0.10991 0.26536
[2,] 0.021947 0.29564 0.51043 0.92209
[3,] 0.826581 0.04550 0.23561 0.19043

chisq.test の結果から導く場合

> tab <- matrix(c(4, 5, 2, 0, 0, 7, 6, 1, 1, 0, 3, 1), byrow=TRUE, nc=4)
> res <- chisq.test(tab)	# 結果をとっておく
Warning message: 	# この警告は，期待値のうちに 1 より小さいものがあるため
Chi-squared approximation may be incorrect in: chisq.test(tab)  
> res$expected	# 期待値
          [,1] [,2]     [,3]      [,4]
[1,] 1.8333333  4.4 4.033333 0.7333333
[2,] 2.3333333  5.6 5.133333 0.9333333
[3,] 0.8333333  2.0 1.833333 0.3333333
> res$residuals	# 標準化残差
           [,1]       [,2]       [,3]        [,4]
[1,]  1.6001894  0.2860388 -1.0124568 -0.85634884
[2,] -1.5275252  0.5916080  0.3825184  0.06900656
[3,]  0.1825742 -1.4142136  0.8616404  1.15470054
> res$stdres	# 調整された残差
          [,1]       [,2]       [,3]       [,4]
[1,]  2.202652  0.4640162 -1.5986161 -1.1138229
[2,] -2.291288  1.0458250  0.6581681  0.0978076
[3,]  0.219089 -2.0000000  1.1860432  1.3093073
> pnorm(abs(res$stdres), lower.tail=FALSE)*2	# P 値
           [,1]       [,2]      [,3]      [,4]
[1,] 0.02761930 0.64263616 0.1099059 0.2653552
[2,] 0.02194677 0.29564182 0.5104301 0.9220851
[3,] 0.82658070 0.04550026 0.2356052 0.1904303
直前のページへ戻る　　
E-mail to Shigenobu AOKI