load("prob1.Rdata")
model1=lm(density~distance, data=prob1)
model1
##
## Call:
## lm(formula = density ~ distance, data = prob1)
##
## Coefficients:
## (Intercept) distance
## 1.211973 0.003761
model1s=summary(model1)
model1s
##
## Call:
## lm(formula = density ~ distance, data = prob1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.212753 -0.047247 -0.009136 0.062975 0.169705
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.2119730 0.0324376 37.363 < 2e-16 ***
## distance 0.0037609 0.0007954 4.728 7.54e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09813 on 25 degrees of freedom
## Multiple R-squared: 0.4721, Adjusted R-squared: 0.4509
## F-statistic: 22.35 on 1 and 25 DF, p-value: 7.54e-05
To p-value του F-test και του t-test για το συνελεστή του distance είναι ίσο με \(7.5\cdot 10^{-5}\), επομένως η συσχέτιση μεταξύ απόστασης και πυκνότητας είναι στατιστικά ισχυρά σημαντική.
p1new=data.frame(cbind(NA,18,NA))
names(p1new)=names(prob1)
prediction=predict(model1, newdata=p1new, interval="prediction")
prediction
## fit lwr upr
## 1 1.279669 1.072366 1.486971
rjack=rstudent(model1)
summary(rjack)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.45033 -0.51048 -0.09590 -0.01419 0.64705 1.84606
lev=hatvalues(model1)
summary(lev)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.03787 0.05459 0.05775 0.07407 0.09482 0.14935
cook=cooks.distance(model1)
summary(cook)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0001985 0.0033584 0.0119726 0.0346568 0.0436789 0.1607892
Οι κρίσιμες τιμές:
για τα jacknife residuals, k=1,n=27, a=0.05, κρίσιμη τιμή = 3.50
για το leverage: κρίσιμη τιμή=0.35 (περίπου)
για Cook’s distance: d (n-k-1)=17, d=0.68
shapiro.test(rjack)
##
## Shapiro-Wilk normality test
##
## data: rjack
## W = 0.95466, p-value = 0.2775
load("prob2.Rdata")
model21=lm(undcount~perc_min+crimrate+poverty+diffeng+hsgrad+housing, data=prob2)
summary(model21)
##
## Call:
## lm(formula = undcount ~ perc_min + crimrate + poverty + diffeng +
## hsgrad + housing, data = prob2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1511 -1.0921 0.0798 0.9336 4.3403
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.299641 1.334062 0.225 0.823061
## perc_min 0.084726 0.023189 3.654 0.000551 ***
## crimrate 0.021489 0.013692 1.570 0.121876
## poverty -0.021048 0.084728 -0.248 0.804675
## diffeng 0.180053 0.101960 1.766 0.082583 .
## hsgrad -0.040023 0.041607 -0.962 0.340018
## housing -0.006199 0.025658 -0.242 0.809936
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.604 on 59 degrees of freedom
## Multiple R-squared: 0.6176, Adjusted R-squared: 0.5787
## F-statistic: 15.88 on 6 and 59 DF, p-value: 9.211e-11
model22=step(model21,direction="back")
## Start: AIC=68.94
## undcount ~ perc_min + crimrate + poverty + diffeng + hsgrad +
## housing
##
## Df Sum of Sq RSS AIC
## - housing 1 0.150 151.87 67.003
## - poverty 1 0.159 151.88 67.007
## - hsgrad 1 2.379 154.10 67.965
## <none> 151.72 68.938
## - crimrate 1 6.335 158.06 69.638
## - diffeng 1 8.019 159.74 70.337
## - perc_min 1 34.331 186.05 80.401
##
## Step: AIC=67
## undcount ~ perc_min + crimrate + poverty + diffeng + hsgrad
##
## Df Sum of Sq RSS AIC
## - poverty 1 0.181 152.05 65.082
## - hsgrad 1 2.874 154.75 66.240
## <none> 151.87 67.003
## - crimrate 1 6.968 158.84 67.964
## - diffeng 1 7.907 159.78 68.353
## - perc_min 1 37.794 189.66 79.670
##
## Step: AIC=65.08
## undcount ~ perc_min + crimrate + diffeng + hsgrad
##
## Df Sum of Sq RSS AIC
## <none> 152.05 65.082
## - hsgrad 1 5.775 157.83 65.542
## - crimrate 1 6.851 158.90 65.990
## - diffeng 1 7.829 159.88 66.396
## - perc_min 1 42.390 194.44 79.312
model22
##
## Call:
## lm(formula = undcount ~ perc_min + crimrate + diffeng + hsgrad,
## data = prob2)
##
## Coefficients:
## (Intercept) perc_min crimrate diffeng hsgrad
## 0.35633 0.08376 0.01974 0.17405 -0.04883
model23=lm(undcount~perc_min+poverty+hsgrad, data=prob2)
anova(model23,model21)
## Analysis of Variance Table
##
## Model 1: undcount ~ perc_min + poverty + hsgrad
## Model 2: undcount ~ perc_min + crimrate + poverty + diffeng + hsgrad +
## housing
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 62 170.78
## 2 59 151.72 3 19.063 2.471 0.0706 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Cp-Mallows : \(Cp=\frac{SSE(p)}{MSE(k)}- [n-2(p+1)]\)
a21=anova(model21)
a21
## Analysis of Variance Table
##
## Response: undcount
## Df Sum Sq Mean Sq F value Pr(>F)
## perc_min 1 195.816 195.816 76.1471 3.275e-12 ***
## crimrate 1 29.171 29.171 11.3438 0.001338 **
## poverty 1 5.428 5.428 2.1108 0.151559
## diffeng 1 11.619 11.619 4.5184 0.037729 *
## hsgrad 1 2.874 2.874 1.1177 0.294732
## housing 1 0.150 0.150 0.0584 0.809936
## Residuals 59 151.722 2.572
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
a23=anova(model23)
a23
## Analysis of Variance Table
##
## Response: undcount
## Df Sum Sq Mean Sq F value Pr(>F)
## perc_min 1 195.816 195.816 71.0873 7.085e-12 ***
## poverty 1 12.136 12.136 4.4056 0.03990 *
## hsgrad 1 18.044 18.044 6.5506 0.01294 *
## Residuals 62 170.784 2.755
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Παίρνουμε: \(SSE(p)=170.8, MSE(k)=2.572, n=66, k=6, p=3, Cp=8.45 > 3\), επομένως το μικρό μοντέλο είναι υποδεέστερο του πλήρους.
load("prob3.Rdata")
prob3$elevc=prob3$elev-mean(prob3$elev)
model3=lm(damage~elevc+region+elevc*region, data=prob3)
model3
##
## Call:
## lm(formula = damage ~ elevc + region + elevc * region, data = prob3)
##
## Coefficients:
## (Intercept) elevc regionNorth elevc:regionNorth
## 37.87043 -0.01721 5.38905 0.10839
summary(model3)
##
## Call:
## lm(formula = damage ~ elevc + region + elevc * region, data = prob3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.781 -11.612 0.308 11.035 26.219
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.87043 6.31507 5.997 1.24e-07 ***
## elevc -0.01721 0.01928 -0.893 0.375
## regionNorth 5.38905 6.61930 0.814 0.419
## elevc:regionNorth 0.10839 0.02333 4.646 1.90e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.61 on 60 degrees of freedom
## Multiple R-squared: 0.4556, Adjusted R-squared: 0.4284
## F-statistic: 16.74 on 3 and 60 DF, p-value: 5.132e-08