info@statistiknachhilfe.ch

+41 (0)56 530 01 55

logologologo
  • Offers & Prices
    • Where we help
    • Overview
    • Correction of your work
  • Bachelor & Master Theses
  • Tutoring
    • Formulation of hypotheses
    • Help with hypothesis testing
    • Interpretation of hypothesis tests
    • Support with homework
    • Exam preparation
    • Preparation for your studies
    • Tutoring during lectures
  • Data analysis
    • Support with data analysis
    • Review of a data analysis
    • Support for data visualization
    • Preparation for theses
    • Data analysis for final papers
  • Corporate clients
  • About us
  • Contact
  • EN
    • DE
  • Offers & Prices
    • Where we help
    • Overview
    • Correction of your work
  • Bachelor & Master Theses
  • Tutoring
    • Formulation of hypotheses
    • Help with hypothesis testing
    • Interpretation of hypothesis tests
    • Support with homework
    • Exam preparation
    • Preparation for your studies
    • Tutoring during lectures
  • Data analysis
    • Support with data analysis
    • Review of a data analysis
    • Support for data visualization
    • Preparation for theses
    • Data analysis for final papers
  • Corporate clients
  • About us
  • Contact
  • EN
    • DE
  • Offers & Prices
    • Where we help
    • Overview
    • Correction of your work
  • Bachelor & Master Theses
  • Tutoring
    • Formulation of hypotheses
    • Help with hypothesis testing
    • Interpretation of hypothesis tests
    • Support with homework
    • Exam preparation
    • Preparation for your studies
    • Tutoring during lectures
  • Data analysis
    • Support with data analysis
    • Review of a data analysis
    • Support for data visualization
    • Preparation for theses
    • Data analysis for final papers
  • Corporate clients
  • About us
  • Contact
  • EN
    • DE
23 January 2023 by Kevin General 0 comments

How to check the condition of normal distribution of residuals for the linear regression model in R and SPSS?

 

Definition

In a linear regression, an attempt is made to model a linear dependence between a dependent variable y and one or more independent variables x. An important assumption here is that the residuals (the difference between the observed values of y and the predicted values of y) are normally distributed.

A normal distribution of the residuals is important because it allows statistical estimates and tests to be performed on the model parameters. If the residuals are normally distributed, one can assume that the estimates of the regression parameters (such as the slope and the y-intercept) are normally distributed.

If the residuals are not normally distributed, there may be problems in interpreting the statistical significance of the regression parameters because the standard errors and T-values used to calculate significance depend on the assumption of a normally distributed distribution of the residuals.

Therefore, one must ensure that the residuals are normally distributed before interpreting the statistical estimates.

The term prerequisite can be a bit confusing here, since most prerequisites can only be checked after the model has been estimated in the first place. One can therefore consider the preconditions rather as preconditions for the correct interpretation of the coefficients and less for performing the actual regression.

Methods

There are several methods to check the normal distribution of the residuals, such as examining Q-Q plots, calculating skewness and kurtosis, or performing normality tests such as the Shapiro-Wilk test. On the following lines this procedure is explained in more detail in R and SPSS.

Example in R

For this example, we again use the "Swiss" dataset in R we estimate a regression model by regressing birth rate (Fertility) on education (Education). We store the result in the variable "fit".

fit <- lm(Fertility ~ Education, swiss)

We can display the result using the summary function:

summary(fit)

We observe a negative relationship (-0.8624) between education and birth rate.

We get the residuals of a model via the attribute: $residuals.

residuen <- fit$coefficients

To check whether the residuals are normally distributed, we can first plot the residuals in a histogram.

hist(residuals)

At first glance, the residuals do not look normally distributed. However, since we have a relatively small sample, we cannot make a statement based only on the plot. We can therefore still perform a statistical test, in this case the "Shapiro-Wik test".

In R this is easily done with:

shapiro.test(residuen)

The null hypothesis of the Shapiro-Wilk test is that the residuals are normally distributed. Our test yielded a p-value of 0.0592, which is above the usual significance level of 5%. Thus, we cannot reject the null hypothesis that the residuals are normally distributed. The condition is therefore fulfilled.

Example in SPSS

The procedure in SPSS is analogous to that in R.

Estimate a regression by clicking on: "Analyze" -> "Regression" -> "Linear ... "

Before you click on Ok or Insert, click on the button "Diagrams" and select "Histogram" for diagrams of standardized residuals. Click OK and display the results.

To run the Shapiro-Wilk test with SPSS, we must first save the residuals of the regression. We also have a button in the regression window called "Save". Click on it and select "Not standardized" for the residuals. Then click on "Next" and then on "OK".

A new variable has now been created in your data set with the name "RES_1". We can now perform a Shapiro-Wilk test for this variable.

  1. Click Analyze -> Descriptive Statistics -> Exploratory Data Analysis in the menu.
  2. Select the newly created variable "Unstandardized Residual [RES_1]".
  3. Click on the button "Diagrams" and check the box "Normal distribution diagram with tests".
  4. Click on "Next" and then on "OK".

In the output you will now see the output for the tests for normal distribution.

Analogous to the test in R, we see that the Shapiro-Wilk test is just not significant with a p-value of 0.059. We cannot reject the null hypothesis and can therefore assume that the residuals are normally distributed. The condition is therefore fulfilled.

What to do if the residuals are not normally distributed?

In practice, however, it is not always possible to fulfill the requirement of normally distributed residuals. In such cases, one should check whether there is a linear relationship between the two variables at all. If this is not the case, the relationship between the variables can be better modeled with another, non-linear function. If a linear relationship is assumed, then bootstrapping can be used with smaller samples and thus simulate the standard errors instead of estimating them. With larger samples, the assumption of normally distributed residuals can also be violated since the central limit theorem applies. Different papers have set different thresholds such as N > 15 or N > 50. The more symmetrically distributed the residuals are, the fewer observations are needed for the central limit theorem to apply. In case of doubt, however, bootstrapping can also be performed on somewhat larger samples to avoid any doubts or objections to the results.

interpretation interpret normal distribution R regression regression analysis regression coefficients residuals SPSS Stata prerequisite
PREV
NEXT

Related Posts

January 23, 2023
How to check the condition of normal distribution of residuals for the linear regression model in R and SPSS?
Read More
10 March 2022
Interpret regression coefficients correctly
Read More
September 16, 2024
Chi-square test: A comprehensive guide for use in statistics
Read More

Do you have questions about statistics? Do you need help with your master or bachelor thesis?

We'll help you!

Address:
Höhenweg 270, 5046 Walde
Opening hours:
9:00 am - 6:00 pm
Phone:
056 530 01 55
E-Mail:
info@statistiknachhilfe.ch

Offer

  • Bachelor & Master Theses
  • Offers & Prices
  • Tutoring Bundles
  • Correction of your work
  • Preparation for theses
  • Exam preparation
  • Preparation for your studies
  • Support with homework

About us

  • About us
  • Contact
  • Offers & Prices
  • Blog
  • Become a part of our team!
  • Where we help

Legal matters

  • AGB
  • Contact
  • Privacy policy

Partner

  • datenschutzkonform.ch
  • statisticscenter.ch