Sample sizes when using multiple linear regression for prediction

Gregory T. Knofczynski, Daniel Mundfrom

Research output: Contribution to journalArticlepeer-review

228 Scopus citations

Abstract

When using multiple regression for prediction purposes, the issue of minimum required sample size often needs to be addressed. Using a Monte Carlo simulation, models with varying numbers of independent variables were examined and minimum sample sizes were determined for multiple scenarios at each number of independent variables. The scenarios arrive from varying the levels of correlations between the criterion variable and predictor variables as well as among predictor variables. Two minimum sample sizes were determined for each scenario, a good and an excellent prediction level. The relationship between the squared multiple correlation coefficients and minimum necessary sample sizes were examined. A definite relationship, similar to a negative exponential relationship, was found between the squared multiple correlation coefficient and the minimum sample size. As the squared multiple correlation coefficient decreased, the sample size increased at an increasing rate. This study provides guidelines for sample size needed for accurate predictions.

Original languageEnglish
Pages (from-to)431-442
Number of pages12
JournalEducational and Psychological Measurement
Volume68
Issue number3
DOIs
StatePublished - Jun 2008

Scopus Subject Areas

  • Education
  • Developmental and Educational Psychology
  • Applied Psychology
  • Applied Mathematics

Keywords

  • Monte Carlo
  • Multiple linear regression
  • Sample size
  • Simulation
  • Squared multiple correlation coefficient
  • Subject predictor ratio

Fingerprint

Dive into the research topics of 'Sample sizes when using multiple linear regression for prediction'. Together they form a unique fingerprint.

Cite this