Are Regression Results Affected By A Smaller Sample Size?

Research article
Open Admission
Published: 12 March 2019

Sample size calculations for model validation in linear regression analysis

BMC Medical Research Methodology volume nineteen, Article number:54 (2019) Cite this commodity

28k Accesses
6 Citations
Metrics details

Abstruse

Background

Linear regression analysis is a widely used statistical technique in practical applications. For planning and appraising validation studies of uncomplicated linear regression, an approximate sample size formula has been proposed for the articulation test of intercept and gradient coefficients.

Methods

The purpose of this article is to reveal the potential drawback of the existing approximation and to provide an alternative and verbal solution of power and sample size calculations for model validation in linear regression analysis.

Results

A fetal weight case is included to illustrate the underlying discrepancy between the exact and gauge methods. Moreover, all-encompassing numerical assessments were conducted to examine the relative performance of the 2 singled-out procedures.

Conclusions

The results show that the exact approach has a distinct reward over the current method with greater accurateness and loftier robustness.

Peer Review reports

Groundwork

Regression analysis is the virtually ordinarily applied statistical method of all scientific fields. The extensive utility incurs continuous investigations to give various interpretations, extensions, and computing algorithms for the development and formulation of empirical models. General guidelines and fundamental principles on regression analysis have been well documented in the standard texts of Cohen et al. [1], Kutner et al. [2], and Montgomery, Peck, and Vining [3], amidst others. Amongst the methodological issues and statistical implications of regression analysis, model adequacy and validity represent two vital aspects for justifying the usefulness of the underlying regression model. In the process of model selection, residual analysis and diagnostic checking are employed to identify influential observations, leverage, outliers, multicollinearity, and other lack of fit problems. Alternatively, model validation refers to the plausibility and generalizability of the regression office in terms of the stability and suitability of the regression coefficients.

In particular, it is emphasized in Kutner et al. ([2], Section 9.6), Montgomery, Peck, and Vining ([3], Department 11.2), and Snee [4] that at that place are three approaches to assessing the validity of regression models: (1) comparison of model predictions and coefficients with concrete theory, prior experience, theoretical models, and other simulation results; (two) drove of new data to check model predictions; and (iii) data splitting in which reservation of a portion of the available data is used to obtain an contained mensurate of the model prediction accurateness. Essentially, the key utilities betwixt model selection and model validation should exist properly recognized and distinguished because a refined model that fits the data does non necessarily guarantee prediction accuracy. Further details and related issues can be plant in the importance texts of Kutner et al. [2] and Montgomery, Peck, and Vining [three] and the references therein.

The present article focuses on the validation process of linear regression analysis for comparison with postulated or acclaimed models. In linear regression, the focus is ofttimes concerned with the being and magnitude of the slope coefficients. However, the quality of estimation and prediction in associating the response variable with the predictor variables is determined past the closely intertwined intercept and slope coefficients. It is of practical importance to behave a joint test of intercept and slope coefficients in order to verify the compatibility with established or theoretical formulations. For example, Maddahi et al. [five] compared left ventricular myocardial weights of dogs by nuclear magnetic resonance imaging with actual measurements for different methods using simple linear regression assay. The results were tested, both individually and simultaneously, whether the intercept was dissimilar from zip and the slope was different form unity. Besides, Rose and McCallum [6] proposed a simple regression formula for estimating the logarithm of feta weight with the sum of the ultrasound measurements of biparietal diameter, mean abdominal bore, and femur length. Note that the birth weights differ amongst ethic groups, accomplice characteristics, and time periods. Thus, it is of considerable interest for related research to validate or compare the magnitudes of intercept and slope coefficients in their formulation.

The importance and implications of statistical power analysis in research studies are well addressed in Cohen [7], Kramer and Blasey [8], White potato, Myros, and Wolach [ix], and Ryan [10], among others. In the context of multiple regression and correlation, the distinct notions of stock-still and random regression settings were emphasized and explicated in power and sample size calculations by Gatsonis and Sampson [11], Mendoza and Stafford [12], Sampson [13], and Shieh [14,15,sixteen]. On the other paw, Kelley [17], Krishnamoorthy and Xia [xviii], and Shieh [19] discussed sample size determinations for amalgam precise confidence intervals of strength of association. It is noteworthy that analysis of covariance (ANCOVA) models involving both categorical and continuous predictors incur different hypothesis testing procedures. Appropriately, they require unique power procedures as discussed in Shieh [20] and Tang [21], among others.

For the purposes of planning research designs and validating model formulation, a sample size procedure was presented in Colosimo et al. [22]. The presented formula has a computationally appealing expression and maintains reasonable accuracy in their simulation written report. Notwithstanding, the particular method involves a convenient substitution of stock-still mean parameter for random predictor variables. Their illustrations were not detailed enough to accost the extent and impact of such simplification in sample size computations. Consequently, the adequacy of the sample size procedure described in Colosimo et al. [22] requires further clarification and no enquiry to appointment has examined its properties under unlike situations.

The statistical inferences for the regression coefficients are based on the conditional distribution of the continuous predictors. Yet, unlike the fixed cistron configurations and handling levels in analysis of variance (ANOVA) and other experimental designs, the continuous measurements of the predictor variables in regression studies are typically available just later the data has been collected. For accelerate planning research design, the distribution and ability functions of the examination procedure demand to be appraised over possible values of the predictors. Thus, it is important to recognize the stochastic nature of the predictor variables. The cardinal differences between stock-still and random models have been explicated in Binkley and Abbot [23], Cramer and Appelbaum [24], Sampson [13], and Shaffer [25]. Despite the complexity associated with the unconditional properties of the test procedure, the inferential procedures are the aforementioned nether both fixed and random formulations. Hence, the usual rejection dominion and disquisitional value remain unchanged. The distinction between the two modeling approaches becomes critical for power analysis and sample size planning.

The articulation examination of intercept and gradient coefficients in linear regression is more involved than the individual tests of intercept or slope parameters. A general linear hypothesis setting is required to perform the simultaneous test of both intercept and slope coefficients as shown in Rencher and Schaalje ([26], Section 8.4.2). However, information technology is essential to emphasize that they did not address the corresponding ability and sample size problems. In view of the limited results in current literature, this article aims to present ability and sample size procedure for the joint exam of intercept and slope coefficients with specific recognition of the stochastic features of predictor variables. First, exact power function and sample size procedure for detecting intercept and gradient differences of simple linear regression are derived under random modeling framework assuming predictor variables have independent and identical normal distribution. Then, the technical presentation is extended to the full general context of multiple linear regression. And so, a numerical case of model validation is employed to demonstrate the essential discrepancy between the exact and estimate methods. The accuracy and robustness of the contending methods are appraised through simulation studies under a wide range of model configurations with normal and non-normal predictors.

Methods

Simple linear regression

Consider the uncomplicated linear regression model for associating the response variable Y with the predictor variable X:

$$ {Y}_i={\upbeta}_I+{X}_i{\upbeta}_S+{\upvarepsilon}_{i,} $$

(1)

where Y _i is the observed value of the response variable Y; X _i is the recorded value of the continuous predictor X; β_I and β_{Due south} are unknown intercept and slope parameters; and ε_i are iid North(0, σ²) random errors for i = 1, …, N. To examine the existence and magnitude of the intercept and slope coefficients {β_I, β_S}, the statistical inferences are based on the to the lowest degree squares estimators $ {\widehat{\upbeta}}_I $ and $ {\widehat{\upbeta}}_S $, where $ {\widehat{\upbeta}}_I $ = $ \overline{Y} $ – $ \overline{X}{\widehat{\upbeta}}_S $, $ {\widehat{\upbeta}}_S $ = SSXY/SSX, $ \overline{Y} $ = $ \sum \limits_{i=1}^N $ Y _i/N, $ \overline{X} $ = $ \sum \limits_{i=i}^N $ Ten _i/N, SSXY = $ \sum \limits_{i=1}^N $(10 _i – $ \overline{X} $)(Y _i – $ \overline{Y} $), and SSX = $ \sum \limits_{i=1}^N $(X _i – $ \overline{Ten} $)². It follows from the standard results in Rencher and Schaalje ([26], Section 7.6.3) that the estimators {$ {\widehat{\upbeta}}_I $, $ {\widehat{\upbeta}}_S $} have the bivariate normal distribution

$$ \widehat{\boldsymbol{\upbeta}}\sim {Northward}_2\left(\boldsymbol{\upbeta}, {\upsigma}^2{\mathrm{W}}_X\right), $$

(2)

where

$$ \widehat{\boldsymbol{\upbeta}}=\left[\begin{array}{c}{\widehat{\upbeta}}_I\\ {}{\widehat{\upbeta}}_S\terminate{array}\right],\kern0.5em \boldsymbol{\upbeta} =\left[\brainstorm{array}{c}{\upbeta}_I\\ {}{\upbeta}_S\end{assortment}\right],\kern0.5em {\mathbf{W}}_X=\left[\begin{array}{cc}{W}_{X11}& {W}_{X12}\\ {}{Westward}_{X21}& {W}_{X22}\stop{array}\right], $$

W _10eleven = ane/N + $ {\overline{10}}^2 $/SSX, Westward _Ten12 =West _X21 = −$ \overline{X} $/SSX, and Westward ₁₀₂₂ = i/SSX. The subscript X of West ₁₀ emphasizes the elements {W _Ten11, W _X12, West _Ten21, W _Ten22} of the variance and covariance matrix are functions of the predictor variables. Also, $ {\widehat{\upsigma}}^2 $ = SSE/ν is the usual unbiased computer of σ^two, where SSE =SSY – SSXY ²/SSX is the fault sum of squares, SSY = $ \sum \limits_{i=1}^Northward $(Y _i – $ \overline{Y} $)², and ν =Northward – two. Note that the least squares estimators $ {\widehat{\upbeta}}_I $ and $ {\widehat{\upbeta}}_S $ are independent of $ {\widehat{\upsigma}}^2 $.

A articulation test of the intercept and slope coefficients can be conducted with the hypothesis

$$ {\mathrm{H}}_0:\kern1em \left[\begin{assortment}{c}{\upbeta}_I\\ {}{\upbeta}_S\end{assortment}\right]=\left[\brainstorm{array}{c}{\upbeta}_{I0}\\ {}{\upbeta}_{S0}\finish{assortment}\right]\;\mathrm{versus}\ {\mathrm{H}}_1:\kern0.5em \left[\brainstorm{array}{c}{\upbeta}_I\\ {}{\upbeta}_S\end{array}\right]\ne \left[\begin{array}{c}{\upbeta}_{I0}\\ {}{\upbeta}_{S0}\terminate{array}\right]. $$

(iii)

Following the model assumption in Eq. 1, the likelihood ratio statistic for the joint test of intercept and slope is

$$ {F}_J=\frac{\left({\widehat{\boldsymbol{\upbeta}}}_D^{\mathrm{T}}{\mathbf{Due west}}_X^{\hbox{-} 1}{\widehat{\boldsymbol{\upbeta}}}_D\right)/2}{{\widehat{\sigma}}^2}, $$

(4)

where $ {\widehat{\boldsymbol{\upbeta}}}_D $ = [$ {\widehat{\upbeta}}_{ID} $, $ {\widehat{\upbeta}}_{SD} $]^T, $ {\widehat{\upbeta}}_{ID} $ = $ {\widehat{\upbeta}}_I $ – β_I0, and $ {\widehat{\upbeta}}_{SD} $ = $ {\widehat{\upbeta}}_S $ – β_S0. Under the null hypothesis, it can exist shown that

$$ {F}_J\sim F\left(2,\upnu \right), $$

(5)

where F(two, ν) is an F distribution with 2 and ν degrees of freedom. Hence, H ₀ is rejected at the significance level α if

$$ {F}_J>{F}_{2,\upnu, \upalpha}, $$

(6)

where F _{2, ν, α} is the upper (100·α)th percentile of the F(two, ν) distribution. In full general, the joint test statistic F _J has the nonnull distribution for the given values of $ \overline{X} $ and SSX:

$$ {F}_J\left|\left[\overline{X}, SSX\right]\sim F\left(ii,\upnu, {\Delta}_J\right)\right. $$

(7)

where

$$ {\Delta}_J=\kern0.5em \frac{\left\{Due north{\left({\upbeta}_{ID}+\overline{X}{\upbeta}_{SD}\correct)}^2+{\upbeta}_{SD}^2 SSX\correct\}}{\upsigma^two}. $$

(eight)

Hence, the noncentral F distribution F(c, ν, Δ_J) is a role of the predictor values {10 _i, i = i, …, Due north} merely through the summary statistics $ \overline{Ten} $ and SSX.

The joint examination of the intercept and slope coefficients given in Eq. 3 can be viewed as a special case of the general linear hypothesis considered in Rencher and Schaalje ([26], Section 8.4.two). However, two important aspects of this study should be pointed out. First, unlike the current consideration, the associated F examination and related statistical properties in Rencher and Schaalje [26] are presented nether the standard settings with fixed predictor values. Second, they did not address the ability and sample size problems under random modeling formulations. Accordingly, their fundamental results are extended here to accommodate the predictor features in ability and sample size calculations for the validation of simple linear regression models.

The statistical inferences about the regression coefficients are based on the conditional distribution of the continuous variables {X _i, i = 1, …, Northward}. Therefore, the resulting analysis would be specific to the observed values of the predictors. Information technology is articulate that, earlier conducting a research report, the actual values of predictors are non available beforehand just as the major responses. In view of the stochastic nature of the summary statistics $ \overline{X} $ and SSX, information technology is essential to recognize and assess the distribution of the test statistic over possible values of the predictors. To demonstrate the impact of the predictor features on power and sample size calculations, the normality setting is commonly employed to provide a convenient ground for analytical derivation and empirical test of random predictors equally in Gatsonis and Sampson [11], Sampson [xiii], and Shieh [14]. However, it is important to note that the power and sample size calculations of Gatsonis and Sampson [eleven], Sampson [13], and Shieh [14, xv] for detecting slope coefficients in multiple regression analysis are not applicative for assessing differences in intercept and slope coefficients considered hither.

Specifically, the continuous predictor variables {X _i, i = one, ..., N} are assumed to accept independent and identical normal distribution N(μ_X, $ {\upsigma}_X^2 $). With the normal assumption, it tin can be readily established that $ \overline{10} $ ~ N(μ_X, $ {\upsigma}_X^ii $/North) and K =SSX/$ {\upsigma}_X^2 $ ~ χ^two(κ) where κ =North – 1. Thus, the noncentrality Δ_J in Eq. 8 tin be expressed every bit

$$ {\Delta}_J=\kern0.5em \frac{\left\{N{\left(a+ bZ\correct)}^2+ dK\right\}}{\upsigma^two}, $$

(9)

where a = β_ID + μ_Xβ_SD, b = (d/N)^ane/2, d =$ {\upbeta}_{SD}^2{\upsigma}_X^ii $, and Z = ($ \overline{10} $ – μ_X)/($ {\upsigma}_X^ii $/N)^1/two ~ N(0, i). As a consequence, the F _J statistic has the ii-stage distribution

$$ {F}_J\mid \left[Thousand,\kern0.5em Z\correct]\kern1em \sim \kern0.5em F\left(2,\kern0.5em \upnu, \kern0.5em {\Delta}_J\right),\kern0.5em K\sim {\upchi}^2\left(\upkappa \right),\mathrm{and}\kern0.5em Z\sim N\left(0,i\right). $$

(ten)

Note that the two random variables 1000 and Z are independent. Moreover, the corresponding power function for the simultaneous examination tin can exist formulated as

$$ {\Psi}_J={E}_K{E}_Z\left[P\left\{F\left(ii,\upnu, {\Delta}_J\correct)>{F}_{two,v,\alpha}\right\}\correct], $$

(eleven)

where the expectations East _K and E _Z are taken with respect to the distributions of K and Z, respectively.

Alternatively, Colosimo et al. ([22], Section 3.2) described a simple and naive method to obtain an unconditional distribution of F _J. They substituted the sample values of the predictor variables in the noncentrality Δ_J with the corresponding expected value East[X _i] = μ_X for i = one, ..., N. Thus, the distribution of F _J is approximated by a noncentral F distribution:

$$ {F}_J\sim F\left(2,\ \upnu,\ {\Delta}_C\right), $$

(12)

where Δ_C = (Na ²)/σ². The suggested ability function of Colosimo et al. [22] for the joint test of intercept and slope coefficients is

$$ {\Psi}_C=P\left\{F\left(2,\upnu, {\Delta}_C\right)>{F}_{2,5,\alpha}\right\}. $$

(13)

It is vital to annotation that the judge ability function Ψ_C only involves a noncentral F distribution, whereas the normal predictor distributions lead to the exact and more complex power formula Ψ_J that consists of a articulation chi-square and normal mixture of noncentral F distributions. Obviously, the power function Ψ_C is relatively simpler to compute than the verbal formula Ψ_J. But the approximate nature of Ψ_C does not involve all of the predictor features in ability computations.

It follows from large sample theory that Z and Thou/Northward converge to 0 and 1, respectively. Hence, the sample-size-adapted noncentrality quantity Δ_J/Due north approaches $ {\Delta}_J^{\ast } $ equally the sample size N increases to infinity, where

$$ {\Delta}_J^{\ast }=\kern0.5em \frac{{\left({\upbeta}_{ID}+{\upmu}_X{\upbeta}_{SD}\right)}^2+{\upbeta}_{SD}^2{\upsigma}_X^2}{\upsigma^2}. $$

(14)

Hence, $ {\Delta}_J^{\ast } $ provides a convenient measurement of event size for the joint appraisal of intercept and slope coefficients. It tin can be immediately seen from the noncentrality term of the judge power function Ψ_C that $ {\Delta}_C^{\ast } $ = Δ_C/North = (β_ID + μ_Xβ_SD)ⁱⁱ/σ² <$ {\Delta}_J^{\ast } $ with the exceptions that β_SD = 0 and/or $ {\upsigma}_X^2 $ = 0. Consequently, the estimated ability Ψ_C is generally less than that of Ψ_J even for large sample sizes when all other configurations remain constant. It is shown later that while the ciphering is more involved for the complex power function Ψ_J, the exact approach has a clear advantage over the judge procedure in accurate ability calculations. For advance planning of a research design, the presented ability formulas can be employed to calculate the sample size North needed to attain the specified power 1 – β for the called significance level α, nada values {β_I0, β_South0}, coefficient parameters {β_I, β_S}, variance component σ², and predictor mean and variance {μ_X, $ {\upsigma}_X^2 $}. Information technology commonly involves an incremental search with a small initial value to notice the optimal solution for achieving the desired ability performance.

Multiple linear regression

The power and sample size calculations for the general scenario of multiple linear regression with more one predictor are discussed next. Consider the multiple linear regression model with response variable Y _i and p predictor variables (X _i1, ..., X _ip) for i = 1, ..., N:

$$ \mathbf{Y}=\mathbf{X}\boldsymbol{\upbeta } +\boldsymbol{\upvarepsilon}, $$

(xv)

where Y = (Y ₁, ..., Y _N)^T is an Northward × 1 vector with Y _i being the observed measurement of the ith subject; X = (1 _N, X _South) with 1 _North is the N × ane vector of all 1's, 10 _S = (X _S1, ..., X _SN)^T is an Due north ×p matrix, X _Si = (X _i1, ..., X _ip)^T, X _i1, ..., X _ip are the observed values of the p predictor variables of the ith subject field; β = (β_I, $ {\boldsymbol{\upbeta}}_S^{\mathrm{T}} $)^T is a (p + 1) × 1 vector with β _S = (β₁, ..., β_p)^T and β_I, β_one, ..., β_p are unknown coefficient parameters; and ε = (ε₁, ..., ε_North)^T is an N × 1 vector with ε_i are iid N(0, σ²) random variables.

For the joint test of intercept and slope coefficients in terms of

$$ {\mathrm{H}}_0:\boldsymbol{\upbeta} =\boldsymbol{\uptheta}\ \mathrm{versus}\ {\mathrm{H}}_1:\boldsymbol{\upbeta} \ne \boldsymbol{\uptheta}, $$

(xvi)

it can be shown from Rencher and Schaalje ([6], Section eight.4.2) that the test statistic is

$$ {F}_{MJ}=\frac{\left\{{\left(\widehat{\boldsymbol{\upbeta}}-\boldsymbol{\uptheta} \correct)}^{\mathrm{T}}\left({\mathbf{X}}^{\mathrm{T}}\mathbf{X}\right)\left(\widehat{\boldsymbol{\upbeta}}-\boldsymbol{\uptheta} \correct)\right\}/\left(p+one\right)}{{\widehat{\upsigma}}^2}, $$

(17)

where $ {\widehat{\upsigma}}^2 $ =SSE/ν is the usual unbiased figurer of σ². Under the zip hypothesis, F _MJ has an F distribution with p + 1 and ν degrees of freedom

$$ {F}_{MJ}\sim F\left(p+ane,v\right) $$

(18)

The joint test can be conducted by reject H ₀ at the significance level α if F _MJ >F _{(p + ane), ν, α}. In general, F _MJ has the nonnull distribution for the given values of 10 _{Due south}:

$$ {F}_{MJ}\sim F\left(p+1,\ \upnu,\ {\Delta}_{MJ}\correct), $$

(nineteen)

where F(p + 1, ν, Δ_MJ) is a noncentral F distribution with p + 1 and ν degrees of freedom and noncentrality parameter Δ_MJ with

$$ {\Delta}_{MJ}=\frac{\left\{{\left(\boldsymbol{\upbeta} -\boldsymbol{\uptheta} \right)}^{\mathrm{T}}\left({\mathbf{X}}^{\mathrm{T}}\mathbf{X}\right)\left(\boldsymbol{\upbeta} -\boldsymbol{\uptheta} \right)\right\}}{\upsigma^2}. $$

(20)

It is essential to emphasize that the inferences in Rencher and Schaalje [26] are concerned mainly with the slope coefficients β _S. As noted in the context of unproblematic linear regression, the fundamental results concerning fixed predictor values are extended here to power and sample size calculations for the validation of linear regression models nether random predictor settings.

In view of the random nature of the predictor variables, the continuous predictor variables {X _Si, i = one, ..., Northward} are assumed to take independent multinormal distributions N _p(μ _X, Σ _X). With the multinormal assumptions, information technology can be readily established that $ {\overline{\mathbf{10}}}_S $ = $ \sum \limits_{i=1}^Northward $ X _Si/Northward ~ N _p(μ _Ten, Σ ₁₀/N) and A = $ \sum \limits_{i=1}^N $(Ten _Si – $ {\overline{\mathbf{10}}}_S $)(X _Si – $ {\overline{\mathbf{X}}}_S $)^T ~ W _p(κ, Σ _Ten), where W _p(κ, Σ ₁₀) is a Wishart distribution with κ degrees of freedom and covariance matrix Σ _X, and κ =N – 1. Thus, the noncentrality Δ_MJ can exist rewritten every bit

$$ {\Delta}_{MJ}=\kern0.5em \frac{\left\{N{\left({\upbeta}_{ID}+{\boldsymbol{\upbeta}}_{SD}^{\mathrm{T}}{\overline{\mathbf{X}}}_S\right)}^two+{\boldsymbol{\upbeta}}_{SD}^{\mathrm{T}}{\mathbf{A}\boldsymbol{\upbeta}}_{SD}\right\}}{\upsigma^2}, $$

(21)

where β_ID = β_I – θ_I and β _SD =β _S – θ _S. Using the prescribed distributions of $ {\overline{\mathbf{10}}}_S $ and A, it can be shown that β_ID +$ {\boldsymbol{\upbeta}}_{SD}^{\mathrm{T}}{\overline{\mathbf{X}}}_S $ =a +bZ ~ N(a, b ^two), Z ~ N(0, 1), and K =$ {\boldsymbol{\upbeta}}_{SD}^{\mathrm{T}}{\mathbf{A}\boldsymbol{\upbeta}}_{SD} $/d ~ χ²(κ), where a = β_ID +$ {\boldsymbol{\upbeta}}_{SD}^{\mathrm{T}} $ μ _X, b = (d/N)^1/2, and d =$ {\boldsymbol{\upbeta}}_{SD}^{\mathrm{T}}{\boldsymbol{\Sigma}}_X{\boldsymbol{\upbeta}}_{SD} $. Note that the two random variables K and Z are independent. It is conceptually simple and computationally user-friendly to subsume the stochastic features of $ {\overline{\mathbf{Ten}}}_S $ and A in terms of Z and K. Accordingly, the noncentrality quantity Δ_J is formulated as

$$ {\Delta}_{MJ}=\frac{\left\{N{\left(a+ bZ\right)}^2+ dK\right\}}{\upsigma^2}. $$

(22)

Thus, under the multinormal predictor assumptions, the F _MJ statistic has the ii-phase distribution

$$ {F}_{MJ}\mid \left[Yard,Z\right]\sim F\left(p+1,\kern0.5em \upnu, \kern0.5em {\Delta}_{MJ}\right),K\sim {\upchi}^2\left(\upkappa \right),\mathrm{and}\kern0.5em Z\sim Northward\left(0,ane\right). $$

(23)

The respective power function for the simultaneous test can be termed as

$$ {\Psi}_{MJ}={Eastward}_K{East}_Z\left[P\left\{F\left(p+i,\kern0.5em \upnu, \kern0.5em {\Delta}_{MJ}\right)>{F}_{\left(p+1\right),\upnu, \upalpha}\correct\}\right], $$

(24)

where the expectations E _Thou and Due east _Z are taken with respect to the distribution of K and Z, respectively. Plainly, when p = 1, the test statistic F _MJ and power function Ψ_MJ reduce to the simplified formulas of F _{One thousand} and Ψ_J given in Eqs. 4 and 11, respectively.

Results

An illustration

To demonstrate the prescribed power and sample size procedures, the simplified formula for estimating fetal weight in Rose and McCallum [vi] is used as a benchmark for validation. Although at that place are several different methods for estimating the fetal weight, it was demonstrated in Anderson et al. [27] that the simple linear regression formula of Rose and McCallum [6] compares favorably with other techniques. Based on the ultrasound examinations conducted in the Stanford University Infirmary labor and delivery suite between January 1981 and March 1984, they presented a useful formula for predicting the natural logarithm of birth weight with the sum of head, abdomen, and limb ultrasound measurements as given by the equation: ln(BW) = iv.198 + 0.143·Ten, where X = biparietal bore + mean abdominal diameter + femur length (in centimeters). The average nascence weight of their study population was 2275 g with a range of 490–5300 g. The detailed comparisons and related discussions of viable equations for estimating fetal weight tin be plant in Anderson et al. [27] and the references therein.

Feasibly, there are underlying differences in fetal weight between dissimilar ethnic origins, cohort groups, and time periods. To validate the simple formula for a target population, it requires a detailed scheme to decide the necessary sample size so that the conducted written report has a decent assurance in detecting the potential discrepancy. For illustration, the intercept and slope coefficients are set as β_I = four.ane and β_S = 0.15, respectively. The mistake component is selected to exist σ² = 0.095. The characteristics of the ultrasound measurements are represented by the mean μ_X = 24.two and variance $ {\upsigma}_X^2 $ = six. Note that these configurations assure that the expected fetal weight of the designated population E[BW] =E[exp(4.i + 0.15·10 + ε)] = 2275.52 coincides the average magnitude of birth weighs reported in Rose and McCallum [6]. To examination the hypothesis of H₀: (β_I, β_South) = (4.198, 0.143) versus H₁: (β_I, β_S) ≠ (4.198, 0.143) with the significance level α = 0.05, numerical computations showed that the sample sizes of N _E = 173 and 227 are required for the verbal approach to reach the target power of 0.8 and 0.9, respectively. Because of the sample sizes demand to exist integer values in practice, the attained power is slightly greater than the nominal power level. In these ii cases, the achieved powers of the two sample sizes are Ψ_J = 0.8001 and 0.9010, respectively. These results were computed with the supplementary algorithms presented in Boosted files 1 and 2. For ease of application, the prescribed configurations are incorporated in the user specification sections of the SAS/IML programs.

On the other manus, the matching sample sizes computed with the approximate method of Colosimo et al. [22] are Due north _C = 183 and 239 with the attained powers of Ψ_C = 0.8010 and 0.9002, respectively. Therefore, the unproblematic method of Colosimo et al. [22] clearly requires 183–173 = ten and 239–227 = 12 more babies than the verbal formula to satisfy the nominal power performance. Actually, the verbal power function gives the values Ψ_J = 0.8236 and 0.9161 with the sample sizes 183 and 239, respectively. Hence, the resulting power differences between the two magnitudes of sample size are 0.8236–0.8001 = 0.0235 and 0.9161–0.9010 = 0.0151. To enhance the analogy, the computed sample size, estimated power, and difference for the exact and approximate procedures are summarized in Table one. The sample size and power calculations show that the guess power role Ψ_C tends to underestimate powers because the simplification of noncentrality parameter in the noncentral F distribution. Correspondingly, the approximate method of Colosimo et al. [22] often overestimates the required sample sizes for validation analysis. It is essential to note that adopting a pocket-size sample size will crusade a report that has bereft power to demonstrate model difference. In this case of Colosimo et al. [22], their method may lead to an over-sized study that wastes fourth dimension, money, and other resources. More than importantly, the hypothesis tests of validation studies are over-rejected and yield erroneous conclusions. It is of both practical usefulness and theoretical business organisation to further appraise the intrinsic implications of the 2 distinct procedures for other settings. Detailed empirical studies are described next to evaluate and compare their accuracy nether a broad variety of model configurations.

Tabular array i Computed sample size, estimated power, and difference for the exact and approximate procedures with {β_I, β_S} = {4.ane, 0.15}, {β_I0, β_{Due south0}} = {4.198, 0.143}, σⁱⁱ = 0.095, μ_X = 24.ii, $ {\sigma}_X^2 $ = 6, and Blazon I error α = 0.05

Full size tabular array

Numerical comparisons

In view of the potential discrepancy between the verbal and estimate procedures, numerical investigations of ability and sample size calculations were conducted nether a broad range of model configurations in two studies. The start assessment focuses on the situations with normal predictor variables, while the second study concerns the robustness of the two methods under several prominent situations of not-normal predictors.

Normal predictors

For ease of comparison, the model settings in Colosimo et al. [22] are considered and expanded to reveal the singled-out behavior of the contending procedures. Specifically, the null and culling hypotheses are

$$ {\mathrm{H}}_0:\kern1em \left[\begin{array}{c}{\upbeta}_I\\ {}{\upbeta}_S\end{array}\right]=\left[\brainstorm{array}{c}0\\ {}ane\end{array}\right]\kern0.5em \mathrm{versus}\ {\mathrm{H}}_1:\kern0.5em \left[\begin{array}{c}{\upbeta}_I\\ {}{\upbeta}_S\finish{assortment}\right]\ne \left[\brainstorm{assortment}{c}0\\ {}one\end{array}\right], $$

where {β_I, β_Southward} = {d, one +d} and {β_ID, β_SD} = {d, d} with d = 0.3, 0.4, and 0.v. Note that these coefficient settings are equivalent to those with {β_I, β_S} = {β_I0 +d, β_South0 +d} considering they atomic number 82 to the same differences {β_ID, β_SD} = {d, d} and the resulting ability functions remain identical. The error component is stock-still every bit σⁱⁱ = one and the predictors X are assumed to have normal distributions with mean μ_Ten = {0, 0.5, ane} and variance $ {\upsigma}_X^2 $ = {0.5, 1, 2}. Overall these considerations result in a total of 27 different combined settings. These combinations of model configurations were chosen to correspond the possible characteristics that are likely to be encountered in actual applications and also to maintain a reasonable range for the magnitudes of sample size without making unrealistic assessments.

Throughout this empirical investigation, the significance level and nominal power are fixed every bit α = 0.05 and 1 – β = 0.90, respectively. With the prescribed specifications, the required sample sizes are computed for the exact procedure with the power office Ψ_J. The computed sample sizes of the nine combined predictor hateful and variance patterns are summarized in Tabular array 2, Table S1 and Tabular array S2 for the coefficient departure d = 0.3, 0.iv, and 0.5, respectively. Every bit suggested by a referee, Tables S1 and S2 are presented in Additional files 3 and 4, respectively. In order to evaluate the accuracy of power calculations, the estimated power of the exact and approximate procedures are also presented. Note that the attained values of the verbal arroyo are marginally larger than the nominal level 0.90. In contrast, the estimated powers of the approximation of Colosimo et al. [22] are all less than 0.90 and the difference is quite substantial in some cases. Then, Monte Carlo simulation studies of 10,000 iterations were performed to compute the false ability for the designated sample sizes and parameter configurations. For each replicate, N predictor values were generated from the designated normal distribution Due north(μ_X, $ {\upsigma}_X^2 $). The resulting values of normal predictor, intercept and gradient coefficients {β_I, β_S}, and error variance σ², in turn, decide the configurations for producing Northward normal outcomes of the simple linear regression model defined in Eq. 1. Next, the test statistic F _J was computed and the simulated power was the proportion of the 10,000 replicates whose test statistics F _J exceed the corresponding critical value F _{2, ν, 0.05}. The capability of the ii sample size procedures is adamant by the error betwixt the estimate power and the simulated power. The simulated ability and mistake are also summarized in Table 2, Table S1 and Tabular array S2 for all twenty-7 design schemes.

Tabular array 2 Computed sample size, estimated power, and imitation power for Normal predictors with {β_I, β_Southward} = {0.three, 1.iii}, {β_I0, β_S0} = {0, 1}, σ² = i, Type I mistake α = 0.05, and nominal power 1 – β = 0.90

Full size table

It tin be seen from these results that the discrepancy betwixt the estimated power and the faux power is considerably small for the proposed exact technique for all model configurations considered hither. Specifically, the resulting errors of the 27 designs are all within the small range of − 0.0087 to 0.0056. On the other mitt, the estimated powers of the approximate method are constantly smaller than the simulated powers. The outcomes testify a articulate design that accented mistake decreases with coefficient difference d and predictor mean μ_X, and increases with predictor variance $ {\upsigma}_X^2 $, when all other configurations are held constant. Notably, the associated accented errors tin exist every bit large equally 0.4456, 0.4295, and 0.4183 when μ₁₀ = 0 and $ {\upsigma}_X^2 $ = 2 for d = 0.3, 0.4, and 0.v in Tabular array ii, Tabular array S1, Table S2, respectively. It should exist noted that most of the sample sizes reported in the empirical exam of Colosimo et al. [22] (Table 1) are rather big and impractical. This may explain why the performance of the estimate formula was acceptable in their study. In fact, some of their cases with smaller sample sizes also showed the aforementioned phenomenon that the simple method leads to an underestimate of power level and an overestimated sample size required to achieve the nominal power. Essentially, the simplicity of the approximate formula does come with a huge price in terms of inaccurate power and sample size calculations.

Non-normal predictors

To address the sensitivity bug of the two techniques, ability and sample size calculations were also conducted for the regression models with non-normal predictors. For illustration, the model settings in Tabular array 2 with {β_ID, β_SD} = {0.3, 0.three} are modified by bold the predictors have four dissimilar sets of distributions: Exponential(ane), Gamma(2, i), Laplace(i), and Uniform(0, 1). For ease of comparison, the designated distributions were linearly transformed to have hateful μ_Ten and variance $ {\upsigma}_X^ii $ as reported in the previous study. Hence, the computed sample sizes associated with the exact procedure and estimated powers of the two methods remain identical for the 4 non-normal distributions. The simulated powers were obtained with the Monte Carlo simulation studies of 10,000 iterations under the selected model configurations and non-normal predictor distributions. Similar to the numerical assessments in the preceding study, the computed sample sizes, imitation powers, estimated powers, and associated errors of the two competing procedures are presented in Tables S3-S6 of Additional files 5, six, seven, 8 for the four types of not-normal predictors, respectively.

Regarding the robustness properties of the two procedures, the results in suggest that the performance of the verbal arroyo is slightly affected by the non-normal covariate settings. The high skewness and kurtosis of the Exponential distribution patently has a more than prominent impact on the normal-based power office than the other three cases of Gamma, Laplace, and Compatible distributions. Annotation that the approximate method only depends on the mean values of the predictors and is presumably less sensitive to the variation of predictor distributions. Nevertheless, the accuracy marginally improved in some cases, merely generally maintains almost the same performance as in the normal setting presented in Tabular array 2. In curt, the sensitivity and robustness of the suggested verbal technique depends on the level of how badly predictor distributions depart from normality structure. On the other hand, the performance assessments bear witness that the exact procedure still requite acceptable results fifty-fifty in the situations with not-normal predictors considered here. More importantly, these empirical evidences reveal that the exact arroyo is relatively more than reliable and authentic than the approximate method to be recommended equally a trustworthy technique for power and sample calculations.

Discussion

In practise, a research report requires adequate statistical ability and sufficient sample size to detect scientifically credible effects. Although multiple linear regression is a well-recognized statistical tool, the corresponding power and sample size trouble for model validation has not been adequately examined in the literature. To heighten the usefulness of the joint test of intercept and slope coefficients in linear regression analysis, this article presents theoretical discussions and computational algorithms for power and sample size calculations under the random modeling framework. The stochastic nature of predictor variables is taken into account by assuming that they have an contained and identical normal distribution. In contrast, the existing method of Colosimo et al. [22] adopted a direct replacement of mean values for the predictor variables. Consequently, the proposed exact arroyo has the prominent advantage of accommodating the complete distributional features of normal predictors whereas the unproblematic approximation of Colosimo et al. [22] only includes the mean parameters of the predictor variables.

Conclusions

The presented analytic derivations and empirical results indicate that the approximate formula of Colosimo et al. [22] generally does non give authentic power and sample size calculations. According to the overall accuracy and robustness, the exact arroyo clearly outperforms the judge methods as a useful tool in planning validation study. Although the numerical illustration only involves a predictor variable, it embodies the underlying principle and disquisitional feature of linear regression that tin exist useful in conducting similar evaluations for the more than general framework of multiple linear regression.

Abbreviations

ANCOVA:: Analysis of covariance
ANOVA:: Analysis of variance

References

Cohen J, Cohen P, Due west SG, Aiken LS. Applied multiple regression/correlation analysis for the behavioral sciences. tertiary ed. Mahwah: Erlbaum; 2003.

Google Scholar
Kutner MH, Nachtsheim CJ, Neter J, Li West. Applied linear statistical models. 5th ed. New York: McGraw Hill; 2005.

Google Scholar
Montgomery DC, Peck EA, Vining GG. Introduction to linear regression analysis. 5th ed. Hoboken: Wiley; 2012.

Google Scholar
Snee RD. Validation of regression models: methods and examples. Technometrics. 1977;nineteen:415–28.

Article Google Scholar
Maddahi J, Crues J, Berman DS, et al. Noninvasive quantification of left ventricular myocardial mass by gated proton nuclear magnetic resonance imaging. J Am Coll Cardiol. 1987;10:682–92.

CAS Article Google Scholar
Rose BI, McCallum WD. A simplified method for estimating fetal weight using ultrasound measurements. Obstet Gynecol. 1987;69:671–4.

CAS PubMed Google Scholar
Cohen J. Statistical power analysis for the behavioral sciences. 2d ed. Hillsdale: Erlbaum; 1988.

Google Scholar
Kraemer HC, Blasey C. How many subjects?: Statistical ability assay in research. 2nd ed. Los Angeles: Sage; 2015.

Google Scholar
Murphy KR, Myors B, Wolach A. Statistical ability analysis: a simple and general model for traditional and modern hypothesis tests. fourth ed. New York: Routledge; 2014.

Volume Google Scholar
Ryan TP. Sample size determination and power. Hoboken: Wiley; 2013.

Book Google Scholar
Gatsonis C, Sampson AR. Multiple correlation: exact power and sample size calculations. Psychol Bull. 1989;106:516–24.

CAS Article Google Scholar
Mendoza JL, Stafford KL. Confidence interval, power adding, and sample size interpretation for the squared multiple correlation coefficient under the fixed and random regression models: a estimator programme and useful standard tables. Educ Psychol Meas. 2001;61:650–67.

Commodity Google Scholar
Sampson AR. A tale of two regressions. J Am Stat Assoc. 1974;69:682–9.

Article Google Scholar
Shieh Grand. Exact interval estimation, ability calculation and sample size determination in normal correlation analysis. Psychometrika. 2006;71:529–forty.

Article Google Scholar
Shieh K. A unified approach to power calculation and sample size determination for random regression models. Psychometrika. 2007;72:347–60.

Article Google Scholar
Shieh G. Exact analysis of squared cross-validity coefficient in predictive regression models. Multivar Behav Res. 2009;44:82–105.

Article Google Scholar
Kelley G. Sample size planning for the squared multiple correlation coefficient: accurateness in parameter estimation via narrow confidence intervals. Multivar Behav Res. 2008;43:524–55.

Article Google Scholar
Krishnamoorthy K, Xia Y. Sample size calculation for estimating or testing a nonzero squared multiple correlation coefficient. Multivar Behav Res. 2008;43:382–410.

CAS Article Google Scholar
Shieh G. Sample size requirements for interval estimation of the forcefulness of association effect sizes in multiple regression analysis. Psicothema. 2013;25:402–vii.

PubMed Google Scholar
Shieh Thou. Power and sample size calculations for dissimilarity analysis in ANCOVA. Multivar Behav Res. 2017;52:1–11.

Commodity Google Scholar
Tang Y. Exact and approximate power and sample size calculations for assay of covariance in randomized clinical trials with or without stratification. Stat Biopharm Res. 2018;ten:274–86.

Commodity Google Scholar
Colosimo EA, Cruz FR, Miranda JLO, et al. Sample size calculation for method validation using linear regression. J Stat Comput Simul. 2007;77:505–sixteen.

Article Google Scholar
Binkley JK, Abbot PC. The fixed X supposition in econometrics: can the textbooks be trusted? Am Stat. 1987;41:206–14.

Google Scholar
Cramer EM, Appelbaum MI. The validity of polynomial regression in the random regression model. Rev Educ Res. 1978;48:511–5.

Commodity Google Scholar
Shaffer JP. The Gauss-Markov theorem and random regressors. Am Stat. 1991;45:269–73.

Google Scholar
Rencher Ac, Schaalje GB. Linear models in statistics. 2nd ed. Hoboken: Wiley; 2007.

Volume Google Scholar
Anderson NG, Jolley IJ, Wells JE. Sonographic estimation of fetal weight: comparison of bias, precision and consistency using 12 dissimilar formulae. Ultrasound Obstet Gynecol. 2007;thirty:173–ix.

CAS Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the editor and two reviewers for their constructive comments that led to an improved article.

Funding

No funding.

Availability of data and materials

The summary statistics are available from the following article: [6].

Author data

Affiliations

Department of Practical Mathematics, Chung Yuan Christian University, Taoyuan, Taiwan, 32023, Republic of People's republic of china

Show-Li Jan
Section of Management Scientific discipline, National Chiao Tung University, Hsinchu, Taiwan, 30010, Commonwealth of China

Gwowen Shieh

Contributions

SLJ conceived of the study, and participated in the development of theory and helped to draft the manuscript. GS carried out the numerical computations, participated in the empirical analysis and drafted the manuscript. Both authors read and approved the terminal manuscript.

Corresponding writer

Correspondence to Gwowen Shieh.

Ethics declarations

Authors' information

SLJ is a professor of Applied Mathematics, Chung Yuan Christian University, Taoyuan, Taiwan 32023. GS is a professor of Management Science, National Chiao Tung Academy, Hsinchu, Taiwan 30010.

Ethics blessing and consent to participate

Non applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Boosted files

Additional file one:

SAS/IML programme for computing the power for the articulation test of intercept and gradient coefficients. (PDF 60 kb)

Boosted file 2:

SAS/IML programme for calculating the sample size for the joint exam of intercept and slope coefficients. (PDF sixty kb)

Additional file 3:

Tabular array S1. Computed sample size, estimated ability, and simulated power for Normal predictors with {β_I, β_S} = {0.4, i.4}, {β_I0, β_S0} = {0, ane}, σ² = 1, Blazon I error α = 0.05, and nominal power 1 – β = 0.90. (PDF 95 kb)

Boosted file four:

Tabular array S2. Computed sample size, estimated ability, and simulated power for Normal predictors with {β_I, β_South} = {0.five, 1.5}, {β_I0, β_S0} = {0, one}, σ² = 1, Type I mistake α = 0.05, and nominal power 1 – β = 0.90. (PDF 96 kb)

Additional file 5:

Table S3. Computed sample size, estimated ability, and imitation power for transformed Exponential predictors with {β_I, β_S} = {0.3, i.three}, {β_I0, β_Southward0} = {0, 1}, σ² = 1, Type I mistake α = 0.05, and nominal power 1 – β = 0.ninety. (PDF 95 kb)

Additional file 6:

Table S4. Computed sample size, estimated ability, and imitation power for transformed Gamma predictors with {β_I, β_S} = {0.3, 1.3}, {β_I0, β_S0} = {0, 1}, σ² = 1, Blazon I mistake α = 0.05, and nominal power 1 – β = 0.ninety. (PDF 95 kb)

Additional file 7:

Table S5. Computed sample size, estimated power, and false power for transformed Laplace predictors with {β_I, β_S} = {0.3, i.3}, {β_I0, β_Southward0} = {0, one}, σ² = i, Type I error α = 0.05, and nominal power 1 – β = 0.90. (PDF 95 kb)

Additional file 8:

Tabular array S6. Computed sample size, estimated power, and simulated ability for transformed Uniform predictors with {β_I, β_Southward} = {0.iii, i.iii}, {β_I0, β_Southward0} = {0, i}, σ² = 1, Type I error α = 0.05, and nominal power one – β = 0.xc. (PDF 96 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution four.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted apply, distribution, and reproduction in any medium, provided you give advisable credit to the original author(south) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Artistic Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/ane.0/) applies to the information fabricated available in this article, unless otherwise stated.

Reprints and Permissions

About this commodity

Verify currency and authenticity via CrossMark

Cite this commodity

Jan, SL., Shieh, G. Sample size calculations for model validation in linear regression analysis. BMC Med Res Methodol 19, 54 (2019). https://doi.org/10.1186/s12874-019-0697-9

Download citation

Received: 31 August 2018
Accepted: 26 February 2019
Published: 12 March 2019
DOI : https://doi.org/ten.1186/s12874-019-0697-9

Keywords

Linear regression
Model validation
Power
Sample size
Stochastic predictor