Wednesday, May 6, 2020

Simple Linear Regression Free Essays

string(661) " 200 300 400 500 600 700 800 900 1000 Appraised Value \(in Thousands of Dollars\) Review: Inference for Regression We can describe the relationship between x and y using a simple linear regression model of the form  µy = \? 0 \+ \? 1 x 1000 900 Sale Price \(in Thousands of Dollars\) 800 700 600 500 400 300 200 100 0 0 100 200 300 400 500 600 700 800 900 1000 Appraised Value \(in Thousands of Dollars\) response variable y : sale price explanatory variable x: appraised value relationship between x and y : linear strong positive We can estimate the simple linear regression model using Least Squares \(LS\) yielding the following LS regression line: y = 20\." Stat 326 – Introduction to Business Statistics II Review – Stat 226 Spring 2013 Stat 326 (Spring 2013) Introduction to Business Statistics II 1 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 2 / 47 Review: Inference for Regression Example: Real Estate, Tampa Palms, Florida Goal: Predict sale price of residential property based on the appraised value of the property Data: sale price and total appraised value of 92 residential properties in Tampa Palms, Florida 1000 900 Sale Price (in Thousands of Dollars) 800 700 600 500 400 300 200 100 0 0 100 200 300 400 500 600 700 800 900 1000 Appraised Value (in Thousands of Dollars) Review: Inference for Regression We can describe the relationship between x and y using a simple linear regression model of the form  µy = ? 0 + ? 1 x 1000 900 Sale Price (in Thousands of Dollars) 800 700 600 500 400 300 200 100 0 0 100 200 300 400 500 600 700 800 900 1000 Appraised Value (in Thousands of Dollars) response variable y : sale price explanatory variable x: appraised value relationship between x and y : linear strong positive We can estimate the simple linear regression model using Least Squares (LS) yielding the following LS regression line: y = 20. 94 + 1. 069x Stat 326 (Spring 2013) Introduction to Business Statistics II / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 4 / 47 Review: Inference for Regression Interpretation of estimated intercept b0 : corresponds to the predicted value of y , i. We will write a custom essay sample on Simple Linear Regression or any similar topic only for you Order Now e. y , when x = 0 Review: Inference for Regression Interpretation of estimated slope b1 : corresponds to the change in y for a unit increase in x: when x increases by 1 unit y will increase by the value of b1 interpretation of b0 is not always meaningful (when x cannot take values close to or equal to zero) here b0 = 20. 94: when a property is appraised at zero value the predicted sales price is $20,940 — meaningful?! Stat 326 (Spring 2013) Introduction to Business Statistics II 5 / 47 b1 0: y decreases as x increases (negative association) b1 0: y increases as x increases (positive association) here b1 = 1. 069: when the appraised value of a property increases by 1 unit, i. e. by $1,000, the predicted sale price will increase by $1,069. Stat 326 (Spring 2013) Introduction to Business Statistics II 6 / 47 Review: Inference for Regression Measuring strength and adequacy of a linear relationship correlation coe? cient r : measure of strength of linear relationship ? 1 ? r ? 1 here: r = 0. 9723 Review: Inference for Regression Population regression line Recall from Stat 226 Population regression line The regression model that we assume to hold true for the entire population is the so-called population regression line where  µy = ? 0 + ? 1 x, coe? cient of determination r 2 : amount of variation in y explained by the ? tted linear model 0 ? r2 ? 1 here: r 2 = (0. 9723)2 = 0. 9453 ? 94. 53% of the variation in the sale price can be explained through the linear relationship between the appraised value (x) and the sale price (y ) Stat 326 (Spring 2013) Introduction to Business Statistics II 7 / 47  µy — average (mean) value of y in population for ? xed value of x ? — population intercept ? 1 — population slope The population regression line could only be obtained if we had information on all individuals in the population. Stat 326 (Spring 2013) Introduction to Business Statistics II 8 / 47 Review: Inference for Regression Based on the population regression line we can fully describe re lationship between x and y up to a random error term ? y = ? 0 + ? 1 x + ? , where ? ? N (0, ? ) Review: Inference for Regression In summary, these are important notations used for SLR: Description x y Parameters ? 0 ? 1  µy ? Stat 326 (Spring 2013) Introduction to Business Statistics II 9 / 47 Stat 326 (Spring 2013) Description Estimates b0 b1 y e Description Introduction to Business Statistics II 10 / 47 Review: Inference for Regression Review: Inference for Regression Validity of predictions Assuming we have a â€Å"good† model, predictions are only valid within the range of x-values used to ? t the LS regression model! Predicting outside the range of x is called extrapolation and should be avoided at all costs as predictions can become unreliable. Why ? t a LS regression model? A â€Å"good† model allows us to make predictions about the behavior of the response variable y for di? rent values of x estimate average sale price ( µy ) for a property appraised at $223,000: x = 223 : y = 20. 94 + 1. 069 ? 223 = 259. 327 ? the average sale price for a property appraised at $223,000 is estimated to be about $259,327 What is a â€Å"good† model? — answer to this question is not straight forward. We can visually check the validity of the ? tted linear model (through residu al plots) as well as make use of numerical values such as r 2 . more on assessing the validity of regression model will follow. 11 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 12 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II Review: Inference for Regression What to look for: Review: Inference for Regression Regression Assumptions residual plot: Assumptions SRS (independence of y -values) linear relationship between x and  µy for each value of x, population of y -values is normally distributed (? ? ? N) r2 : for each value of x, standard deviation of y -values (and of ? ) is ? In order to do inference (con? dence intervals and hypotheses tests), we need the following 4 assumptions to hold: Stat 326 (Spring 2013) Introduction to Business Statistics II 13 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 14 / 47 Review: Inference for Regression †SRS Assumption† is hardest to check The †Linearity Assumption† and †Constant SD Assumption† are typically checked visually through a residual plot. Recall: residual = y ? y = y ? (b0 + b1 x) The †Normality Assumption† is checked by assessing whether residuals are approximately normally distributed (use normal quantile plot) plot x versus residuals any pattern indicates violation Review: Inference for Regression Stat 326 (Spring 2013) Introduction to Business Statistics II 15 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 16 / 47 Review: Inference for Regression Returning to the Tampa Palms, Florida example: 100 50 Residual 0 -50 -100 -150 0 100 200 300 400 500 600 700 800 900 1000 Review: Inference for Regression Going one step further, excluding the outlier yields 0. 2 0. 1 0. 0 -0. 1 -0. 2 -0. 3 4 4. 5 5 5. 5 log Appraised 6 6. 5 7 Residual Appraised Value (in Thousands of Dollars) Note: non-constant variance can often be stabilized by transforming x, or 0. 5 y , or both: Residual 0. 0 -0. 5 -1. 0 -1. 5 4 4. 5 5 5. 5 log Appraised 6 6. 5 7 outliers/in? uential points in general should only be excluded from an analysis if they can be explained and their exclusion can be justi? ed, e. g. ypo or invalid measurements, etc. excluding outliers always means a loss of information handle outliers with caution may want to compare analyses with and without outliers Stat 326 (Spring 2013) Introduction to Business Statistics II 17 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 18 / 47 Review: Inference for Regression normal quantil e plots Tampa Palms example Residuals Sale Price (in Thousands of Dollars) 100 .01 . 05 . 10 . 25 . 50 . 75 . 90 . 95 . 99 Review: Inference for Regression Residuals log Sale 50 Regression Inference Con? dence intervals and hypotheses tests -3 -2 -1 0 1 2 3 Normal Quantile Plot -50 -100 Need to assess whether linear relationship between x and y holds true for entire population. .01 . 05 . 10 . 25 . 50 . 75 . 90 . 95 . 99 Residuals log Sale without outlier 0. 2 0. 1 0 -0. 1 -0. 2 -0. 3 -3 -2 -1 0 1 2 3 This can be accomplished through testing H0 : ? 1 = 0 vs. H0 : ? 1 = 0 based on the estimates slope b1 . For simplicity we will work with the untransformed Tampa Palms data. Normal Quantile Plot Stat 326 (Spring 2013) Introduction to Business Statistics II 19 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 20 / 47 Review: Inference for Regression Review: Inference for Regression Example: Find 95% CI for ? 1 for the Tampa Palms data set Con? dence intervals We can construct con? dence intervals (CIs) for ? 1 and ? 0 . General form of a con? dence interval estimate  ± t ? SEestimate , where t ? is the critical value corresponding to the chosen level of con? dence C t ? is based on the t-distribution with n ? 2 degrees of freedom (df) Interpretation: Stat 326 (Spring 2013) Introduction to Business Statistics II 21 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 22 / 47 Review: Inference for Regression Review: Inference for Regression Testing for a linear relationship between x and y If we wish to test whether there exists a signi? cant linear relationship between x and y , we need to test H0 : ? 1 = 0 Why? If we fail to reject the null hypothesis (i. e. stick with H0 = ? 1 = 0), the LS regression model reduces to  µy = ? 1 =0 versus Ha : ? 1 = 0 ?0 + ? 1 x ? 0 + 0  · x ? 0 (constant) Introduction to Business Statistics II 24 / 47 = = implying that  µy (and hence y ) is not linearly dependent on x. Stat 326 (Spring 2013) Introduction to Business Statistics II 23 / 47 Stat 326 (Spring 2013) Review: Inference for Regression Review: Inference for Regression Example (Tampa Palms data set): Test at the ? = 0. 05 level of signi? cance for a linear relationship between the appraised value of a property and the sale price Stat 326 (Spring 2013) Introduction to Business Statistics II 25 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 26 / 47 Inference about Prediction Why ? t a LS regression model? The purpose of a LS regression model is to 1 Inference about Prediction 2 estimate  µy – average/mean value of y for a given value of x, say x ? e. g. estimate average sale price  µy for all residential property in Tampa Palms appraised at x ? $223,000 predict y – an individual/single future value of the response variable y for a given value of x, say x ? e. g. predict a future sale price of an individual residential property appraised at x ? =$223,000 Keep in mind that we consider predictions for only one value of x at a time. Note, these two tasks are VERY di? erent. Carefully think about the di? erence! Stat 326 (Spring 2013) Introduction to Business Statistics II 27 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 28 / 47 Inference about Prediction To estimate  µy and to predict a single future y value for a given level of x = x ? we can use the LS regression line y = b0 + b1 x Simply substitute the desired value of x, say x ? , for x: y = b0 + b1 x ? Inference about Prediction In addition we need to know how much variability is associated with the point estimator. Taking the variability into account provides information about how good and reliable the point estimator really is. That is, which range potentially captures the true (but unknown) parameter value? Recall from 226 ? construction of con? dence intervals Stat 326 (Spring 2013) Introduction to Business Statistics II 29 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 0 / 47 Inference about Prediction Much more variability is associated with estimating a single observation than estimating an average — individual observations always vary more than averages!! Inference about Prediction Therefore we distinguish a con? dence interval for the average/mean response  µy and a prediction interval for a single future observation y Both intervals use a t ? critical value from a t-distribution with df = n ? 2. the standard error will be di? erent for each interval: While the point estimator for the average  µy and the future individual value y are the same (namely y = b0 + b1 x ? , the of the two con? dence intervals ! Stat 326 (Spring 2013) Introduction to Business Statistics II 31 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 32 / 47 Inference about Prediction Con? dence interval for the average/mean response  µy Width of the con? dence interval is determined using the standard error SE µ (from estimating the mean response) SE µ can be obtained in JMP Keep in mind that every con? dence interval is always constructed for one speci? c given v alue x ? A level C con? dence interval for the average/mean response  µy , when x takes the value x? is given by y  ± t ? SE µ , where SE µ is the standard error for estimating a mean response. Stat 326 (Spring 2013) Introduction to Business Statistics II 33 / 47 Inference about Prediction Prediction interval for a single (future) value y Again, Width of the con? dence interval is determined using the standard error SE µ (from estimating the mean response) SEy can be obtained in JMP Keep in mind that every prediction interval is always constructed for one speci? c given value x ? A level C prediction interval for a single observation y , when x takes the value x ? is given by y  ± t ? SEy , where SEy is the standard error for estimating a single response. Stat 326 (Spring 2013) Introduction to Business Statistics II 34 / 47 Inference about Prediction The larger picture: Inference about Prediction The larger picture cont’d. Stat 326 (Spring 2013) Introduction to Business Statistics II 35 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 36 / 47 Inference about Prediction Example: An appliance store runs a 5-month experiment to determine the e? ect of advertising on sales revenue. There are only 5 observations. The scatterplot of the advertising expenditures versus the sales revenues is shown below: Bivariate Fit of Sales Revenues (in Dollars) By Advertising expenditure Inference about Prediction Example cont’d: JMP can draw the con? dence intervals for the mean responses as well as for the predicted values for future observations (prediction intervals). These are called con? dence bands: Bivariate Fit of Sales Revenues (in Dollars) By Advertising expenditure 5000 5000 Sales Revenues (in Dollars) 4000 3000 2000 1000 Sales Revenues (in Dollars) 4000 3000 2000 1000 0 0 0 100 200 300 400 500 600 Advertising expenditure (in Dollars) 0 100 200 300 400 500 600 Advertising expenditure (in Dollars) Linear Fit Linear Fit Sales Revenues (in Dollars) = -100 + 7 Advertising expenditure (in Dollars) Stat 326 (Spring 2013) Introduction to Business Statistics II 37 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 38 / 47 Inference about Prediction Inference about Prediction Estimation and prediction (for the appliance store data) Estimation and prediction – Using JMP For each observation in a data set we can get from JMP: y , SEy , and also SE µ . In JMP do: 1 2 We wish to estimate the mean/average revenue of the subpopulation of stores that spent x ? = 200 on advertising. Suppose that we also wish to predict the revenue in a future month when our store spends x ? = 200 on advertising. The point estimate in both situations is the same: y = ? 100 + 7 ? 200 ? 1300 the corresponding standard errors of the mean and of the prediction however are di? erent: SE µ ? 331. 663 SEy ? 690. 411 40 / 47 Choose Fit Model From response icon, choose Save Columns and then choose Predicted Values, Std Error of Predicted, and Std Error of Individual. Stat 326 (Spring 2013) Introduction to Business Statistics II 39 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II Inference about Prediction Estimation and prediction (cont’d) Note that in the appliance store example, SEy SE µ (690. 411 versus 331. 63). This is true always: we can estimate a mean value for y for a given x ? much more precisely than we can predict the value of a single y for x = x ?. In estimating a mean  µy for x = x ? , the only uncertainty arises because we do not know the true regression line. In predicting a single y for x = x ? , we have two uncertainties: the true regression line plus the expected variability of y -values around the true line. Inference about Prediction Estimation and prediction (cont’d) It always holds that SE µ SEy Therefore a prediction interval for a single future observation y will always be wider than a con? ence interval for the mean response  µy as there is simply more uncertainty in predicting a single value. Stat 326 (Spring 2013) Introduction to Business Statistics II 41 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 42 / 47 Inference about Prediction Example cont’d: JMP also calculates con? dence intervals for the mean response  µy as well as prediction intervals for single future observations y. (For instructions follow the handout on JMP commands related to regression CIs and PIs. ) Inference about Prediction Example cont’d: To construct both a con? ence and/or prediction interval, we need to obtain SE µ and SEy in JMP for the value x ? that we are interested in: Month Ad. Expend. S ales Rev. Pred. Sales Rev. StdErr Pred Sales Revenues StdErr Indiv Sales Revenues Let’s construct one 95% CI and PI by hand and see if we can come up with the same results as JMP: In the second month the appliance store spent x = $200 on advertising and observed $1000 in sales revenue, so x = 200 and y = 1000 Using the estimated LS regression line, we predict: y = ? 100 + 7 ? 200 = 1300 Stat 326 (Spring 2013) Introduction to Business Statistics II 43 / 47 Need to ? nd t ? ?rst: Stat 326 (Spring 2013) Introduction to Business Statistics II 44 / 47 Inference about Prediction A 95% CI for the mean response  µy , when x ? = 200: Inference about Prediction A 95% PI for a single future observation of y , when x ? = 200: Stat 326 (Spring 2013) Introduction to Business Statistics II 45 / 47 Stat 326 (Spring 2013) Introduction to Business Statistics II 46 / 47 Inference about Prediction Example cont’d: Advertising exp. Sales Rev. Lower 95% Mean Upper 95% Mean Sales Rev. Sales Rev. Lower 95% Indiv Sales Rev. Upper 95% Indiv Sales Rev. Month Stat 326 (Spring 2013) Introduction to Business Statistics II 47 / 47 How to cite Simple Linear Regression, Papers

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.