LINEAR REGRESSION | CONCEPT OVERVIEW The TOPIC of LINEAR REGRESION AND GOODNESS OF FIT can be referenced on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing.
CONCEPT INTRO: LINEAR REGRESSION is a STATISTICAL METHOD used to SUMMARIZE and PREDICT the value of a DEPENDENT VALUE from an INDEPENDENT VARIABLE. A LINEAR REGRESSION MODEL interprets DATA under the ASSUMPTION that the RELATIONSHIP between the TWO VARIABLES is LINEAR and be PLOTTED using CARTESIAN COORDINATES (π₯# , π¦# ). The INDEPENDENT VARIABLE is REPRESENTED on the X-AXIS to DEFINE the REGRESSOR or PREDICTOR value βxβ. The DEPENDENT VARIABLE is REPRESENTED on the Y-AXIS to DEFINE the DEPENDENT RESPONSE VARIABLE or OUTCOME value βyβ. The FORMULA for the STANDARD FORM OF A REGRESSION LINE can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing.
Made with
by Prepineer | Prepineer.com
The LEAST SQUARES REGRESSION LINE (LSRL) represents the LINEAR RELATIONSHIP between the DEPENDENT VARIABLE "y" and the INDEPENDENT VARIABLE βxβ using the SLOPE-INTERCEPT FORM of a STRAIGHT LINE as shown by the following expression: π¦ = π + ππ₯ Where: β’ π¦ is the INDEPENDENT VARIABLE, representing the PREDICTED VALUES of y β’ π is the Y-INTERCEPT of the LEAST SQUARES REGRESSION LINE β’ π is the SLOPE of the linear regression equation, REPRESENTING the CHANGE in the DEPENDENT VARIABLE for each UNIT CHANGE in the INDEPENDENT VARIABLE β’ π₯ is the DEPENDENT OBSERVED VALUE of the INDEPENDENT VARIABLE π¦
Made with
by Prepineer | Prepineer.com
The GOAL of a LINEAR REGRESSION ANALYSIS is to IDENTIFY the REGRESSION COEFFICIENTS, which are the SLOPE (π) and Y-INTERCEPT (π), REPRESENTING the best FIT of the LEAST SQUARES REGRESSION LINE for the DATA POINTS. Just like any STRAIGHT LINE, we can use a simple 3 STEP PROCESS to CALCULATE the VALUES we need to WRITE the LEAST SQUARES REGRESSION LINE for a given REGRESSION MODEL. Step 1: Calculate the SUMMATION VALUES of the DATA COORDINATES that REPRESENT the DISPERSION of the DATA relative to the X and Y axes The FORMULAS for the DISPERSION OF OBSERVATIONS are not provided in the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. We must memorize this formula and understand its application independent of the NCEES Supplied Reference Handbook. The FIRST STEP is to calculate the VARIOUS VALUES shown below that REPRESENT the DIFFERENCE in POSITION of the INDEPENDENT VARIABLES and DEPENDENT VARIABLES for each DATA POINT. βπ₯# βπ¦# βπ₯# π¦# βπ₯#/ βπ¦#/ βπ₯# / βπ¦#
/
Made with
by Prepineer | Prepineer.com
2
1 π₯= π
1 π¦ = π
π₯# #34
2
π¦# #34
Where: β’ π is the sample size β’ (π₯# , π¦# ) is the observed values for a data point with values π₯# and π¦# for the π 67 observation Step 2: CALCULATE the SLOPE of the REGRESSION LINE (π) In this STEP, we are looking to CALCULATE the SLOPE of the REGRESSION LINE by USING the RELATIONSHIP between the SUM of the SQUARES of βxβ and the SUM of the βx-yβ PRODUCTS. The FORMULA for the SUM OF X-Y PRODUCTS can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. The SUM of the βx-yβ PRODUCTS represents the RISE (π:; ) or VERTICAL CHANGE of the REGRESSION LINE and is calculated as: 2
π:; = #34
1 π₯# π¦# β π
2
2
π₯# #34
Made with
π¦# #34
by Prepineer | Prepineer.com
The FORMULA for the SUM OF SQUARES OF X can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. The SUM of the βxβ PRODUCTS represents the RUN (π:: ) or HORIZONTAL CHANGE of the REGRESSION LINE and is calculated as:
2
π:: = #34
1 π₯#/ β π
/
2
π₯# #34
The FORMULA for the SLOPE OF THE LINEAR REGRESSION EQUATION can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. Just like ANY other SLOPE of a LINE, we can simply CALCULATE the SLOPE of the REGRESSION LINE by CALCULATING the QUOTIENT of the RISE (π:; ) divided by the RUN (π:: ) as:
π=
π:; π::
Made with
by Prepineer | Prepineer.com
Where: β’ (π₯# , π¦# ) is the OBSERVED VALUES for a DATA POINT with values π₯# and π¦# for the π 67 observation β’ π is the SAMPLE SIZE β’ π:: is the SUM of SQUARES of π₯ β’ π:; is the SUM of π₯ β π¦ PRODUCTS Step 3: CALCULATE the Y-INTERCEPT (π) of the REGRESSION LINE: The FORMULA for the Y-INTERCEPT OF A REGRESSION LINE can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. RELATING back to the STANDARD FORM of a LEAST SQUARES REGRESSION LINE in SLOPE-INTERCEPT FORM, we can RE-WRITE the expression to ISOLATE the Y-INTERCEPT VARIABLE "π": π = π¦ β ππ₯ Where: β’ π¦ is the INDEPENDENT VARIABLE, representing the PREDICTED VALUES of y β’ π is the Y-INTERCEPT of the LEAST SQUARES REGRESSION LINE β’ π is the SLOPE of the linear regression equation, REPRESENTING the CHANGE in the DEPENDENT VARIABLE for each UNIT CHANGE in the INDEPENDENT VARIABLE β’ π₯ is the DEPENDENT OBSERVED VALUE of the INDEPENDENT VARIABLE π¦
Made with
by Prepineer | Prepineer.com
As LINEAR REGRESSION MODELS, typically have NUMEROUS DATA POINTZ, we can USE the AVERAGE MEAN of the x-values (π₯) and y-values (π¦) in lieu of the x- and y- values of an INDIVIDUAL DATA POINT. PLUGGING in the AVERAGE MEAN VALUES for the x- and y- COORDINATES, we can RE-WRITE the EQUATION for the LEAST SQUARES REGRESSION LINE as: π = π¦ β ππ₯ GOODNESS OF FIT: The TOPIC of GOODNESS OF FIT can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. Once the SLOPE of the LEAST SQUARES REGRESSION LINE is calculated using the LEAST SQUARES METHOD, the GOODNESS OF FIT can be determined by calculating the SAMPLE CORRELATION COEFFICIENT βπ
β. The GOODNESS OF FIT describes how GOOD OF A FIT the REGRESSION VALUES, plotted as a LINE, match the ACTUAL OBSERVED VALUES, plotted as DATA POINTS. The FORMULA for the SAMPLE CORRELATION COEFFICIENT (R) can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing.
Made with
by Prepineer | Prepineer.com
The SAMPLE CORRELATION COEFFICIENT βRβ is calculated using the following expression:
π
=
π:; π:: π;;
A LINEAR REGRESSION ANALYSIS is considered a GOOD FIT if the CORRELATION VALUE βRβ exceeds 0.95. A CORRELATION VALUE βRβ LESS than 0.95, INDICATES the FIT of the LINEAR REGRESSION ANALYSIS is POOR. A CORRELATION VALUE of βRβ equals 1, indicates that the DATA FIT is a PERFECT STRAIGHT LINE indicating a 1 to relationship between the INDEPENDENT and DEPENDENT VARIABLES in the DATA SET.
Made with
by Prepineer | Prepineer.com
If the SLOPE of the REGRESSION LINE (π) is POSITIVE, there is a POSITIVE CORRELATION COEFFICIENT VALUE (π
> 1). A POSITIVE CORRELATION (π
> 1) indicates that the INDEPENDENT VARIABLE and DEPENDENT VARIABLE tend to MOVE in the SAME DIRECTIION, such that as ONE VALUE INCREASES, the OTHER VALUES INCREASES as well.
Made with
by Prepineer | Prepineer.com
If the SLOPE of the REGRESSION LINE (π) is NEGATIVE, there is a NEGATIVE CORRELATION COEFFICIENT VALUE (π
< 1). A NEGATIVE CORRELATION (π
< 1) indicates that the INDEPENDENT VARIABLE and DEPENDENT VARIABLE tend to MOVE in the OPPOSITE DIRECTION, such that as ONE VALUE INCREASES, the OTHER VALUE DECREASES.
Made with
by Prepineer | Prepineer.com
The FORMULA for the COEFFICIENT OF DETERMINATION (π
/ ) can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. The COEFFICIENT OF DETERMINATION (π
/ ) is used to QUANTIFY the ACCEPTABILITY of a REGRESSION MODEL, and REPRESENTS the PROPORTION of the VARIANCE in the INDEPENDENT VARIABLE that is PREDICTABLE from the INDEPENDENT VARIABLE. / π:; π
= π:: π:; /
Where: β’ π
is the SAMPLE CORRELATION COEFFICIENT β’ π:: is the SUM of SQUARES of π₯ β’ π:; is the SUM of π₯ β π¦ PRODUCTS A COEFFICIENT OF DETERMINATION (π
/ ) value of 0 INDICATES that the DEPENDENT VARIABLE cannot be PREDICTED from the INDEPENDENT VARIABLE. A COEFFICIENT OF DETERMINATION (π
/ ) value of 1 INDICATES that the DEPENDENT VARIABLE can be PREDICTED without ERROR from the INDEPENDENT VARIABLE. A COEFFICIENT OF DETERMINATION (π
/ ) value BETWEEN 0 and 1 INDICATES the EXTENT to which the DEPENDENT VARIABLE is PREDICTABLE.
Made with
by Prepineer | Prepineer.com
For EXAMPLE, a COEFFICIENT DETERMINATION VALUE of π
/ = 0.13, indicates that 13 PERCENT of the VARIANCE in the DEPENDENT VARIABLE is PREDICTABLE. RESIDUAL: The TOPIC of RESIDUAL can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. The RESIDUAL of a LINEAR REGRESSION ANALYSIS represents the DIFFERENCE between the OBSERVED VALUE of the DEPENDENT VARIABLE and the PREDICTED VALUE. π
ππ πππ’ππ = πππ πππ£ππ ππππ’π β πππππππ‘ππ ππππ’π The RESIDUAL describes the ERROR in the GOODNESS OF FIT for the REGRESSION MODEL and VERIFIES the ASSUMPTION that the ERRORS in the DATA are NORMALLY DISTRIBUTED with CONSTANT VARIANCE. The FORMULA for the RESIDUAL OF A REGRESSION MODEL can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. π# = π¦# β π¦ = π¦# β (π + ππ₯# )
Made with
by Prepineer | Prepineer.com
Where: β’ π# is the RESIDUAL β’ π¦# is an ACTUAL OBSERVATION β’ π¦ is the corresponding FITTED VALUE from the REGRESSION MODEL β’ π is the Y-INTERCEPT of the LINEAR REGRESSION equation β’ π is the SLOPE of the LINEAR REGRESSION equation STANDARD ERROR OF ESTIMATE: The MEAN SQUARE ERROR (MSE) is an ESTIMATE of the LIKELIHOOD of a VALUE being CLOSE to an OBSERVED VALUE by AVERAGING the SQUARE of the ERROS, which is the DIFFERENCE between the ESTIMATED VALUE and the OBSERVED VALUE. A SMALLER MEAN SQUARE ERROR (MSE) value is PREFERRED as it INDICATES a SMALLER LIKELIHOOD of ERROR. The FORMULA for the STANDARD ERROR OF ESTIMATE can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing.
πS/
/ π:: π;; β π:; = = πππΈ π:: (π β 2)
Made with
by Prepineer | Prepineer.com
The FORMULA for the SUM OF SQUARES OF Y can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. The SUM OF SQUARES OF Y is given by the expression: 2
π;; = #34
1 π¦#/ β π
/
2
π¦# #34
Where: β’ (π₯# , π¦# ) is the OBSERVED VALUES for a DATA POINT with values π₯# and π¦# for the π 67 observation β’ π is the SAMPLE SIZE β’ π:: is the SUM of SQUARES of π₯ β’ π;; is the SUM of SQUARES of π¦ β’ π:; is the SUM of π₯ β π¦ PRODUCTS β’ πS/ is the STANDARD ESTIMATE of ERROR β’ πππΈ is the MEAN SQUARE ERROR
Made with
by Prepineer | Prepineer.com
CONFIDENCE INTERVALS: In ADDITION to the POINT ESTIMATE of the SLOPE and INTERCEPT, it is possible to obtain CONFIDENCE INTERVAL estimates of the REGRESSION PARAMETERS. The WIDTH of these CONFIDENCE INTERVALS is a MEASURE of the OVERALL QUALITY of the REGRESSION LINE. The FORMULA for the CONFIDENCE INTERVAL OF THE Y-INTERCEPT (π) can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. The CONFIDENCE INTERVAL for the Y-INTERCEPT of the REGRESSION LINE is calculated as:
π Β± π‘X//,2Z/
1 π₯/ + πππΈ π π::
The FORMULA for the CONFIDENCE INTERVAL OF SLOPE (π) can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing.
Made with
by Prepineer | Prepineer.com
The CONFIDENCE INTERVAL for the SLOPE of the REGRESSION LINE is calculated as:
π Β± π‘X//,2Z/
πππΈ π::
Where: β’ π is the SAMPLE SIZE β’ π:: is the SUM of SQUARES of π₯ β’ πππΈ is the MEAN SQUARE ERROR β’ π is the Y-INTERCEPT of the LEAST SQUARES REGRESSION LINE β’ π is the slope of the linear regression equation, REPRESENTING the CHANGE in the DEPENDENT VARIABLE for each UNIT CHANGE in the INDEPENDENT VARIABLE β’ π‘X//,2Z/ corresponds to the PROBABILITY under the T-DISTRIBUTION with π β 2 degrees for freedom for a given π‘ value β’ πΌ is the SIGNIFICANCE LEVEL
Made with
by Prepineer | Prepineer.com
CONCEPT EXAMPLE: The amount of rainfall in Northern California is tracked and record over the span of 6 years. Given the observed rainfall values in the table below, the equation for the least squares regression line is most close to: i
xi
yi
1
1.2
1.1
2
2.3
2.1
3
3.0
3.1
4
3.8
4.0
5
4.7
4.9
6
5.9
5.9
A. π¦ = 0.134 β 1.05π₯ B. π¦ = β0.134 + 1.05π₯ C. π¦ = β1.05π₯ + 0.134 D. ππππ ππ π‘βπ ππππ£π
Made with
by Prepineer | Prepineer.com
SOLUTION: Just like any STRAIGHT LINE, we can use a simple 3 STEP PROCESS to CALCULATE the VALUES we need to WRITE the LEAST SQUARES REGRESSION LINE for a given REGRESSION MODEL. Step 1: Calculate the SUMMATION VALUES of the DATA COORDINATES that REPRESENT the DISPERSION of the DATA relative to the X and Y axes The FORMULAS for the DISPERSION OF OBSERVATIONS are not provided in the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. We must memorize this formula and understand its application independent of the NCEES Supplied Reference Handbook. The FIRST STEP is to calculate the VARIOUS VALUES shown below that REPRESENT the DIFFERENCE in POSITION of the INDEPENDENT VARIABLES and DEPENDENT VARIABLES for each DATA POINT. βπ₯# βπ¦# βπ₯# π¦# βπ₯#/ βπ¦#/ βπ₯# / βπ¦#
/
Made with
by Prepineer | Prepineer.com
1 π₯= π
2
2
1 π¦ = π
π₯# #34
π¦# #34
Where: β’ π is the sample size β’ (π₯# , π¦# ) is the observed values for a data point with values π₯# and π¦# for the π 67 observation Given the VALUES in the PROBLEM STATEMENT, letβs CALCULATE each of the VALUES representing the DISPERSION of the OBSERVATIONS: βπ₯# = 1.2 + 2.3 + 3.0 + 3.8 + 4.7 + 5.9 = 20.90 βπ₯#/ = 1.2
/
+ 2.3
/
+ 3.0
/
+ 3.8
/
+ 4.7
/
+ 5.9
/
= 87.07
+ 5.9
/
= 90.05
βπ¦# = 1.1 + 2.1 + 3.1 + 4.0 + 4.9 + 5.9 = 21.10 βπ¦#/ = 1.1
/
+ 2.1
/
+ 3.1
/
+ 4.0
/
+ 4.9
/
βπ₯# π¦# = 1.2 1.1 + 2.3 2.1 + 3.0 3.1 + 3.8 4.0 + 4.7 4.9 + 5.9 5.9 = 88.49
β π₯#
/
= 20.90
/
= 436.81
β π¦#
/
= 21.10
/
= 445.21
Made with
by Prepineer | Prepineer.com
2
1 π₯= π
π₯# =
1 6
20.90 = 3.48
π¦# =
1 6
21.10 = 3.52
#34
2
1 π¦= π
#34
Step 2: CALCULATE the SLOPE of the REGRESSION LINE (π) In this STEP, we are looking to CALCULATE the SLOPE of the REGRESSION LINE by USING the RELATIONSHIP between the SUM of the SQUARES of βxβ and the SUM of the βx-yβ PRODUCTS. The FORMULA for the SUM OF X-Y PRODUCTS can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. The SUM of the βx-yβ PRODUCTS represents the RISE (π:; ) or VERTICAL CHANGE of the REGRESSION LINE and is calculated as: 2
π:; = #34
1 π₯# π¦# β π
2
2
π₯# #34
Made with
π¦# #34
by Prepineer | Prepineer.com
PLUGGING in the CALCULATED values, we CALCULATE the SUM of the βx-yβ PRODUCTS as:
π:; = 88.49 β
1 6
20.90 21.10 = 14.99
The FORMULA for the SUM OF SQUARES OF X can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. The SUM of the βxβ PRODUCTS represents the RUN (π:: ) or HORIZONTAL CHANGE of the REGRESSION LINE and is CALCULATED as: 2
π:: = #34
1 π₯#/ β π
/
2
π₯# #34
PLUGGING in the CALCULATED values, we CALCULATE the SUM of the βx-yβ PRODUCTS as:
π:: = 87.07 β
1 6
436.81 = 14.27
The FORMULA for the SLOPE OF THE LINEAR REGRESSION EQUATION can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing.
Made with
by Prepineer | Prepineer.com
Just like ANY other SLOPE of a LINE, we can simply CALCULATE the SLOPE of the REGRESSION LINE by CALCULATING the QUOTIENT of the RISE (π:; ) divided by the RUN (π:: ) as:
π=
π:; π::
Where: β’ (π₯# , π¦# ) is the OBSERVED VALUES for a DATA POINT with values π₯# and π¦# for the π 67 observation β’ π is the SAMPLE SIZE β’ π:: is the SUM of SQUARES of π₯ β’ π:; is the SUM of π₯ β π¦ PRODUCTS PLUGGING in the CALCULATED VALUES for the SUM of SQUARES of βxβ and SUM of βx-yβ PRODUCTS, we CALCULATE the SLOPE of the REGRESSION LINE is as:
π=
π:; 14.99 = = 1.05 π:: 14.27
Made with
by Prepineer | Prepineer.com
Step 3: CALCULATE the Y-INTERCEPT (π) of the REGRESSION LINE: The FORMULA for the Y-INTERCEPT OF A REGRESSION LINE can be referenced under the SUBJECT of ENGINEERING PROBABILITY AND STATISTICS on page 40 of the NCEES Supplied Reference Handbook, Version 9.4 for Computer Based Testing. RELATING back to the STANDARD FORM of a LEAST SQUARES REGRESSION LINE in SLOPE-INTERCEPT FORM, we can RE-WRITE the expression to ISOLATE the Y-INTERCEPT VARIABLE "π": π = π¦ β ππ₯ Where: β’ π¦ is the INDEPENDENT VARIABLE, representing the PREDICTED VALUES of y β’ π is the Y-INTERCEPT of the LEAST SQUARES REGRESSION LINE β’ π is the SLOPE of the linear regression equation, REPRESENTING the CHANGE in the DEPENDENT VARIABLE for each UNIT CHANGE in the INDEPENDENT VARIABLE β’ π₯ is the DEPENDENT OBSERVED VALUE of the INDEPENDENT VARIABLE π¦ PLUGGING in the AVERAGE MEAN VALUES for the x- and y- COORDINATES, we can RE-WRITE the EQUATION for the LEAST SQUARES REGRESSION LINE as: π = π¦ β ππ₯
Made with
by Prepineer | Prepineer.com
We can now CALCULATE the Y-INTERCEPT of the REGRESSION LINE as: π = 3.52 β 1.05 3.48 = β0.134 Now that we KNOW the Y-INTERCEPT and SLOPE of the REGRESSION LINE, we can write the LEAST SQUARE REGRESSION LINE in STANDARD FORM as: π¦ = β0.134 + 1.05π₯
Therefore, the correct answer choice is B. π² = βπ. πππ + π. πππ±
Made with
by Prepineer | Prepineer.com