## What Is the Sum of Squares?

Sum of squares is a statistical method used in regression analysis to assess data point dispersion. The purpose of regression analysis is to see how well a data series can be fitted to a function that can help explain how the data series was formed. Sum of squares is a mathematical method for determining the function that best fits (varies the least) from the data.

## What Does the Sum of Squares Mean?

The sum of squares is used to calculate deviation from the mean. The mean is the average of a group of values in statistics and is the most widely used measure of central tendency. The arithmetic mean is found simply by adding all of the values in the data set and dividing by the number of values.

Assume Microsoft’s (MSFT) closing prices in the past five days were 74.01, 74.77, 73.94, 73.61, and 73.40 in US dollars. The total price is $369.73, hence the mean or average price of the textbook is $369.73 / 5 = $73.95.

However, knowing the mean of a measurement set is not always sufficient. It is sometimes useful to know how much variance exists in a group of measurements. The distance between individual values and the mean may provide some information into how well the observations or values match the established regression model.

For example, if an analyst wants to know whether the share price of Microsoft (MSFT) moves in lockstep with the price of Apple (AAPL), they can list out the set of observations for the process of both stocks over a specific period, say 1, 2, or 10 years, and create a linear model with each of the observations or measurements recorded. If the connection between the two variables (the price of AAPL and the price of MSFT) is not a straight line, then there are differences in the data set that must be investigated.

If the line in the linear model produced does not cross through all of the measures of value, then part of the variability seen in share prices remains unexplained. The sum of squares is used to determine if two variables have a linear relationship, and any unexplained variability is referred to as the residual sum of squares.

The sum of squares equals the square of variation, where variation is defined as the difference between each individual result and the mean. The distance between each data point and the line of best fit is squared and then totaled to get the sum of squares. This value will be minimized using the line of greatest fit.

## How to Calculate the Sum of Squares

You can understand why the measurement is known as the sum of squared deviations, or simply the sum of squares. Using our MSFT example as an example, the total of squares may be computed as follows:

- SS = (74.01 – 73.95)2 + (74.77 – 73.95)2 + (73.94 – 73.95)2 + (73.61 – 73.95)2 + (73.40 – 73.95)2
- SS = (0.06) 2 + (0.82)2 + (-0.01)2 + (-0.34)2 + (-0.55)2
- SS = 1.0942

Adding the total of the deviations without squaring yields a value equal to or close to zero since the negative deviations almost fully balance the positive variances. The total of variances must be squared to get a more accurate value. Because the square of any integer, whether positive or negative, is always positive, the sum of squares will always be a positive number.

## Example of Using the Sum of Squares

According to the MSFT calculation findings, a high sum of squares implies that the majority of the values are farther away from the mean, indicating that the data is highly variable. A low sum of squares indicates a low level of variability in the collection of observations.

In the above example, 1.0942 indicates that the volatility of MSFT’s stock price in the previous five days is extremely low, and investors seeking to invest in stocks with price stability and low volatility may choose MSFT.

**IMPORTANT TAKEAWAYS: **

- The sum of squares calculates the standard deviation of data points from the mean value.
- A greater sum-of-squares result suggests that the data set has a high degree of variability, while a lower result shows that the data does not deviate much from the mean value.

## The Drawbacks of Using the Sum of Squares

Making an investment choice on which stock to buy requires many more considerations than those covered here. An analyst may need to work with years of data to determine how high or low an asset’s variability is with more confidence. The sum of squares becomes greater when additional data points are added to the collection because the values become more spread apart.

The standard deviation and variance are the two most often used measures of variation. To compute any of the two metrics, the sum of squares must first be computed. The variance is the sum of the squares averaged across time (i.e., the sum of squares divided by the number of observations). The square root of the variance yields the standard deviation.

The sum of squares is used in two different regression analysis methods: linear least squares and non-linear least squares. The least-squares method derives its name from the fact that the regression function minimizes the sum of the squares of the variance from the actual data points. In this manner, a function that statistically gives the greatest fit for the data may be created. It is important to understand that a regression function may be either linear (a straight line) or non-linear (a curving line).

If you enjoyed this post please check out our recent definitions: