最小二乘法的算术推导

最小二乘法的算术推导

$\hat y=a+bx$

$\begin{aligned} Q&=\displaystyle \sum^n_{i=1}{(y_i-\hat y)^2}\\\
&={(y_1-\hat y_1)^2}+{(y_2-\hat y_2)^2}+ \cdots +{(y_n-\hat y_n)^2} \end{aligned}$

其中$\hat y$为回归方程,求$Q$最小时回归方程的系数a,b取值。

具体推导如下:

一、预备公式

1>

$\displaystyle \sum^n_{i=1}{(x_i-{\bar x})^2}=\displaystyle \sum^n_{i=1}{x_i^2-n{\bar x}^2}$,

其中:$\bar x=\frac {x_1+x_2+\cdots +x_n}{n}$

推导:

$\begin{aligned} 左边&={(x_1-\bar x_1)^2}+{(x_2-\bar x_2)^2}+ \cdots +{(x_n-\bar x_n)^2}\\\
&=x_1^2+x_2^2+ \cdots +x_n^2-2\bar x·(x_1+x_2+ \cdots +x_n)+n\bar x^2\\\
&=\displaystyle \sum^n_{i=1}{x_i^2}-2\bar x·n·\bar x + n·\bar x^2\\\
&=\displaystyle\sum^n_{i=1}{x_i^2-n\bar x^2}=右边 \end{aligned}$

2>

$\displaystyle\sum^n_{i=1}{(x_i-\bar x)}{(y_i-\bar y)}=\displaystyle \sum^n_{i=1}{x_i}{y_i}-n\bar x\bar y$,

其中:$\bar x=\frac {x_1+x_2+\cdots+x_n}{n},\bar y=\frac{y_1+y_2+\cdots + y_n}{n}$

推导:

$\begin{aligned} 左边&={(x_1-\bar x_1)}{(y_1-\bar y_1)}+{(x_2-\bar x_2)}{(y_2-\bar y_2)}+ \cdots +{(x_n-\bar x_n)}{(y_n-\bar y_n)}\\\
&=(x_1y_1+x_2y_2+\cdots+x_n·y_n)+n\bar x·\bar y-(y_1+y_2+\cdots+y_n)·\bar x-(x_1+x_2+\cdots+x_n)·\bar y\\\
&=\displaystyle\sum^ n_{i=1}{x_i}{y_i}+n\bar x·\bar y-n\bar y·\bar x-n\bar x\bar y\\\
&=\displaystyle\sum^n_{i=1}{x_i}{y_i}-n\bar x\bar y=右边 \end{aligned}$

二、推导过程

$\begin{aligned} Q &=\displaystyle \sum^n_{i=1}{(y_i-\hat y)^2}\\\
&=\displaystyle \sum^n_{i=1}{[y_i-(a+bx_i)]}^2\\\
&=(y_1^2+y_2^2+ \cdots +y_n^2)+na^2+b^2(x_1^2 + x_2^2 + \cdots +x_n^2)+2a·b·(x_1+x_2+\cdots+x_n)-2a·(y_1+y_2+\cdots +y_n)-2b(x_1·y_1+x_2·y_2+\cdots +x_n·y_n)\\\
&=\displaystyle \sum^n_{i=1}y_i^2+na^2+b^2·\displaystyle \sum^n_{i=1}x_i^2+2ab·n·\bar x-2a·n·\bar y-2b·\displaystyle \sum^n_{i=1}x_iy_i\\\
&=\displaystyle \sum^n_{i=1}y_i^2-2b\displaystyle \sum^n_{i=1}x_iy_i+b^2·\displaystyle \sum^n_{i=1}x_i^2+na^2-n·2a·(\bar y-b\bar x)\\\
&=\displaystyle \sum^n_{i=1}y_i^2-2b\displaystyle \sum^n_{i=1}x_iy_i+b^2·\displaystyle \sum^n_{i=1}x_i^2+n[a-(\bar y-b\bar x)]^2-n[(\bar y-b\bar x)]^2\\\
&=n[(\bar y-b\bar x)]^2-n\bar y^2+2n·b·\bar x·\bar y-n·b^2·\bar x^2+b^2·\displaystyle \sum^n_{i=1}x_i^2-2b·\displaystyle \sum^n_{i=1}x_iy_i+\displaystyle \sum^n_{i=1}y_i^2\\\
&=n[(\bar y-b\bar x)]^2+b^2(\displaystyle \sum^n_{i=1}x_i^2-n·\bar x^2)+\displaystyle \sum^n_{i=1}y_i^2-n·\bar y^2-2b·(\displaystyle \sum^n_{i=1}x_iy_i-n·\bar x·\bar y)\\\
&=n[a-(\bar y-b\bar x)]^2+b^2{\displaystyle \sum^n_{i=1}(x_i-\bar x)^2}-2b·\displaystyle \sum^n_{i=1}{(x_i-\bar x)(y_i-\bar y)}+\displaystyle \sum^n_{i=1}y_i^2-n·\bar y^2\\\
&=n[a-(\bar y-b\bar x)]^2+{\displaystyle \sum^n_{i=1}(x_i-\bar x)^2}·\left [b^2-2b·\frac {\displaystyle \sum^n_{i=1}(x_i-\bar x)(y_i-\bar y)}{\displaystyle \sum^n_{i=1}{(x_i-\bar x)^2}}\right ] +\displaystyle \sum^n_{i=1}y_i-n·\bar y^2\\\
&=n[a-(\bar y-b\bar x)]^2+{\displaystyle \sum^n_{i=1}(x_i-\bar x)^2}·\left [b-\frac {\displaystyle \sum^n_{i=1}(x_i-\bar x)(y_i-\bar y)}{\displaystyle \sum^n_{i=1}{(x_i-\bar x)^2}}\right ]^2-\frac {\left [{\displaystyle \sum^n_{i=1}}(x_i-{\bar x})(y_i-\bar y)\right ]^2}{\displaystyle \sum^n_{i=1}{(x_i-\bar x)^2}}+ \displaystyle \sum^n_{i=1}y_i-n·\bar y^2 \end{aligned}$
因为:

$\displaystyle \sum^n_{i=1}(x_i-\bar x)^2$

$\frac {[\displaystyle \sum^n_{i=1}(x_i-\bar x)(y_i-\bar y)]^2}{\displaystyle \sum^n_{i=1}(x_i-\bar x)^2}$

$\displaystyle \sum^n_{i=1}y_i-n·\bar y^2$

为常数。

所以:

$Q_{min}=
\begin{cases}
a-(\bar y-b\bar x)=0\\\
b-\frac {\displaystyle \sum^n_{i=1}(x_i-\bar x)(y_i-\bar y)}{\displaystyle \sum^n_{i=1}(x_i-\bar x)^2}=0
\end{cases}
\Longrightarrow \begin{cases} a=\bar y-b\bar x\\\
b=\frac {\displaystyle \sum^n_{i=1}(x_iy_i-n·\bar x\bar y)}{\displaystyle \sum^n_{i=1}x_i^2-n·\bar x^2}
\end{cases}
$