Please refresh the page if equations are not rendered correctly.
---------------------------------------------------------------
1. 协方差
定义: 若实数随机变量 X 与 Y 期望值分别为 E(X)=\mu 与 E(Y)=\nu ,则两者间的协方差定义为:
\operatorname{cov}(X, Y)=\mathrm{E}[(X-\mu)(Y-\nu)]
2.协方差矩阵
设有一组随机向量(多元随机变量或随机向量, multivariate random variable or random vector),可以表示为\mathbf{X} = \left[ x_1, x_2, x_3, ..., x_n \right]^\top,n=1,2,3, ..., n,代表这一组随机向量的个数。每个随机向量包含m个元素,则可以定义该组随机向量的协方差矩阵为:
\operatorname{Covariance \ Matrix \ \mathbf{C}}=\frac{1}{m-1}\left[\begin{array}{cccc}
\operatorname{cov}\left(x_1, x_1\right)&\operatorname{cov}\left(x_1, x_2\right)&\ldots&\operatorname{cov}\left(x_1, x_n\right) \\
\operatorname{cov}\left(x_2, x_1\right)&\operatorname{cov}\left(x_2, x_2\right)&\ldots&\operatorname{cov}\left(x_2, x_n\right) \\
\vdots&\vdots&\ddots&\vdots \\
\operatorname{cov}\left(x_n, x_1\right)&\operatorname{cov}\left(x_n, x_2\right)&\ldots&\operatorname{cov}\left(x_n, x_n\right)
\end{array}\right]
协方差矩阵的第 (i, j) 项定义为 如下形式 :
c_{i j}=\operatorname{cov}\left(x_i, x_j\right)=\mathrm{E}\left[\left(x_i-\mu_i\right)\left(x_j-\mu_j\right)\right]
其中, \mu_i 是 x_i 的期望值,即, \mu_i=\mathrm{E}\left(x_i\right) 。而协方差矩阵为:
\mathbf{C} =\mathrm{E}\left[(\mathbf{X}-\mathrm{E}[\mathbf{X}])(\mathbf{X}-\mathrm{E}[\mathbf{X}])^{\mathrm{T}}\right]
Nomenclatures differ. Some statisticians, following the probabilist William Feller in his two-volume book A n Introduction to Probability Theory and Its Applications, { }^{[2]} call the matrix \mathrm{K}_{\mathbf{X X}} the variance of the random vector \mathbf{X}, because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector \mathbf{X}.
\operatorname{var}(\mathbf{X})=\operatorname{cov}(\mathbf{X}, \mathbf{X})=\mathrm{E}\left[(\mathbf{X}-\mathrm{E}[\mathbf{X}])(\mathbf{X}-\mathrm{E}[\mathbf{X}])^{\mathrm{T}}\right] .
Both forms are quite standard, and there is no ambiguity between them. The matrix \mathrm{K}_{\mathbf{X X}} is also often called the variance-covariance matrix, since the diagonal terms are in fact variances.
By comparison, the notation for the cross-covariance matrix between two vectors is\operatorname{cov}(\mathbf{X}, \mathbf{Y})=\mathrm{K}_{\mathbf{X Y}}=\mathrm{E}\left[(\mathbf{X}-\mathrm{E}[\mathbf{X}])(\mathbf{Y}-\mathrm{E}[\mathbf{Y}])^{\mathrm{T}}\right]
举例:设有随机向量x_1和x_2, 分别为:
x_1 = [-2.1, -1, 4.3] \\
x_2 = [3.0, 1.1, 0.12]
可以组成X:
X = np.stack((x1, x2), axis=0)
既:
\left[\begin{array}{ccc}
-2.1&-1&4.3 \\
3.0&1.1&0.12
\end{array}\right]
使用Numpy
中的协方差矩阵函数numpy.cov()
可以计算其协方差矩阵:
x1 = [-2.1, -1, 4.3]
x2 = [3, 1.1, 0.12]
X = np.stack((x1, x2), axis=0)
>>> np.cov(X)
array([[11.71 , -4.286 ], # may vary
[-4.286 , 2.144133]])
>>> np.cov(x1, x2)
array([[11.71 , -4.286 ], # may vary
[-4.286 , 2.144133]])
>>> np.cov(x1, bias=False)
array(11.71)
>>> np.cov(x1,bias=True)
array(7.80666667)
>>> np.cov(x,ddof=0)
array(7.80666667)
numpy.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None, *, dtype=None)[source]
注意参数的默认值:
- 当bias
参数取默认值时,计算各随机变量的均值时采用(m-1),其中m为number of observations given in each radom vector (unbiased estimate)。反之,如果设置为True
, 则采用m求均值。
- Ifddof
notNone
the default value implied by bias is overridden. Note thatddof=1
will return the unbiased estimate, even if both fweights and aweights are specified, andddof=0
will return the simple average (用随机向量的实际元素个数m求均值). See the notes for the details. The default value isNone
.
3. Pearson相关性系数
已知协方差矩阵的情况下,Pearson相关性系数可以根据以下公式计算得到:
R_{i j}=\frac{c_{i j}}{\sqrt{c_{i i} c_{j j}}}
The values of R are between -1 and 1 , inclusive.
在Numpy
中,可以直接使用numpy.corrcoef
函数求得。
参考资料:
1. 协方差 - 维基百科,自由的百科全书
2. 协方差矩阵 - 维基百科,自由的百科全书
3. 2023-09-04 numpy.cov — NumPy v1.25 Manual
4. numpy.corrcoef — NumPy v1.25 Manual
Comments NOTHING