清华大学-432统计学-2016年

一、(30分) 设X1,X2,,XnX_{1}, X_{2}, \cdots, X_{n}是来自均匀分布总体U(0,θ)U(0, \theta)的简单随机样本, 其中θ>0\theta>0是未知参数.
定义

Un=max{X1,X2,,Xn},Vn=min{X1,X2,,Xn}.U_{n}=\max \left\{X_{1}, X_{2}, \cdots, X_{n}\right\}, V_{n}=\min \left\{X_{1}, X_{2}, \cdots, X_{n}\right\}.

(1)(10分) 求θ\theta 的最大似然估计 ;
(2)(10分) 判断UnU_{n}VnV_{n}是否分别是θ\theta的无偏估计, 并说明理由;
(3)(10分) 试构造θ\theta的相合估计, 并说明理由.

Solution:
(1) 总体的密度函数是 f(xθ)=1θI[0,θ]f(x \mid \theta)=\frac{1}{\theta} I_{[0, \theta]}, 其中 I[0,θ]={0,x[0,θ]1,x[0,θ]I_{[0, \theta]}=\left\{\begin{array}{ll}0, & x \notin[0, \theta] \\ 1, & x \in[0, \theta]\end{array}\right. 是示 性函数. 所以似然函数 L(Xθ)=1θnI{Unθ}I{Vn0}L(\mathbf{X} \mid \theta)=\frac{1}{\theta^{n}} I_{\left\{U_{n} \leqslant \theta\right\}} I_{\left\{V_{n} \geqslant 0\right\}}, 可以看出它是关于 θ\theta 的单调减 函数, 同时从示性函数中看出 θUn\theta \geqslant U_{n}, 故当 θ=Un\theta=U_{n} 时, 似然函数取到最大值. 因 此 θ^MLE=Un\hat{\theta}_{M L E}=U_{n}.

(2) XiU(0,θ)X_{i} \sim U(0, \theta), 易知 Yi=XiθU(0,1)Y_{i}=\frac{X_{i}}{\theta} \sim U(0,1). 则

Un=θmax{Y1,Y2,,Yn},Vn=θmin{Y1,Y2,,Yn}.U_{n}=\theta \cdot \max \left\{Y_{1}, Y_{2}, \cdots, Y_{n}\right\}, V_{n}=\theta \cdot \min \left\{Y_{1}, Y_{2}, \cdots, Y_{n}\right\} .

而我们知道 max{Y1,Y2,,Yn}Beta(n,1),min{Y1,Y2,,Yn}Beta(1,n)\max \left\{Y_{1}, Y_{2}, \cdots, Y_{n}\right\} \sim \operatorname{Beta}(n, 1), \min \left\{Y_{1}, Y_{2}, \cdots, Y_{n}\right\} \sim \operatorname{Beta}(1, n). 根据 Beta 分 布的性质, 有 EUn=nn+1θ,EVn=1n+1θE U_{n}=\frac{n}{n+1} \theta, E V_{n}=\frac{1}{n+1} \theta. 也就是说它们都不是 θ\theta 的无偏估计 (但 UnU_{n} 是渐近无偏的, 因为 EUnθE U_{n} \rightarrow \theta ).

(3) 根据 Beta 分布的性质, Var(Un)=θ2Var(Beta(n,1))=n(n+1)2(n+2)θ2\operatorname{Var}\left(U_{n}\right)=\theta^{2} \operatorname{Var}(\operatorname{Beta}(n, 1))=\frac{n}{(n+1)^{2}(n+2)} \theta^{2}, 而

P{Unθ>ε}=P{Unnn+1θ+nn+1θθ>ε}P{Unnn+1θ>ε2}+P{nn+1θθ>ε2}\begin{aligned} P\left\{\left|U_{n}-\theta\right|>\varepsilon\right\} &=P\left\{\left|U_{n}-\frac{n}{n+1} \theta+\frac{n}{n+1} \theta-\theta\right|>\varepsilon\right\} \\ & \leqslant P\left\{\left|U_{n}-\frac{n}{n+1} \theta\right|>\frac{\varepsilon}{2}\right\}+P\left\{\left|\frac{n}{n+1} \theta-\theta\right|>\frac{\varepsilon}{2}\right\} \end{aligned}

根据切比雪夫不等式, P{Unnn+1θ>ε2}<4Var(Un)ε2=4θ2ε2n(n+1)2(n+2)P\left\{\left|U_{n}-\frac{n}{n+1} \theta\right|>\frac{\varepsilon}{2}\right\}<\frac{4 \operatorname{Var}\left(U_{n}\right)}{\varepsilon^{2}}=\frac{4 \theta^{2}}{\varepsilon^{2}} \frac{n}{(n+1)^{2}(n+2)}.
同时当 n2θε1n \geqslant \frac{2 \theta}{\varepsilon}-1 时, P{nn+1θθ>ε2}=0P\left\{\left|\frac{n}{n+1} \theta-\theta\right|>\frac{\varepsilon}{2}\right\}=0.
目此有

i=1P{Unθ>ε}=A+[θε]P{Unθ>ε}( 其中 A=i=1[2θε1]P{Unθ>ε})A+[2θε]4θ2ε2n(n+1)2(n+2)<+\begin{aligned} \sum_{i=1}^{\infty} P\left\{\left|U_{n}-\theta\right|>\varepsilon\right\} &=A+\sum_{\left[\frac{\theta}{\varepsilon}\right]}^{\infty} P\left\{\left|U_{n}-\theta\right|>\varepsilon\right\}\left(\text { 其中 } A=\sum_{i=1}^{\left[\frac{2 \theta}{\varepsilon}-1\right]} P\left\{\left|U_{n}-\theta\right|>\varepsilon\right\}\right) \\ & \leqslant A+\sum_{\left[\frac{2 \theta}{\varepsilon}\right]}^{\infty} \frac{4 \theta^{2}}{\varepsilon^{2}} \frac{n}{(n+1)^{2}(n+2)}<+\infty \end{aligned}

由 Borel-Cantelli 引理, UnU_{n}θ\theta 的强相合估计.

二、(20分) 设λn=1n(n=1,2,)\lambda_{n}=\frac{1}{n}(n=1,2, \cdots), 随机变量XnP(λn)X_{n} \sim \mathcal P \left(\lambda_{n}\right).

(1)(10分) 证明:Xn: X_{n}依概率收敛于0;
(2)(10分) nXnn X_{n}是否依概率收敛于0, 并说明理由.

Solution:

(1) P{Xn0ε}P{Xn0}=1P{Xn=0}P\left\{\left|X_{n}-0\right| \geqslant \varepsilon\right\} \leqslant P\left\{X_{n} \neq 0\right\}=1-P\left\{X_{n}=0\right\}. 而

P{Xn=0}=λn00!eλn=e1n1P\left\{X_{n}=0\right\}=\frac{\lambda_{n}^{0}}{0 !} e^{-\lambda_{n}}=e^{-\frac{1}{n}} \rightarrow 1

因此 P{Xn0ε}1P{Xn=0}0P\left\{\left|X_{n}-0\right| \geqslant \varepsilon\right\} \leqslant 1-P\left\{X_{n}=0\right\} \rightarrow 0. 故 XnX_{n} 依概率收敛于 0 .
(2) P{nXn0ε}=P{Xnεn}P{Xn0}=1P{Xn=0}P\left\{\left|n X_{n}-0\right| \geqslant \varepsilon\right\}=P\left\{X_{n} \geqslant \frac{\varepsilon}{n}\right\} \leqslant P\left\{X_{n} \neq 0\right\}=1-P\left\{X_{n}=0\right\}.
同理可知 nXnn X_{n} 依概率收敛于 0 .

三、(30分) 设样本Y1,Y2,,YnY_{1}, Y_{2}, \cdots, Y_{n}独立, YiN(kxi,σ2),i=1,2,n,Y_{i} \sim N\left(k x_{i}, \sigma^{2}\right), i=1,2, \cdots n, 其中x1,x2,,xnx_{1}, x_{2}, \cdots, x_{n}是已知非零常数. kkσ2\sigma^{2}是未知参数.

(1)(15分) 求kkσ2\sigma^{2}的最大似然估计;
(2)(15分) 判断上面得到的估计是否为无偏估计.

Solution:
(1) 似然函数 L(k,σ2)=(2πσ2)n2exp{12σ2i=1n(yikxi)2}L\left(k, \sigma^{2}\right)=\left(2 \pi \sigma^{2}\right)^{-\frac{n}{2}} \exp \left\{-\frac{1}{2 \sigma^{2}} \sum_{i=1}^{n}\left(y_{i}-k x_{i}\right)^{2}\right\}
对数似然函数 lnL(k,σ2)=n2ln(2πσ2)12σ2i=1n(yikxi)2\ln L\left(k, \sigma^{2}\right)=-\frac{n}{2} \ln \left(2 \pi \sigma^{2}\right)-\frac{1}{2 \sigma^{2}} \sum_{i=1}^{n}\left(y_{i}-k x_{i}\right)^{2}

 令 {lnLk=i=1n(yikxi)xiσ2=0lnLσ2=n2σ2+i=1n(yikxi)22σ4=0, 可解得 {k^=i=1nxiyii=1nxi2σ^2=1ni=1n(yik^xi)2\text { 令 }\left\{\begin{array} { l } { \frac { \partial \operatorname { l n } L } { \partial k } = \frac { \sum _ { i = 1 } ^ { n } ( y _ { i } - k x _ { i } ) x _ { i } } { \sigma ^ { 2 } } = 0 } \\ { \frac { \partial \operatorname { l n } L } { \partial \sigma ^ { 2 } } = - \frac { n } { 2 \sigma ^ { 2 } } + \frac { \sum _ { i = 1 } ^ { n } ( y _ { i } - k x _ { i } ) ^ { 2 } } { 2 \sigma ^ { 4 } } = 0 } \end{array} \quad , \text { 可解得 } \left\{\begin{array}{l} \hat{k}=\frac{\sum_{i=1}^{n} x_{i} y_{i}}{\sum_{i=1}^{n} x_{i}^{2}} \\ \hat{\sigma}^{2}=\frac{1}{n} \sum_{i=1}^{n}\left(y_{i}-\hat{k} x_{i}\right)^{2} \end{array}\right.\right.

容易验证驻点即为最大值点, 即为 MLE.

(2) Ek^=E(i=1nxiyii=1nxi2)=i=1nxiEyii=1nxi2=i=1nkxi2i=1nxi2=kE \hat{k}=E\left(\frac{\sum_{i=1}^{n} x_{i} y_{i}}{\sum_{i=1}^{n} x_{i}^{2}}\right)=\frac{\sum_{i=1}^{n} x_{i} E y_{i}}{\sum_{i=1}^{n} x_{i}^{2}}=\frac{\sum_{i=1}^{n} k x_{i}^{2}}{\sum_{i=1}^{n} x_{i}^{2}}=k. 顺便探讨其方差, 有

Var(k^)=Var(i=1nxiyii=1nxi2)=i=1nxi2Var(yi)(i=1nxi2)2=σ2i=1nxi2.\operatorname{Var}(\hat{k})=\operatorname{Var}\left(\frac{\sum_{i=1}^{n} x_{i} y_{i}}{\sum_{i=1}^{n} x_{i}^{2}}\right)=\frac{\sum_{i=1}^{n} x_{i}^{2} \operatorname{Var}\left(y_{i}\right)}{\left(\sum_{i=1}^{n} x_{i}^{2}\right)^{2}}=\frac{\sigma^{2}}{\sum_{i=1}^{n} x_{i}^{2}} .

再研究 σ^2\hat{\sigma}^{2}, 有

E(nσ^2)=i=1nE(yik^xi)2=i=1nE(yikxi+kxik^xi)2=i=1nE[(yikxi)22(yikxi)(k^xikxi)+(k^xikxi)2]=i=1nE(yikxi)22i=1nCov(yi,k^xi)+i=1nE(k^xikxi)2=nσ22i=1nxixiσ2j=1nxj2+i=1nxi2Var(k^)=(n1)σ2.\begin{aligned} E\left(n \hat{\sigma}^{2}\right) &=\sum_{i=1}^{n} E\left(y_{i}-\hat{k} x_{i}\right)^{2}=\sum_{i=1}^{n} E\left(y_{i}-k x_{i}+k x_{i}-\hat{k} x_{i}\right)^{2} \\ &=\sum_{i=1}^{n} E\left[\left(y_{i}-k x_{i}\right)^{2}-2\left(y_{i}-k x_{i}\right)\left(\hat{k} x_{i}-k x_{i}\right)+\left(\hat{k} x_{i}-k x_{i}\right)^{2}\right] \\ &=\sum_{i=1}^{n} E\left(y_{i}-k x_{i}\right)^{2}-2 \sum_{i=1}^{n} \operatorname{Cov}\left(y_{i}, \hat{k} x_{i}\right)+\sum_{i=1}^{n} E\left(\hat{k} x_{i}-k x_{i}\right)^{2} \\ &=n \sigma^{2}-2 \sum_{i=1}^{n} x_{i} \frac{x_{i} \sigma^{2}}{\sum_{j=1}^{n} x_{j}^{2}}+\sum_{i=1}^{n} x_{i}^{2} \operatorname{Var}(\hat{k})=(n-1) \sigma^{2} . \end{aligned}

k^\hat{k}kk 的无偏估计, σ^2\hat{\sigma}^{2} 不是 σ2\sigma^{2} 的无偏估计.

四、(20分) 设离散随机变量XnX_{n}的分布律为:

P(Xn=0)=11n,P(Xn=n)=1n,n=1,2,3,P\left(X_{n}=0\right)=1-\frac{1}{n}, P\left(X_{n}=n\right)=\frac{1}{n}, n=1,2,3, \cdots

判断:

(1)(10分) XnX_{n}的分布函数是否收敛;
(2)(10分) XnX_{n}是否矩收敛.

[注]: pp 阶矩收敛(LpL^p收敛)的含义是: 假设有 XnX_n 的一个极限随机变量 XX, 如果

EXnXp0,n,E|X_n-X|^p \rightarrow 0,\quad n\to \infty,

那么我们说 XnLpXX_n \rightarrow_{L^p} X, 即 LpL^p 收敛. 一般只讨论 pp 为正整数.

Solution:
(1) xnx_{n} 的分布函数是 Fn(x)={0,x<011n,0x<n.1,xnF_{n}(x)= \begin{cases}0, & x<0 \\ 1-\frac{1}{n}, & 0 \leqslant x<n . \\ 1, & x \geqslant n\end{cases} F(x)=limnFn(x)={0,x<01,x0F(x)=\lim _{n \rightarrow \infty} F_{n}(x)=\left\{\begin{array}{ll}0, & x<0 \\ 1, & x \geqslant 0\end{array}\right., 故 Fn(x)nF(x).F_{n}(x) \stackrel{n \rightarrow \infty}{\longrightarrow} F(x) .

(2) 根据第一题, 我们发现了 XnX_n 存在一个极限 X=0X=0, 因此对任意 kN+k\in N^+, 有

EXn0k=0k(11n)+nk1n=nk1{1,k=1,k>1E\left|X_{n} - 0\right|^{k}=0^{k} \cdot\left(1-\frac{1}{n}\right)+n^{k} \cdot \frac{1}{n}=n^{k-1} \rightarrow\left\{\begin{array}{ll}1, & k=1 \\ \infty, & k>1\end{array}\right.

因此 XnX_{n} 矩收敛不成立.

五、(30分) 设总体XX 密度函数为f(x)=axa1I[0x1],f(x)=a x^{a-1} I[0 \leq x \leq 1], 其中aa是未知参数, 参数空间为A=A= {aa=1,a=2}\{a \mid a=1, a=2\}, 现有统计假设

H0:a=1 vs H1:a=2H_{0}: a=1 \quad \text { vs } \quad H_{1}: a=2

从总体 XX 抽取简单随机样本x1x_{1}x2,x_{2}, 并构造检验的拒绝域为

W={(x1,x2)x1x2>12},W=\left\{\left(x_{1}, x_{2}\right) \mid x_{1} x_{2}>\frac{1}{2}\right\},

试计算:

(1)(15分) 犯第一类错误的概率;
(2)(15分) 检验法的功效.

Solution:
(1) 犯第一类错误的概率

α=P{x1x2>12a=1}=x1x2>12f(x1a=1)f(x2a=1)dx1dx2=x1x2>12I{0x1,x21}dx1dx2=121dx212x21dx1=121(112x2)dx2=1212ln2\begin{aligned} \alpha &=P\left\{x_{1} x_{2}>\frac{1}{2} \mid a=1\right\}=\iint_{x_{1} x_{2}>\frac{1}{2}} f\left(x_{1} \mid a=1\right) f\left(x_{2} \mid a=1\right) d x_{1} d x_{2} \\ &=\iint_{x_{1} x_{2}>\frac{1}{2}} I_{\left\{0 \leqslant x_{1}, x_{2} \leqslant 1\right\}} d x_{1} d x_{2}=\int_{\frac{1}{2}}^{1} d x_{2} \int_{\frac{1}{2 x_{2}}}^{1} d x_{1}=\int_{\frac{1}{2}}^{1}\left(1-\frac{1}{2 x_{2}}\right) d x_{2}=\frac{1}{2}-\frac{1}{2} \ln 2 \end{aligned}

(2) 功效函数 g(k)=P{(x1,x2)Wa=k}g(k)=P\left\{\left(x_{1}, x_{2}\right) \in W \mid a=k\right\}.
g(1)=α=1212ln2g(1)=\alpha=\frac{1}{2}-\frac{1}{2} \ln 2.

g(2)=x1x2>124x1x2I{0x1,x21}dx1dx2=4121x2dx212x21x1dx1=2121x2(114x22)dx2=3412ln2g(2)=\iint_{x_{1} x_{2}>\frac{1}{2}} 4 x_{1} x_{2} I_{\left\{0 \leqslant x_{1}, x_{2} \leqslant 1\right\}} d x_{1} d x_{2}=4 \int_{\frac{1}{2}}^{1} x_{2} d x_{2} \int_{\frac{1}{2 x_{2}}}^{1} x_{1} d x_{1}=2 \int_{\frac{1}{2}}^{1} x_{2}\left(1-\frac{1}{4 x_{2}^{2}}\right) d x_{2}=\frac{3}{4}-\frac{1}{2} \ln 2

六、(20分) 现有来自于某一连续总体的样本观测值如下:

0.9040.2050.1250.2470.7121.6341.1072.6680.2241.4810.9140.0350.4570.2611.090.4591.6490.4260.6010.4160.908\begin{array}{ccccc} 0.904 & -0.205 & 0.125 & -0.247 & 0.712 \\ 1.634 & -1.107 & -2.668 & -0.224 & 1.481 \\ -0.914 & -0.035 & -0.457 & 0.261 & 1.09 \\ -0.459 & -1.649 & 0.426 & -0.601 & 0.416 \\ 0.908 & & & & \end{array}

(1)(10分) 分别计算如下统计量的值:样本均值,样本标准差,样本极差,样本偏度和峰度;
(2)(10分) 画出这组样本观测值的箱线图和直方图.

Solution:
(1)

最大值 最小值 极差 均值 标准差 偏度 峰度
1.634 -2.668 4.302 -0.029 1.036 -0.659 0.827

(2)

[注] 由于已不再让带计算器, 此类题目不可能再考. 但最好掌握偏度、峰度的定义与计算公式.