复旦大学-432统计学-2019年

一、(15分) 甲袋中有n1n-1个白球1个黑球, 乙袋中有nn个白球, 每次从两袋中各取一球进行交换, 求交换NN次后黑球还在甲袋的概率.

Solution:
pkp_{k} 为交换 kk 次后黑球还在甲袋的概率, 则根据全概率公式,

pk+1=pkn1n+(1pk)1n=(12n)pk+1n,p_{k+1}=p_{k} \cdot \frac{n-1}{n}+\left(1-p_{k}\right) \cdot \frac{1}{n}=\left(1-\frac{2}{n}\right) p_{k}+\frac{1}{n},

故有

pN12=(12n)(pN112)==(12n)N(p012),p_{N}-\frac{1}{2}=\left(1-\frac{2}{n}\right)\left(p_{N-1}-\frac{1}{2}\right)=\cdots=\left(1-\frac{2}{n}\right)^{N}\left(p_{0}-\frac{1}{2}\right),

pN=12+12(12n)Np_{N}=\frac{1}{2}+\frac{1}{2}\left(1-\frac{2}{n}\right)^{N} \text {. }

二、(15分) f(x,y)=Ae(2x+3y)I[x>0,y>0],f(x, y)=A e^{-(2 x+3 y)} I[x>0, y>0],
(1)(3分) AA;
(2)(3分) P(X<2,Y<1)P(X<2, Y<1);
(3)(3分) XX的边际密度;
(4)(3分) P(X<3Y<1)P(X<3 \mid Y<1);
(5)(3分) f(xy)f(x \mid y).

Solution:
(1) 由概率的正则性, 1=R2f(x,y)dxdy=A0+e2xdx0+e3ydy=A6A=61=\int_{R^{2}} f(x, y) d x d y=A \int_{0}^{+\infty} e^{-2 x} d x \int_{0}^{+\infty} e^{-3 y} d y=\frac{A}{6} \Rightarrow A=6.
(2) P(X<2,Y<1)=02e2xdx01e3ydy=6(1e4)(1e3)P(X<2, Y<1)= \int_{0}^{2} e^{-2 x} d x \int_{0}^{1} e^{-3 y} d y=6\left(1-e^{-4}\right)\left(1-e^{-3}\right).
(3) fX(x)=0+6e2x3ydy=2e2x,x>0f_{X}(x)=\int_{0}^{+\infty} 6 e^{-2 x-3 y} d y=2 e^{-2 x}, x>0.
(4) 由于 fX(x)=2e2x,fY(y)=3e3yf_{X}(x)=2 e^{-2 x}, f_{Y}(y)=3 e^{-3 y}, 故 X,YX, Y 相互独立, 因此

P(X<3Y<1)=P(X<3)=1e6.P(X<3 \mid Y<1)=P(X<3)=1-e^{-6} .

(5) f(xy)=f(x,y)fY(y)=2e2x,x>0,y>0f(x \mid y)=\frac{f(x, y)}{f_{Y}(y)}=2 e^{-2 x}, x>0, y>0.

三、(10分) X1,X2,X_{1}, X_{2},i.i.d N(μ,σ2),\sim N\left(\mu, \sigma^{2}\right),Emax{X1,X2}E \max \left\{X_{1}, X_{2}\right\}.

Solution:
Yi=Xiμσ,i=1,2Y_{i}=\frac{X_{i}-\mu}{\sigma}, i=1,2, 故 max{Y1,Y2}=max{X1,X2}μσ\max \left\{Y_{1}, Y_{2}\right\}=\frac{\max \left\{X_{1}, X_{2}\right\}-\mu}{\sigma}.

Emax{Y1,Y2}=EY1I[Y1Y2]+EY2I[Y1<Y2]=2EY1I[Y1Y2],E \max \left\{Y_{1}, Y_{2}\right\}=E Y_{1} I_{\left[Y_{1} \geq Y_{2}\right]}+E Y_{2} I_{\left[Y_{1}<Y_{2}\right]}=2 E Y_{1} I_{\left[Y_{1} \geq Y_{2}\right]},

E[Y1I[Y1Y2]]=12π+x+yex2+y22dxdy=12ππ45π4sinθdθ0+r2er22drE[ Y_{1} I_{\left[Y_{1} \geq Y_{2}\right]}]=\frac{1}{2 \pi} \int_{-\infty}^{+\infty} \int_{x}^{+\infty} y e^{-\frac{x^{2}+y^{2}}{2}} d x d y=\frac{1}{2 \pi} \int_{\frac{\pi}{4}}^{\frac{5 \pi}{4}} \sin \theta d \theta \int_{0}^{+\infty} r^{2} e^{-\frac{r^{2}}{2}} d r

其中 0+r2er22dr=20+r22er22d(r22)=2Γ(32)=π2\int_{0}^{+\infty} r^{2} e^{-\frac{r^{2}}{2}} d r=\sqrt{2} \int_{0}^{+\infty} \sqrt{\frac{r^{2}}{2}} e^{-\frac{r^{2}}{2}} d\left(\frac{r^{2}}{2}\right)=\sqrt{2} \Gamma\left(\frac{3}{2}\right)=\sqrt{\frac{\pi}{2}}, 故 Emax{Y1,Y2}=1π,Emax{X1,X2}=μ+σπE \max \left\{Y_{1}, Y_{2}\right\}=\frac{1}{\sqrt{\pi}}, E \max \left\{X_{1}, X_{2}\right\}=\mu+\frac{\sigma}{\sqrt{\pi}}.

四、(20分) EX=0,Var(X)=σ2,E X=0, \operatorname{Var}(X)=\sigma^{2}, 证明对任意 ε>0,\varepsilon>0,
(1)(10分) P(X>ε)σ2ε2P(|X|>\varepsilon) \leq \frac{\sigma^{2}}{\varepsilon^{2}};
(2)(10分) P(X>ε)σ2σ2+ε2P(X>\varepsilon) \leq \frac{\sigma^{2}}{\sigma^{2}+\varepsilon^{2}}.

Solution:
(1) 记 I[X>ε]={1,X>ε,0,Xε.I_{[|X|>\varepsilon]}=\left\{\begin{array}{ll}1, & |X|>\varepsilon, \\ 0, & |X| \leq \varepsilon .\end{array}\right. 它是集合 {ω:X(ω)>ε}\{\omega:|X(\omega)|>\varepsilon\} 的示性函数, 可以看出:

I[X>ε]X2ε2I_{[|X|>\varepsilon]} \leq \frac{|X|^{2}}{\varepsilon^{2}}

P(X>ε)=EI[X>ε]EX2ε2=σ2ε2P(|X|>\varepsilon)=E I_{[|X|>\varepsilon]} \leq \frac{E|X|^{2}}{\varepsilon^{2}}=\frac{\sigma^{2}}{\varepsilon^{2}}.
(2) 由马尔可夫不等式, 有

P(X>ε)=P(X+a>ε+a)E(X+a)2(ε+a)2=σ2+a2(ε+a)2,P(X>\varepsilon)=P(X+a>\varepsilon+a) \leq \frac{E(X+a)^{2}}{(\varepsilon+a)^{2}}=\frac{\sigma^{2}+a^{2}}{(\varepsilon+a)^{2}},

a=σ2εa=\frac{\sigma^{2}}{\varepsilon}, 则恰有

σ2+a2(ε+a)2=σ2+σ4ε2(ε+σ2ε)2=σ2σ2+ε2,\frac{\sigma^{2}+a^{2}}{(\varepsilon+a)^{2}}=\frac{\sigma^{2}+\frac{\sigma^{4}}{\varepsilon^{2}}}{\left(\varepsilon+\frac{\sigma^{2}}{\varepsilon}\right)^{2}}=\frac{\sigma^{2}}{\sigma^{2}+\varepsilon^{2}},

因此 P(X>ε)σ2σ2+ε2P(X>\varepsilon) \leq \frac{\sigma^{2}}{\sigma^{2}+\varepsilon^{2}}.

[注]: 为什么要取 a=σ2εa=\frac{\sigma^2}{\varepsilon}? 这里我们记 g(a)=σ2+a2(ε+a)2g(a) = \frac{\sigma^2 + a^2}{(\varepsilon+a)^2}, 这是aa 的函数, 求出它的最大值点即可, 发现恰好在 a=σ2εa=\frac{\sigma^2}{\varepsilon} 取到.

五、(10分) X1,X2X_{1}, X_{2}, i.i.d. N(0,1),\sim N(0,1),X1X2\frac{X_{1}}{X_{2}} 的分布.

Solution: 重复考察, 略去.

六、(20分) X1,X2,,Xn,i.i.dN(μ,σ2),μX_{1}, X_{2}, \ldots, X_{n}, i . i . d \sim N\left(\mu, \sigma^{2}\right), \mu 已知, 证明:
(1)(10分) 1ni=1n(Xiμ)2\frac{1}{n} \sum_{i=1}^{n}\left(X_{i}-\mu\right)^{2}σ2\sigma^{2} 的有效估计;
(2)(10分) 1nπ2i=1nXiμ\frac{1}{n} \sqrt{\frac{\pi}{2}} \sum_{i=1}^{n}\left|X_{i}-\mu\right|σ\sigma 的无偏估计, 但不有效.

Solution:
(1) 先计算 σ2\sigma^{2} 的 Fisher 信息量, 根据定义

I(σ2)=E[lnf(X;σ2)σ2]2=14σ4E[(Xμσ)21]2,I\left(\sigma^{2}\right)=E\left[\frac{\partial \ln f\left(X ; \sigma^{2}\right)}{\partial \sigma^{2}}\right]^{2}=\frac{1}{4 \sigma^{4}} E\left[\left(\frac{X-\mu}{\sigma}\right)^{2}-1\right]^{2},

恰好 E[(Xμσ)21]2E\left[\left(\frac{X-\mu}{\sigma}\right)^{2}-1\right]^{2}χ2\chi^{2} (1) 的方差, 故 I(σ2)=12σ4I\left(\sigma^{2}\right)=\frac{1}{2 \sigma^{4}}. 因此, σ2\sigma^{2} 的 C-R 下界为 1nI(σ2)=2nσ4.\frac{1}{n I\left(\sigma^{2}\right)}=\frac{2}{n} \sigma^{4} . 再计算 1ni=1n(Xiμ)2\frac{1}{n} \sum_{i=1}^{n}\left(X_{i}-\mu\right)^{2} 的期望方差, 由于 1σ2i=1n(Xiμ)2χ2(n)\frac{1}{\sigma^{2}} \sum_{i=1}^{n}\left(X_{i}-\mu\right)^{2} \sim \chi^{2}(n), 期望是 nn, 方差是 2n2 n, 因此

E[1ni=1n(Xiμ)2]=σ2,Var[1ni=1n(Xiμ)2]=2nσ4,E\left[\frac{1}{n} \sum_{i=1}^{n}\left(X_{i}-\mu\right)^{2}\right]=\sigma^{2}, \quad \operatorname{Var}\left[\frac{1}{n} \sum_{i=1}^{n}\left(X_{i}-\mu\right)^{2}\right]=\frac{2}{n} \sigma^{4},

它是无偏估计, 方差又恰好达到 C-R 下界, 故是有效估计.
(2) 首先, 令 g(x)=xg(x)=\sqrt{x}, 则 σ\sigmaCR\mathrm{C}-\mathrm{R} 下界为 [g(σ2)]2nI(σ2)=12nσ2\frac{\left[g^{\prime}\left(\sigma^{2}\right)\right]^{2}}{n I\left(\sigma^{2}\right)}=\frac{1}{2 n} \sigma^{2}, 我们再去计算 1nπ2i=1nXiμ\frac{1}{n} \sqrt{\frac{\pi}{2}} \sum_{i=1}^{n}\left|X_{i}-\mu\right| 的期望方差:

1σEX1μ=+x12πex22dx=2π0+xex22dx=2πEX1μ=2πσ,\frac{1}{\sigma} E\left|X_{1}-\mu\right|=\int_{-\infty}^{+\infty}|x| \frac{1}{\sqrt{2 \pi}} e^{-\frac{x^{2}}{2}} d x=\sqrt{\frac{2}{\pi}} \int_{0}^{+\infty} x e^{-\frac{x^{2}}{2}} d x=\sqrt{\frac{2}{\pi}} \Rightarrow E\left|X_{1}-\mu\right|=\sqrt{\frac{2}{\pi}} \sigma,

EX1μ2=σ2, 故 Var(X1μ)=σ22πσ2=(12π)σ2E\left|X_{1}-\mu\right|^{2}=\sigma^{2} \text {, 故 } \operatorname{Var}\left(\left|X_{1}-\mu\right|\right)=\sigma^{2}-\frac{2}{\pi} \sigma^{2}=\left(1-\frac{2}{\pi}\right) \sigma^{2} \text {, }

E[1nπ2i=1nXiμ]=σ,Var[1nπ2i=1nXiμ]=(π21)nσ2E\left[\frac{1}{n} \sqrt{\frac{\pi}{2}} \sum_{i=1}^{n}\left|X_{i}-\mu\right|\right]=\sigma, \operatorname{Var}\left[\frac{1}{n} \sqrt{\frac{\pi}{2}} \sum_{i=1}^{n}\left|X_{i}-\mu\right|\right]=\frac{\left(\frac{\pi}{2}-1\right)}{n} \sigma^{2}, 它是无偏估计, 但 没有达到 C-R 下界, 不是有效估计.

七、(20分) 总体的分布函数F(x)F(x)连续单增, X(1),X(2),,X(n)X_{(1)}, X_{(2)}, \ldots, X_{(n)} 是来自该总体的随机样本的次序统计量, Yi=F(X(i)),Y_{i}=F\left(X_{(i)}\right),
(1)(10分) EYi,Var(Yi)E Y_{i}, \operatorname{Var}\left(Y_{i}\right);
(2)(10分) (Y1,Y2,,Yn)T\left(Y_{1}, Y_{2}, \ldots, Y_{n}\right)^{T} 的协方差矩阵.

Solution:
(1) 由于 U1=F(X1),U2=F(X2),,Un=F(Xn)U_{1}=F\left(X_{1}\right), U_{2}=F\left(X_{2}\right), \cdots, U_{n}=F\left(X_{n}\right) 独立同服从 [0,1][0,1] 上均匀分布, 故 Yi=U(i)Y_{i}=U_{(i)} 恰好就是均匀分布的次序统计量, 根据次序统计量的定义, 有:

fi(y)=n!(i1)!(ni)!fU(y)Pi1{Uy}Pni{Uy}, 一一代入, 有 fi(y)=n!(i1)!(ni)!yi1(1y)ni,0y1, 恰好是 Beta(i,ni+1).\begin{gathered} f_{i}(y)=\frac{n !}{(i-1) !(n-i) !} f_{U}(y) P^{i-1}\{U \leq y\} P^{n-i}\{U \geq y\}, \text { 一一代入, 有 } \\ f_{i}(y)=\frac{n !}{(i-1) !(n-i) !} y^{i-1}(1-y)^{n-i}, 0 \leq y \leq 1 \text {, 恰好是 } \operatorname{Beta}(i, n-i+1) . \end{gathered}

根据贝塔分布的性质 EYi=in+1,Var(Yi)=i(ni+1)(n+1)2(n+2)E Y_{i}=\frac{i}{n+1}, \operatorname{Var}\left(Y_{i}\right)=\frac{i(n-i+1)}{(n+1)^{2}(n+2)}.
(2) 我们考虑 (Yi,Yj),i<j\left(Y_{i}, Y_{j}\right), i<j 的联合分布, 根据次序统计量的定义, 有

fi,j(x,y)=n!(i1)!(ji1)!(nj)!fU(x)fU(y)Pi1{Ux}Pji1{xUy}Pnj{Uy}f_{i, j}(x, y)=\frac{n !}{(i-1) !(j-i-1) !(n-j) !} f_{U}(x) f_{U}(y) P^{i-1}\{U \leq x\} P^{j-i-1}\{x \leq U \leq y\} P^{n-j}\{U \geq y\}

fi,j(x,y)=n!(i1)!(ji1)!(nj)!xi1(yx)ji1(1y)nj,0xy1.f_{i, j}(x, y)=\frac{n !}{(i-1) !(j-i-1) !(n-j) !} x^{i-1}(y-x)^{j-i-1}(1-y)^{n-j}, 0 \leq x \leq y \leq 1.

协方差问题归为计算积分 010yxyxi1(yx)ji1(1y)njdxdy\int_{0}^{1} \int_{0}^{y} x y \cdot x^{i-1}(y-x)^{j-i-1}(1-y)^{n-j} d x d y, 而如果我 们把 yy 视作 y=x+(yx)y=x+(y-x), 那么这个积分可以化为两部分:

010yxi+1(yx)ji1(1y)njdxdy=(i+1)!(ji1)!(nj)!(n+2)!,010yxi(yx)ji(1y)njdxdy=i!(ji)!(nj)!(n+2)!,\begin{gathered} \int_{0}^{1} \int_{0}^{y} x^{i+1}(y-x)^{j-i-1}(1-y)^{n-j} d x d y=\frac{(i+1) !(j-i-1) !(n-j) !}{(n+2) !}, \\ \int_{0}^{1} \int_{0}^{y} x^{i}(y-x)^{j-i}(1-y)^{n-j} d x d y=\frac{i !(j-i) !(n-j) !}{(n+2) !}, \end{gathered}

(首先请看 fi,j(x,y)f_{i, j}(x, y) 的表达式, 会发现

(n+2)!(i+1)!(ji1)!(nj)!xi+1(yx)ji1(1y)nj\frac{(n+2) !}{(i+1) !(j-i-1) !(n-j) !} x^{i+1}(y-x)^{j-i-1}(1-y)^{n-j}

恰也是一个密度函数, 故通过概率的正则性就很容易得到上述两个积分值) 故有 E[YiYj]=i(j+1)(n+1)(n+2)E [Y_{i} Y_{j}]=\frac{i(j+1)}{(n+1)(n+2)}, 故 Cov(Yi,Yj)=i(n+1j)(n+1)2(n+2)\operatorname{Cov}\left(Y_{i}, Y_{j}\right)=\frac{i(n+1-j)}{(n+1)^{2}(n+2)}. 因此协方差矩阵是 Σ=(aij)n×n\Sigma=\left(a_{i j}\right)_{n \times n}, 其中 aij=i(n+1j)(n+1)2(n+2),i<ja_{i j}=\frac{i(n+1-j)}{(n+1)^{2}(n+2)}, i<j.

八、(20分) 有来自总体U(θ,2θ)U(\theta,2\theta)的随机样本X1,,XnX_1,\cdots,X_n, 求θ\theta的矩估计和MLE, 并验证无偏性和相合性.

Solution:
先求矩估计, 求期望得EX1=3θ2EX_1=\frac{3\theta}{2}, 由替换原理得θ^M=23Xˉ\hat{\theta}_M=\frac{2}{3}\bar{X}. 求期望容易看出它无偏, 由强大数律知它是强相合估计.
再求MLE, 写出似然函数是

L(θ)=1θnI{X(n)<2θ}I{X(1)>θ}=I{X(n)2<θ<X(1)}θn,L\left( \theta \right) =\frac{1}{\theta ^n}I_{\left\{ X_{\left( n \right)}<2\theta \right\}}I_{\left\{ X_{\left( 1 \right)}>\theta \right\}}=\frac{I_{\left\{ \frac{X_{\left( n \right)}}{2}<\theta <X_{\left( 1 \right)} \right\}}}{\theta ^n},

可以看出1θn\frac{1}{\theta^n}关于θ\theta单调递减, 故θ\theta取最小值时为MLE, 即θ^L=X(n)2\hat{\theta}_L=\frac{X_{(n)}}{2}. 利用变换Yi=XiθθU(0,1)Y_i=\frac{X_i-\theta}{\theta}\sim U(0,1), 知Y(n)=X(n)θθBeta(n,1)Y_{(n)}=\frac{X_{(n)}-\theta}{\theta}\sim Beta(n,1), 故有

E(Y(n))=nn+1,E(θ^L)=12E(X(n))=12[θE(Y(n))+θ]=2n+12n+2θ.E\left( Y_{\left( n \right)} \right) =\frac{n}{n+1},\quad E\left( \hat{\theta}_L \right) =\frac{1}{2}E\left( X_{\left( n \right)} \right) =\frac{1}{2}\left[ \theta E\left( Y_{\left( n \right)} \right) +\theta \right] =\frac{2n+1}{2n+2}\theta .

所以θ^L\hat{\theta}_L不无偏, 但渐近无偏, 再看相合性, 有

P(θ^Lθ>ε1)=P(X(n)2θ>ε2)=P(Y(n)1>ε3)=P(Y(n)<1ε3)=(1ε3)n,\begin{aligned} P\left( \left| \hat{\theta}_L-\theta \right|>\varepsilon _1 \right) &=P\left( \left| X_{\left( n \right)}-2\theta \right|>\varepsilon _2 \right)\\ &=P\left( \left| Y_{\left( n \right)}-1 \right|>\varepsilon _3 \right)\\ &=P\left( Y_{\left( n \right)}<1-\varepsilon _3 \right)\\ &=\left( 1-\varepsilon _3 \right) ^n,\\ \end{aligned}

级数收敛, 故它是强相合估计.

九、(20分) 设X1,,XnX_1,\cdots,X_n是来自N(μ,1)N(\mu,1)的随机样本, 考虑假设检验问题

H0:μ=1vsH1:μ=2H_0:\mu = 1 \quad \mathrm{vs} \quad H_1:\mu=2

给定拒绝域W={Xˉ>1.6}W=\{\bar{X}>1.6\}, 回答下述问题:
(1)(10分) n=10n=10, 求犯两类错误的概率α\alpha, β\beta;
(2)(10分) 要求第二类错误β0.01\beta\le0.01, 求样本量的取值范围.

Solution:
(1) 先算第一类错误, 原假设成真时XˉN(1,1n)\bar{X}\sim N(1,\frac{1}{n}), 故

α=Pμ=1(Xˉ>1.6)=Pμ=1(n(Xˉ1)>0.6n)=1Φ(0.6n)=1Φ(0.610).\begin{aligned} \alpha &=P_{\mu =1}\left( \bar{X}>1.6 \right)\\ &=P_{\mu =1}\left( \sqrt{n}\left( \bar{X}-1 \right) >0.6\sqrt{n} \right)\\ &=1-\Phi \left( 0.6\sqrt{n} \right)\\ &=1-\Phi \left( 0.6\sqrt{10} \right) .\\ \end{aligned}

再算第二类错误, 备择假设成真时XˉN(2,1n)\bar{X}\sim N(2,\frac{1}{n}), 故

β=Pμ=2(Xˉ1.6)=Pμ=1(n(Xˉ2)0.4n)=Φ(0.4n)=Φ(0.410).\begin{aligned} \beta &=P_{\mu =2}\left( \bar{X}\le 1.6 \right)\\ &=P_{\mu =1}\left( \sqrt{n}\left( \bar{X}-2 \right) \le -0.4\sqrt{n} \right)\\ &=\Phi \left( -0.4\sqrt{n} \right)\\ &=\Phi \left( -0.4\sqrt{10} \right) .\\ \end{aligned}

(2) 根据第(1)问计算, 令Φ(0.4n)0.01\Phi \left( -0.4\sqrt{n} \right) \le 0.01, 得

0.4n2.33n2.3320.42n34.-0.4\sqrt{n}\le -2.33\Longrightarrow n\ge \frac{2.33^2}{0.4^2}\Longrightarrow n\ge 34.