東京大学 新領域創成科学研究科 メディカル情報生命専攻 2018年8月実施 問題11
Author
Description
Suppose that a sequence \(\mathbf{x} = x_1 x_2 \cdots x_n\) is generated from a first-order stationary Markov model that has initial probabilities \(\{\pi_k\}\) and transition probabilities \(\{a_{ij}\}\) as follows.
Initial probabilities:
Transition probabilities:
Note that \(P(X)\) is the probability that \(X\) is true, and \(P(X \mid Y)\) is the conditional probability that \(X\) is true when \(Y\) is true.
Answer the following questions:
(1) Assume \(n = 4\). Show the probability that \(101\) is included as a continuous substring in \(\mathbf{x}\). Show also the probability that \(111\) is included as a continuous substring in \(\mathbf{x}\).
(2) Assume \(n = 4\). Show the expected number of 1s in \(\mathbf{x}\) when \(101\) is included as a continuous substring in \(\mathbf{x}\).
In the following questions, use the following transition probabilities.
(3) Calculate the expected proportion of 1s in \(\mathbf{x}\) when \(n \to \infty\).
(4) Suppose that \(n = 2m\) (\(m\) is a positive integer). When every two letters of \(\mathbf{x} = x_1 x_2 \cdots x_{2m}\) is converted to a letter of \(\mathbf{y} = y_1 \cdots y_m\) using the following rule, calculate the expected proportions of \(a, c, g, t\) in \(\mathbf{y}\) when \(m \to \infty\).
假设序列 \(\mathbf{x} = x_1 x_2 \cdots x_n\) 是从一个一阶平稳马尔可夫模型生成的,该模型具有如下初始概率 \(\{\pi_k\}\) 和转移概率 \(\{a_{ij}\}\)。
初始概率:
转移概率:
注意 \(P(X)\) 是 \(X\) 为真的概率,\(P(X \mid Y)\) 是 \(X\) 在 \(Y\) 为真时的条件概率。
回答以下问题:
(1) 假设 \(n = 4\)。证明在 \(\mathbf{x}\) 中包含 \(101\) 作为连续子串的概率。也证明在 \(\mathbf{x}\) 中包含 \(111\) 作为连续子串的概率。
(2) 假设 \(n = 4\)。证明在 \(\mathbf{x}\) 中包含 \(101\) 作为连续子串时 \(\mathbf{x}\) 中 1 的期望数量。
在以下问题中,使用以下转移概率。
(3) 计算当 \(n \to \infty\) 时 \(\mathbf{x}\) 中 1 的期望比例。
(4) 假设 \(n = 2m\) (\(m\) 是一个正整数)。当 \(\mathbf{x} = x_1 x_2 \cdots x_{2m}\) 的每两个字母转换为 \(\mathbf{y} = y_1 \cdots y_m\) 的一个字母时,使用以下规则,计算 \(\mathbf{y}\) 中 \(a, c, g, t\) 的期望比例,当 \(m \to \infty\) 时。
Kai
(1)
To determine the probability of specific substrings occurring within a Markov sequence, we need to calculate the joint probabilities of these substrings appearing in the sequence.
Probability of \(101\) being included
We consider all possible sequences of length 4 that include \(101\) as a continuous substring:
- \(0101\)
- \(1010\)
- \(1011\)
- \(1101\)
Calculate the probabilities of these sequences:
- For \(0101\):
- For \(1010\):
- For \(1011\):
- For \(1101\):
The probability of \(101\) being included is the sum of these probabilities:
Probability of \(111\) being included
We consider the sequences of length 4 that include \(111\) as a continuous substring:
- \(0111\)
- \(1110\)
- \(1111\)
Calculate the probabilities of these sequences:
- For \(0111\):
- For \(1110\):
- For \(1111\):
The probability of \(111\) being included is the sum of these probabilities:
(2)
Given that we need to find the expected number of 1s in the sequence \(\mathbf{x} = x_1 x_2 x_3 x_4\) when the substring \(101\) is included, we need to consider the sequences that include \(101\).
The sequences of length 4 that include \(101\) as a continuous substring are:
- \(0101\)
- \(1010\)
- \(1011\)
- \(1101\)
Let \(P(0101)\), \(P(1010)\), \(P(1011)\), and \(P(1101)\) be the probabilities of these sequences, respectively.
We will calculate the expected number of 1s by averaging the number of 1s in these sequences, weighted by their probabilities.
- For \(0101\):
- For \(1010\):
- For \(1011\):
- For \(1101\):
The expected number of 1s, \(E[\#1s]\), is given by:
Where \(P(101)\) is the total probability of having the substring \(101\):
Thus, the expected number of 1s is:
Using the given probabilities:
Substitute these probabilities into the expression for the expected number of 1s:
(3)
To find the steady-state probabilities \(\pi_0\) and \(\pi_1\), we solve the following system:
Substitute the given values:
Solving the system:
The expected proportion of 1s is:
(4)
Given the rules for \(y_i\):
- \(a\): \(x_{2i-1} = 0\) and \(x_{2i} = 0\)
- \(c\): \(x_{2i-1} = 0\) and \(x_{2i} = 1\)
- \(g\): \(x_{2i-1} = 1\) and \(x_{2i} = 0\)
- \(t\): \(x_{2i-1} = 1\) and \(x_{2i} = 1\)
Calculate the steady-state probabilities for pairs:
The expected proportions are:
- \(a\): \(P(00) = 0.48\)
- \(c\): \(P(01) = 0.12\)
- \(g\): \(P(10) = 0.12\)
- \(t\): \(P(11) = 0.28\)
Knowledge
概率计算 马尔可夫链 转移概率 随机过程
重点词汇
- Markov Model 马尔可夫模型
- Transition Probability 转移概率
- Steady-State 稳态
参考资料
- "Markov Chains: From Theory to Implementation and Experimentation" by Paul A. Gagniuc
- "Introduction to Probability Models" by Sheldon M. Ross