東京大学 新領域創成科学研究科 メディカル情報生命専攻 2014年8月実施 問題12
Author
Description
Suppose that the maximum score of the global alignment of the two sequences,
Here we assume that
We also assume that the same score of
Answer the following problems.
(1) Show the general form of
(2) In order to get the maximum score of the alignment as
for for
You may suppose that
(3) For the alignment of biological sequences, usually we set
(4) In order to treat the score of long gaps more accurately, let us consider alignments using the following iterations. Show the general form of the gap score of length
假设通过使用以下迭代 (A) 计算两个序列
这里我们假设
我们还假设对于任意长度为
回答以下问题。
(1) 展示
(2) 为了在使用 (A) 迭代后获得最大比对得分
对于 对于
你可以假设
(3) 对于生物序列的比对,通常我们设置
(4) 为了更准确地处理长间隙的得分,让我们考虑使用以下迭代进行比对。展示当
Kai
(1)
To find the general form of
For a gap of length
Explanation:
- The first gap opening costs
- Each subsequent gap extension costs
- For a gap of length
, we have 1 opening and extensions
Therefore, the general form of
(2)
Given the initial conditions:
We need to determine
For
Therefore,
For
Therefore,
(3)
In biological sequence alignment, we usually set
-
Biological relevance: In evolutionary processes, it's more likely for a single mutation event to cause a contiguous gap (insertion or deletion) of multiple residues than for multiple independent single-residue indel events to occur adjacently. Setting
reflects this biological reality. -
Alignment quality: This setting encourages fewer, longer gaps rather than many short gaps in the alignment. This often leads to more biologically meaningful alignments, as it better represents the evolutionary events that may have occurred.
-
Computational efficiency: By discouraging the introduction of many small gaps, this setting can reduce the search space for optimal alignments, potentially improving the efficiency of alignment algorithms.
(4)
For the more complex model with two sets of gap penalties
where
To find
Solving for
Note that
This more complex model allows for a more nuanced treatment of gap penalties, potentially leading to more accurate alignments, especially for sequences with varying gap length distributions.
Knowledge
动态规划 序列比对 全局比对 生物信息学
难点思路
- 理解动态规划在序列比对中的应用
- 分析递推公式以推导空位罚分函数
- 理解不同空位罚分模型的生物学意义
- 处理更复杂的空位罚分模型,并推导其一般形式
解题技巧和信息
- 分析递推公式时,关注 gap opening 和 gap extension 的区别
- 初始值的设置对于动态规划算法的正确性至关重要
- 在生物序列比对中,参数的设置常常基于生物学知识和经验
- 处理分段函数时,寻找分界点(break-even point)是关键
重点词汇
- Global alignment: 全局比对
- Dynamic programming: 动态规划
- Gap penalty: 空位罚分
- Gap opening penalty: 空位开放罚分
- Gap extension penalty: 空位延伸罚分
- Sequence alignment: 序列比对
- Affine gap penalty: 仿射空位罚分
参考资料
- Gusfield, D. (1997). Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Chap. 3.