東京大学新領域創成科学研究科メディカル情報生命専攻 2015年8月実施問題12

Author

Description

Assume that a global alignment of two sequences, and , is calculated by a dynamic programming using the iterative formula (A).

where is the score of aligning and , is the maximum score of the alignments of and . Assume that and that score is given to a gap of length . Answer the following questions (1) – (5).

(1) Show the general form of .

(2) Show the initial values for and for , so that the maximum score of the alignments of the two sequences and is obtained as after updating the iterative formula (A) for and . Notice that .

(3) Evaluate the computational time of calculating the maximum score of the alignments of the two sequences and , and describe it by using and .

(4) When updating formula (A) for and , is defined as follows:

Among the values of , and , when is the maximum, , otherwise, when is the maximum, , otherwise, . Briefly explain, within five lines, about the role of in the alignment algorithm.

假设通过动态规划计算两个序列和的全局比对，使用迭代公式 (A)。

其中是比对和的得分，是和的比对的最大得分。假设并且对于长度为的空隙给定得分。回答以下问题 (1) – (5)。

(1) 展示的一般形式。

(2) 展示初始值对于和对于，使得在更新迭代公式 (A) 之后，对于和，两个序列和的比对最大得分为。注意。

(3) 评估计算和两个序列的比对最大得分的计算时间，并用和描述。

(4) 在更新公式 (A) 时，对于和，定义如下：

在 , 和的值中，当是最大值时，, 否则，当是最大值时，, 否则，。简要解释在比对算法中的作用，限制在五行以内。

Kai

(1)

The general form of , which is the score given to a gap of length , is typically represented as where is a positive penalty for each gap. This linear form assumes a constant penalty for each gap, reflecting the simple gap penalty model.

(2)

The initial values are determined based on the penalty for gaps. Specifically:

This initialization reflects the cumulative penalty for introducing gaps in either sequence up to length or .

(3)

The computational time of calculating the maximum score of the alignments involves filling in an matrix . For each cell in this matrix, we compute the value using the iterative formula, which takes constant time . Since there are cells, the overall time complexity is:

(4)

The role of is to record the source of the maximum value for , indicating the optimal move that leads to the current cell. This allows us to trace back from to to reconstruct the optimal alignment path between the two sequences.

Knowledge

动态规划序列比对全局比对路径回溯复杂度分析

解题技巧和信息

在序列比对中，理解递推公式的三个选择对应的不同操作（匹配/错配、插入间隙）是非常重要的。
初始化边界条件可以帮助你理解和构建完整的动态规划表。
通过路径回溯可以重构出最优的对齐方式，而不仅仅是求得最大得分。

重点词汇

alignment 对齐
sequence 序列
dynamic programming 动态规划
gap penalty 间隙惩罚
traceback 回溯

参考资料

Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press. Chap. 2

Author​

Description​

Kai​

(1)​

(2)​

(3)​

(4)​

Knowledge​

解题技巧和信息​

重点词汇​

参考资料​