Assume that a global alignment of two sequences, and , is calculated by a dynamic programming using the iterative formula (A).
where is the score of aligning and , is the maximum score of the alignments of and . Assume that and that score is given to a gap of length . Answer the following questions (1) – (5).
(1) Show the general form of .
(2) Show the initial values for and for , so that the maximum score of the alignments of the two sequences and is obtained as after updating the iterative formula (A) for and . Notice that .
(3) Evaluate the computational time of calculating the maximum score of the alignments of the two sequences and , and describe it by using and .
(4) When updating formula (A) for and , is defined as follows:
Among the values of , and ,
when is the maximum, ,
otherwise, when is the maximum, ,
otherwise, .
Briefly explain, within five lines, about the role of in the alignment algorithm.
The general form of , which is the score given to a gap of length , is typically represented as where is a positive penalty for each gap. This linear form assumes a constant penalty for each gap, reflecting the simple gap penalty model.
The computational time of calculating the maximum score of the alignments involves filling in an matrix . For each cell in this matrix, we compute the value using the iterative formula, which takes constant time . Since there are cells, the overall time complexity is:
The role of is to record the source of the maximum value for , indicating the optimal move that leads to the current cell. This allows us to trace back from to to reconstruct the optimal alignment path between the two sequences.
Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press. Chap. 2