(1) The iterative equations below are for calculation of the score of global alignment of two sequences , , where is the match score of character and , and . The initial values are not shown here.
(1-1) Show the formula of the penalty for a gap of length .
(1-2) Suppose that some of the initial values are
, , ,
for : ,
for : .
Show the initial values and .
(1-3) Show the iterative equations to calculate the maximum score of local alignment using the same type of gap penalty.
(1-4) Explain a method to get a local alignment with the maximum score using the calculation of (1-3).
(2) There is a sequence consisting of . Define the complementary character of as
(2-1) Explain what is reported by the following algorithm.
(2-2) Let us define the 'reverse complementary alignment score' of two subsequences and of length and as the maximum score of global alignment of and . Note that is reverse ordered.
Also define the substitution matrix of the alignment as
and the gap penalty is the number of gaps (a gap of length has penalty ).
Show an algorithm to report a pair of (possibly empty) subsequences of with the maximum reverse complementary alignment score.
To compute the local alignment, the iterative equations are modified to allow for the possibility of starting a new alignment, indicated by a score of 0:
1-4: Obtaining a Local Alignment with Maximum Score
To obtain a local alignment with the maximum score:
Initialize all cells in the first row and column to 0.
Fill the dynamic programming matrix using the equations from (1-3).
Find the cell with the maximum score in the entire matrix.
Perform a traceback from until reaching a cell with score 0 or the matrix boundary.
The path of this traceback gives the optimal local alignment.
The algorithm scans a sequence for reverse complementary matches. It uses a matrix to record the length of matching substrings that are reverse complements. If the length of the match exceeds a threshold , the algorithm reports the corresponding subsequences.
Specifically:
stores the length of the reverse complementary match ending at and .
The algorithm compares with the complement of for from down to .
If a match is found, it extends the previous match () by 1.
If the length of the match () is at least , it reports the corresponding ranges.
The reported ranges and represent the start and end positions of reverse complementary subsequences of length at least .
2-2: Algorithm for Maximum Reverse Complementary Alignment Score
The algorithm to find the maximum reverse complementary alignment score is as follows:
Initialize a dynamic programming matrix dp where dp[i][j] represents the maximum reverse complementary alignment score ending at positions and .
Fill the matrix using a modified Smith-Waterman algorithm, considering reverse complementary matches.
Keep track of the maximum score and its position.
Perform a traceback from the position of the maximum score to reconstruct the aligned subsequences.
Return the pair of subsequences with the maximum reverse complementary alignment score.
The expected time complexity of this algorithm is , where is the length of the sequence . The expected space complexity is also due to the dynamic programming matrix.
Durbin, R., Eddy, S. R., Krogh, A., & Mitchison, G. (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press. Chapter 2-3.
Gusfield, D. (1997). Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge university press. Chapter 11-12.