東京大学 情報理工学系研究科 コンピュータ科学専攻 2019年8月実施 専門科目I 問題3
Author
Description
In this problem, the length of a string
We consider the following problem FIND: For given strings
In case there is no such
For strings
for (i = 0; i <= l(s) - l(p); i++)
if (eq(s + i, p) == 1)
return i;
return -1;
Answer the following questions.
(1) Express the order of the worst-case time complexity of algorithm
In the following, the hash value
where
(2) Assume that
(3) Give an algorithm
(4) Give an algorithm
Kai
(1)
Algorithm eq(s + i, p)
is called, which has a time complexity of
Thus, the total time complexity of Algorithm
(2)
To compute
Explanation:
- We remove the contribution of
from . - We multiply the result by
to shift all values left. - We add the contribution of the new character
. - We take the modulo
to keep the hash value in the correct range.
This computation can be done in
(3)
Algorithm
- Precompute
. - Precompute
and check if it matches . If it matches, return 0. - For
to : - Compute
from using the formula derived in Q2. - If
matches , return .
- Compute
int H_0(string s, string p) {
int lp = ell(p);
int ls = ell(s);
int hp = h(p, lp);
int hs = h(s, lp);
if (hp == hs) return 0;
for (int i = 1; i <= ls - lp; i++) {
hs = (d * (hs - numval(s[i - 1]) * d_m) + numval(s[i + lp - 1])) % q;
if (hp == hs) return i;
}
return -1;
}
The time complexity of
Condition when
(4)
Algorithm
- Precompute
. - For
to : - Compute
. - If
, then check eq(s + i, p)
. Ifeq(s + i, p) == 1
, return.
- Compute
int H(string s, string p) {
int lp = ell(p);
int ls = ell(s);
int hp = h(p, lp);
int hs = h(s, lp);
if (hp == hs && eq(s, p) == 1) return 0;
for (int i = 1; i <= ls - lp; i++) {
hs = (d * (hs - numval(s[i - 1]) * d_m) + numval(s[i + lp - 1])) % q;
if (hp == hs && eq(s + i, p) == 1) return i;
}
return -1;
}
Time Complexity:
- Best Case:
if the first occurrence matches. - Average Case: If the number of hash matches (that require further checking with
eq
) is, then the average case time complexity is . - Worst Case: The worst-case complexity can be
if there are many hash collisions, causing frequent calls to eq
.
Condition for
Worst-case Complexity: The worst-case time complexity of algorithm eq
.
Knowledge
字符串匹配 哈希算法 Rabin-Karp算法 复杂度分析
难点思路
此题的难点在于如何高效地计算字符串的哈希值,并利用哈希值进行匹配。在应对哈希碰撞时,我们需要进行字符串的实际比较,保证算法的正确性。
解题技巧和信息
- 哈希算法: 使用适当的哈希函数和模数
来降低哈希碰撞的概率。 - 滚动哈希: 是一种高效的技术,可以在 O(1) 时间内更新哈希值。
- 字符串比较: 当哈希值匹配时,需要使用实际的字符串比较来确认结果。
- 复杂度分析: 分析算法的平均情况和最坏情况的复杂度,以便选择合适的解决方案。
重点词汇
- Hash Function (哈希函数): A function that maps data of arbitrary size to fixed-size values.
- Collision (碰撞): When two different inputs produce the same hash output.
- Rabin-Karp Algorithm (Rabin-Karp 算法): A string matching algorithm that uses hash values for efficient searching.
- string matching 字符串匹配
- rolling hash 滚动哈希
- time complexity 时间复杂度
- worst-case scenario 最坏情况
- hash collision 哈希冲突
参考资料
- T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to Algorithms, 3rd Edition, Chapter 32: "String Matching".