東京大学 情報理工学系研究科 電子情報学専攻 2016年8月実施 専門 第3問
Author
Description
You want to find users from an access log that extremely frequently access to a Web service. Unique user IDs are assigned to individual users, and the access log records IDs of the users that have accessed to the service in chronological order. Answer the following questions.
(1) You want to verify whether there exists a user that occupies the majority of the accesses in the access log, without creating a kind of a frequency histogram of all the users. You have therefore conceived the following algorithm.
- i. Prepare a list
that is initialized to an empty list . - ii. Access each element in array
from the beginning, and perform either of the following operations for the th element depending on the value of at that time. - ii-(a). (If
is empty) add to . - ii-(b). (If
is not empty) if is included in , add to . Otherwise, remove one arbitrary element from .
- ii-(a). (If
- iii. output
.
Use this algorithm to process the following sequence of user IDs from the first element, and show the values of
(2) Prove that the number of varieties of user IDs in
(3) If there exists a majority user
(4) In the algorithm in (1), when the size of array read_log() that reads a user ID one by one from the access log, while improving its space efficiency by not explicitly expressing
Here, read_log() returns the first user ID in the access log at the first call and returns the following user IDs from the access log one by one for the following calls. read_log() returns
Kai
(1)
| 0 | 11 | {11} |
| 1 | 10 | { } |
| 2 | 11 | {11} |
| 3 | 11 | {11, 11} |
| 4 | 7 | {11} |
| 5 | 11 | {11, 11} |
| 6 | 11 | {11, 11, 11} |
| 7 | 3 | {11, 11} |
| 8 | 8 | {11} |
(2)
Initially,
Assume at step
- Case 1:
: Add to , variety is still 1. Condition satisfied. - Case 2:
: Remove an . becomes smaller or empty, variety is still 1 or 0. Condition satisfied. - Case 3:
is empty : Add to , the variety is 1. Condition satisfied.
Therefore, the number of varieties of user IDs in
(3)
This is the Boyer-Moore Voting Algorithm.
Let the majority element be
The algorithm effectively pairs two distinct elements and eliminates both.
In the remove step, effectively one
In the worst case for
Since
Therefore, the list
(4)
find_majority():
candidate_id = -1
count = 0
while True:
id = read_log()
if id == -1:
break
if count == 0:
candidate_id = id
count = 1
else if id == candidate_id:
count = count + 1
else:
count = count - 1
return candidate_id