Let us consider a dataset that consists of data where each datum is represented in the form which is a bit string of length . Each datum is assigned a unique data ID (identifier) which is a distinct integer. Let's build a system that searches for data close in distance to an arbitrary input datum (query datum). During a search, the system needs to enumerate the data IDs of all data that satisfy the condition. The distance between two data is defined by the Hamming distance between bit strings. The Hamming distance between two bit strings and is defined as follows.
Answer the following questions.
(1) The table below shows an example of the dataset in the case of and .
Data ID
1
0
1
1
1
2
1
0
0
1
3
0
0
1
0
Assume that a query datum is given. Find the Hamming distance between the query datum and each datum.
Next, we consider a search algorithm using a lookup table. Assume that is an even number and that the bit strings of data are uniformly distributed. In the following questions, it is not necessary to consider the time complexity of building a lookup table.
(2) We want to search for data whose bit strings are identical to a given query datum. Here, let us consider a lookup table that takes a bit string as an input and outputs a list containing the data IDs of all data that have the bit string. Answer the average time complexity and the space complexity of a search using this lookup table.
(3) We consider an algorithm as follows. We divide a bit string into two bit strings of length . We search for candidates by using the lookup table in the same manner as Question (2) for each divided bit string and then return the data IDs of all data matching the query datum. Answer the average time complexity and the space complexity of a search by this algorithm.
(4) By using the same data structure as Question (3), we consider an algorithm to search for data with a Hamming distance of 1 or less from a given query datum. Answer the average time complexity of a search by this algorithm.
Next, we consider the case where . In this case, it is sometimes effective to perform a linear search by actually calculating the Hamming distance between the query datum and each datum. Therefore, let us consider designing a specialized digital circuit to compute the Hamming distance.
(5) Let us consider a 2-input 1-output digital circuit whose inputs are two 1-bit bit strings , and output is which is the Hamming distance between and . Draw a table representing the relation of , and . Also, draw by using necessary components among AND, OR, and NOT gates.
(6) Draw a 4-input 2-output digital circuit that outputs a binary representation of the Hamming distance between two 2-bit bit strings by using necessary components among AND, OR, NOT gates, and .
(7) Draw an 8-input 3-output digital circuit that outputs a binary representation of the Hamming distance between two 4-bit bit strings by using necessary components among , half adder , and OR gate.
First we find a list from for finding a match for the first bits with expected length by executing (3), and verify by computing Hamming distance for every datum with time. We find the sequences with Hamming distance and the first- bits same as the query.
Then we find a list from and execute the same. We thus get all sequences with Hamming distance .
Finding 2 lists takes time and verifying takes . So the average time complexity is .
A half adder is a module taking input and output as the sum mod 2 and the carry.
A
B
S
C
0
0
0
0
0
1
1
0
1
0
1
0
1
1
0
1
and can be 00,01,10.
is the sum-mod-2 from a half adder adding .
is the S from a half adder wrapping the carry of HA(A2,B2) and the S of HA(A1,B1)
is the final carry of A1+B1+HA(A2,B2)[C]. When it is 1, the configuration is like 11,01 or 10,10. So the C of HA(A1,B1) must be considered in 10,10 case; and the carry when 11+01 i.e. HA(HA(A2,B2)[C],HA(A1,B1)[S]) is also considered (the sum of which is ).