Assume that matrix elements are non-negative integers and they are stored in main memory.
(1) When the algorithm below is used to multiply an \(m \times n\) matrix \(A\) and an \(n \times m\) matrix \(B\), how many times are these matrix elements in \(A\) and \(B\) read from the main memory?
Write the total number of read operations on your answer sheet.
Reading the same element twice is considered as two operations.
Do not count accesses to matrix \(C\) or other variables.
var i = 0
while i < m begin
var j = 0
while j < m begin
var d = 0
var k = 0
while k < n begin
d = d + a[i, k] * b[k, j]
k = k + 1
end
c[i, j] = d
j = j + 1
end
i = i + 1
end
(2) When an \(m \times n\) matrix is stored into a file, matrix elements are separated by a whitespace, rows are separated by a comma, and a period is written right after the last element in the last row.
For example, the following \(3 \times 4\) matrix:
Answer the numbers of rows and columns in the matrix stored in file mat1.txt in the USB flash drive.
Write the answer in your answer sheet. Ignore all letters following a period in the file.
(3) Compute the trace (the sum of the main diagonal elements) of the product of the matrices \(A\) and \(B\) stored in file mat1.txt and mat2.txt in the USB flash drive.
Write the answer in your answer sheet.
(4) Once an element of a matrix is read from the main memory, it is saved in cache memory, and when the same element is required, the element is not read from the main memory as long as it is saved in the cache memory.
The cache memory can hold at most \(s\) elements and it is managed in the LRU (Least Recently Used) scheme.
When the cache memory holds \(s\) elements and a new element not included in the cache memory is required, the least recently used element is discarded from the cache memory.
Then the new element is read from the main memory and saved in the cache memory.
Under this circumstance, how many times are the elements of \(m \times n\) matrix \(A\) and \(n \times m\) matrix \(B\) read from the main memory while multiplying them in the algorithm shown in (1)?
Write a program that computes the total number of read operations for given \(m,n\), and \(s\).
(5) Suppose that \(m\) and \(n\) share a common divider \(p\).
The algorithm for matrix multiplication is changed as below so that the number of read operations from the main memory will decrease when the cache memory mentioned in (4) is used.
Fill in each of the blanks \(1\) to \(6\) with a variable name. Write the answer in your answer sheet.
var u = 0
while u < m begin
var v = 0
while v < m begin
var w = 0
while w < n begin
var i = u
while i < [blank 1] + [blank 2] begin
var j = v
while j < [blank 3] + [blank 4] begin
var d = 0
var k = w
while k < [blank 5] + [blank 6] begin
d = d + a[i, k] * b[k, j]
k = k + 1
end
c[i, j] = c[i, j] + d
j = j + 1
end
i = i + 1
end
w = w + p
end
v = v + p
end
u = u + p
end
(6) How many times are matrix elements in \(A\) and \(B\) read from the main memory during matrix multiplication in the algorithm shown in (5) with the cache memory mentioned in (4)?
Write a program that computes the total number of read operations for given \(m,n,p\), and \(s\).
(7) When computing matrix multiplication as in (6), which common divider \(p\) of \(m\) and \(n\) minimizes the total number of read operations from the main memory on matrix elements in \(A\) and \(B\)?
Write a program that computes such \(p\) (if several, the maximum \(p\) among them) for given \(m,n\), and \(s\).
Moreover, write the result of the computation in your answer sheet for \(m=200\), \(n=150\), and \(s=600\).
# Here I assume that file can be weirdly formatted and matrix isn't defined in a single line.defmain():lines=[]withopen('2018-Summer/mat1.txt','r')asf:lines=f.readlines()rows=0cols=0cols_check=Truerows_check=Trueforlinlines:sep_cnt=l.count(',')end_cnt=l.count('.')ifsep_cnt>0:rows+=sep_cntifcols_check:ifl.find(',')>0:cols+=len(l[:l.find(',')].strip().split(' '))cols_check=Falseelse:ifcols_check:cols+=len(l.strip().split(' '))ifend_cnt>0:rows+=1breakprint(f"{rows} x {cols}")if__name__=="__main__":main()
# Here I assume matrix defined in a single line, so one line per matrixfromlocaleimportatoiimportnumpyasnpdefget_mat_from_file(filename):mats=[]lines=[]withopen(filename,'r')asf:lines=f.readlines()forlinlines:rows=l.strip()[:-1].split(',')_r=len(rows)_c=len(rows[0].split(' '))mats.append(np.full((_r,_c),0))i=0forroinrows:j=0foritminro.split(' '):mats[-1][i,j]=atoi(itm)j+=1i+=1returnmatsdefget_mul_mat_a_b(mat_a,mat_b):mat_c=np.matmul(mat_a,mat_b)returnmat_cdefget_trace_for_mat(mat):returnnp.trace(mat)defmain():# According to the instructions there is no issues with using numpymat_a=get_mat_from_file('2018-Summer/mat1.txt')[0]mat_b=get_mat_from_file('2018-Summer/mat2.txt')[0]mat_c=get_mul_mat_a_b(mat_a,mat_b)trc=get_trace_for_mat(mat_c)print(trc)if__name__=="__main__":main()
deflru_insert(elm,lru,s):iflen(lru)>=s:lru.pop()lru.insert(0,elm)deflru_is_in(elm,lru):returnelminlrudeflru_refresh(elm,lru):lru.remove(elm)lru.insert(0,elm)defmain():cache_lru=[]m=6n=4s=2rds=0# number of readingsi=0whilei<m:j=0whilej<m:k=0whilek<n:ifnotlru_is_in((i+1)*n+k,cache_lru):rds+=1lru_insert((i+1)*n+k,cache_lru,s)else:lru_refresh((i+1)*n+k,cache_lru)ifnotlru_is_in(-(k+1)*m-j,cache_lru):rds+=1lru_insert(-(k+1)*m-j,cache_lru,s)else:lru_refresh(-(k+1)*m-j,cache_lru)k+=1j+=1i+=1print(rds)if__name__=="__main__":main()
deflru_insert(elm,lru,s):iflen(lru)>=s:lru.pop()lru.insert(0,elm)deflru_is_in(elm,lru):returnelminlrudeflru_refresh(elm,lru):lru.remove(elm)lru.insert(0,elm)defmain():cache_lru=[]m=6n=4p=2s=8rds=0# number of readingsu=0whileu<m:v=0whilev<m:w=0whilew<n:i=uwhilei<u+p:j=vwhilej<v+p:k=wwhilek<w+p:ifnotlru_is_in((i+1)*n+k,cache_lru):rds+=1lru_insert((i+1)*n+k,cache_lru,s)else:lru_refresh((i+1)*n+k,cache_lru)ifnotlru_is_in(-(k+1)*m-j,cache_lru):rds+=1lru_insert(-(k+1)*m-j,cache_lru,s)else:lru_refresh(-(k+1)*m-j,cache_lru)k+=1j+=1i+=1w+=pv+=pu+=pprint(rds)if__name__=="__main__":main()
importmathdeflru_insert(elm,lru,s):iflen(lru)>=s:lru.pop()lru.insert(0,elm)deflru_is_in(elm,lru):returnelminlrudeflru_refresh(elm,lru):lru.remove(elm)lru.insert(0,elm)defget_p_options(m,n,s):return[xforxinrange(2,int(math.sqrt(s/2))+1)ifm%x==0andn%x==0]'''Technically, it is enough to return the number of reads for the largest p suggestion.Since if the entire sub-matrix cannot fit into s then there there will be much more reads.But here I am checking all posiibilities.'''defmain():cache_lru=[]m=200n=150s=600best_p=1min_rds=m*n*m*2all_p=get_p_options(m,n,s)all_p.reverse()forpinall_p:rds=0# number of readingsgo_on=Trueu=0whileu<mandgo_on:v=0whilev<mandgo_on:w=0whilew<nandgo_on:i=uwhilei<u+pandgo_on:j=vwhilej<v+pandgo_on:k=wwhilek<w+pandgo_on:ifnotlru_is_in((i+1)*n+k,cache_lru):rds+=1lru_insert((i+1)*n+k,cache_lru,s)else:lru_refresh((i+1)*n+k,cache_lru)ifnotlru_is_in(-(k+1)*m-j,cache_lru):rds+=1lru_insert(-(k+1)*m-j,cache_lru,s)else:lru_refresh(-(k+1)*m-j,cache_lru)ifrds>min_rds:go_on=Falsek+=1j+=1i+=1w+=pv+=pu+=pifrds<min_rdsandp>best_p:best_p=pmin_rds=rdsprint(f"best p={best_p} : {min_rds} readings")if__name__=="__main__":main()