Answer the following questions by writing programs if necessary. Store the programs in the USB flash drive before the examination ends.
(1) We store binary data in a text file. We split binary data to 6-bit chunks and store them in the file after replacing every 6-bit number, 000000 to 111111, with a character A, B, ..., Z, a, b, ..., z, 0, 1, ..., 9, @, or #, respectively, in ascending order.
For example, we replace the 6-bit number with A when it is 000000, B when it is 000001, C when it is 000010, H when it is 000111, @ when it is 111110, and # when it is 111111. When the binary data is a bit sequence:
we store BHC# in the text file. The bit length of the binary data is a multiple of \(6\).
The text file data1.txt stores binary data in the format shown above. Obtain the bit sequence (11 bits) from the 310th bit to the 320th bit of that binary data, and write the sequence on the answer sheet. The left-most bit of the binary data is the 0th bit.
(2) We store binary data in a file after compressing them.
The program restoring the compressed file reads every byte (8 bits) from the beginning of the compressed file, and appends data to the end of the restored file as follows:
Append the read byte as it is unless the byte is 0,
When the byte is 0, read the following two bytes as two 8-bit unsigned integers from the file. Let them \(p\) and \(d\). It always holds: \(256 > p \geq d \geq 0\).
When \(d = 0\), append 1-byte binary data 0 no matter what the value of \(p\) is.
Otherwise, append a copy of the sub-sequence of bytes from the \(p\)-th byte to the \((p-d+1)\)-th byte counting from the end of the file already restored so far. The byte last appended to the restored file is the first byte (\(p=1\)).
For example, when the bytes stored in the compressed binary file are:
Write the program that restores a compressed binary file by the method shown above, and prints the size (bytes) of the file after the restoration.
Restore the compressed binary files data2.bin, data2b.bin, and data2c.bin by that program, and write their sizes (bytes) after the restoration down on the answer sheet.
After the restoration, name the files data2a.txt, data2b.tif, and data2c.txt, respectively. Store them in the USB flash drive.
(3) Write the program that compresses the given binary file and prints the size (bytes) of the file after the compression.
The compressed file is restored by the program written for (2).
The program compresses the file to be as small as possible.
Compress the binary file data3a.txt, data3b.png, and data3c.txt by that program, and write their sizes after the compression down on the answer sheet.
After the compression, name the files data3a.bin, data3b.bin, data3c.bin, respectively. Store them in the USB flash drive.
(4) We encrypt English text by a simple substitution cipher and store the encrypted text in a file.
The text consists of lower-case letters a to z, periods ., and/or white space characters.
A sentence ends with a period.
A simple substitution cipher encrypts the text by replacing each letter with another fixed letter (lower-case alphabets, a period, or a white space character), which may be the same letter.
The text file data4.txt stores a cipher text encrypted by this method. Decrypt data4.txt by referring to data4dict.txt (white-space separated), which lists all the words included in the plaintext obtained by decrypting data4.txt, and write the first sentence of the obtained plaintext down on the answer sheet.
(5) We encrypt a binary file. We split the contents of the binary file to 4-byte chunks and encrypt each chunk as follows.
The size (bytes) of the binary file is a multiple of \(4\). First, read each byte of the four bytes as an \(8\)-bit unsigned integer and let them \(b_0, b_1, b_2, b_3\), respectively, from the beginning. Then, let:
\[
m = \sum_{k=0}^3 2^{8(3-k)} b_k
\]
Let \(e = 551263368336670859257571\), \(n = 3858843578360632069557337\), and
\[
c = m^e \mod n
\]
Here, \(n\) is the product of secret prime numbers \(p\) and \(q\). \(\mod n\) expresses modulo \(n\). Note that it holds:
\[
(x \times y) \mod n = ((x \mod n) \times (y \mod n)) \mod n
\]
The encrypted file is a text file storing the decimal numbers \(c\) computed from the 4-byte chunks in the same order. The character strings expressing \(c\) are separated by a white space character.
For example, when the original binary file stores the following bytes:
# I thought that the file is in bits, but it seems like it is actually test# ABC... etc, based on the definition in the question.# So this code is not relevant to the questionclassBitsRead(object):def__init__(self,f)->None:self._file=fdefread(self,n,l):k=lprint(f"Readinng {n//8+l//8+min(l%8,1)} bytes...")rd=self._file.read(n//8+l//8+min(l%8,1))print(len(rd))# from first byteprint(f"Starting with {8-n%8} bits from byte {n//8} shift {n%8}, using mask {bin(2**(8-n%8)-1)}")b=rd[n//8-1]&(2**(8-n%8)-1)k-=8-n%8ifk<0:print(f"Need to trim result by shifting {abs(k)} bits")b=b>>abs(k)returnb# middle partrs,re=n//8+1,n//8+k//8+1foriinrange(rs,re):print(f"Adding 8 bits using byte {i}")b=b<<8b=b|rd[i-1]k-=8# from end byteifk%8!=0:print(f"Adding additional final {k%8} bits from byte {re} by shifting {8-k%8} bits")b=b<<k%8b=b|(rd[re-1]>>(8-k%8))returnbdefmain():n=310# index (from)l=11# length (bits)bts=[]withopen('2020-Summer/data1.txt','rb')asf:ifFalse:br=BitsRead(f)rd=br.read(n,l)print(bin(rd))else:r=f.read(1)rb=0whiler!=b'':ifr>=b'A'andr<=b'z':rb=(int.from_bytes(r,'big')-ord('A')).to_bytes(1,'big')elifr>=b'0'andr<=b'9':rb=(int.from_bytes(r,'big')-ord('0')).to_bytes(1,'big')elifr==b'@':rb=(2**6-2).to_bytes(1,'big')else:rb=(2**6-1).to_bytes(1,'big')bts.append(rb)r=f.read(1)for_binbts[n//8:n//8+l//6+min(l%6,1)+1]:print(f"{int.from_bytes(_b,'big'):06b}",end='')print(f"\n{' '*(n%6)}^{'-'*(l-1)}")if__name__=="__main__":main()
#include<bits/stdc++.h>usingnamespacestd;inttrans(charch){// map char to intif('A'<=ch&&ch<='Z')returnch-'A';if('a'<=ch&&ch<='z')return26+ch-'a';if(isdigit(ch))return26+26+ch-'0';if(ch=='@')return26+26+10;return26+26+10+1;}voidsolve(){stringstr;cin>>str;vector<int>vec;for(autoch:str){intnum=trans(ch);stack<int>stk;for(inti=0;i<6;i++){// change int to binary stringstk.push(num%2);num/=2;}while(!stk.empty())vec.push_back(stk.top()),stk.pop();}for(inti=310;i<=320;i++)// output the needed bytescout<<vec[i];cout<<"\n";}signedmain(){if(freopen("data1.txt","r",stdin)==NULL)assert(0);intt=1;// cin >> t;while(t--)solve();return0;}
# Note: They have a mistake in the fefinitions of the question, they say repeat in decending order.# But based on the logic of the next question and the example they give, it's asceding order.# so instead of p -> ... -> p-d+1 it should be p-d -> ... -> pbZERO=(0).to_bytes(1,'big')defmain():w_buff=[]withopen('2020-Summer/data2.bin','rb')asf:EOF=FalsewhilenotEOF:r=f.read(1)ifr==b'':EOF=Truecontinueifr!=bZERO:print(f"Writing {r}")w_buff.append(r)else:p=int.from_bytes(f.read(1),'big')d=int.from_bytes(f.read(1),'big')print(f"Zero detected, p={p}, d={d}")ifd==0:w_buff.append(bZERO)else:foriinrange(p-d,p):w_buff.append(w_buff[i])print(f"Writing {w_buff[i]}")withopen('2020-Summer/data2_s.txt','wb')asf_out:forbinw_buff:f_out.write(b)if__name__=="__main__":main()
bZERO=(0).to_bytes(1,'big')defget_max_match(buff,i):max_p=0max_d=0j=0k=id=0whilej<i+1:ifbuff[j]==buff[k]:j+=1k+=1d+=1else:ifmax_d<d:max_d=dmax_p=jj=j-d+1k=id=0returnmax_p,max_ddefcompress_buffer(buff):comp_buff=[]d=p=0i=0whilei<len(buff):print(f"Checking for i={i} (from {len(buff)})")p,d=get_max_match(buff,i)ifd<4:ifbuff[i]==bZERO:comp_buff.append(bZERO)comp_buff.append(bZERO)comp_buff.append(bZERO)else:comp_buff.append(buff[i])i+=1else:comp_buff.append(int.to_bytes(0,1,'big'))comp_buff.append(int.to_bytes(p,1,'big'))comp_buff.append(int.to_bytes(d,1,'big'))i+=dreturncomp_buffdefmain():in_buff=[]withopen('2020-Summer/data2_s.txt','rb')asf:EOF=FalsewhilenotEOF:r=f.read(1)ifr==b'':EOF=Truecontinuein_buff.append(r)comp_buff=compress_buffer(in_buff)withopen('2020-Summer/data3.bin','wb')asf_out:forbincomp_buff:f_out.write(b)if__name__=="__main__":main()
#include<bits/stdc++.h>usingnamespacestd;/*这题楼上代码主要是对于匹配长度小等于3的时候选择不压缩,但是我感觉还得多考虑含0的情况,下面给出了hack数据data3b.txt,按楼上的py代码跑出来是下面那种,但是选择压缩得到的结果是上面更短的。*//*INPUT data2.txt:29 2a 2b 2c 2d 2e 2f 2a 2b 2c 2d 2e 00 2e 2f 2a 2b 30OUTPUT data2.bin:29 2a 2b 2c 2d 2e 2f 00 06 05 00 00 00 00 08 04 30HACK!INPUT data3b.txt:29 00 2a 29 00 2aOUPUT:WRIGHT: 29 00 00 00 2a 00 03 03 (shorter)WRONG: 29 00 00 00 2a 29 00 00 00 2a*/voidsolve(){ifstreamfin("data3b.txt",ios::in|ios::binary);ofstreamfout("data3b.bin",ios::out);vector<int>vec,res;charnum;while(fin.read((char*)&num,sizeof(num)))vec.push_back(num);for(inti=0;i<vec.size();i++){intl=0,r=-1;// max range that matchsfor(intj=0;j<i;j++){// enumerate the begginning of match partintlen=0;// max matched lenwhile(j+len<i&&i+len<vec.size()&&vec[j+len]==vec[i+len])len++;// match and notice do not exceed the limitif(len>(r-l+1)){l=j,r=j+len-1;}}intmax_len=r-l+1;boolcontain0=0;for(inti=l;i<=r;i++)contain0|=vec[i]==0;if(r==-1||(max_len<=3&&!contain0)){// choose not to compressif(vec[i]==0){res.push_back(0);res.push_back(0);res.push_back(0);}elseres.push_back(vec[i]);}else{intp=i-l;intd=max_len;res.push_back(0);res.push_back(p);res.push_back(d);i+=d-1;}}for(autoc:res){charch=c;fout.write((char*)&ch,sizeof(ch));}}signedmain(){intt=1;while(t--)solve();return0;}
defget_letters_by_usage_from_file(f):lls=dict()forlinf.readlines():forwinl.lower():forcinw:ifcinlls:lls[c]+=1else:lls[c]=1returnllsdefmain():en_w=de_w=dict()withopen('2020-Summer/data4.txt','r')asf:en_w=get_letters_by_usage_from_file(f)withopen('2020-Summer/data4dict.txt','r')asf:de_w=get_letters_by_usage_from_file(f)sll_e=[]forkinen_w:sll_e.append((k,en_w[k]))sll_d=[]forkinde_w:sll_d.append((k,de_w[k]))sll_e.sort(key=lambdax:x[1],reverse=True)sll_d.sort(key=lambdax:x[1],reverse=True)foriinrange(len(sll_e)):print(f"#{sll_e[i][1]} '{sll_e[i][0]}' , '{sll_d[i][0]}'")# use the printed information to understand the encryption by handif__name__=="__main__":main()
# Can also use 'factor' linux commandimportprimefacdefget_decomposition(n):returnlist(primefac.primefac(n))defmain():n=3858843578360632069557337print(type(n))pq=get_decomposition(n)print(pq)# Using p and q we can compute d, then decrypt the fileif__name__=="__main__":main()
# 这题基于楼上代码,把得到p, q后具体如何解密的代码补充完整,该小题介绍的算法为RSA思想的加密算法,由于题中数据过大,如使用C++需要用支持int128的Pollard Rho算法来质因数分解,实在是不如python,故没有写C++版本importprimefacdefget_decomposition(n):returnlist(primefac.primefac(n))defmain():n=3858843578360632069557337e=551263368336670859257571pq=get_decomposition(n)# print(pq)# Using p and q we can compute d, then decrypt the filep=pq[0]q=pq[1]d=((p-1)*(q-1)+1)//eres=[]withopen('data5.txt','r')asf:arr=f.read().split()forcinarr:c=int(c)m=pow(c,d,n)# m = 2 ** 24 * b0 + 2 ** 16 * b1 + 2 ** 8 * b2 + b3b=[0,0,0,0]foriinrange(0,4):b[3-i]=m%256m-=b[3-i]m=m//256foriinrange(0,4):res.append(b[i])withopen('data5ans.txt','wb')asf:fornuminres:f.write(bytes([num]))if__name__=="__main__":main()