Given an integer , we define a language over an alphabet by:
Here, is the set of integers and . That is, is the language that consists of words whose -th symbol from the last is .
Answer the following questions.
(1) Give a non-deterministic finite automaton that accepts .
(2) Describe using a regular expression. You may write the -time concatenation of a regular expression as .
(3) Is a regular language? If so, give a finite automaton that accepts . If not, prove that is not regular. You may use the pumping lemma for regular languages.
(4) Prove that any deterministic finite automaton that accepts has at least states.
To construct an NFA that accepts , we need to ensure that the third symbol from the end is 'a'. Here is the NFA:
States:
Alphabet:
Transitions:
From (start state):
On reading any symbol or , move to (this loop represents reading any number of symbols at the start).
On reading any symbol, move to (non-deterministically guess that we might be three symbols away from the end).
From :
On reading any symbol or , move to .
From :
On reading any symbol or , move to (final state).
Final State:
This NFA accepts a string if it non-deterministically guesses that it is three symbols away from the end, and then checks if the third-to-last symbol is 'a'.
Here, represents any string of length , followed by the symbol 'a', and then followed by any string of arbitrary length. This ensures that the -th symbol from the end is 'a'.
To prove that is not a regular language, we will use the pumping lemma. The pumping lemma states that if a language is regular, then any sufficiently long string in the language can be "pumped" — that is, a portion of the string can be repeated multiple times, and the resulting strings will still belong to the language.
Assume that is a regular language. Then by the pumping lemma, there exists a pumping length such that any string with length at least can be decomposed as , where:
,
, and
for all .
Given that , the substring is confined to the first characters, which consist entirely of 'b's followed by a single 'a' and another some 'b's. Thus, the substring consists of only 'b's (say for some ).
Claim: Any DFA that accepts must have at least states.
Proof:
Consider the DFA accepting . This DFA must remember the last symbols it has seen in order to determine whether the -th symbol from the end is 'a'.
There are possible sequences of symbols over the alphabet , and the DFA must distinguish between each of these sequences because each sequence can determine whether the current string belongs to . Thus, the DFA must have a unique state for each possible sequence of symbols.
Therefore, the DFA must have at least states to correctly accept all strings in .