Author |
Topic: RegEx: ab, ba (Read 7119 times) |
|
william wu
wu::riddles Administrator
    

Gender: 
Posts: 1291
|
 |
RegEx: ab, ba
« on: Feb 24th, 2003, 2:39pm » |
Quote Modify
|
Write a regular expression (or equivalently, construct a deterministic finite automaton) that accepts strings over the alphabet { a, b } in which the number of occurrences of 'ab' = number of occurrences of 'ba'. Note 1: What's a regular expression? Basically, it matches character strings that fit a certain pattern. You are restricted to the operators OR, CONCATENATE, and Kleene Star. And you can use all the parentheses you want. http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=regul ar+expressions Note 2: This problem often trips people up because regular expressions are not supposed to have unbounded counting ability. There's a one-to-one correspondence between a regular expression and finite state machines, and the ability to count arbitrarily high requires an unbounded amount of state space. It follows that you cannot make a regular expression that matches all 0-1 strings of the form 0n1n, where n is an integer, and the notation 0n refers to the string of n consecutive zeroes. However, in this case there's something slightly cunning you can observe which will allow you to realize a Regular Expression.
|
« Last Edit: Feb 24th, 2003, 2:41pm by william wu » |
IP Logged |
[ wu ] : http://wuriddles.com / http://forums.wuriddles.com
|
|
|
Icarus
wu::riddles Moderator Uberpuzzler
    
 Boldly going where even angels fear to tread.
Gender: 
Posts: 4863
|
 |
Re: RegEx: ab, ba
« Reply #1 on: Feb 24th, 2003, 7:26pm » |
Quote Modify
|
I am not particularly knowledgable about CS topics so I usually leave this forum alone, but this one touches on something I use all the time! A couple of questions though: Is the "Kleene star" * the star in most shells (matches any pattern), or the star of most UNIX tools, PERL, etc (matches the preceding pattern any number of times)? When you say that only the Kleene star, OR and CONCATENATE operators are allowed, are you also disallowing anchors? Or is the RE always required to match the entire pattern, not just some part of it? If I can force the RE to match the entire pattern, then I know the solution.
|
|
IP Logged |
"Pi goes on and on and on ... And e is just as cursed. I wonder: Which is larger When their digits are reversed? " - Anonymous
|
|
|
Jeremiah Smith
Full Member
  
 Beep!
Posts: 172
|
 |
Re: RegEx: ab, ba
« Reply #2 on: Feb 24th, 2003, 8:46pm » |
Quote Modify
|
on Feb 24th, 2003, 7:26pm, Icarus wrote:Is the "Kleene star" * the star in most shells (matches any pattern), or the star of most UNIX tools, PERL, etc (matches the preceding pattern any number of times)? |
| The Perl one (matches the preceding pattern any number of times).
|
|
IP Logged |
|
|
|
william wu
wu::riddles Administrator
    

Gender: 
Posts: 1291
|
 |
Re: RegEx: ab, ba
« Reply #3 on: Feb 24th, 2003, 10:21pm » |
Quote Modify
|
I should probably add that by definition, a string is a finite sequence of characters. on Feb 24th, 2003, 7:26pm, Icarus wrote: Is the "Kleene star" * the star in most shells (matches any pattern), or the star of most UNIX tools, PERL, etc (matches the preceding pattern any number of times)? When you say that only the Kleene star, OR and CONCATENATE operators are allowed, are you also disallowing anchors? Or is the RE always required to match the entire pattern, not just some part of it? |
| > Kleene star = Matches the preceding pattern any number of times. > No anchors are needed ... I don't see how they apply here. I presume by anchors you mean the operators ^ $ \b \B, for beginning and ends of strings, and word boundaries. Since the problem statement says the alphabet consists only of the letters a and b, there are no space characters to create word boundaries. So basically the universe of strings you have to worry about are strings of a's and b's. If the number of occurrences of ab in the whole string is equal to the number of occurrences of ba, your RegEx should accept that string.
|
« Last Edit: Feb 24th, 2003, 10:21pm by william wu » |
IP Logged |
[ wu ] : http://wuriddles.com / http://forums.wuriddles.com
|
|
|
william wu
wu::riddles Administrator
    

Gender: 
Posts: 1291
|
 |
Re: RegEx: ab, ba
« Reply #4 on: Feb 24th, 2003, 10:25pm » |
Quote Modify
|
Follow up Question: Prove that it is impossible to write a regular expression (or equivalently, construct a deterministic finite automaton) that accepts strings over the alphabet { a, b, c } in which the number of occurrences of 'ab' = number of occurrences of 'ba'. Note: This might be too hard to do if you haven't heard of something in CS theory called the pumping lemma, but it'd be interesting to see if an unacquainted mind ends up rediscovering it while thinking about this problem.
|
|
IP Logged |
[ wu ] : http://wuriddles.com / http://forums.wuriddles.com
|
|
|
James Fingas
Uberpuzzler
    


Gender: 
Posts: 949
|
 |
Re: RegEx: ab, ba
« Reply #5 on: Feb 25th, 2003, 9:01am » |
Quote Modify
|
Original Problem: The only reason you can solve the problem when the alphabet is {a,b} is that the occurrances of ab and the occurrances of ba must alternate. The regexp is simple: a*a or b*b However, in the case where the language is {a,b,c}, then your automaton must be able to count indefinitely high (and no finite automaton can do this). Proof: Consider the string abcabcabc ... cbacbacba, where there are m repetitions of "abc" and n repetitions of "cba". When the automaton has received all the "abc"s, it must be able to wait for m repetitions of "cba", accept the string if it ends there, or reject the string if it doesn't. As each of these m "cba"s passes, it must enter a different state (this should be obvious). However, there is no limit to how high m can be, so there is no limit to the number of states that such a successful automaton would need (ie. a successful automaton would not be finite).
|
|
IP Logged |
Doc, I'm addicted to advice! What should I do?
|
|
|
Icarus
wu::riddles Moderator Uberpuzzler
    
 Boldly going where even angels fear to tread.
Gender: 
Posts: 4863
|
 |
Re: RegEx: ab, ba
« Reply #6 on: Feb 25th, 2003, 3:50pm » |
Quote Modify
|
Given the definition of the Kleene Star, the solution would have to be (a(a|b)*a)|(b(a|b)*b) ( | = OR ). As for the need for anchors: most implementations I know of, this will still match "abab" since there is nothing to require it to match the whole string. It will match the "aba" portion and ignore the rest. I don't see how this can be solved without some means of forcing a full string match. Either by (1) having your REs always required to match the whole string, or (2) allowing anchors: ^(a(a|b)*a)|(b(a|b)*b)$
|
|
IP Logged |
"Pi goes on and on and on ... And e is just as cursed. I wonder: Which is larger When their digits are reversed? " - Anonymous
|
|
|
TimMann
Senior Riddler
   

Gender: 
Posts: 330
|
 |
Re: RegEx: ab, ba
« Reply #7 on: Feb 26th, 2003, 12:18am » |
Quote Modify
|
In the theory of regular expressions, the language defined by a regular expression is the set of strings that match the regular expression. That is, the whole string matches the regular expression, not a substring. William was assuming people knew that for this problem. Grep's regular expressions are an extension of the kind usually used in the theory. One way to look at the "anchoring" issue is that grep gives you an implicit ".*" at the beginning and end of the pattern unless you start it with ^ and/or end it with $. Here "." itself is a grep extension, meaning of course the disjunction of all characters in the alphabet; that is, (a|b|...).
|
|
IP Logged |
http://tim-mann.org/
|
|
|
towr
wu::riddles Moderator Uberpuzzler
    
 Some people are average, some are just mean.
Gender: 
Posts: 13730
|
 |
Re: RegEx: ab, ba
« Reply #8 on: Feb 26th, 2003, 12:36am » |
Quote Modify
|
on Feb 25th, 2003, 3:50pm, Icarus wrote:Given the definition of the Kleene Star, the solution would have to be (a(a|b)*a)|(b(a|b)*b) ( | = OR ). |
| This doesn't match "a", or "b" or "" for that matter.. so how about: (a (b+ a+)* | b (a+ b+)* |) here's an interesting link if you like regular expressions: http://odur.let.rug.nl/~vannoord/fsademo/ watch the syntax though, ab => [a,b] and a|b => {a,b} and the empty string is {} It also functions as tranducer, in case you want to translate..
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
william wu
wu::riddles Administrator
    

Gender: 
Posts: 1291
|
 |
Re: RegEx: ab, ba
« Reply #9 on: Feb 26th, 2003, 12:40am » |
Quote Modify
|
Regarding Icarus's solution: Almost ... forgetting some special cases. For instance, the string "a" has zero occurrences of both 'ab' and 'ba', and thus should be in our language. Likewise, the empty string should be accepted. The empty string is usually referred to in RegEx theory as epsilon, and is another symbol available to us which I forgot to mention. The intuition is right though -- the key observation is every string in the language must start with the same letter it ends with.
|
« Last Edit: Feb 26th, 2003, 12:46am by william wu » |
IP Logged |
[ wu ] : http://wuriddles.com / http://forums.wuriddles.com
|
|
|
Icarus
wu::riddles Moderator Uberpuzzler
    
 Boldly going where even angels fear to tread.
Gender: 
Posts: 4863
|
 |
Re: RegEx: ab, ba
« Reply #11 on: Feb 26th, 2003, 8:03pm » |
Quote Modify
|
on Feb 26th, 2003, 12:18am, TimMann wrote:In the theory of regular expressions, the language defined by a regular expression is the set of strings that match the regular expression. That is, the whole string matches the regular expression, not a substring. William was assuming people knew that for this problem. |
| As I said, I don't know much CS theory. As far as actual RE implementations go, grep is hardly alone. About the only things I've seen that do require matching the whole string are shells.
|
|
IP Logged |
"Pi goes on and on and on ... And e is just as cursed. I wonder: Which is larger When their digits are reversed? " - Anonymous
|
|
|
rmsgrey
Uberpuzzler
    


Gender: 
Posts: 2874
|
 |
Re: RegEx: ab, ba
« Reply #12 on: May 15th, 2003, 9:36pm » |
Quote Modify
|
on Feb 26th, 2003, 12:36am, towr wrote: (a (b+ a+)* | b (a+ b+)* |) |
| doesn't work - rejects aaba quick fix: (a+(b*a+)*|b+(a*b+)*|e) using (e) to denote empty string and noting (a+) is shorthand for (aa*) (the original problem ruled out using anything but the three basic operations)
|
|
IP Logged |
|
|
|
S. Owen
Full Member
  

Gender: 
Posts: 221
|
 |
Re: RegEx: ab, ba
« Reply #13 on: May 26th, 2003, 7:41am » |
Quote Modify
|
I think that answer is right... I worked out the state machine for this and came up with this alternate regular expression, which may be a little simpler: ( e | a ( a | b+ a )* | b ( b | a+ b )* )
|
|
IP Logged |
|
|
|
rmsgrey
Uberpuzzler
    


Gender: 
Posts: 2874
|
 |
Re: RegEx: ab, ba
« Reply #14 on: Aug 14th, 2003, 3:32pm » |
Quote Modify
|
on May 26th, 2003, 7:41am, S. Owen wrote:I think that answer is right... I worked out the state machine for this and came up with this alternate regular expression, which may be a little simpler: *'hide'd portion removed* |
| :: (e|a(b*a)*|b(a*b)*) :: seems simpler still. In fact, I'm going to go out on a limb and say I reckon it's the simplest possible - you need to cover the three cases (e, a..., b...) and in the latter two, you need to ensure that each run of the wrong letter is followed by one of the right one.
|
|
IP Logged |
|
|
|
|