wu :: forums (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
riddles >> cs >> Seperate Characters
(Message started by: shallow on Apr 28th, 2006, 8:26pm)

Title: Seperate Characters
Post by shallow on Apr 28th, 2006, 8:26pm
There is a sequence of charaters. each charater can be either one-byte charater or two-byte character. For the one-byte charater, its value fall in [0,127]; for a two-byte character, the value of the first byte must be larger than 127 and the second byte can take any value.

Now give you a position along the sequence. It can be a one-byte character, or any one byte of the two-byte charater. Now you are required to return the character right before the position.

Title: Re: Seperate Characters
Post by cwolves on Apr 29th, 2006, 8:30am
um....you need a bit more than that for it to even be a puzzle.  All your question says is:

"Here's a book.  What's before the letter n?"

even if the sequence is linear it's still impossible as any value can at some point have any other value before it.

Title: Re: Seperate Characters
Post by towr on Apr 29th, 2006, 9:25am
Seems like you have to search back to the last <=127 char (or the start) before you can determine where characters begin/end

Title: Re: Seperate Characters
Post by Neelesh on May 1st, 2006, 6:24am
I guess you will need to search for last two consecutive <=127 characters. This is because, supposing "L" represents such a character, then "LL" is the only sequence which gurantees that the previous character terminated on the first L whereas the new character started (and terminated) on the second L. Once such sequence is obtained, we could traverse in forward  direction.

Here is an untested code.

[hideb]

variables :
gp = given position
cp = current position
pp = previous position
p2pp = previous-to-previous position

I assume that these are the pointers-like entities. Dereferencing them simply gives the values "L" or "M" depending on whether the character pointed by them in >127 or <= 127.

cp = gp - 1
while ( (cp > 1) and ( (*cp == M) or (*cp -1 ) == M) )
   cp--;

pp = cp;

while(gp >= cp)
{
    p2pp = pp;
    pp = cp;
    cp += ( ( cp == M )? 2 : 1);
 // not tested for end of sequence, but that can be added easily)
}
return the character pointed by p2pp. This will be the necessary character.

[/hideb]

The code is untested, but I guess the underlying logic is flawless.




Title: Re: Seperate Characters
Post by SMQ on May 1st, 2006, 9:32am
Neelesh: I first thought along those lines too, but then I realized that towr was right -- any byte <= 127 is sufficient, because you know it must be the last byte of a char.  It could be a single-byte char or it could be the second byte of a two-byte char, but not the first byte of a two-byte char, so either way there is a char break after a byte <= 127.

--SMQ

Title: Re: Seperate Characters
Post by Neelesh on May 1st, 2006, 10:00am

on 05/01/06 at 09:32:58, SMQ wrote:
any byte <= 127 is sufficient, because you know it must be the last byte of a char.


Yeah, I see what you are saying, one byte <= 127 is sufficient. Of course, we need a byte<=127 before (gp-1)'th position, (and not before gp'th position) where gp represents the given pointer. This is obvious since we are interested in the previous character, not the character at the gp.


Title: Re: Seperate Characters
Post by cwolves on May 4th, 2006, 7:00am
If you're able to freely scan the entire array looking for a char <= 127 then why not just look at char n-1?

And if you can't then the problem is impossible.

Title: Re: Seperate Characters
Post by towr on May 4th, 2006, 7:43am
Well, a character isn't necessarily a char (i.e. 1 byte). You want the previous character, which may be either 1 or 2 bytes in size.

Title: Re: Seperate Characters
Post by Grimbal on May 8th, 2006, 9:27am
Here is my take.

This Java code returns the position of the start of the previous character.  It doesn't return the character itself because it is not specified how 2 bytes combine into one char.

The procedure returns -1 if there is no previous char.

[hideb]
// returns the char before the char at position curr
// return -1 if curr is on the 1st char.
static int getPrev(byte[]ch, int curr)
{
 // align current char
 int i = curr;
 while( i>0 && (ch[i-1]&0x80) != 0 ) i--;
 if( (curr-i)%2!=0 ) curr--;
 // find and align previous char
 int prev = curr-1;
 if( i>prev ){
   i = prev;
   while( i>0 && (ch[i-1]&0x80) != 0 ) i--;
 }
 if( (prev-i)%2!=0 ) prev--;
 // that's it
 return prev;
}
[/hideb]

Title: Re: Seperate Characters
Post by DewiMorgan on Jun 6th, 2006, 2:28pm
You can optimise slightly: if the preceding byte is high, then the cursor needs to move back 2, assuming the file is valid.

low high = n/a
high high = 2



Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board