wu :: forums
« wu :: forums - Re: substring search »

Welcome, Guest. Please Login or Register.
Nov 21st, 2024, 1:54am

RIDDLES SITE WRITE MATH! Home Home Help Help Search Search Members Members Login Login Register Register
   wu :: forums
   riddles
   cs
(Moderators: SMQ, Grimbal, william wu, ThudnBlunder, Icarus, Eigenray, towr)
   Re: substring search
« Previous topic | Next topic »
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Re: substring search  (Read 9294 times)
sameer_r
Guest

Email

Re: substring search  
« on: Apr 25th, 2003, 2:31pm »
Quote Quote Modify Modify Remove Remove

Hi everyone,
 
I didn't see any message posted regarding the substring search problem.  Pardon me if a thread already exists for it.
 
I was thinking about a solution to the problem and came up with the following algorithm (only time efficient)
 
Say A is the string to be searched in the string B.
for e.g.
A = "xyxz"
B = "xyzxyxz"
 
Sort A and Sort B keeping track of the original indices of each character.
A = "xxyz" with original indices 1,3,2,4
B = "xxxyyzz" with original indices 1,4,6,2,5,3,7
 
Allocate an integer array (counter) equal to the size of B.
 
Now for every character in B, do a binary search to locate the corresponding character in A (there could be more than one occurence of the same character).
 
Say for the first x in B, we locate the first x in A with original index 1. (since its sorted the other x's would be easy to find by just increasing the index until we came to a character > x)
 
Now if this character in B were to be a part of the substring A, then the beginning of the string would have been (original index of this character in B - original index of this character in A + 1).
 
In the case of the first x in A: 1-1+1
or in the case of second x in A : 1-3+1<0 (hence cannot be)
 
This means that if the first x in B were a part of the substring A, then it could be the first character of A and not the 3rd.
 
Now if this x is the first character of substring A, then 1 should be the index from which the substring starts in B and hence in Count array we increment Count[1] by 1.
 
Finally when we are done looking at all the elements of B, then we scan through the Count array and all those offsets that have the value equal to the length of substring A, are in fact the beginning locations of the substring A in B.
 
Thanks,
Sameer.
IP Logged
Papa Homer
Guest

Email

Re: substring search  
« Reply #1 on: Apr 25th, 2003, 3:06pm »
Quote Quote Modify Modify Remove Remove

I am feeling a bit slow right now, so could you provide some more details for you solution?  Specifically, the runtime complexity for each step as well as an explaination of why it actually works.
 
Thanks.
IP Logged
sameer_r
Guest

Email

Re: substring search  
« Reply #2 on: Apr 25th, 2003, 3:13pm »
Quote Quote Modify Modify Remove Remove

The algorithm has the following steps:
 
sort A and B
  sorting A theta(m)
  sorting B theta(n)
  assuming that we know in advance the range of characters that the strings can contain.
 
go thru each element of B (n)
  search that element in A theta(lg(n))
  go thru the matching element in A (at worst m) (this part I am not sure of. looks like an average case analysis like in quick sort is needed)
 
finally go thru the count array and find the offsets where your substring starts in B. theta(n)
 
so i guess the complexity in worst case comes out to be theta(nm), though I think it should be better than that.
IP Logged
Papa Homer
Guest

Email

Re: substring search  
« Reply #3 on: Apr 25th, 2003, 3:34pm »
Quote Quote Modify Modify Remove Remove

Yeah, that's the complexity of a brute force method: for every character in B, see if the next m characters make up A.
IP Logged
sameer_r
Guest

Email

Re: substring search  
« Reply #4 on: Apr 25th, 2003, 4:06pm »
Quote Quote Modify Modify Remove Remove

For Brute force method, the run-time is always theta(nm), but for this algo, the worst case is theta(nm).  It could perform better.  Even for quick-sort the worst case run-time is theta(n^2), but average case is theta(nlgn), which is pretty good, in fact the theoritically the best possible for comparison sort.
 
I still hope (fingers crossed) that this algorithm is better than brute force.
IP Logged
Papa Homer
Guest

Email

Re: substring search  
« Reply #5 on: Apr 25th, 2003, 4:18pm »
Quote Quote Modify Modify Remove Remove

That's not entire correct.  For the brute force method the best case is m (if A appears in the beginning of B).  However, the worst and AVERAGE cases are still mn.  So, the question is, what is the average performance of your algorithm on randomk data.
IP Logged
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: substring search  
« Reply #6 on: Apr 26th, 2003, 5:25am »
Quote Quote Modify Modify

there is a sublinear search algorithm for this problem..
You need a little preprocessing (linear time) and a few simple tricks..
Of course without my book I can't really describe it adequatly..
IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: substring search  
« Reply #7 on: Apr 27th, 2003, 1:37pm »
Quote Quote Modify Modify

I'll just give an example for now.. As I said, there are a few tricks you need to do it efficiently, but basicly here's how it works (after preprocessing A in linear time/space)
 
Suppose we start with two strings
 
A = abbac
B = abcaccbcabcbabbaccbabcccabababbabbcc
start comparing from the back of A
 
abcaccbcabcbabbaccbabcccabababbabbcc
abbac
  xxx
since it doesn't match, and ac doesn't occur in A again, you can shift A past the first 5 characters
 
Compare at that point again
 
abcaccbcabcbabbaccbabcccabababbabbcc
     abbac
  xxx    x
it doesn't match, but b does occur in A, so shift A to match to the first b and compare from there again
 
abcaccbcabcbabbaccbabcccabababbabbcc
       abbac
  xxx    x x
same again as before
 
abcaccbcabcbabbaccbabcccabababbabbcc
         abbac
  !xx    x x x
and again, compare, you can skip the substring (one character in this case) you allready know is there
 
abcaccbcabcbabbaccbabcccabababbabbcc
            abbac
  xxx    x xXxXXX
done
 
best case when there is a match m; best case when there isn't n/m; on avarage  < n, worst case n+m (with preprocessing)
« Last Edit: Apr 27th, 2003, 1:46pm by towr » IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
akasina9
Newbie
*



x = 0x2B | ~ 0x2B. x is the question

   


Gender: male
Posts: 9
Re: substring search  
« Reply #8 on: Nov 17th, 2012, 6:30am »
Quote Quote Modify Modify

on Apr 27th, 2003, 1:37pm, towr wrote:
I'll just give an example for now.. As I said, there are a few tricks you need to do it efficiently, but basicly here's how it works (after preprocessing A in linear time/space)
 
Suppose we start with two strings
 
A = abbac
B = abcaccbcabcbabbaccbabcccabababbabbcc
start comparing from the back of A
 
abcaccbcabcbabbaccbabcccabababbabbcc
abbac
  xxx
since it doesn't match, and ac doesn't occur in A again, you can shift A past the first 5 characters
 
Compare at that point again
 
abcaccbcabcbabbaccbabcccabababbabbcc
     abbac
  xxx    x
it doesn't match, but b does occur in A, so shift A to match to the first b and compare from there again
 
abcaccbcabcbabbaccbabcccabababbabbcc
       abbac
  xxx    x x
same again as before
 
abcaccbcabcbabbaccbabcccabababbabbcc
         abbac
  !xx    x x x
and again, compare, you can skip the substring (one character in this case) you allready know is there
 
abcaccbcabcbabbaccbabcccabababbabbcc
            abbac
  xxx    x xXxXXX
done
 
best case when there is a match m; best case when there isn't n/m; on avarage  < n, worst case n+m (with preprocessing)

 
I think towr was referring to the Boyer Moore algorithm.http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm .
IP Logged
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: substring search  
« Reply #9 on: Nov 17th, 2012, 7:35am »
Quote Quote Modify Modify

Probably, yeah.  
God, that was long ago... Did we even have wikipedia back then?
IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
TenaliRaman
Uberpuzzler
*****



I am no special. I am only passionately curious.

   


Gender: male
Posts: 1001
Re: substring search  
« Reply #10 on: Nov 23rd, 2012, 11:13am »
Quote Quote Modify Modify

We did have wikipedia back then, but it wasn't popular as it is now nor was it as rich in information as it is today. The oldest wikipedia article on Boyer Moore that wayback machine has is around 2007. So, I guess its reasonable enough to say that Boyer Moore article wasn't around when you wrote your answer Cheesy
IP Logged

Self discovery comes when a man measures himself against an obstacle - Antoine de Saint Exupery
birbal
Full Member
***





   


Gender: male
Posts: 250
Re: substring search  
« Reply #11 on: Dec 13th, 2012, 10:39am »
Quote Quote Modify Modify

on Nov 17th, 2012, 7:35am, towr wrote:
Probably, yeah.  
God, that was long ago... Did we even have wikipedia back then?

Isn't this algo kmp ?
http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorith m
IP Logged

The only thing we have to fear is fear itself!
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: substring search  
« Reply #12 on: Dec 13th, 2012, 11:19am »
Quote Quote Modify Modify

No, I don't think it's that one.
I can't really be bothered to get into it, but Boyer Moore seems more like it.
IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board