wu :: forums
« wu :: forums - Finding Substring Position »

Welcome, Guest. Please Login or Register.
Nov 29th, 2024, 12:35am

RIDDLES SITE WRITE MATH! Home Home Help Help Search Search Members Members Login Login Register Register
   wu :: forums
   riddles
   cs
(Moderators: towr, Eigenray, Grimbal, Icarus, ThudnBlunder, SMQ, william wu)
   Finding Substring Position
« Previous topic | Next topic »
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Finding Substring Position  (Read 817 times)
A
Full Member
***



Perder todas as esperanças é liberdade!

   


Gender: male
Posts: 236
Finding Substring Position  
« on: Nov 11th, 2007, 11:06pm »
Quote Quote Modify Modify

Given a string of length N , and a integer k , we need to find the first substring of length k which occurs more than once !!
 
 
 
eg  
 
Input: String = abcdebcxbcdebyz , k=4
output :Substring = "bcde” and index = 1
 
How we can efficiently modify to find the one with Maximum occurrences ?
IP Logged

What Doesn't Kill Me Will Only Make Me Stronger
gotit
Uberpuzzler
*****





   
Email

Gender: male
Posts: 804
Re: Finding Substring Position  
« Reply #1 on: Nov 12th, 2007, 2:43am »
Quote Quote Modify Modify

I was thinkink something like this:
Construct a suffix tree of the string. Scan all nodes at level k of the tree from left to right. Mark the first node that has a branching. The path from the root to this node will be the required substring.
 
As far as  finding the substring with the maximum occurance is concerned, just mark that node at level k which has the most branches. Again the path from the root to this node will be the required substring.
 
I haven't done enough experiments on the above though. So I am not sure whether this works for all cases.
IP Logged

All signatures are false.
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Finding Substring Position  
« Reply #2 on: Nov 12th, 2007, 3:05am »
Quote Quote Modify Modify

on Nov 12th, 2007, 2:43am, gotit wrote:
I was thinkink something like this:
Construct a suffix tree of the string. Scan all nodes at level k of the tree from left to right. Mark the first node that has a branching. The path from the root to this node will be the required substring.
I don't think it would necessarily branch at depth k. It might branch at a later level.
 
Consider k=2 and banana
hidden:

Nowhere is there a branching after an N, while AN is clearly the first repeating substring of length 2.
 
You can probably fix it by noting how many leaf nodes there are in the subtree of a node (because you're right in that if there are multiple continuations, then there must be multiple occurrences; it's just that you can't determine that based on purely local information)
« Last Edit: Nov 12th, 2007, 3:10am by towr » IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
KWillets
Junior Member
**





   


Posts: 84
Re: Finding Substring Position  
« Reply #3 on: Nov 12th, 2007, 11:02am »
Quote Quote Modify Modify

If you use a left-right online algorithm, the first inserted node of depth >= k is a match.
 
There's also a lighter-weight (1n) way to do this with an insertion-based block sort, but it's tricky.
IP Logged
aditi
Newbie
*





   


Posts: 17
Re: Finding Substring Position  
« Reply #4 on: Nov 12th, 2007, 3:40pm »
Quote Quote Modify Modify

Pre process the string: for every character 'c' in the string, let next(index('c')) be the index when the character appears again in the string(-1 if it doesn't).
This can be done by using a hash and scanning the string from left to right. If the character 'c' is not there in the hash, add it to hash with its index as the value and set next('c') to -1. If 'c' is in the map, then next(index('c')) = hash_value('c') and hash_value('c') = index of 'c' in the string.
 
Let str[i,i+k-1] be the substring of length k in consideration. To see if it occurs again, check if str[next(i),next(i)+k-1] is same as str[i,i+k-1]. Keep doing this for 0 <= i < N, unless you find a match.  
 
Also avoid considering str[i,i+k-1] where next(i) = -1 because it can't have a match later on.
 
Space : O(N)
Time: O(Nk)
IP Logged
dont.hurt.again
Newbie
*





   


Posts: 6
Re: Finding Substring Position  
« Reply #5 on: Nov 13th, 2007, 9:49pm »
Quote Quote Modify Modify

can anybody give a good link to study suffix tree.
IP Logged
riddle_sea
Newbie
*






   


Gender: male
Posts: 13
Re: Finding Substring Position  
« Reply #6 on: Nov 13th, 2007, 10:55pm »
Quote Quote Modify Modify

on Nov 13th, 2007, 9:49pm, dont.hurt.again wrote:
can anybody give a good link to study suffix tree.

 
i found a good link...
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Suffix/
IP Logged
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board