Author |
Topic: Finding Substring Position (Read 817 times) |
|
A
Full Member
Perder todas as esperanças é liberdade!
Gender:
Posts: 236
|
|
Finding Substring Position
« on: Nov 11th, 2007, 11:06pm » |
Quote Modify
|
Given a string of length N , and a integer k , we need to find the first substring of length k which occurs more than once !! eg Input: String = abcdebcxbcdebyz , k=4 output :Substring = "bcde” and index = 1 How we can efficiently modify to find the one with Maximum occurrences ?
|
|
IP Logged |
What Doesn't Kill Me Will Only Make Me Stronger
|
|
|
gotit
Uberpuzzler
Gender:
Posts: 804
|
|
Re: Finding Substring Position
« Reply #1 on: Nov 12th, 2007, 2:43am » |
Quote Modify
|
I was thinkink something like this: Construct a suffix tree of the string. Scan all nodes at level k of the tree from left to right. Mark the first node that has a branching. The path from the root to this node will be the required substring. As far as finding the substring with the maximum occurance is concerned, just mark that node at level k which has the most branches. Again the path from the root to this node will be the required substring. I haven't done enough experiments on the above though. So I am not sure whether this works for all cases.
|
|
IP Logged |
All signatures are false.
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: Finding Substring Position
« Reply #2 on: Nov 12th, 2007, 3:05am » |
Quote Modify
|
on Nov 12th, 2007, 2:43am, gotit wrote:I was thinkink something like this: Construct a suffix tree of the string. Scan all nodes at level k of the tree from left to right. Mark the first node that has a branching. The path from the root to this node will be the required substring. |
| I don't think it would necessarily branch at depth k. It might branch at a later level. Consider k=2 and banana hidden: | | Nowhere is there a branching after an N, while AN is clearly the first repeating substring of length 2. You can probably fix it by noting how many leaf nodes there are in the subtree of a node (because you're right in that if there are multiple continuations, then there must be multiple occurrences; it's just that you can't determine that based on purely local information)
|
« Last Edit: Nov 12th, 2007, 3:10am by towr » |
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
KWillets
Junior Member
Posts: 84
|
|
Re: Finding Substring Position
« Reply #3 on: Nov 12th, 2007, 11:02am » |
Quote Modify
|
If you use a left-right online algorithm, the first inserted node of depth >= k is a match. There's also a lighter-weight (1n) way to do this with an insertion-based block sort, but it's tricky.
|
|
IP Logged |
|
|
|
aditi
Newbie
Posts: 17
|
|
Re: Finding Substring Position
« Reply #4 on: Nov 12th, 2007, 3:40pm » |
Quote Modify
|
Pre process the string: for every character 'c' in the string, let next(index('c')) be the index when the character appears again in the string(-1 if it doesn't). This can be done by using a hash and scanning the string from left to right. If the character 'c' is not there in the hash, add it to hash with its index as the value and set next('c') to -1. If 'c' is in the map, then next(index('c')) = hash_value('c') and hash_value('c') = index of 'c' in the string. Let str[i,i+k-1] be the substring of length k in consideration. To see if it occurs again, check if str[next(i),next(i)+k-1] is same as str[i,i+k-1]. Keep doing this for 0 <= i < N, unless you find a match. Also avoid considering str[i,i+k-1] where next(i) = -1 because it can't have a match later on. Space : O(N) Time: O(Nk)
|
|
IP Logged |
|
|
|
dont.hurt.again
Newbie
Posts: 6
|
|
Re: Finding Substring Position
« Reply #5 on: Nov 13th, 2007, 9:49pm » |
Quote Modify
|
can anybody give a good link to study suffix tree.
|
|
IP Logged |
|
|
|
|