Author |
Topic: string search: none/one/ two character different (Read 465 times) |
|
chuncl
Junior Member
Gender:
Posts: 87
|
|
string search: none/one/ two character different
« on: Oct 2nd, 2008, 3:23pm » |
Quote Modify
|
There are two files. Files A has many lines, each line has a string composed of 20 characters. For example: in file A we have: 1, abbbcdefggabcdeabcde 2, abbbcdefggabcdeabcxy 3, abbbcdefggabcdeabcxz ...... There is another file B, with many lines also, each line has a string composed of 20 characters.e.g., 1, abhhcdefggabcdeabcde 2, abbxcdefggabcdeabcxz 3, abbpcdefggabcdeabcxz 4, abhhcdefggabcdeabcde 5, abcxcdefggabcdeabcxy ...... We want to for each string in file b, search the strings in file A, which are the same, or one/two chateraters different. For example, there are two characters different between B1 and A1,A1 should be selected for B1. Also, there is one character difference between B2 and A2,so A2 should be selected also. what is the efficient way to do this search? Thanks
|
|
IP Logged |
It is fun
|
|
|
towr
wu::riddles Moderator Uberpuzzler
Some people are average, some are just mean.
Gender:
Posts: 13730
|
|
Re: string search: none/one/ two character differe
« Reply #1 on: Oct 3rd, 2008, 1:01am » |
Quote Modify
|
I'd probably preprocess A to make some kind of tree.
|
|
IP Logged |
Wikipedia, Google, Mathworld, Integer sequence DB
|
|
|
Hippo
Uberpuzzler
Gender:
Posts: 919
|
|
Re: string search: none/one/ two character differe
« Reply #2 on: Oct 3rd, 2008, 5:37am » |
Quote Modify
|
on Oct 3rd, 2008, 1:01am, towr wrote:I'd probably preprocess A to make some kind of tree. |
| Me too, I would start creating trie. And instead of checking exact match I would allow traversal with upto 2 differences. But there are additional approaches as "Metric trees" I am not sure how well will they be suited for this problem. It may even depend on the actual data set.
|
« Last Edit: Oct 3rd, 2008, 5:40am by Hippo » |
IP Logged |
|
|
|
serenity
Newbie
Posts: 25
|
|
Re: string search: none/one/ two character differe
« Reply #3 on: Oct 3rd, 2008, 8:35am » |
Quote Modify
|
It's not clear to me r u making trie of whole file or each line individually if whole file isn't it necessary that first alphabet of every line should be same( in this case a) to get correct result as root will be a then.
|
|
IP Logged |
|
|
|
Hippo
Uberpuzzler
Gender:
Posts: 919
|
|
Re: string search: none/one/ two character differe
« Reply #4 on: Oct 3rd, 2008, 10:30am » |
Quote Modify
|
Trie of one line is linked list of word letters ... no this is not what I was talking about. The lines can start arbitrary. http://en.wikipedia.org/wiki/Trie
|
« Last Edit: Oct 3rd, 2008, 10:32am by Hippo » |
IP Logged |
|
|
|
|