wu :: forums
« wu :: forums - Database sorting »

Welcome, Guest. Please Login or Register.
Jan 7th, 2025, 11:33pm

RIDDLES SITE WRITE MATH! Home Home Help Help Search Search Members Members Login Login Register Register
   wu :: forums
   riddles
   cs
(Moderators: SMQ, Icarus, Grimbal, Eigenray, william wu, ThudnBlunder, towr)
   Database sorting
« Previous topic | Next topic »
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print
   Author  Topic: Database sorting  (Read 597 times)
gotit
Uberpuzzler
*****





   
Email

Gender: male
Posts: 804
Database sorting  
« on: Aug 19th, 2007, 10:32am »
Quote Quote Modify Modify

A database contains around 16000 records and needs to be sorted. At a given time, only 1000 records can be stored in memory. Can anyone please tell me how to sort the database efficientlyHuh
IP Logged

All signatures are false.
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Database sorting  
« Reply #1 on: Aug 19th, 2007, 11:31am »
Quote Quote Modify Modify

There's a number of options.
You could do a form of merge sort: first sort batches of 1000. Then read from two batches, and write to a new one at the same time, merging the two into a new one. Repeat until you have one large sorted database. (You really only need to be able to have two records in memory at a time for this to work: read one record from each set, write away the smallest and read a new one from that set, etc)
Option two: hash all the records so they a) fit in memory, b) the hashes have the same order as the records. Sort the hashes, then lookup the records.
Option 3: something akin to radix sort. Divide the criteria you sort on into small pieces; e.g. if you were sorting numbers you would use bits; then stable sort from the least significant bit up to the most significant one. All the small pieces would fit in memory at the same time, so sorting them is easy, then just use that to realign the original.
IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
gotit
Uberpuzzler
*****





   
Email

Gender: male
Posts: 804
Re: Database sorting  
« Reply #2 on: Aug 19th, 2007, 12:55pm »
Quote Quote Modify Modify

The third option is not quite clear to me. What I don't understand is how to relate the bits back to the original numbers. Should I use some kind of hash function? If yes, what kind of hash function should be used?
IP Logged

All signatures are false.
towr
wu::riddles Moderator
Uberpuzzler
*****



Some people are average, some are just mean.

   


Gender: male
Posts: 13730
Re: Database sorting  
« Reply #3 on: Aug 19th, 2007, 1:38pm »
Quote Quote Modify Modify

on Aug 19th, 2007, 12:55pm, gotit wrote:
The third option is not quite clear to me. What I don't understand is how to relate the bits back to the original numbers. Should I use some kind of hash function? If yes, what kind of hash function should be used?
In the case of actual numbers (e.g. ints or long long ints), you would just sort the numbers on the last bit, then second to last bit etc.
With non-numerical data you wouldn't use actual bits, but (different) pieces of information. To relate them back to the record each piece has to keep an association with the record, and you could do that by attaching a unique id to each record (and piece of data extracted from it).
 
So you might first sort pairs of (FirstName,Id). Then  
retrieve the next piece of data for each id, e.g the last name. And then (stable) sort the update pairs (LastName, Id).
In the end, you can reorder the records according to the id's
IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB
Grimbal
wu::riddles Moderator
Uberpuzzler
*****






   


Gender: male
Posts: 7527
Re: Database sorting  
« Reply #4 on: Aug 19th, 2007, 2:43pm »
Quote Quote Modify Modify

It also depends what underlying hardware you have.
 
If your database is a  file and you have enough space on disk, I'd say you should sort chunks of 1000 into as many files, then, in a second pass, merge all the results by keeping just one record from each file in memory.  This way, in the end, you read and write each record only twice.
 
If your database is more like an array (i.e. a direct access disk space with not much extra space to play with), you will have to shuffle records around.  You could maybe do a first reorganizing by estimating a median and splitting the database in 2.  There is a simple algorithm for that.  Split the parts again until they are <1000 items, sort them.
 
I thought of another possibility.  If the sort is only on part of the record, you could create an index in memory (i.e. keep just the key fields and the location on disk) and sort that.  And shuffle the records later.
« Last Edit: Aug 19th, 2007, 2:44pm by Grimbal » IP Logged
Pages: 1  Reply Reply Notify of replies Notify of replies Send Topic Send Topic Print Print

« Previous topic | Next topic »

Powered by YaBB 1 Gold - SP 1.4!
Forum software copyright © 2000-2004 Yet another Bulletin Board