wu :: forums - Database sorting

wu :: forums « wu :: forums - Database sorting » Welcome, Guest. Please Login or Register. Jul 21^st, 2025, 10:43pm
RIDDLES SITE WRITE MATH! Home Help Search Members Login Register

   wu :: forums
   riddles
   cs (Moderators: Icarus, Eigenray, ThudnBlunder, Grimbal, SMQ, william wu, towr)
   Database sorting

« Previous topic | Next topic »

Pages: 1

Notify of replies

Send Topic

Author

Topic: Database sorting (Read 601 times)

gotit
Uberpuzzler

Gender:

Posts: 804

Database sorting
« on: Aug 19^th, 2007, 10:32am »

Quote

Modify

A database contains around 16000 records and needs to be sorted. At a given time, only 1000 records can be stored in memory. Can anyone please tell me how to sort the database efficiently Huh

IP Logged

All signatures are false.

towr
wu::riddles Moderator
Uberpuzzler

Some people are average, some are just mean.

Gender: male

Posts: 13730

Re: Database sorting
« Reply #1 on: Aug 19^th, 2007, 11:31am »

Quote

Modify

There's a number of options.
You could do a form of merge sort: first sort batches of 1000. Then read from two batches, and write to a new one at the same time, merging the two into a new one. Repeat until you have one large sorted database. (You really only need to be able to have two records in memory at a time for this to work: read one record from each set, write away the smallest and read a new one from that set, etc)
Option two: hash all the records so they a) fit in memory, b) the hashes have the same order as the records. Sort the hashes, then lookup the records.
Option 3: something akin to radix sort. Divide the criteria you sort on into small pieces; e.g. if you were sorting numbers you would use bits; then stable sort from the least significant bit up to the most significant one. All the small pieces would fit in memory at the same time, so sorting them is easy, then just use that to realign the original.

IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB

gotit
Uberpuzzler

Gender:

Posts: 804

Re: Database sorting
« Reply #2 on: Aug 19^th, 2007, 12:55pm »

Quote

Modify

The third option is not quite clear to me. What I don't understand is how to relate the bits back to the original numbers. Should I use some kind of hash function? If yes, what kind of hash function should be used?

IP Logged

All signatures are false.

towr
wu::riddles Moderator
Uberpuzzler

Some people are average, some are just mean.

Gender: male

Posts: 13730

Re: Database sorting
« Reply #3 on: Aug 19^th, 2007, 1:38pm »

Quote

Modify

on Aug 19^th, 2007, 12:55pm, gotit wrote:

In the case of actual numbers (e.g. ints or long long ints), you would just sort the numbers on the last bit, then second to last bit etc.
With non-numerical data you wouldn't use actual bits, but (different) pieces of information. To relate them back to the record each piece has to keep an association with the record, and you could do that by attaching a unique id to each record (and piece of data extracted from it).

So you might first sort pairs of (FirstName,Id). Then
retrieve the next piece of data for each id, e.g the last name. And then (stable) sort the update pairs (LastName, Id).
In the end, you can reorder the records according to the id's

IP Logged

Wikipedia, Google, Mathworld, Integer sequence DB

Grimbal
wu::riddles Moderator
Uberpuzzler

Gender:

Posts: 7528

Re: Database sorting
« Reply #4 on: Aug 19^th, 2007, 2:43pm »

Quote

Modify

It also depends what underlying hardware you have.

If your database is a file and you have enough space on disk, I'd say you should sort chunks of 1000 into as many files, then, in a second pass, merge all the results by keeping just one record from each file in memory. This way, in the end, you read and write each record only twice.

If your database is more like an array (i.e. a direct access disk space with not much extra space to play with), you will have to shuffle records around. You could maybe do a first reorganizing by estimating a median and splitting the database in 2. There is a simple algorithm for that. Split the parts again until they are <1000 items, sort them.

I thought of another possibility. If the sort is only on part of the record, you could create an index in memory (i.e. keep just the key fields and the location on disk) and sort that. And shuffle the records later.

« Last Edit: Aug 19^th, 2007, 2:44pm by Grimbal »

IP Logged

Pages: 1

Notify of replies

Send Topic


« Previous topic \| Next topic »