Most amateur programmers know a few sorting algorithms--bubble sort certainly, probably the maximum-minimum methods, and, on a slightly more advanced level, the shell sort. Some know the more efficient sorts, such as shuttle or tree sorts. The best of these sorting algorithms require time proportional to the number of elements to sort (n*logn). What is not so well known is a sorting algorithm--and not a terribly complex one, either--that finishes in a time proportional to n (the number of elements to be sorted). Therefore, for some values of n, this sort must be faster than any of the other types. It generally goes by the name of "address calculation."

To be fair, some good reasons account for its lack of popularity. First, this method takes more than the minimum necessary amount of memory space to sort any given list; it requires additional storage proportional to n. However, in most microcomputer BASIC operations, storage requirements are not excessive, and the time savings may out weigh storage considerations. The second and more fundamental objection is that an address-calculation sort depends on the nature of the sorting keys. Most sorts use the key values only for comparison, simply checking whether one key is greater than another. This sort uses the actual value of the key.

For example, the address-calculation sort operates by first reserving a large range of memory for storage. It goes through its input list in order and, for each element, uses the key value to calculate an address within the reserved range. This mapping of keys to addresses is crucial. The operation is most efficient when the mapping is one-to-one (one element to one address), but practically it will be many-to-one. The only absolute restriction on the mapping is that it be nondecreasing, but it is important to the sort's efficiency that the greatest possible dispersion of the list elements into the range be achieved, or at least that the fewest possible collisions (mappings of two list elements onto one address) occur. These considerations require knowledge of the range and distribution of the keys. Because commercial programmers must make sorts as general as possible, address calculation is neglected. If the key distribution differs substantially from the rectilinear (from an even distribution, such as might be obtained from random generation), then the function to map keys onto addresses must become much more complex. But for microcomputer programming, often the key distribution is close to random, making the address-calculation sort a good choice.

With the appropriate address calculated, that location is checked to determine its status. If it is empty, the current list element is placed there, and the algorithm continues. If it is already occupied, then the element must be inserted in such a manner as to maintain proper order. When all list elements have been placed in the range, the program simply reads them off in order, ignoring unused elements of the range, and places them in the output list.

Test Program

Listing 1 is a formatted listing of an Applesoft version of a test model address-calculation sort. The loop in lines 40 through 80 generates n random integer variables (-32767 to +32767) and prints them out. The variable I represents the number of locations allocated to the range (more about the 2.36 later). The address-mapping function is a simple linear one; keys are multiplied by a constant BP to linearly map them onto the range 0 to I. A sort of string variables would compute a numerical value from the first so many characters, weighting them by position. Significantly, the actual list elements are not placed in the array A%; rather, indexes representing their location in the input list N% are used. A considerable space saving for lists in which the key is not the whole record results from this approach. A% is dimensioned at I+N (see line 110) to ensure that no element, in the course of being inserted into A%, gets bumped off the upper end. While this wastes space, it could be avoided with extra programming; however, that would obscure the primary ideas in this example. The main loop goes through the list in order, computing the address V from the key. If the location is vacant, line 150 places the index there. Otherwise, lines 160 and, 170 insert the index in a higher location. The process produces a "ripple" up the line, smaller elements into place so that the highest element encountered gets placed in the next vacant location by line 150. Once all the indexes are in place, lines 200 through 230 print the results. You could just as easily place them in another array. The counter C saves time by halting the printout upon locating all the elements.

Efficiency vs. Speed

I still have not justified my grandiose claims for the sort's speed. While the full mathematical treatment is unnecessary, some discussion is in order. Note first that the time used by the printout loop remains proportional to the value of I (the number of locations assigned to the range). This provides a motive for keeping I as small as possible, and if I is made proportional to n, then the time taken by this loop will also be proportional to n. The time taken by the main loop would be proportional to n if there were no collisions (that is, if lines 160 and 170 went unused). The number of collisions decreases as I increases, providing a reason for wanting I to be as large as possible. Counterbalancing the two considerations shows that the optimum value for I will be proportional to n; the time taken in the main loop then also turns out to be proportional to n. The actual constants of proportionality depend on the implementation. These arguments are validated experimentally by figure 1, based on numerous timings of a stripped-down version of listing 1 run on an Apple II Plus. The diagram consists of a line plotted on top of points representing averages of several runs at near-optimum I. The optimum time turned out to be slightly greater than 9 seconds per 100 n. The optimum value for I was calculated to be about that used in listing 1; namely 2.36*N. Regardless of the implementation, the optimum ratio of I to n should be about 2.5 +/- .5, with little variation of time within that range.

The address-calculation sorting algorithm provides a fast, not terribly complicated sort for lists the nature of whose keys and distribution is generally known. For special cases it can provide the most effiecent sorting available.

Douglas Davidson (1505 Mintwood Dr., McLean, VA 22101) is a high-school senior. His hobbies include computers and astronomy.

Listing 1: The address-calculation sort program. Written for the Apple II computer, the program will generate a list of random numbers, sort the list, and print the sorted list.

Figure 1: The Address Calculation Response Chart. The amount of time required to sort a list is directly proportional to the number (n) of elements in the list.

Address Calculation:
The Forgotten Sort

Sorting speed is directly proportional to the number of elements

Test Program

Efficiency vs. Speed

Welcome to massmind.org!

Welcome to massmind.org!

Address Calculation: The Forgotten Sort

Sorting speed is directly proportional to the number of elements

Test Program

Efficiency vs. Speed

Welcome to massmind.org!

Welcome to massmind.org!

Address Calculation:
The Forgotten Sort