A hopefully interesting problem: Allocation algorithm

Donald Halloran

unread,

May 9, 2002, 11:15:32 AM5/9/02

to

Here's a problem which I have had to solve one way or another a couple of
times in the past. I thought it made for an interesting challenge. I
resorted to a compromise between cursor and set based operations the last
time.

Take 2 tables, representing groups of the same things of some sort, such
that each record has a "quantity":

tA(aOperationId int, aThingId int, aQuantity int)
tB(bOperationId int, bThingId int, bQuantity int)

OperationId represents a key into some kind of operation table that caused
the records to be inserted. One operation represents one record in one of
the tables.

First, tA is populated with some number of records. At some point in the
future, tB is populated with an unrelated number of records. Quantities are
essentially random.

The problem is to "allocate" the quantities in tB against the quantities in
tA. That is to say, all quantity in tB must be "spent" against quantity in
tA. A good real life example is stock, where tA represents stock that has
come in, and tB represents stock going out. Another representation might be
financial, or an allocation of groups of people to other predefined groups,
like school classes, or teams, or whatever.

The allocation follows a "many to many" relationship, such that quantities
in tB may be greater or less than quantities in tA, but the sum quantity of
tA must always at least equal the sum quantity in tB.

The allocation relationship is held in a third table, pointing at each of
the other two (keyed into aOperationId and bOperationId, and holding the
quantity allocated).

The first time I encountered this kind of problem Quantity was not random,
but fell within a likely range of values. My solution that time was to turn
to an interpolative solution, taking an "expected" number of records based
on an average quantity, and then adding or subtracting, again based on an
expected "average" quantity as necessary. This was done in a while loop and
usually managed to allocate within 3 or 4 iterations.

The second time, an average quantity was not useful as quantities could
swing from just one to hundreds. My solution in this instance went something
like this:
Cross join the two tables into a working table. Cursor through the cross
join. Each time we hit a thingid an allocate the lesser of the two
quantities to the joined record, update all other records containing the
thingId on both sides of the working table. Do this while there are still
records with quantity unallocated. We don't need to cursor through the
entire cross join.
This solution proved faster than the fully iterative solution, but a cursor
was still needed.

Can anyone write an efficient SQL Server solution to this general problem
without using a cursor?

--
Don Halloran

Michael MacGregor

unread,

May 9, 2002, 11:18:54 AM5/9/02

to

It really help to provide DDL (CREATE TABLES), example data (INSERT INTO)
and the result you hope to obtain. That is far better than just narrative as
it allows us to cut and paste into Query Analyzer for testing.

Michael MacGregor
Database Architect
SalesDriver

Isaac Blank

unread,

May 9, 2002, 11:54:20 AM5/9/02

to

Check these threads:
http://groups.google.com/groups?th=228cba63c764ea1
http://groups.google.com/groups?th=388ff26ae6367c93

Donald Halloran

unread,

May 9, 2002, 12:00:47 PM5/9/02

to

"Michael MacGregor" <michael...@NOsalesSPAMdriver.com> wrote in message
news:#Ny$X029BHA.2372@tkmsftngp07...

> It really help to provide DDL (CREATE TABLES), example data (INSERT INTO)
> and the result you hope to obtain. That is far better than just narrative
as
> it allows us to cut and paste into Query Analyzer for testing.

Understood, but the problem isn't really designed around any single
implementation. Nevertheless, here's the general idea. (for simplicity,
let's assume every "thing" is of the same type, and leave the operation out,
instead just having a unique identifier on each table) First the result
required, DDL at the end

tA
aId aQuantity
---------------
1 10
2 5

tB
bId bQuantity
----------------
1 6
2 5
3 3
4 1

Wanted result:

tAllocation
aId bId QuantityAllocated
------------------------------
1 1 6
1 2 4
2 2 1
2 3 3
2 4 1

------------ BEGIN HERE --------------
create table tA(aId int identity(1,1), aQuantity int)
go

create table tB(bId int identity(1,1), bQuantity int)
go

create table tAllocation(aId int, bId int, QuantityAllocated int)
go

insert into tA values(10)
insert into tA values(5)

insert into tB values(6)
insert into tB values(5)
insert into tB values(3)
insert into tB values(1)
----------------------------------------

Joe Celko

unread,

May 9, 2002, 12:19:05 PM5/9/02

to

You really need to buy a copy of SQL FOR SMARTIES, but then who doesn't
<g>? let me quote from my book:

This is Dr. Codd's T-Join, which was introduced in his book on the
seocnd version of the relational model, which were based on the idea of
a best fit or approximate equality. The algorithm for the operators is
easier to understand with an example modified from Dr. Codd.

The problem is to assign the classes to the available classrooms. We
want (class_size < room_size) to be true after the assignments are made.
This will allow us a few empty seats in each room for late students. We
can do this in one of two ways. The first way is to sort the tables in
ascending order by classroom size and the number of students in a class.
We start with the following tables:

CREATE TABLE Rooms
(room_nbr CHAR(2) PRIMARY KEY,
room_size INTEGER NOT NULL);

CREATE TABLE Classes
(class_nbr CHAR(2) PRIMARY KEY,
class_size INTEGER NOT NULL);

These tables have the following rows in them:

Classes
class_nbr class_size
=====================
'c1' 80
'c2' 70
'c3' 65
'c4' 55
'c5' 50
'c6' 40

Rooms
room_nbr room_size
==================
'r1' 70
'r2' 40
'r3' 50
'r4' 85
'r5' 30
'r6' 65
'r7' 55

The goal of the T-Join problem is to assign a class which is smaller
than the classroom given it (class_size < room_size). Dr. Codd gives
two approaches to the problem.

1) Ascending order algorithm (cursors)

Sort both tables into ascending order. Reading from the top of the
Rooms table, match each class with the first room that will fit.

Classes Rooms
class_nbr class_size room_nbr room_size
==================== ===================
'c6' 40 'r5' 30
'c5' 50 'r2' 40
'c4' 55 'r3' 50
'c3' 65 'r7' 55
'c2' 70 'r6' 65
'c1' 80 'r1' 70
'r4' 85
This gives us

Results
class_nbr class_size room_nbr room_size
========================================
'c2' 70 'r4' 85
'c3' 65 'r1' 70
'c4' 55 'r6' 65
'c5' 50 'r7' 55
'c6' 40 'r3' 50

2) Descending order algorithm (cursors)

Sort both tables into descending order. Reading from the top of the
Classes table, match each class with the first room that will fit.

Classes Rooms
class_nbr class_size room_nbr room_size
===================== ===================
'c1' 80 'r4' 85
'c2' 70 'r1' 70
'c3' 65 'r6' 65
'c4' 55 'r7' 55
'c5' 50 'r3' 50
'c6' 40 'r2' 40
'r5' 30

Results
class_nbr class_size room_nbr room_size
=========================================
'c1' 80 'r4' 85
'c3' 65 'r1' 70
'c4' 55 'r6' 65
'c5' 50 'r7' 55
'c6' 40 'r3' 50

Notice that the answers are different! Dr. Codd has never given a
definition in relational algebra of the T-Join, so I proposed that we
need one. Informally, for each class, we want the smallest room which
will hold it, while maintaining the T-Join condition. Or for each room,
we want the largest class that will fill it, while maintaining the
T-Join condition. These can be two different things, so you must decide
which table is the driver. But either way, I am advocating a "best fit"
over Codd's "first fit" approach.

In effect, the Swedish and Croatian solutions given later in this
section use my definition instead of Dr. Codd's; the Colombian solution
is true to the algorithmic approach.

Other theta conditions can be used in place of the "less than" shown
here. If "less than or equal" is used, all the classes are assigned to
a room in this case, but not in all cases. This is left to the reader
as an exercise.

The first attempts in standard SQL are versions grouped by queries.
They can, however, produce some rows that would be left out of the
answers Dr. Codd was expecting. The first JOIN can be written as

SELECT class_nbr, class_size, MIN(room_size)
FROM Rooms, Classes
WHERE Classes.class_size < Rooms.room_size
GROUP BY class_nbr, class_size;

This will give a result table with the desired room sizes, but not the
room numbers. You cannot put the other columns in the SELECT list,
since it would conflict with the GROUP BY clause. But also note that
the classroom with 85 seats (r4) is used twice, once by class 'c1' and
then by class 'c2':

Result
class_nbr class_size MIN(room_size)
====================================
c1 80 85 <== room r4
c2 70 85 <== room r4
c3 65 70
c4 55 65
c5 50 55
c6 40 50

Your best bet after this is to use the query in an EXISTS clause.

SELECT *
FROM Rooms, Classes
WHERE EXISTS (SELECT class_nbr, class_size, MIN(room_size)
FROM Rooms, Classes
WHERE Classes.class_size < Rooms.room_size
GROUP BY class_nbr, class_size);

Alternately, you can save the results to a temporary table, then JOIN it
back to the Cartesian product of Rooms and Classes. The second T-Join
can be done by putting the columns for Rooms into the SELECT list of the
same query schema:

SELECT room_nbr, room_size, MAX(class_size)
FROM Rooms, Classes
WHERE Classes.class_size < Rooms.room_size
GROUP BY room_nbr, room_size;

This time, the results are the same as those Dr. Codd got with his
procedural algorithm:

Result
room_nbr room_size MAX(class_size)
======================================
'r4' 85 80
'r1' 70 65
'r6' 65 55
'r7' 55 50
'r3' 50 40

If you do a little arithmetic on the data, you find that we have 360
students and 395 seats, 6 classes and 7 rooms. This solution uses the
fewest rooms, but note that the 70 students in class 'c2' are left out
completely. Room 'r2' is left over, but it has only 40 seats.

As it works out, the best fit of rooms to classes is given by changing
the matching rule to "less than or equal". This will leave the smallest
room empty and pack the other rooms to capacity, thus:

SELECT class_nbr, class_size, MIN(room_size)
FROM Rooms, Classes
WHERE Classes.class_size <= Rooms.room_size
GROUP BY class_nbr, class_size;

The Croatian Solution

I published this same problem in an article in DBMS magazine in 1992 and
got an answer in QUEL from Miljenko Martinis of Croatia in our Letters
column (Miljenko 1992). He then translated it from QUEL into SQL with
two views, thus:

CREATE VIEW Classrooms -- all possible legal pairs
AS SELECT *
FROM Classes, Rooms
WHERE class_size < room_size;

CREATE VIEW Classrooms1 -- smallest room for the class
AS SELECT *
FROM Classrooms AS CR1
WHERE room_size = (SELECT MIN(room_size)
FROM Classrooms
WHERE class_nbr = CR1.class_nbr);

We find the answer with the simple query

SELECT class_nbr, class_size, room_size, room_nbr
FROM Classrooms1 AS CR1
WHERE class_size = (SELECT MAX(class_size)
FROM Classrooms1
WHERE room_nbr = CR1.room_nbr);

This is a pure SQL-89 solution.

class_nbr class_size room_size room_nbr
==========================================
'c6' 40 50 'r3'
'c5' 50 55 'r7'
'c4' 55 65 'r6'
'c3' 65 70 'r1'
'c1' 80 85 'r4'

The Swedish Solution

I got another solution from Anders Karlsson of Mr. K Software AB in
Stockholm, Sweden. Here is a version of that query:

SELECT C1.class_nbr, C1.class_size, R1.room_size, R1.room_nbr
FROM Classes AS C1, Rooms AS R1
WHERE C1.class_size = (SELECT MAX(C2.class_size)
FROM Classes AS C2
WHERE R1.room_size > C2.class_size)
AND NOT EXISTS (SELECT *
FROM Rooms AS R2
WHERE R2.room_size > C1.class_size
AND R2.room_size < R1.room_size);

The first predicate says we have the largest class that will go into
this room. The second predicate says there is no other room which would
fit this class better (i.e. smaller than the candidate room and still
larger than the class at which we are looking).

The Colombian Solution

Francisco Moreno of the Department of Systems Engineering, at the
University of Antioquia in Colombia came up with another approach and
data to demonstrate the problems in the T-Join.

Clean out the existing tables and insert this data:

DELETE FROM Classes;
INSERT INTO Classes
VALUES ('c1', 106),
('c2', 105),
('c3', 104),
('c4', 100),
('c5', 99),
('c6', 90),
('c7', 89),
('c8', 88),
('c9', 83),
('c10', 82),
('c11', 81),
('c12', 65),
('c13', 50),
('c14', 49),
('c15', 30),
('c16', 29),
('c17', 28),
('c18', 20),
('c19', 19);

DELETE FROM Rooms;
INSERT INTO Rooms
VALUES ('r1', 102),
('r2', 101),
('r3', 95),
('r4', 94),
('r5', 85),
('r6', 70),
('r7', 55),
('r8', 54),
('r9', 35),
('r10', 34),
('r11', 25),
('r12', 18);

Using Codd's T-Join algorithm for descending lists, you would have this
mapping:

'c1' 106
'c2' 105
'c3' 104
'c4' 100 <---> 'r1' 102
'c5' 99 <---> 'r2' 101
'c6' 90 <---> 'r3' 95
'c7' 89 <---> 'r4' 94
'c8' 88
'c9' 83 <---> 'r5' 85
'c10' 82
'c11' 81
'c12' 65 <---> 'r6' 70
'c13' 50 <---> 'r7' 55
'c14' 49 <---> 'r8' 54
'c15' 30 <---> 'r9' 35
'c16' 29 <---> 'r10' 34
'c17' 28
'c18' 20 <---> 'r11' 25
'c19' 19
'r12' 18

There 1317 students in classes and 768 seats for them. You can see by
inspection that some classes are too large for any room we have. If you
started in ascending order, class 'c19' pairs with room 'r11' and you
get another result set.

This algorithm is not a best fit answer, but a first fit answer. This
is an important difference. To explain further, The first fit to class
'c4' is room 'r1', which has 102 seats; the best fit is room 'r2' which
has 101 seats. The algorithm would give us this result table:

Results
class_nbr class_size room_size room_nbr
==========================================
'c4' 100 102 'r1'
'c5' 99 101 'r2'
'c6' 90 95 'r3'
'c7' 89 94 'r4'
'c9' 83 85 'r5'
'c12' 65 70 'r6'
'c13' 50 55 'r7'
'c14' 49 54 'r8'
'c15' 30 35 'r9'
'c16' 29 34 'r10'
'c18' 20 25 'r11'

704 student served.

If you use Swedish or Croatian solution on this data, the answer is:

Swedish Result
class_nbr class_size room_size room_nbr
============================================
'c4' 100 101 'r2'
'c6' 90 94 'r4'
'c9' 83 85 'r5'
'c12' 65 70 'r6'
'c13' 50 54 'r8'
'c15' 30 34 'r10'
'c18' 20 25 'r11'

438 students served.

At this point you have a result which is not complete, but has the
tightest mapping of each class into a room. There is another problem
which was not mentioned. We have not had two classes or two rooms of
the same size in the data. This will cause some other problems.

Instead of trying to use a single static SQL query, we can use SQL to
generate SQL code, then execute it dynamically. This solution is right
but, of course, is horrible from a performance viewpoint.

-- build a table of possible T-Join pairings
DROP TABLE T-Join;
CREATE TABLE T-Join AS
SELECT *
FROM Classes, Rooms
WHERE room_size > class_size;

-- create a temporary working table
DROP TABLE Ins;
CREATE TABLE Ins
(class_nbr CHAR(3) NOT NULL,
class_size INTEGER NOT NULL,
room_nbr CHAR(3) NOT NULL,
room_size INTEGER NOT NULL);

-- create a table with the insertion code for each row
SELECT
'INSERT INTO Ins
SELECT class_nbr, class_size, room_nbr, room_size
FROM T-Join AS T1
WHERE room_size
= (SELECT MAX(room_size)
FROM T-Join
WHERE room_size NOT IN (SELECT room_size FROM Ins))
AND class_size
= (SELECT MAX(class_size)
FROM T-Join AS T2
WHERE class_size NOT IN (SELECT class_size FROM Ins)
AND T2.class_size < T1.room_size);'
FROM Rooms
WHERE room_size > (SELECT MIN(class_size) FROM 'c);
COMMIT;

Now use "SELECT * FROM Ins;" query in a host program with dynamic SQL
and execute each statement in the temporary table in order. This will
give us the first answer at the start of section 17.5.3, and it also
works for the original data.

Moreno's second solution, which handles duplicates, is more complex and
I will not give it here. It uses the keys of the tables to make rows
with duplicate values unique.

--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!