A Fast and Efficient Solution to the Backpack Problem

Written by Tels

Abstract
In a backpack problem, you have different items and a limit. The items have a certain cost (e.g. size) and a certain value (e.g. resale value) associated with them. The limit is a cost-limit (e.g. size limit). The goal is to find that combination of items that is below the given cost-limit but gives the best value.

In this article you will find some solutions which will settle this problem once and for all.

Preface
Someone suggested Dynamic Programming to solve this problem (Sedgewick). But this is not only slow, it can't handle floating point values. Someone suggested Branch & Bound to solve this (Domschke?). Well, it works, but it's surely an overkill for such a simple problem. :-)

However, when you limit the count of each item that is available, the problem very fast becomes a very hard one to solve. For instance, limiting the item count to 1 for each item makes the problem one of the hardest problems, and it is believed that there is no general solution to this problem.

Code examples are C code ripped from the example program. However, they are marked as pseudo code, since they don't stand alone.

Abbreviations used in the article are:
 * S: Size of the item (cost)
 * V: Value of the item
 * Q: Quality of the item (Q = S / V )
 * M: Size of backpack (cost limit)
 * Vc: Current value of items in backpack
 * Mc: Space left in backpack at the moment
 * Vb: Best value (solution) we ever found
 * Mb: How much space did the best solution take? (Mb < M)

General Considerations
When we search for a solution, we should first try to write code that gives us every solution. We stuff our items in a single-linked list and pass a pointer to that list to our fill function. Here is a code snippet Figure 1: Code snippet.

This code will enumerate all possible backpack combinations. It is, however, very slow since it does a tremendous amount of work.

The Quality
Every item has a certain size and a certain value. This means that every item gives us a certain value per one size units it needs. We will call this the Quality Q, whereas: Q = S / V It is pretty obvious that only items with a high quality can fill our backpack for the best solution. (Which thief would take glass if he could get diamonds?) So we try to fill in the best items first.

To accomplish this, we have to sort our list of items before we start to fill the backpack. This can be done with one of the best sorting algorithms available. For demonstration purposes, we just add items to a single linked list in sorted order.

This sorting will change the order the backpack solutions that are produced by the routine above. The best backpacks are now emitted first.

Finding the optimal solution fast
We will find the optimal solution with this approach, since we find all solutions. But we don't need other solutions, so we add some abort conditions to the recurse algorithm above. The first thing we want to change is the line: if (Mc == 0) to the line: if (Mc == dSmallestItemSize) So our algorithm doesn't have to test dozens of combinations when even a single single item won't fit in the backpack. The value of dSmallestItem can be calculated by a simple loop over all items in O(N) time.

The next, and crucial, condition is the maximum possible backpack we can find at any time. Whenever we find a solution, we check to see if it is better than any solution so far, and if yes, remember it. On entry to the fillbackpack routine we test what value our backpack would get if we would fill it with the current item up to the brim. Since our items are sorted by quality, the current item can't possibly fill the space left better than the item before. Even leaving out the current item, and using one of the items after the current can't do this, since the quality drops down the list. The maximum possible value of the backpack is then compared to the already best solution found so far. If it is lower or equal, we can stop, we won't get a better solution in this branch.

Since we find the best solutions at the very start, this cuts down the work very effectively.

Here are some benchmark results: Figure 2: Benchmark 1 of example 1, best only.

As you can see, with only 10 calls to fillBackPack we found the optimal solution. Here is the output if we try all solutions: Figure 3: Benchmark 2 of example 1 - all solutions.

Since both versions were completed in a fraction of a second, let's try a little bit more. We add 1000 randomly generated items to our list and set the size of the backpack to 20000. The output will be: Figure 4: Benchmark 1 of example 2 - best solution with bad items.

As you notice, there is a big drop in quality from the first item to the second. When we use more elements, it gets even bigger. This is due to the fact that we evenly distributed the size and the value of the generated items, but not the quality. When the drop in quality is that much, the algorithm is very fast! We can fix this by distributing the size and quality, and calculating the value from these both characteristics. This will produce: Figure 5: Benchmark 2 of example 2 - best solution with better items.

The time taken was only 1 second in both cases, though. Although the second example shows only very few calls to fillBackPack, it had more work to do since it tested more backpacks. The first example had to struggle with the very small size of the best element compared to the backback size.

When trying to enumerate all solutions, we fail: the algorithm will run several hours (days?) on a fast Pentium.

Finding the solution even faster
One problem with this algorithm is the stopping when we have found a backpack which we can't possible improve with items further down our list. In this case we stop, but the fillbackpack routine a step above (that which called us) will reduce its lTry variable and then try again. But since the situation can get only worse (if the item before is better than the current) or at best remains unchanged (if previous item is equal to current item), there is not much sense in trying this. When we pack only one or two items in the backpack, this is not critical (even 2000000 million calls to fillbackpack are very fast!). But when we have already a couple different items in our backpack, we try millions of combinations and do a lot of unnecessary calls. We can fix this by returning two for that special abort condition and test for two after return from fillBackPack.

The next problem with the algorithm is the sorting. For backpacks that are only filled with one item, it is not necessary to sort the entire list of items. We can just pick the best item, fill the backpack and be done. But how will we know when to do this? When there are more than two different items in the best solution, picking the best item takes for every item N/2 steps to find it, and this recurse! A quick test if we are done with only the best item, and if it fails we do the entire sorting and filling. If we have millions of items in a unsorted list, this will save us the sorting time. But for item counts < 10000 it will yield almost no performance difference on a standard PC.

Note: The first optimization works only if the items are sorted by quality and size. Items with an equal quality have to be sorted by size, too. The smallest items have to be preferred over the larger items. It is easy to adapt the sorting algorithm to do that, but you have to change the findBestNextItem algorithm to do the same! This is because larger items will leave a larger space in the backpack than smaller ones, since we can't split up items. Figure 6: Benchmark 3 of example 2 - with early out termination.

As you can see, the early out technique reduces the calls to fillBackPack dramatically!

Finding not THE best solution, but NEARLY the best
Of course it is not necessary in every case to find the exact best solution. We can, for instance, limit the work our program does, when we stop it as soon as it has reached a certain backpack size. See Mmax in the source code for how to do this.

Alterations
One useful alteration is to have a limit of how often every item can be used. For instance, when you fill an transporter, you can only use items that are currently at hand or in your warehouse. It is easy to implement, just limit the lTry variable to how much items you have in reality. This will slow down the algorithm when there are fewer items than could be placed into the backpack.

Another useful alteration could be the removal of the recursion. It is not a problem to compile an OS/2 program with a stack size of one MByte, but for large numbers of different items even this can't be enough. When we remove the recursion, we can emulate the stack on the heap, thus using a dynamically allocated memory. It is possible to construct item combinations, so that every different item is needed for the best solution!

To remove the recursion, we should use the used variable of every item to store lTry in it directly. When lTry is thus no longer needed we can simple convert the function to an non-recursive one.

The following code will construct such a constellation: Here is the output of the program:

Conclusion
You can download a sample program developed with Visual Age C++ for OS/2 here. The program should compile with few problems on other operating systems or compilers. If not, let us know. Please note that it is difficult to get the algorithm to spend some time, it is fast as hell even for LARGE test item counts ;-)

Hope you had fun!