Affinity Analyzer

A tool for identifying skus that are ordered together

If your customers generally order an oil filter gasket when they order an oil filter, then you may be able to reduce travel in your warehouse by storing the filter and the gasket close to each other. This program identifies such opportunities.

You can search for such opportunities by preparing an order history that combines the shopping lists of recent customers and loading it into this program, which will read the orders, search for patterns, and report highly correlated skus. In addition, it will identify skus that tend to complete orders.

Download the program

Please read the license and disclaimers, then click here to download the program. It will appear as a jar file, which most systems will run if you double-click on it.

If the program does not run, make sure you have the latest version of Java installed and your security settings allow execution of Java programs.

How to use the program

  1. Prepare an order history that lists all pick-lines, sorted by customer order. This should be a text file in csv format, with the order ID in the first field and the SKU in the second. (Here is an example.
  2. Start the Affinity Analyzer program and open the the order history file. The program will parse the file and analyze the patterns of customer orders.
  3. Examine the statistics to find SKUs that have been ordered together.


You can find more information and tools like this in our textbook and associated web pages.


Why does the program not look for groups of 3 or 4 or more skus that are frequently ordered together?

This is impractical and unnecessary. It is impractical because the time to process the sales history and the space to store the results both increase exponentially in the size of the affinity groups. It is unnecessary because, if a group of, say, 4 skus are frequently ordered together, this will be recognized by the current pairs analysis, which will report 6 pairs frequently ordered together.

Why do I get an out-of-memory error?

If your order history contains fifty thousand SKUs then the program must tabulate statistics on about (50 000)(50 000) = 2 500 000 000 different pairs of SKUs, which can overflow memory. But the program can take advantage of as much memory as you have available so quit other applications or move to a machine with more memory. Addendum: Some versions of MS Windows apparently allocate no more than 1GB of memory to any Java process. This does not seem to be the case for Mac or Linux.

Why is the program taking so long to run?

Your computer does not have enough memory — see the previous question — and is continually swapping the contents of memory to disk (also known as “thrashing”). You must add more memory to your computer or use another computer that has more memory or truncate the data file (for example, use only the first 10 000 lines). For comparison, a laptop with 3GB RAM takes less than 1 minute to process 200 000 lines of sales history describing 10 000 SKUs.