If your customers generally order an oil filter gasket when they order an oil filter, then you may be able to reduce travel in your warehouse by storing the filter and the gasket close to each other. This program identifies such opportunities.
You can search for such opportunities by preparing an order history that combines the shopping lists of recent customers and loading it into this program, which will read the orders, search for patterns, and report highly correlated skus. In addition, it will identify skus that tend to complete orders.
Please read the license and disclaimer, then click here to download the program. It will appear as a jar
file, which most systems will run if you double-click on it. Alternatively, open a terminal and enter
java -jar WhAffinity.jar
If the program does not run, make sure you have the latest version of Java installed and your security settings allow execution of Java programs.
csv
format, with the order ID in the first field and the SKU in the second. (Here is an example).You can find more information and tools like this in our textbook and associated web pages.
This is impractical and unnecessary. It is impractical because the time to process the sales history and the space to store the results both increase exponentially in the size of the affinity groups. It is unnecessary because, if a group of, say, 4 skus are frequently ordered together, this will be recognized by the current pairs analysis, which will report 6 pairs frequently ordered together.
If your order history contains fifty thousand SKUs then the program must tabulate statistics on about (50 000)(50 000) = 2 500 000 000 different pairs of SKUs, which can overflow memory. But the program can take advantage of as much memory as you have available so quit other applications or move to a machine with more memory. Addendum: Some versions of MS Windows apparently allocate no more than 1GB of memory to any Java process. This does not seem to be the case for Mac or Linux.
Your computer does not have enough memory — see the previous question — and is continually swapping the contents of memory to disk (also known as “thrashing”). You must add more memory to your computer or use another computer that has more memory or truncate the data file (for example, use only the first 10 000 lines). For comparison, a laptop with 3GB RAM takes less than 1 minute to process 200 000 lines of sales history describing 10 000 SKUs.
Here is a stripped-down, command-line version of the program. It reports only the popularity of SKU pairs. Furthermore, it makes a simplifying assumption that may not apply to your data: It assumes that the popularity of SKU pairs does not vary appreciably over the data, so that a SKU pair that is popular at the start of the data will also tend to be popular toward the end of the data; and similarly for unpopularity.
You must have Java 8 installed on your computer. Invoke the program by entering the following in a terminal window.
java -jar WhAffinityCmdLn.jar order-history.csv m n k
The program will then read the order history and tabulate appearances of every pair of SKUs. It will pause after every m orders and purge from memory all but the n SKU pairs most popular to this point. Large m or n consume more memory but better protect against missing seasonal popularity.
At the end, the program will write to stdout
(print to the screen) the k most popular pairs of SKUs. You will probably want to capture the output by re-directing it: Just append
> output-file.csv
to the line invoking the program.
For comparison purposes, on a typical laptop the program processed 13 million lines of sales history in about five minutes, with m = 100,000 and n = 100,000.