This is material to supplement Warehouse & Distribution Science, a textbook and a graduate course taught at the Georgia Institute of Technology by John J. BARTHOLDI, III and Steven T. HACKMAN. Everyone is welcome to use the book and materials here for educational purposes, so long as the copyrights remain intact. You will find more information and technical details on these topics within the book.

Warehouse Activity Profiling

Datamining the distribution center

We will build a database using the following datafiles and query it to answer questions about the demands on the distribution center. Each of the following datafiles is a plain ASCII file with tab-delimited fields and with field names in the first row. To make this as easy as possible, the data has been purged of serious flaws. (Warning: Some flaws remain. For example, some skus are listed as having 0 units (pieces) per carton (case) and cartons are listed as having physical dimensions of 0. For such skus we may interpret this to mean that the unit (piece) is the only unit of measure that is handled by the warehouse.)

sales.txt
This file contains the concatenated shopping lists of all customers over a given time period. This has been taken from the financial system and may be expected to be accurate. Note: This file is unrealistically small, containing only about 10,000 pick lines. Here is a more realistically sized version (about 2.25 million pick lines); but it can be very time-consuming to manipulate.
skus.txt
This describes about 21,000 skus. It is taken from the Item Master, a database that may be populated by data from the vendors or from the purchasing department or from the warehouse operations staff. Data of this sort is likely to contain some inaccuracies, especially in dimensions of the product. Beware especially of physical dimensions with value 0.
locations.txt
This table lists the addresses of all 20,000+ storage locations in the warehouse together with which sku is believed to be stored there. It is taken from the database that manages storage space, which is part of the warehouse management system. Note that it may contain "dummy locations" that do not correspond to actual locations in the warehouse. You will need a map of the warehouse to translate these addresses to actual physical locations.

Set up the database

If you already have a preferred database program, import the text files above into your database and continue with the next section. Otherwise, here are instructions for setting up hsqldb, which is fast, small, sufficient to the purpose, open source, runs on the Java platform, and includes a (rudimentary) user interface. Other open-source databases, which are generally more than we need, but still quite good, include MySQL, Derby (formerly known as Cloudscape), and Firebird.

Using SQL to extract information

There are many excellent on-line SQL references available; here is one and another. In addition, see the hsql documentation, which describes how hsqldb realizes the SQL standard(s).

For our purposes the most important part of SQL is the SELECT statement.

There are many ways to transmit SQL commands to the database. Most databases include some type of query input facility. For example, MS Access has a QueryBuilder tool; and hsqldb comes with SqlTool. SQuirreL is a nicer and more powerful input facility that is also open source and can connect to most common database engines and so can be worth learning for future use. SQuirreL also comes with a SQL syntax verifier that can help you avoid SQL errors. You can also copy results directly from SQuirreL to paste into reporting tools such as spreadsheets.

Most front ends will also allow you to submit a file of SQL commands to be executed in batch, once you have debugged them.

Questions

Use the sales history, description of skus and their locations to answer the following. (SQL queries are provided for some questions to help you get started. Sometimes the query can be rewritten to run faster: You are urged to experiment.)

  1. What are the largest and the smallest dimensions among all skus?
    --
    -- This is a comment. Be sure to document each query
    --
    SELECT Max( UnitLength ) FROM Skus;

    Check your answer: 105.5

    Write similar queries for the remaining dimensions. Or you can combine multiple queries into a single one.

  2. How many times was each sku was requested? What sku was most popular? Least popular?
    --
    -- Sum all requests for each sku and list skus from most popular to least popular
    --
    SELECT Sales.SkuID, Count(Sales.SkuID) AS TotalPicks
    INTO NumberRequests
    FROM Sales
    GROUP BY Sales.SkuID
    ORDER BY TotalPicks DESC;

    Check your answer: skus PAP747-01 and SMD68670 were requested 27 times apiece.

  3. How many orders arrived per day? Per hour?
    --
    -- Number of orders per day, in calendar sequence
    --
    SELECT Date, COUNT(Sales.OrderNumber) AS Orders
    INTO OrdersPerDate
    FROM Sales
    GROUP BY Date
    ORDER BY Date;
  4. How many picks came out of each zone? Out of each aisle? Out of each bay? Which zone/aisle/bay has the most picks?
    --
    -- Number of picks per zone listed from most busy to least
    --
    SELECT Locations.Zone, COUNT(Sales.SkuID) AS Picks
    FROM Locations INNER JOIN Sales
    ON Locations.SkuID = Sales.SkuID
    GROUP BY Locations.Zone
    ORDER BY Picks DESC;

    Check your answer: 1,510 picks were from zone A.

  5. What were the total quantities, measured in selling units, requested of each sku?
    --
    -- Sum quantities ordered of each sku
    --
    SELECT SkuID, SUM(OrderQuantity) AS TotalQuantity
    INTO Quantities
    FROM Sales
    GROUP BY SkuID;
  6. What volume, measured in cubic feet of product, was ordered from each vendor? What volume was shipped to each customer? What volume of product was picked from each zone? From each aisle? From each bay? Which zone/aisle/bay produces the most volume?

    First compute the volume of each sku and then look this up to avoid recomputing for each query. (Since dimensions were given in linear inches, we must divide by 12³ to convert to cubic feet.)

    --
    -- Compute volume ordered of each sku and save into a new table
    -- for future reference.
    --
    SELECT Skus.SkuID, Skus.VendorID, (Skus.UnitWidth * Skus.UnitHeight * Skus.UnitLength * OrderQuantity / 1728.0) AS CubicFeet
    INTO Volumes
    FROM Skus INNER
    JOIN Sales ON Skus.SkuID = Sales.SkuID;

    Now submit the query:

    --
    -- Volumes shipped by vendor (Note that we must JOIN the two
    -- tables Volumes and Skus to put the information together.)
    --
    SELECT Skus.VendorID, SUM(Volumes.CubicFeet) AS VendorVolume
    FROM Volumes INNER JOIN Skus ON Skus.SkuID = Volumes.SkuID
    GROUP BY Skus.VendorID
    ORDER BY VendorVolume DESC;

    Check your answer: Customer 552321 ordered the greatest volume.

    --
    -- To get the volume from each zone we must JOIN
    -- three tables.
    --
    SELECT Locations.Zone, SUM(Volumes.CubicFeet) AS ZoneVolume
    FROM (Locations INNER JOIN Volumes ON Locations.SkuID = Volumes.SkuID
    GROUP BY Locations.Zone
    ORDER by ZoneVolume;

    Check your answer: The total volume picked from zone B was 1,022 cubic feet.

The following questions are harder and may require direct processing of the data files via some general programming language (Java recommended).

  1. Construct the lines-per-order distribution: How many orders consist of a single line? How many of 2 lines? etc.
  2. Construct the picks-per-order distribution: How many orders consist of a single pick? How many of 2 picks? etc.
  3. Construct the zones-per-order distribution: How many orders are included within a single zone? How many within 2 zones? etc.
  4. Compute sku “affinities”; that is, the number of times that each pair of skus appeared in the same order.

Thanks to Aloysius CHEN Yan Kai for help in preparing the queries on this page.