We will build a database using the following datafiles and query it to answer questions about the demands on the distribution center. Each of the following datafiles is a plain ASCII file with comma-delimited fields and with field names in the first row.
This file contains the concatenated shopping lists of all customers over a given time period. This has been taken from the financial system and may be expected to be accurate. Note: This file is unrealistically small, containing only about 10,000 pick lines. Here is a more realistically sized version (about 2.25 million pick lines); but it can be very time-consuming to manipulate.
This describes about 21,000 skus. It is taken from the Item Master, a database that may be populated by data from the vendors or from the purchasing department or from the warehouse operations staff. Data of this sort is likely to contain some inaccuracies, especially in dimensions of the product. Beware especially of physical dimensions with value 0.
This table lists the addresses of all 20,000+ storage locations in the warehouse together with which sku is believed to be stored there. It is taken from the database that manages storage space, which is part of the warehouse management system. Note that it may contain “dummy locations” that do not correspond to actual locations in the warehouse. You will need a map of the warehouse to translate these addresses to actual physical locations.
If you already have a preferred database program, import the text files above into your database and continue with the next section. Otherwise, here are some open source databases that are sufficient to the purpose: MySQL,PostgreSQL, and H2. Each includes a rudimentary user interface. There are several free GUIs that connect to these databases and also paid versions. Another possibility is to install MAMP (Mac) or XAMPP (Mac/Windows/Linux) and use phpMyAdmin to interface with the MySQL database.
If you have never done this before it can be tedious and even stressful. But there is lots of on-line documentation, or you can ask a more-experienced classmate for help, or ask the TA.
There are many excellent on-line SQL references available; here is one.
For our purposes the most important part of SQL is the SELECT
statement.
To view all the data in the Skus table, type the following into the command panel:
SELECT * FROM Skus;
. Then, click “Execute” to execute the query. (Note that a
SQL command is terminated by a semicolon.)
To view only a few columns from the Skus table, say SkuID
and UnitWeight
, type the
following into the command panel: SELECT SkuID, UnitWeight FROM Skus;
If you would like the database to save its work for future
reference, use the CREATE TABLE...AS...WITH DATA
construct, as in the following.
CREATE TABLE Weights AS (SELECT SkuID, UnitWeight FROM Skus) WITH DATA;
This will save the results of the query as a table named
Weights, which will become part of the database (until you
choose to DROP
it). To examine this table, execute the query SELECT * FROM Weights;
WHERE
specifies the search
conditions. For example, to select all the rows corresponding
to heavy skus, execute the query SELECT * FROM Skus WHERE UnitWeight > 25;
.If you want only the fields corresponding to sku ID and
unit weight then execute the query
SELECT SkuID, UnitWeight FROM Skus WHERE UnitWeight > 25;
.
SQL also allows allows use of =
(equal), <>
(not equal), <
(less
than), <=
(less than or equal to), and >=
(greater than or equal to).
LIKE
command can be used. For instance, to
find all skus with ID beginning with “AAGD”, execute the query
SELECT * FROM Skus WHERE SkuID LIKE 'AAGD%';
.The %
character is used as a wildcard. Note
that strings of text must be surrounded by single quotes,
whereas no quotes are used with numbers.
There are many ways to transmit SQL commands to the database. Most databases include some type of query input facility. For example, MS Access has a QueryBuilder tool. SQuirrel is a useful input facility that is also open source and can connect to most common database engines and so can be worth learning for future use. SQuirreL also comes with a SQL syntax verifier that can help you avoid SQL errors. You can also copy results directly from SQuirreL to paste into reporting tools such as spreadsheets.
Most front ends will also allow you to submit a file of SQL commands to be executed in batch, once you have debugged them.
Use the sales history, description of skus and their locations to answer the following. (SQL queries are provided for some questions to help you get started. Sometimes the query can be rewritten to run faster: You are urged to experiment.)
Question 1. What are the largest and the smallest dimensions among all skus?
-- -- This is a comment. Be sure to document each query -- SELECT Max( UnitLength ) FROM Skus;
Write similar queries for the remaining dimensions. Or you can combine multiple queries into a single one.
Question 2. How many times was each sku was requested?
-- -- Sum all requests for each sku and list skus from most popular to least popular -- CREATE TABLE Popularity AS (SELECT Sales.SkuID, Count(Sales.SkuID) AS TotalPicks FROM Sales GROUP BY Sales.SkuID ORDER BY TotalPicks DESC) WITH DATA;
Question 3. How many orders arrived per day?
-- -- Number of orders per day, in calendar sequence -- CREATE TABLE OrdersPerDate AS (SELECT Date, COUNT(Sales.OrderNumber) AS Orders FROM Sales GROUP BY Date ORDER BY Date) WITH DATA;
Question 4. How many picks came out of each zone?
-- -- Number of picks per zone listed from most busy to least -- SELECT Locations.Zone, COUNT(Sales.SkuID) AS Picks FROM Locations INNER JOIN Sales ON Locations.SkuID = Sales.SkuID GROUP BY Locations.Zone ORDER BY Picks DESC;
Question 5. What were the total quantities, measured in selling units,
requested of each sku?
-- -- Sum quantities ordered of each sku -- CREATE TABLE Quantities AS (SELECT SkuID, SUM(OrderQuantity) AS TotalQuantity FROM Sales GROUP BY SkuID) WITH DATA;
Question 6. Compute the “flow” — that is, the physical volume sold, measured in
cubic feet — for each product.
-- -- Compute volume ordered of each sku and save into a new table named "Flow" -- CREATE TABLE Flow AS (SELECT Skus.SkuID, (Skus.UnitWidth * Skus.UnitHeight * Skus.UnitLength * OrderQuantity / 1728.0) AS CubicFeet FROM Skus INNER JOIN Sales ON Skus.SkuID = Sales.SkuID) WITH DATA;
The following question(s) are harder and may require direct processing of the data files via some general programming language (Python, Ruby, Java, etc.).
Question 7. Construct the pick-lines-per-order distribution: How many orders consist of a single line and what percentage of all orders do they represent? How many of 2 lines? etc.
Thanks to Aloysius CHEN Yan Kai for help in preparing the queries on this page and to Andrew Kovacs for catching some typos, and Pisit Jarumaneeroj for fixing my SQL errors.