The Naked PC Newsletter

Your good neighbor
who's also a computer
consultant!


TNPCers Say:
I have been subscribed to your newsletter for several months now and love it... Thanks for your excellent work, I learn a lot. -- David B.
117,977+ current readers

Type your email address and click Subscribe!
Subscribe to Our Newsletter
Name: 
E-mail:

Swiss-Tech Key Ring Products

You'll find a jillion uses for these super-cool portable tool kits that fit right on your key chain. Whether it's fixing your eyeglasses, pulling splinters, tightening up the loose screws you run into everyday... Open computer cases with ease, snip wires, all the jobs a small set of pliers would make easy work of, you've got to check out Swiss-Tech tools!


Get Jim and Lee's Book!
T.J. Lee and Lee Hudspeth's Absolute Beginner's Guide to PC Upgrades
Now available at Amazon!



Contact TNPC



Home What is TNPC?
Meet the crew... The TNPC Store TNPC Articles
Send comments Members Only Prior Issues

From The Naked PC issue #5.09...Lee Hudspeth

Mining for Data Gold Using Microsoft Excel

by Lee Hudspeth
April 25, 2002

When some folks think of "data mining" they imagine a process involving massive databases and very expensive special-purpose software running on costly mainframes or even supercomputers. That's one part of the story, from the point of view of a company like an airline with literally hundreds of millions--if not billions--of reservation and passenger records and their field values stored over time that may reveal some profitable patterns if sifted properly. In this article I'll share with you some Excel data analysis basics that you can apply today to your own data, on your own PC. I guarantee you'll learn something new and beneficial about your business. The features I cover include list management, AutoFilter, Advanced Filter, statistics tray, subtotals, PivotTables, and more.

In a recent consulting assignment I performed data mining on a scale of 10,000 records in a flat-file database spanning a decade, and my toolkit was comprised of one powerful and relatively ubiquitous tool: Microsoft Excel.

The project's goal was to examine the data for patterns that might yield more information when studied in greater detail.

In Excel, the term "list" and "database" are synonymous. Excel's help file defines a list like so, "A series of worksheet rows that contain related data, such as an invoice database or a set of client names and phone numbers. The first row of the list has labels for the columns." (Note: To fit this article within the designated space, I'll refer to some external resources, one of which is an Excel book Jim and I co-wrote. If you want to take a test-drive with these features you need a robust database; I provide the complete steps for doing this on a supplemental Web page; links appear at the article's end.) To read what Microsoft has to say about creating a list, study the help topic "Guidelines for creating a list on a worksheet". Assuming you've already got a list set up, click any cell inside the list then turn on the AutoFilter feature: Data, Filter, AutoFilter.

Paraphrasing from our book, AutoFilter converts each column's header into a searchable drop-down control. The control contains several pre-determined choices: All, Top 10, Custom, Blanks and NonBlanks (the latter two appear only if there's at least one blank cell in the column), followed by a sorted list of all the unique values in the column. This is an amazingly powerful feature, yet so easy to use; I highly recommend you play around with it. It only takes a few mouse clicks to discover something new about your data; once you starting using AutoFilter you'll rarely turn it off. I only have room in this article to describe the overall capabilities of AutoFilter; as you dig deeper into this feature you'll discover the ability to perform simple queries from a "build your own query" dialog box or complex queries involving multiple fields that you build using formulas (this is Excel's Advanced Filter feature).

Sometimes you just need to do a quick, fundamental statistical analysis on a range of cells, like a count or a sum or an average. This is literally a one-click operation using Excel's cool but little-known statistics tray. Select a range of cells (at least two cells), say, the values 2 and 3 in A1:A2. Now right-click anywhere on Excel's status bar (the horizontal bar at the bottom of the parent window that displays "Ready" at its left edge), and notice the pop-up menu: None, Average, Count, Count Nums, Max, Min, Sum. Select Average and notice that the status bar now displays the text "Average=2.5" (about two-thirds of the way across the status bar). Play around with changing the statistic of interest. From now on, whenever you select *any* range of cells, the statistic is updated dynamically.

With one mouse click, Excel's subtotals feature can automatically generate single-level and multi-level subtotals. You can choose which fields to subtotal at each level change, you can have subtotals appear above or below the associated data, you can page break between groups, and you can choose between a variety of subtotal functions: Sum, Count, Average, Max, Min, Product, and more. The most important thing about subtotals is to first sort your data by the field(s) you are interested in; each field of interest is a "level." In the Orders table example (see supplemental page), to see Freight subtotals by customer you first sort the database by customer (CustomerID), then you select Data, Subtotals. Excel uses its powerful outlining feature to display these automatic subtotals, and you can put this outline view to good use. Continuing with the freight example, say you want to see only the subtotals and not the underlying detail data, click the outlining level 2 button and only the customer- level subtotals are visible. Click the outlining level 1 button and you'll see just the database-wide grand total. To see the details again click the outlining level 3 button. Removing subtotals is easy: Data, Subtotals, Remove All.

PivotTables are the cream of the crop in Excel's data analysis feature set. In our book we describe them this way, "Think of a PivotTable as an interactive summary of your data. You can quickly change the functions you use for this summarization (from counts to sums to averages, and so on), change the fields across which the data is tabulated, flip through groupings and sub- groupings (pages), chart these groupings, refresh the source data, drill down or up to see more or less detail, change formatting, and more." If you've been following along with the Orders table example, what if you wanted to see a table of freight charges by destination country by employee? (Counting from the point where subtotals are removed, you need a mere 11 mouse clicks to create this table.) Here are the steps: remove all subtotals, click anywhere inside the list, select Data, PivotTable and PivotChart Report, click Next, click Next again, click Layout, drag the ShipCountry field into the Row area, drag the Employee field into the Column area, drag the Freight field into the Data area, click OK, and click Finish. And remember, a PivotTable can be updated to reflect changes in underlying data by clicking the Refresh button, or you can set it to automatically refresh each time the workbook is opened. Many more PivotTable settings are available: click the PivotTable button on the PivotTable toolbar, then select Table Options. PivotTables can give you valuable insights into your data. I highly recommend that you spend some time with them.

For those of you interested in learning more about Excel's list management, PivotTable, and related features, what Jim and I wrote about these topics regarding Excel in our book "Excel 97 Annoyances" still applies today. Although the book is out of print, you can easily acquire copies through Amazon Marketplace:
http://www.TheNakedPC.com/t/509/tr.cgi?lee1

I just tested the 12 steps we wrote that get you set up with a work-along database based on Access' Northwind sample database, and everything is the same in Excel XP and 2000 and 97. Here is my supplemental page for this article, including the complete steps for creating a robust practice list:
http://www.TheNakedPC.com/t/509/tr.cgi?lee2

(I'll return to my series on Excel XP's new features next time.)

You can reach Lee Hudspeth at:
mailto:LeeHudspeth@TheNakedPC.com

Why not subscribe to TNPC Newsletter Now?
You'll be glad you did.
Your Name: 
Your E-mail Address:
Copyright © 2002, PRIME Consulting Group, Inc. and Dan Butler.
All Rights Reserved.
The Naked PC is a trademark of PRIME Consulting Group, Inc.
ISSN: 1522-4422

You may reprint an article from TNPC as long as you show the
entire article and include the authors byline, excerpt and
subscription information as shown:

article_title
by author_name
(This article originally appeared in The Naked PC
newsletter; subscribe at http://www.TheNakedPC.com)

Return to Top


Advertise in TNPC Disclosure JOIN the Horde!
Letters to Editor Privacy policy Search TNPC
TNPC Library
TNPC Forum
Subscriber Services

Why not subscribe to TNPC Newsletter Now?
You'll be glad you did.
Your Name: 
Your E-mail Address:

TNPC Hot Tips:
  • Email out of control? Spam filling your inbox? People trying to steal your identity? Same here - until I applied these tips. You can too in a new multimedia e-book. Tame Your Email.

  • DO YOU MAKE THESE MONEY MISTAKES? Do you know that trying to pay off your high interest rate debts first and/or paying extra on more than one debt is the SLOWEST way to get out of debt? Don't make these same mistakes. Learn more at by clicking here...

Google

In The Current Issue

Read #5.09 here!

Fighting Spam: Part 4
Low Tech Solutions
   - System Journals

Acrobat Shows Agility
Mining for Data
   Using Microsoft Excel


Pop-Up Stopper
The product takes aim at the blight of pop-up and pop-under ads that have made Web surfing a nuisance of late. It comes in three flavors: a Free version, Pro ($19.95), and Companion ($39.95), an expanded package that provides cookie cleaning, Web navigation aids, and clean-up tools.

Norton Internet Security 2002
The standard version of Norton includes Web site and newsgroup blocking features intended to keep kids out of porn and violence sites, but the Pro version's features are billed as "productivity controls" designed to allow employers to keep workers from goofing off on the job.

Getting Things Done
by David Allen
There is a logic to David Allen's system that really works for me. The book gives a broad overview of the process then goes into a step-by-step explanation on how to integrate the ideas in the book right into your daily life.

Read The Naked PC back issues