My experiments with sparklines

September 23rd, 2009 No comments

For the uninitiated, Sparklines are “data-intense, design-simple, word-sized graphics” according to its inventor Edward Tufte. I always wanted to include them in my trend reports. The challenge: How and Which tool to use?

The data came from this report came from a complex query using SQL server and Access that had this format:

College Major 2005 2006 2007 2008 2009
A B 10 15 14 18 25

Some options I investigated and tried (be sure to read this page, you might find many more):

  • Google’s Chart API (http://code.google.com/apis/chart/types.html): you can embed these in Google spreadsheets, create HTML pages using Excel via VBA, or embed them in Excel sheets. Alas, none would work the way I would like them to work: In-cell graphics
  • Use Sparklines for Excel add-in: this add-in will create great in-cell charts (bar, bullet graphs, sparklines, etc), but copying them down is difficult and resource intensive, and any change you make in the column size will alter the shape of that chart. In addition, I had more than 400 rows to populate–these would be too many objects in a spreadsheet for Excel to handle. My machine froze when I tried to copy it for 10 rows.
  • Use R’s implementation by Jason Dieterle (search on this page for Jason Dieterle (email), January 28, 2008. Works very good. I modified the code to print max and mins only, but the function generates a graphic, which needed to be embeded in LaTex file. It did not work nicely. Charts were too big to fit in a cell of a table.
  • Create bar charts in Excel using REPT function. I tried different font sizes and styles, but it didn’t look pretty, for the range of data varied. (Hint: make the alignment of text 90, use pipe signs, create columns and graphs for each data value, remove gridlines, keep the columns very close. It almost worked.)
  • Use spark package for LaTex, doesn’t work in pdflatex, and you have to play a lot with the settings
  • Use sparklines package for LaTex. Problem was that data needed to be normalized (or scaled) from 0 to 1, and needed extra parameters for min and max points. Solution: create a normalize function in Excel, and write a big formula to produce the exact needed string for the sparklines to work i.e.:
    \begin{sparkline}{5}
    \sparkdot 1 1 blue
    \sparkdot 0.2 0 red
    \spark 0.2 0 0.4 0.0625 0.6 0.5625 0.8 0.75 1 1 /
    \end{sparkline}

Here’s the normalize function:

Public Function Normalize(cell2Normalize As Range, WholeRng As Range)
       Normalize = (cell2Normalize.Value - WorksheetFunction.Min(WholeRng)) / (WorksheetFunction.Max(WholeRng) - WorksheetFunction.Min(WholeRng))
End Function

Here’s the big formula to convert the range for data values from 2005 to 2009 (in the range E2:I2) to the sparkline LaTex environment.

="\begin{sparkline}{5} " & "\sparkdot " & CHOOSE(MATCH(MAX(E2:I2),E2:I2,0),0.2,0.4,0.6,0.8,1) & " " & 1 & " blue " & "\sparkdot " & CHOOSE(MATCH(MIN(E2:I2),E2:I2,0),0.2,0.4,0.6,0.8,1) & " " & 0 & " red " & " \spark 0.2 " & Normalize(E2,E2:I2) & " 0.4 " & Normalize(F2,E2:I2) & " 0.6 " & Normalize(G2,E2:I2) & " 0.8 " & Normalize(H2,E2:I2) & " 1 " & Normalize(I2,E2:I2) & " / \end{sparkline}"

This last option worked beautifully. I dragged the formula down. I selected the data, and clicked on “Convert Table to LaTex” button (using this add-in). Copied the LaTex code to clipboard and pasted it in my LaTex editor. Manually merged the rows for colleges (using \multirow), and generated a beautiful looking pdf.

I was very happy. Printed it in color. Got a request back very soon that there should be total rows.

I forgot about the sparklines and created a report in Access with plain old numbers in less than 15 mins. Gave it back.

I tried to repeat this – Sparkline in Cognos -in Access with no luck.

(I later tried it one more time: Got the data in Excel using External data> Access, created a pivottable with rows and all, did some formatting, copied and pasted values and formats, inserted sparkline code, converted it to LaTex, copied and pasted in LaTex editor, and here’s the beautiful looking sample pdf of 16)

I wish there were simple reporting solutions that included awesome data visualization tools. (BTW, Excel 2010 will have sparklines: link) For this report, I did try Sweave, R, and LaTex, but because of the time constraints I could not investigate it further.

Please comment if you know any other way which meet (or don’t, Tableau is certainly one) these conditions: inexpensive (read free), efficient, and repeatable.

The search key was not found in any record in Access

September 15th, 2009 No comments

Well, I got this error message today while I was trying to import a text file:

“The search key was not found in any record”

After some research, I found that either it was a bug caused by Microsoft Jet 4.0 Service Pack or it was the size of the database (should be not greater than 2GB). It turned out to be the later one. I fixed this problem by compacting and resizing the database. In Access 2007, you click on the Office Button (Top Left Corner), then Manage > Compact and Repair Database.

Categories: Access, Error messages Tags:

Tag Cloud of Data Mining Jobs

August 20th, 2009 No comments

Here’s what I did to get a cool looking tag cloud of data mining jobs:

  1. Used Yahoo Pipes (I created mine, but this one has more feeds)– this pipe aggregates feeds from different job web-sites, and gives the user unique job listing that you can subscribe via RSS: Job Feed Aggregator by Sean Dolan
  2. Subscribed to the RSS feed for the keyword “data mining”
  3. Copied the job descriptions and requirements of many jobs, and saved the text file
  4. Got the python stemmer
  5. Applied the python stemmer to the text file. Stemmer truncates words to their roots, so that we can combine variants of a word into a single word. (First or second step in text mining)
  6. Created a tag cloud using the services of http://www.wordle.net/ . They use “stop words,” so I didn’t have to apply those. Stop words are common words, which necessarily don’t add any value for categorization, of a language.
Data Mining Jobs Tag Cloud

Data Mining Jobs Tag Cloud


The most frequent word is: experience. Companies want people with experience in different data mining techniques. You’ll see that some other big words are: SAS (stemmed as sa), Excel, SQL, analytical skills, statistics, and quantitative skills.

And how do you master these skills, you ask?

  1. Get a graduate degree in statistics, economics, mathematics, computer science, financial engineering, or industrial engineering with emphasis on databases, data mining, and marketing.
  2. Successfully complete data mining projects using free, open-source data mining tools, such as Weka, R, Orange, Rapid-Miner.
  3. Participate in data mining competitions. SAS’s data mining conference has a data mining competition every year.

Have a look at a detailed study by Pejic Bach, M: Creating profile of data mining specialist

Auto Filter Cursor Movement

June 1st, 2009 No comments

Don’t you hate it when you are working with many fields and you want to filter a particular field, so you apply auto filter on the all fields, but you find out that the field you were working on is gone from your sight and you see A1 cell. I did. Not anymore.

Solution: Simply hit the left or the right arrow key, and it will take you to the left or the right cell of the field you were working on.

Categories: Uncategorized Tags: , ,

Things they didn’t teach in school

May 22nd, 2009 No comments

My gripe about graduate school is that the school focused on well-established software and never embraced nor encouraged open-source software. If they had taught, or at least introduced, these following open-source software, it would have helped us immensely to produce the best looking reports with great data analysis. We, however, had to struggle with SAS/Excel to get the graphs and analysis needed, and then spend hours to perform formatting in Word. Why they didn’t teach us:

  1. LaTex: a powerful typeset editor, where you focus on writing and not on formatting. It takes care of all the headings, page numbering, figure/table/equation numbering, TOC, bibliography/citation, and more. Although the learning curve is rather steep, once you get the hang of it, life becomes so easy. For windows: you need to get MikTex and any LaTex editor, such as LEd, LyX, or WinEd
  2. R: awesome statistical package with wonderful graphics components. Producing stunning graphics and statistics has never been easy. It had me at summary. Any software that can do produce the following, just by giving summary(iris) command has to be great:
Summary produced by R of the iris data set

Summary produced by R of the iris data set

In my thesis, I had plenty of equations, and every time I made some significant changes, MS Word happily would turn those equations into empty white boxes — and then I had to rewrite them. In retrospect, I find it ridiculous that I was entering citations manually. So every time I added a new reference, I would manually change the bibliography page and the page where I cited that reference. With LaTex, it is just a breeze to do all this.

Descriptive stats, Box-plots, normal curves, neural network, charts with LaTex equations, and a lot of more stuff, all could be easily done using R, and the best part is “repeatability.” With simple commands, you could export all the charts as images for various data sets or for various training algorithms. No sweat! Try that with Excel. (I did some years ago).