I found treasure! Publication and citation data with metadata (author names, addresses, affiliation): http://citeseer.ist.psu.edu/oai.html
I was reading about knowledge management here, which says that knowledge management is nonsense. I agree to a certain degree, not because of the field, but because of its name. How do you manage knowledge? Isn’t knowledge derived? Wasn’t information “science” good enough? (I have problem with “business intelligence” as well…) As the author of that article says, it is a new term coined to attract attention. He does provide some evidence, but I was left unsatisfied.
I thought of performing text mining on publications database, and citeseer has this great resource. I downloaded the data (72 XML files), performed some clean-up, and ran a script to pull citeseer ID, author addresses, and publication dates where the abstract contained the term “knowledge management”. I was interested in seeing the trend of publication and places of publication.
Have a look at this chart:

There is a definite growth in this area, at least in research and publications. It is startling to see a paper published in 1970, and a peak in 2002. As citeseer data ends in 2004, it is possible that it doesn’t have complete publication history of 2004.
Geographic location wise, the US and Europe leads the way in number of publications:

Here’s what I did to get a cool looking tag cloud of data mining jobs:
- Used Yahoo Pipes (I created mine, but this one has more feeds)– this pipe aggregates feeds from different job web-sites, and gives the user unique job listing that you can subscribe via RSS: Job Feed Aggregator by Sean Dolan
- Subscribed to the RSS feed for the keyword “data mining”
- Copied the job descriptions and requirements of many jobs, and saved the text file
- Got the python stemmer
- Applied the python stemmer to the text file. Stemmer truncates words to their roots, so that we can combine variants of a word into a single word. (First or second step in text mining)
- Created a tag cloud using the services of http://www.wordle.net/ . They use “stop words,” so I didn’t have to apply those. Stop words are common words, which necessarily don’t add any value for categorization, of a language.

Data Mining Jobs Tag Cloud
The most frequent word is: experience. Companies want people with experience in different data mining techniques. You’ll see that some other big words are: SAS (stemmed as sa), Excel, SQL, analytical skills, statistics, and quantitative skills.
And how do you master these skills, you ask?
- Get a graduate degree in statistics, economics, mathematics, computer science, financial engineering, or industrial engineering with emphasis on databases, data mining, and marketing.
- Successfully complete data mining projects using free, open-source data mining tools, such as Weka, R, Orange, Rapid-Miner.
- Participate in data mining competitions. SAS’s data mining conference has a data mining competition every year.
Have a look at a detailed study by Pejic Bach, M: Creating profile of data mining specialist
Recent Comments