<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>nandeshwar.info &#187; text mining</title>
	<atom:link href="http://nandeshwar.info/tag/text-mining/feed/" rel="self" type="application/rss+xml" />
	<link>http://nandeshwar.info</link>
	<description></description>
	<lastBuildDate>Wed, 18 Aug 2010 19:24:47 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Mining publication data</title>
		<link>http://nandeshwar.info/2010/03/08/mining-publication-data/</link>
		<comments>http://nandeshwar.info/2010/03/08/mining-publication-data/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 15:57:24 +0000</pubDate>
		<dc:creator>a7n9</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[scripting]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://nandeshwar.info/?p=315</guid>
		<description><![CDATA[I found treasure! Publication and citation data with metadata (author names, addresses, affiliation): http://citeseer.ist.psu.edu/oai.html 
I was reading about knowledge management here, which says that knowledge management is nonsense. I agree to a certain degree, not because of the field, but because of its name. How do you manage knowledge? Isn&#8217;t knowledge derived? Wasn&#8217;t information &#8220;science&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>I found treasure! Publication and citation data with metadata (author names, addresses, affiliation): <a href="http://citeseer.ist.psu.edu/oai.html">http://citeseer.ist.psu.edu/oai.html </a></p>
<p>I was reading about knowledge management <a href="http://informationr.net/ir/8-1/paper144.html?referer=www.clickfind.com.au">here</a>, which says that knowledge management is nonsense. I agree to a certain degree, not because of the field, but because of its name. How do you manage knowledge? Isn&#8217;t knowledge derived? Wasn&#8217;t information &#8220;science&#8221; good enough? (I have problem with &#8220;business intelligence&#8221; as well&#8230;) As the author of that article says, it is a new term coined to attract attention. He does provide some evidence, but I was left unsatisfied.</p>
<p>I thought of performing text mining on publications database, and citeseer has this great resource. I downloaded the data (72 XML files), performed some clean-up, and ran a script to pull citeseer ID, author addresses, and publication dates where the abstract contained the term &#8220;knowledge management&#8221;. I was interested in seeing the trend of publication and places of publication.</p>
<p style="text-align: center;">Have a look at this chart:<br />
<a href="http://nandeshwar.info/wp-content/uploads/2010/03/NoPubsbyYear.JPG"><img class="aligncenter size-full wp-image-317" title="Publications by year" src="http://nandeshwar.info/wp-content/uploads/2010/03/NoPubsbyYear.JPG" alt="Publications by year" width="573" height="587" /></a></p>
<p>There is a definite growth in this area, at least in research and publications. It is startling to see a paper published in 1970, and a peak in 2002. As citeseer data ends in 2004, it is possible that it doesn&#8217;t have complete publication history of 2004.</p>
<p>Geographic location wise, the US and Europe leads the way in number of publications:<br />
<a href="http://nandeshwar.info/wp-content/uploads/2010/03/WorldMapPubs.JPG"><img class="aligncenter size-medium wp-image-318" title="Worldwide Publications " src="http://nandeshwar.info/wp-content/uploads/2010/03/WorldMapPubs-300x168.jpg" alt="Worldwide Publications " width="300" height="168" /></a></p>
<div class='bookmarkify'><a name='bookmarkify'></a><div class='title' title='Use these links to share this page with others'>Share</div><div class='linkbuttons'><a href='http://www.citeulike.org/posturl?url=http://nandeshwar.info/2010/03/08/mining-publication-data/&amp;title=Mining publication data' title='Save to CiteULike' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/citeulike.png' style='width:16px; height:16px;' alt='[CiteULike] ' /></a> <a href='http://del.icio.us/post?url=http://nandeshwar.info/2010/03/08/mining-publication-data/&amp;title=Mining publication data' title='Save to del.icio.us' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/delicious.png' style='width:16px; height:16px;' alt='[del.icio.us] ' /></a> <a href='http://digg.com/submit?phase=2&amp;url=http://nandeshwar.info/2010/03/08/mining-publication-data/&amp;title=Mining publication data' title='Digg It!' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/digg.png' style='width:16px; height:16px;' alt='[Digg] ' /></a> <a href='http://www.facebook.com/share.php?u=http://nandeshwar.info/2010/03/08/mining-publication-data/' title='Save to Facebook' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/facebook.png' style='width:16px; height:16px;' alt='[Facebook] ' /></a> <a href='http://www.furl.net/storeIt.jsp?u=http://nandeshwar.info/2010/03/08/mining-publication-data/&amp;t=Mining publication data' title='Save to Furl' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/furl.png' style='width:16px; height:16px;' alt='[Furl] ' /></a> <a href='http://www.google.com/bookmarks/mark?op=edit&amp;output=popup&amp;bkmk=http://nandeshwar.info/2010/03/08/mining-publication-data/&amp;title=Mining publication data' title='Save to Google Bookmarks' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/google.png' style='width:16px; height:16px;' alt='[Google] ' /></a> <a href='http://reddit.com/submit?url=http://nandeshwar.info/2010/03/08/mining-publication-data/&amp;title=Mining publication data' title='Reddit' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/reddit.png' style='width:16px; height:16px;' alt='[Reddit] ' /></a> <a href='http://slashdot.org/bookmark.pl?url=http://nandeshwar.info/2010/03/08/mining-publication-data/&amp;title=Mining publication data' title='Slashdot It!' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/slashdot.png' style='width:16px; height:16px;' alt='[Slashdot] ' /></a> <a href='http://www.stumbleupon.com/submit?url=http://nandeshwar.info/2010/03/08/mining-publication-data/&amp;title=Mining publication data' title='Stumble It!' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/stumbleupon.png' style='width:16px; height:16px;' alt='[StumbleUpon] ' /></a> <a href='http://technorati.com/faves?add=http://nandeshwar.info/2010/03/08/mining-publication-data/' title='Add to my Technorati Favorites' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/technorati.png' style='width:16px; height:16px;' alt='[Technorati] ' /></a> <a href='http://twitter.com/home/?status=Mining publication data+http://nandeshwar.info/2010/03/08/mining-publication-data/' title='Save to Twitter' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/twitter.png' style='width:16px; height:16px;' alt='[Twitter] ' /></a> <a href='http://www.feedburner.com/fb/a/emailFlare?itemTitle=Mining publication data&amp;uri=http://nandeshwar.info/2010/03/08/mining-publication-data/&amp;loc=en_US' title='Email this to a friend' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/email.png' style='width:16px; height:16px;' alt='[Email] ' /></a>  <a title='See more bookmark and sharing options...' href='http://nandeshwar.info/2010/03/08/mining-publication-data/#bookmarkify' rel='nofollow'><small>More&nbsp;&raquo;</small></a></div><div class='brand'><small><a href='http://www.bookmarkify.com/'>Powered by Bookmarkify&trade;</a></small></div></div>]]></content:encoded>
			<wfw:commentRss>http://nandeshwar.info/2010/03/08/mining-publication-data/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Tag Cloud of Data Mining Jobs</title>
		<link>http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/</link>
		<comments>http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/#comments</comments>
		<pubDate>Thu, 20 Aug 2009 14:13:48 +0000</pubDate>
		<dc:creator>a7n9</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[data mining]]></category>
		<category><![CDATA[pipes]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[stemming]]></category>
		<category><![CDATA[tag cloud]]></category>
		<category><![CDATA[text mining]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://nandeshwar.info/?p=260</guid>
		<description><![CDATA[Here&#8217;s what I did to get a cool looking tag cloud of data mining jobs:

Used Yahoo Pipes (I created mine, but this one has more feeds)&#8211; this pipe aggregates feeds from different job web-sites, and gives the user unique job listing that you can subscribe via RSS:  Job Feed Aggregator by Sean Dolan 
Subscribed [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s what I did to get a cool looking tag cloud of data mining jobs:</p>
<ol>
<li>Used Yahoo Pipes (I created mine, but this one has more feeds)&#8211; this pipe aggregates feeds from different job web-sites, and gives the user unique job listing that you can subscribe via RSS:  <a href="http://pipes.yahoo.com/pipes/pipe.info?_id=50bf0b7cbcf40213deb98f1314dedf51">Job Feed Aggregator by Sean Dolan </a></li>
<li>Subscribed to the RSS feed for the keyword &#8220;data mining&#8221;</li>
<li>Copied the job descriptions and requirements of many jobs, and saved the text file</li>
<li>Got the <a href="http://tartarus.org/~martin/PorterStemmer/index-old.html">python stemmer </a></li>
<li>Applied the python stemmer to the text file. Stemmer truncates words to their roots, so that we can combine variants of a word into a single word. (First or second step in text mining)</li>
<li>Created a tag cloud using the services of <a href="http://www.wordle.net/">http://www.wordle.net/</a> . They use &#8220;stop words,&#8221; so I didn&#8217;t have to apply those. Stop words are common words, which necessarily don&#8217;t add any value for categorization, of a language.</li>
</ol>
<div id="attachment_261" class="wp-caption aligncenter" style="width: 591px"><a href="http://nandeshwar.info/wp-content/uploads/2009/08/dmjobstagcloud.jpg"><img class="size-full wp-image-261 " title="Data Mining Jobs Tag Cloud" src="http://nandeshwar.info/wp-content/uploads/2009/08/dmjobstagcloud.jpg" alt="Data Mining Jobs Tag Cloud" width="581" height="249" /></a><p class="wp-caption-text">Data Mining Jobs Tag Cloud</p></div>
<p><!--adsensestart--><br />
The most frequent word is: experience. Companies want people with experience in different data mining techniques. You&#8217;ll see that some other big words are: SAS (stemmed as sa), Excel, SQL, analytical skills, statistics, and quantitative skills.</p>
<p>And how do you master these skills, you ask?</p>
<ol>
<li>Get a graduate degree in statistics, economics, mathematics, computer science, financial engineering, or industrial engineering with emphasis on databases, data mining, and marketing.</li>
<li>Successfully complete data mining projects using free, open-source data mining tools, such as Weka, R, Orange, Rapid-Miner.</li>
<li>Participate in data mining competitions. SAS&#8217;s data mining conference has a data mining competition every year.</li>
</ol>
<p>Have a look at a detailed study by Pejic Bach, M: Creating profile of data mining specialist</p>
<div class='bookmarkify'><a name='bookmarkify'></a><div class='title' title='Use these links to share this page with others'>Share</div><div class='linkbuttons'><a href='http://www.citeulike.org/posturl?url=http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/&amp;title=Tag Cloud of Data Mining Jobs' title='Save to CiteULike' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/citeulike.png' style='width:16px; height:16px;' alt='[CiteULike] ' /></a> <a href='http://del.icio.us/post?url=http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/&amp;title=Tag Cloud of Data Mining Jobs' title='Save to del.icio.us' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/delicious.png' style='width:16px; height:16px;' alt='[del.icio.us] ' /></a> <a href='http://digg.com/submit?phase=2&amp;url=http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/&amp;title=Tag Cloud of Data Mining Jobs' title='Digg It!' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/digg.png' style='width:16px; height:16px;' alt='[Digg] ' /></a> <a href='http://www.facebook.com/share.php?u=http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/' title='Save to Facebook' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/facebook.png' style='width:16px; height:16px;' alt='[Facebook] ' /></a> <a href='http://www.furl.net/storeIt.jsp?u=http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/&amp;t=Tag Cloud of Data Mining Jobs' title='Save to Furl' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/furl.png' style='width:16px; height:16px;' alt='[Furl] ' /></a> <a href='http://www.google.com/bookmarks/mark?op=edit&amp;output=popup&amp;bkmk=http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/&amp;title=Tag Cloud of Data Mining Jobs' title='Save to Google Bookmarks' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/google.png' style='width:16px; height:16px;' alt='[Google] ' /></a> <a href='http://reddit.com/submit?url=http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/&amp;title=Tag Cloud of Data Mining Jobs' title='Reddit' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/reddit.png' style='width:16px; height:16px;' alt='[Reddit] ' /></a> <a href='http://slashdot.org/bookmark.pl?url=http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/&amp;title=Tag Cloud of Data Mining Jobs' title='Slashdot It!' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/slashdot.png' style='width:16px; height:16px;' alt='[Slashdot] ' /></a> <a href='http://www.stumbleupon.com/submit?url=http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/&amp;title=Tag Cloud of Data Mining Jobs' title='Stumble It!' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/stumbleupon.png' style='width:16px; height:16px;' alt='[StumbleUpon] ' /></a> <a href='http://technorati.com/faves?add=http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/' title='Add to my Technorati Favorites' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/technorati.png' style='width:16px; height:16px;' alt='[Technorati] ' /></a> <a href='http://twitter.com/home/?status=Tag Cloud of Data Mining Jobs+http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/' title='Save to Twitter' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/twitter.png' style='width:16px; height:16px;' alt='[Twitter] ' /></a> <a href='http://www.feedburner.com/fb/a/emailFlare?itemTitle=Tag Cloud of Data Mining Jobs&amp;uri=http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/&amp;loc=en_US' title='Email this to a friend' onclick='target="_blank";' rel='nofollow'><img src='http://nandeshwar.info/wp-content/plugins/bookmarkify/email.png' style='width:16px; height:16px;' alt='[Email] ' /></a>  <a title='See more bookmark and sharing options...' href='http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/#bookmarkify' rel='nofollow'><small>More&nbsp;&raquo;</small></a></div><div class='brand'><small><a href='http://www.bookmarkify.com/'>Powered by Bookmarkify&trade;</a></small></div></div>]]></content:encoded>
			<wfw:commentRss>http://nandeshwar.info/2009/08/20/tag-cloud-of-data-mining-jobs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
