<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mike Conley&#039;s Blog &#187; word counting</title>
	<atom:link href="http://mikeconley.ca/blog/tag/word-counting/feed/" rel="self" type="application/rss+xml" />
	<link>http://mikeconley.ca/blog</link>
	<description>The personal blog of a Toronto based software developer, musician, sound designer, and theatre enthusiast.</description>
	<lastBuildDate>Fri, 11 May 2012 15:23:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>wordCount.xpi &#8211; Part 1</title>
		<link>http://mikeconley.ca/blog/2009/05/07/wordcountxpi-part-1/</link>
		<comments>http://mikeconley.ca/blog/2009/05/07/wordcountxpi-part-1/#comments</comments>
		<pubDate>Thu, 07 May 2009 22:04:21 +0000</pubDate>
		<dc:creator>Mike</dc:creator>
				<category><![CDATA[Computer Science]]></category>
		<category><![CDATA[Firefox Extensions]]></category>
		<category><![CDATA[Internet]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[firefox]]></category>
		<category><![CDATA[mozilla]]></category>
		<category><![CDATA[OpenOffice]]></category>
		<category><![CDATA[TreeWalker]]></category>
		<category><![CDATA[wc]]></category>
		<category><![CDATA[word count]]></category>
		<category><![CDATA[word counting]]></category>
		<category><![CDATA[wordCount]]></category>
		<category><![CDATA[writer]]></category>

		<guid isPermaLink="false">http://mikeconley.ca/blog/?p=407</guid>
		<description><![CDATA[So, if you recall, I was asked to write a Firefox extension that would do word counting on websites. Originally, when I started this project, I set a goal for myself:  I copied the text from Project Gutenberg&#8217;s First Folio version of Shakespeare&#8217;s Hamlet into OpenOffice Writer, recorded the word/line/character count statistics, and set that [...]]]></description>
			<content:encoded><![CDATA[<p>So, if you recall, I was asked to write a Firefox extension that would do word counting on websites.</p>
<p>Originally, when I started this project, I set a goal for myself:  I copied the text from <a href="http://www.gutenberg.org/dirs/etext00/0ws2610.txt">Project Gutenberg&#8217;s First Folio version of Shakespeare&#8217;s Hamlet </a>into OpenOffice Writer, recorded the word/line/character count statistics, and set that as my projected goal for my first iteration of my extension.</p>
<p>But there&#8217;s a problem with this approach:  I&#8217;m supposed to be copying the behaviour of Unix&#8217;s wc, not OpenOffice Writer&#8217;s word count.  Normally, this wouldn&#8217;t be a problem &#8211; a word count is a word count, a line count is a line count, and Writer should pump out the same numbers as wc.</p>
<p>Not so.</p>
<p>In my last post, I wrote:</p>
<blockquote><p>According to OpenOffice Writer, this text has 32230 words, 173543 characters, and 4257 lines.</p></blockquote>
<p>However, upon passing the same text (saved in the textfile &#8220;count.txt&#8221;) through wc, I got the following output:</p>
<blockquote><p>5302  32230 178845 count.txt</p></blockquote>
<p>Writer and wc agree on the number of words, but disagree on the number of lines &#8211; 5302 (wc) vs 4257 (Writer).  It&#8217;s a disagreement of about a thousand lines.</p>
<p>Brutal.</p>
<p>Anyhow, I&#8217;m going to focus on wc&#8217;s approach to line counting &#8211; simply returning the number of newline characters in the file.</p>
<p>And guess what&#8230;it works.  For Hamlet, my extension pumps out:</p>
<blockquote><p>Document statistics:</p>
<p>Word Count:  32230<br />
Line Count:  5302<br />
Character Count:  178845<br />
Character Count (no spaces):  142368</p></blockquote>
<p>Nice.</p>
<p>Hamlet&#8217;s just the simple case though.  There are plenty of other cases to consider, but this is a start.</p>
<p>Anyhow, <a href="http://www.mikeconley.ca/Firefox_Extensions/wordCount/wordCount.xpi">download here</a>.</p>
<p>In this version, I&#8217;m using <a href="https://developer.mozilla.org/En/DOM/TreeWalker">Mozilla&#8217;s TreeWalker implementation</a> to stitch together the page text.  So far it seems to be working alright, but if it somehow ends up falling through, I might end up using something like <a href="http://www.friendpaste.com/29sHyVu1J3iljnGks30yY3">Andrew Trusty&#8217;s code</a> with the jQuery library to do the text stitching.</p>
<p>So there it is.  Maybe I&#8217;ll keep working on this, pretty it up a bit, etc.  However, work starts on Monday, and that&#8217;ll probably take up most of my technical attention.</p>
<p>We&#8217;ll see though.</p>
]]></content:encoded>
			<wfw:commentRss>http://mikeconley.ca/blog/2009/05/07/wordcountxpi-part-1/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

