{"id":404,"date":"2009-05-05T17:28:32","date_gmt":"2009-05-05T22:28:32","guid":{"rendered":"http:\/\/mikeconley.ca\/blog\/?p=404"},"modified":"2023-12-20T16:25:20","modified_gmt":"2023-12-20T21:25:20","slug":"for-my-next-trick","status":"publish","type":"post","link":"https:\/\/mikeconley.ca\/blog\/2009\/05\/05\/for-my-next-trick\/","title":{"rendered":"For my next trick&#8230;"},"content":{"rendered":"<p>It didn&#8217;t take long for another Firefox extension idea to come along.<\/p>\n<p>Prof. Greg Wilson recently sent me an email, saying the following:<\/p>\n<blockquote><p>I&#8217;d like a Firefox plugin that does &#8216;wc&#8217;, i.e., counts characters, words,  and lines on the current web page, and displays the results in the status  bar.<\/p><\/blockquote>\n<p>Cool, I thought.\u00a0 No problem.\u00a0 That doesn&#8217;t sound too hard.<\/p>\n<p>But I&#8217;ve been mulling and chewing this around in my head, and it&#8217;s actually a harder problem than it first sounds.<\/p>\n<p>&#8220;<a href=\"http:\/\/en.wikipedia.org\/wiki\/Wc_(Unix)\">wc<\/a>&#8220;, short for word-count, is a small, simple, yet extraordinarily useful Unix utility that reads in some file, and spits out the number of words, characters, and lines for that file.<\/p>\n<p>So what&#8217;s the problem?\u00a0 What&#8217;s so hard about coding something like this for web pages?<\/p>\n<p>Well, for starters, users of this proposed extension are probably only interested in the visible, readable text on a web page.\u00a0 That means filtering out all of the HTML tags, all of the JavaScript, etc.\u00a0 Also, many modern web pages make use of IFRAME&#8217;s, hidden DIV&#8217;s, etc.\u00a0 Not to mention, most browsers do automatic word-wrapping, which could throw off the &#8220;line&#8221; counting.\u00a0 How should I treat these cases?<\/p>\n<p>I certainly don&#8217;t think this is an impossible task, just harder than it first sounded.<\/p>\n<p>So here&#8217;s what I&#8217;m going to do:<\/p>\n<p>First, I&#8217;m going to take care of the base case.\u00a0 I&#8217;m going to take care of the case where users are viewing a page of all text, with almost zero HTML.<\/p>\n<p>My test page will be an &#8220;etext&#8221; copy of <a href=\"http:\/\/www.gutenberg.org\/dirs\/etext00\/0ws2610.txt\">Shakespeare&#8217;s Hamlet (first folio)<\/a>, hosted by Project Gutenberg.<\/p>\n<p>According to OpenOffice Writer, this text has 32230 words, 173543 characters, and 4257 lines.<\/p>\n<p>So that&#8217;s my target.\u00a0 I&#8217;m going to create an extension that sits as a button on the status bar.\u00a0 When the button is clicked, an alert will pop up with the statistics.\u00a0 If all goes well, the numbers will match.<\/p>\n<p>Sure, it&#8217;s not the most elegant interface, but it&#8217;ll do for now.<\/p>\n<p>I&#8217;ll post more as it comes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It didn&#8217;t take long for another Firefox extension idea to come along. Prof. Greg Wilson recently sent me an email, saying the following: I&#8217;d like a Firefox plugin that does &#8216;wc&#8217;, i.e., counts characters, words, and lines on the current web page, and displays the results in the status bar. Cool, I thought.\u00a0 No problem.\u00a0 [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[5,220,44,68,79],"tags":[213,125,35,222],"class_list":["post-404","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-firefox-extensions","category-internet","category-javascript","category-technology","tag-extension","tag-firefox","tag-mozilla","tag-wc"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/prmTy-6w","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/posts\/404","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/comments?post=404"}],"version-history":[{"count":3,"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/posts\/404\/revisions"}],"predecessor-version":[{"id":3248,"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/posts\/404\/revisions\/3248"}],"wp:attachment":[{"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/media?parent=404"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/categories?post=404"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/tags?post=404"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}