Category Archives: Internet

Summer Work: Week 1

For this summer, the Computer Science Department at UofT has hired me to continue my work on the OLM project.  Click on that link, or check out my other post about OLM to see what it’s all about.

I just finished my first week of work, and it finished with a long weekend.  Not bad.

And I’ve got a great team – I’m working with Severin Gehwolf and Nelle Varoquaux, both excellent thinkers, programmers, and collaborators.  Severin is a UofT student like myself, and Nelle has flown in specially from France (!) to work with us.  They’re great, and we’re going to get a lot done.

So what have we to show for our first week of work?

Well for starters, we’ve gutted the entire database schema of OLM.  We started right from the bottom, and worked our way through every component of the database, trying to figure out what we could cut, trim, expand, and refactor.

And there was plenty to do.  This version of OLM has been in the works for a while now, and there have been plenty of awesome people working on it – but there’s been a variety of Ruby/Rails/JavaScript experience, and the cracks show.

I, myself, came into this project with no Rails experience whatsoever, and while I think I now more or less get the drift, I’m still by no means an expert.  Anyhow,  I’m looking at my old code too, and kind of grimacing.

But the ideas are all there.  It’s like a big hunk of marble that a whole lot of people gnawed and chiseled at for a little bit, trying to make a sculpture.  After the big DB schema refactor, I think the whole team can sort of see the rough form of what this thing is trying to become, and now we just need to carve it out.  Luckily, instead of a few hours per week like the last few semesters, we get a full summer to focus on it.

So, with the DB refactor done, the first thing has been to redesign the models/controllers to play nice with our new database tables.  It was scary, because after the refactor, everything broke – but we’re working on it, and it’s slowly starting to come back.

We’ve also decided to switch the file storage back-end.  Up until now, we were using Ruby to organize a file system back-end to do simple versioning of submitted files.  One of our goals this summer, is to build an abstraction layer that will allow us to choose different options for this versioned storage back-end.  In particular, we aim to support Subversion.  That’s right – a web-based Subversion front-end that supports commits, and catches (but doesn’t resolve) conflicts.  It’s a fun thought.

I have a feeling this is going to be a very interesting part of our project, and I’ll probably report on it more as it develops – but as it stands, it’s still being conceived on wipe-boards and scrap paper.

Anyhow, I’ll try to keep this blog up to date with what we’re doing.  Or maybe I’ll keep this blog up to date.  I’m conflicted.

Who knows, maybe this will be my last blog post of the summer.  I won’t lie – after working 8 hours on a computer, the last thing I want to do is come home and write a blog post.  If anything, my posts will probably wait until the weekends.

But we’ll see.

wordCount.xpi – Part 1

So, if you recall, I was asked to write a Firefox extension that would do word counting on websites.

Originally, when I started this project, I set a goal for myself:  I copied the text from Project Gutenberg’s First Folio version of Shakespeare’s Hamlet into OpenOffice Writer, recorded the word/line/character count statistics, and set that as my projected goal for my first iteration of my extension.

But there’s a problem with this approach:  I’m supposed to be copying the behaviour of Unix’s wc, not OpenOffice Writer’s word count.  Normally, this wouldn’t be a problem – a word count is a word count, a line count is a line count, and Writer should pump out the same numbers as wc.

Not so.

In my last post, I wrote:

According to OpenOffice Writer, this text has 32230 words, 173543 characters, and 4257 lines.

However, upon passing the same text (saved in the textfile “count.txt”) through wc, I got the following output:

5302 32230 178845 count.txt

Writer and wc agree on the number of words, but disagree on the number of lines – 5302 (wc) vs 4257 (Writer).  It’s a disagreement of about a thousand lines.

Brutal.

Anyhow, I’m going to focus on wc’s approach to line counting – simply returning the number of newline characters in the file.

And guess what…it works.  For Hamlet, my extension pumps out:

Document statistics:

Word Count:  32230
Line Count:  5302
Character Count:  178845
Character Count (no spaces):  142368

Nice.

Hamlet’s just the simple case though.  There are plenty of other cases to consider, but this is a start.

Anyhow, download here.

In this version, I’m using Mozilla’s TreeWalker implementation to stitch together the page text.  So far it seems to be working alright, but if it somehow ends up falling through, I might end up using something like Andrew Trusty’s code with the jQuery library to do the text stitching.

So there it is.  Maybe I’ll keep working on this, pretty it up a bit, etc.  However, work starts on Monday, and that’ll probably take up most of my technical attention.

We’ll see though.

For my next trick…

It didn’t take long for another Firefox extension idea to come along.

Prof. Greg Wilson recently sent me an email, saying the following:

I’d like a Firefox plugin that does ‘wc’, i.e., counts characters, words, and lines on the current web page, and displays the results in the status bar.

Cool, I thought.  No problem.  That doesn’t sound too hard.

But I’ve been mulling and chewing this around in my head, and it’s actually a harder problem than it first sounds.

wc“, short for word-count, is a small, simple, yet extraordinarily useful Unix utility that reads in some file, and spits out the number of words, characters, and lines for that file.

So what’s the problem?  What’s so hard about coding something like this for web pages?

Well, for starters, users of this proposed extension are probably only interested in the visible, readable text on a web page.  That means filtering out all of the HTML tags, all of the JavaScript, etc.  Also, many modern web pages make use of IFRAME’s, hidden DIV’s, etc.  Not to mention, most browsers do automatic word-wrapping, which could throw off the “line” counting.  How should I treat these cases?

I certainly don’t think this is an impossible task, just harder than it first sounded.

So here’s what I’m going to do:

First, I’m going to take care of the base case.  I’m going to take care of the case where users are viewing a page of all text, with almost zero HTML.

My test page will be an “etext” copy of Shakespeare’s Hamlet (first folio), hosted by Project Gutenberg.

According to OpenOffice Writer, this text has 32230 words, 173543 characters, and 4257 lines.

So that’s my target.  I’m going to create an extension that sits as a button on the status bar.  When the button is clicked, an alert will pop up with the statistics.  If all goes well, the numbers will match.

Sure, it’s not the most elegant interface, but it’ll do for now.

I’ll post more as it comes.

Overriding Firefox’s Window.Alert – Part 4

So, I think I’m more or less done the extension.

Someday, when I’ve got more extension development experience under my belt, I’ll probably come back to this and fix it up.  Until then, this will have to do.

Click here to download.

If you’re interested in looking at the source, just change the file extension from “.xpi” to “.zip”, and decompress.  It’s all there.

There’s no license on this thing, no GPL, MIT, nothing.  Use it however you want.  If you find it useful though, I’d love to hear from you – send me email, post a comment, Facebook, Twitter, whichever.

Whew.  I think I’m going to reward myself with some orange sherbet.  Om nom nom…

Here’s a really annoying website to test the extension with.  I really don’t recommend that you visit it without my extension installed.

The window hops around a bit, so just double click on the location bar, and type in something like “http://www.google.ca”.  This will start up the flood of alerts, and (hopefully) you’ll be able to suppress them after the first one hits.

Here’s the site.  Visit at your own risk.

UPDATE:

I’ve moved the extension to Mozilla Addons, and added Firefox 3.5 compatibility.

I’ve updated alertCheck.xpi so that it’ll play nice with Firefox 3.0b5, and hopefully Firefox 3.1.*.  Let me know if there are any behaviour foulups, and I’ll do my best to fix them.

Overriding Firefox’s Window.Alert – Part 3

Wow.  I think I got it.  I’ve got a Firefox plugin that can suppress all alert() dialogs on a page if the user checks a “suppress” box on the second alert() dialog.

The trick, was not to rely on the DOMContentLoaded event to fire to do the override.  Instead, I used the DOMWillOpenModalDialog to detect the first alert().  After detection, I overrode with an alertCheck which asked the user whether or not to “suppress more dialogs”.  If the user answers in the affirmative, alert() is simply overwritten with an empty function.

Piece of cake.

A couple of issues though…

Security

In order to override the alert() function, I have to write to document.getElementById(‘content’).contentWindow.wrappedJSObject.alert.

Remember how I mentioned the distance between the Extension JavaScript, and the inline content JavaScript?  I said it felt like a security layer.

I was totally right.

Check this out. I’ll quote:

You should be aware of XPCNativeWrappers when working with untrusted content. With XPCNativeWrappers turned on (which is the default in Firefox 1.5+), your extension can safely access the DOM of the content document, but not the content JavaScript. Bypassing XPCNativeWrapper to work with content JavaScript directly can lead to security problems.

Hrmph.  So I seem to be violating some security rules here.  So maybe my approach isn’t the greatest idea.  “Mook” from irc.mozilla.net #extdev suggested looking into commonDialog.xul…but I can’t seem to wrap my head around that just yet.

Imperfections

Not sure why yet, but while I can suppress dialog floods like this:

for (i = 0; i < 10; ++i) {
  alert(i);
}

It seems to fail on this:

for (i = 0; i < 10; ++i) {
  alert(i);
  confirm(i);
}

For some reason, regardless of whether or not I choose to suppress the dialogs, they just keep coming.  It works fine when I swap out the confirm() for a second alert().  Not exactly sure why.  Yet.

Ok, so I’m going to clean the code up, and post it soon.  I’ll also post a link to a real, brutally annoying website where you can test the alertCheck extension.  Just give me a bit.