GSoC Update: My Review Board Statistics Extension

The Primary Goal

From the very beginning, my GSoC project has been mainly focused towards one primary goal:  I want to build an extension for Review Board that will allow me to collect information about how long reviewers actually spend reviewing code.

That’s easier said than done.  When I started, the Review Board extension framework wasn’t really in a state to allow such an extension to exist.

So I’ve been tooling around in the Review Board code for the past 2 months, preparing the framework, and getting it ready to handle my extension.

And last night, it started to work.  I can now give rough estimates on how long a reviewer has spent reviewing code.

How It Works

My extension adds a new table to the database which stores “reviewing sessions”.  Each reviewing session is associated to a particular review request and user, and also has a field to store the number of seconds that a user has spent in review.

I’ve created a TemplateHook that allows me to inject Javascript into key areas of Review Board (in particular, the diff viewer, and the screenshot viewer).  The Javascript does the following:  every 10 seconds, we check to see if the mouse has moved on the body of the HTML document.  If it has, we send an “activity” notification to the server.

The server receives this activity notification through the Web API, and checks to see if the time lapsed since the last session update was greater than 10 seconds.  If it is, we increment the working session by 10 seconds and return a 200 HTTP code.  If it isn’t, we don’t change anything and return a 304 HTTP code.

Next, my extension waits for a user to publish a review.  When it notices that a review is being published, it finds the working session for that user and review request, and then attaches it to the published review.  If the user then starts looking at the diff or screenshots again, a new working session is created.

The result?  A pretty decent estimate of how long a user has spent reviewing the code.  No time gets recorded if the user gets up and has a sandwich.  No time gets recorded if the user is on another tab reading Reddit.

An image showing how reviewing time is displayed to the user

Not bad.  For a first draft, anyhow.

I think I’m going to try to chart the data somehow, so that users can track their inspection rates.  I’ll let you know how that goes.

The Shoulders of Tall, Smart People

Recently, I came to the realization that I’ve been writing computer programs in one form or another since I was about 6 or 7 years old.

Along the way, I’ve had plenty of people to influence the way I think about code, and how I write it.  Sure, there have been plenty of textbooks along the way too, but I want to give some thanks to the people who have directly affected my abilities to do what I do.

And what better way of doing that then by listing them?

A Chronological List of People Who Have Influenced My Coding

  1. My parents, for bringing home our first family computer.  It was an 8088XT IBM Clone – no hard drive, 640K of RAM, dual 5 1/4 floppies…it was awesome.  This is the computer I started coding on – but I couldn’t have started without…
  2. My Uncle Mark and my Aunt Soo.  Both have degrees in Computer Science from the University of Waterloo (that’s where they met).  My recollection is pretty vague, but I’m pretty sure that a lot of the programming texts in my house (a big blue QuickBasic manual comes to mind) surely didn’t come from my parents – must have been those two.  With the book in one hand, and the 8088 in the other, I cranked out stupid little programs, little text adventure games, quizzes, etc.
  3. The online QB community from the late 1990′s to the early 2000′s.  When my family got online, I soon found myself hanging out at NeoZones, in the #quickbasic IRC channel on EFNet… actually, a lot of crazy stuff was being done with QuickBasic back then – I remember when DirectQB came out, and somebody was able to code a raytracer…in BASIC.  It was awesome.  I’d say these were my foundation years, when I learned all of my programming fundamentals.
  4. My friends Nick Braun, Joel Beck, and Doug McQuiggan – these three guys and I used to come up with crazy ideas for games, and I’d try to program them.  I’d come home from school, and pound out code for a computer game for a few hours in the basement.  More often then not, these projects would simply be abandoned, but still, a lot was learned here.
    Joel, Doug, our friend Julian and myself were also members of a band in highschool.  It was my job to build and maintain the band website, and this is when I learned to write HTML, basic Perl, and simple JavaScript.
  5. After highschool, I went into Electrical and Computer Engineering at the University of Toronto.  I didn’t do too well at the Electrical bits, but I could handle myself at the Computer bits.  I learned OOP, Java, and basic design patterns from Prof. James McLean.
  6. I also learned a great deal from Prof. McLean’s course text – Introduction to Computer Science Using Java by Prof. John Carter.  I know I said I wasn’t going to mention textbooks, but I also got taught Discrete Mathematics from Prof. Carter, so I thought I’d toss him in too.
  7. My second (and last) semester in ECE had me taking Programming Fundamentals with Prof. Tarak Abdelrahman.  I learned basic C++ from Prof. Abdelrahman, and how to deal with large systems of code.
  8. After my move to the Arts & Science Faculty, I took my first Computer Science course with Dr. Jim Clarke. I learned about Unit Testing, and more design patterns.  I also eventually learned some basic Python from him, but I think it was in another course.
  9. I took CSC258 with Prof. Eric Hehner, and learned about the structure of computer processors.  Physically, this was a low-level as I’d ever gotten to computers.  I was familiar with writing Assembly from my QB days, but Prof. Hehner’s Opcode exercises were really quite challenging – in a pleasant way.  Also, check out his concept of Quote Notation
  10. After that year, I spent the first of three summers working for the District School Board of Niagara.  Ken Pidgen was my manager, Mila Shostak was my supervisor.  Ken gave me incredible freedom to work, and soon I was developing web applications, as opposed to just fixing up department websites (as I originally thought I would be doing).  Mila gave me guidance, and showed me how to use CSS to style a website.  She also got me started using PHP and MySQL to create basic web applications.
  11. While working at the Board, I had the pleasure of sitting across from Jong Lee.  Jong and I would bounce ideas off of one another when we’d get stuck on a programming problem.  He was very experienced, and I learned lots of practical programming techniques from him.
  12. Michael Langlois and Ken Redekop acted as my clients at the Board, and always gave me interesting jobs and challenges to perform.Everyone at the Board was always very positive with me, and I’ll always be grateful that they took a newbie undergrad under their wing!  I was given a ridiculous amount of freedom at the Board, and was allowed to experiment with various technologies to get the job done.  Through my three summers there, I learned bits about Rails, CakePHP, MVC, network security, how to deploy an application remotely, how to run a local server, how to develop locally and post to remote, ORM, Flash, web security…so many things.  The list is huge.
  13. Karen Reid and Greg Wilson have been the latest influences on me.  The MarkUs Project was the first project I’ve ever worked on with a team.  It was my first time seriously using version control, my first time using a project management portal (Dr. Project), my first time learning Ruby, and my first time working on an open source project.  I’ve also learned plenty about time management, people, the business of software, and how to get things done.  Again, I’ve been given lots of freedom to learn, experiment, and hone my craft.

Anyhow, these are the people who come to mind.  I might add to this list if I remember anyone else.

But in the mean time, for the people listed above:  thank you.

Summer Work: Week 1

For this summer, the Computer Science Department at UofT has hired me to continue my work on the OLM project.  Click on that link, or check out my other post about OLM to see what it’s all about.

I just finished my first week of work, and it finished with a long weekend.  Not bad.

And I’ve got a great team – I’m working with Severin Gehwolf and Nelle Varoquaux, both excellent thinkers, programmers, and collaborators.  Severin is a UofT student like myself, and Nelle has flown in specially from France (!) to work with us.  They’re great, and we’re going to get a lot done.

So what have we to show for our first week of work?

Well for starters, we’ve gutted the entire database schema of OLM.  We started right from the bottom, and worked our way through every component of the database, trying to figure out what we could cut, trim, expand, and refactor.

And there was plenty to do.  This version of OLM has been in the works for a while now, and there have been plenty of awesome people working on it – but there’s been a variety of Ruby/Rails/JavaScript experience, and the cracks show.

I, myself, came into this project with no Rails experience whatsoever, and while I think I now more or less get the drift, I’m still by no means an expert.  Anyhow,  I’m looking at my old code too, and kind of grimacing.

But the ideas are all there.  It’s like a big hunk of marble that a whole lot of people gnawed and chiseled at for a little bit, trying to make a sculpture.  After the big DB schema refactor, I think the whole team can sort of see the rough form of what this thing is trying to become, and now we just need to carve it out.  Luckily, instead of a few hours per week like the last few semesters, we get a full summer to focus on it.

So, with the DB refactor done, the first thing has been to redesign the models/controllers to play nice with our new database tables.  It was scary, because after the refactor, everything broke – but we’re working on it, and it’s slowly starting to come back.

We’ve also decided to switch the file storage back-end.  Up until now, we were using Ruby to organize a file system back-end to do simple versioning of submitted files.  One of our goals this summer, is to build an abstraction layer that will allow us to choose different options for this versioned storage back-end.  In particular, we aim to support Subversion.  That’s right – a web-based Subversion front-end that supports commits, and catches (but doesn’t resolve) conflicts.  It’s a fun thought.

I have a feeling this is going to be a very interesting part of our project, and I’ll probably report on it more as it develops – but as it stands, it’s still being conceived on wipe-boards and scrap paper.

Anyhow, I’ll try to keep this blog up to date with what we’re doing.  Or maybe I’ll keep this blog up to date.  I’m conflicted.

Who knows, maybe this will be my last blog post of the summer.  I won’t lie – after working 8 hours on a computer, the last thing I want to do is come home and write a blog post.  If anything, my posts will probably wait until the weekends.

But we’ll see.

wordCount.xpi – Part 1

So, if you recall, I was asked to write a Firefox extension that would do word counting on websites.

Originally, when I started this project, I set a goal for myself:  I copied the text from Project Gutenberg’s First Folio version of Shakespeare’s Hamlet into OpenOffice Writer, recorded the word/line/character count statistics, and set that as my projected goal for my first iteration of my extension.

But there’s a problem with this approach:  I’m supposed to be copying the behaviour of Unix’s wc, not OpenOffice Writer’s word count.  Normally, this wouldn’t be a problem – a word count is a word count, a line count is a line count, and Writer should pump out the same numbers as wc.

Not so.

In my last post, I wrote:

According to OpenOffice Writer, this text has 32230 words, 173543 characters, and 4257 lines.

However, upon passing the same text (saved in the textfile “count.txt”) through wc, I got the following output:

5302 32230 178845 count.txt

Writer and wc agree on the number of words, but disagree on the number of lines – 5302 (wc) vs 4257 (Writer).  It’s a disagreement of about a thousand lines.


Anyhow, I’m going to focus on wc’s approach to line counting – simply returning the number of newline characters in the file.

And guess what…it works.  For Hamlet, my extension pumps out:

Document statistics:

Word Count:  32230
Line Count:  5302
Character Count:  178845
Character Count (no spaces):  142368


Hamlet’s just the simple case though.  There are plenty of other cases to consider, but this is a start.

Anyhow, download here.

In this version, I’m using Mozilla’s TreeWalker implementation to stitch together the page text.  So far it seems to be working alright, but if it somehow ends up falling through, I might end up using something like Andrew Trusty’s code with the jQuery library to do the text stitching.

So there it is.  Maybe I’ll keep working on this, pretty it up a bit, etc.  However, work starts on Monday, and that’ll probably take up most of my technical attention.

We’ll see though.

