Category Archives: Code Reviews

Augmenting Code Review Tools: Screen Age and Retina Burn…

Two more ideas to augment the code review process:

Screen Age

Imagine you’re reviewing a piece of code.  There’s something like…500 lines to look at.  So you’re scrolling through the diff, checking things here, checking things there…

And in the background, the review software is recording which sections of the code you’re looking at.

Hm.

I wonder if it’d be a useful metric to know what portions of the code were being looked at the most during a review?  I believe it’s possible (at least within Firefox) to determine the scroll position of a window.  I believe it’s also possible to determine the dimensions of the entire diff page, and the offset coordinates of elements on that page.

So shouldn’t it be possible to determine which elements were on the screen, and for how long?

Imagine seeing a “heat map” over the diff, showing where the reviewer spent most of their time looking.

Jason Cohen wrote that code reviews should take at least 5 minutes, and at most 60-90 minutes.  I wonder if collecting this information would help determine whether or not a review was performed carefully enough.

Now, granted, there are plenty of ways to add noise to the data.  If I’m doing a review, and stand up to go get a sandwich, and then my neighbour visits, etc…my computer sits there, gazing at those elements, and the data is more or less worthless…

Which brings me to:

Retina Burn

Eye tracking is not a new technology.  One of my roommates works at a lab where they do experiments with eye tracking tools.  From what he tells me, the eye tracking gear in the lab is pretty expensive and heavy-weight.

But check this out:

and this:

and this:

So it looks like a webcam and some clever software can do the eye tracking trick too.  This stuff is probably far less accurate than what my roommate uses – but it’s lighter, and cheaper, and therefore more likely to find its way into homes.  So it looks like the day is coming where I’ll eventually be able to use my eyes to control my mouse cursor.

But, more interestingly, this more or less solves the problem with my Screen Age idea:  this technology can tell when you’re looking at the screen.  And it can give a pretty good guess about what area of the screen you’re looking at.

I wonder if collecting this information from code reviewers would be useful – what exact parts of the code have they looked at?  And for how long?  What have they missed?  What did they gloss over?

UPDATE: Karen Reid has also brought to my attention the possibility of using this technology to see how TAs grade assignments by tracking what they’re looking at.  Hm…

Code Reviews and Predictive Impact Analysis

A few posts ago, I mentioned what I think of as the Achilles’ Heel of light-weight code review:  the lack of feedback over dependencies that can/will be impacted by a posted change.  I believe this lack of feedback can potentially give software developers the false impression that proposed code is sound, and thus allow bugs to slip through the review process.  This has happened more than once with the MarkUs project, where we are using ReviewBoard.

Wouldn’t it be nice…

Imagine that you’re working on a “Library” Rails project.  Imagine that you’re about to make an update to the Book model within the MVC framework:  you’ve added a more efficient static class method to the Book model that allows you to check out large quantities of Books from the Library all at once, rather than one at a time.  Cool.  You update the BookController to use the new method, run your regression tests (which are admittedly incomplete, and pass with flying colours), and post your code for review.

Your code review tool takes the change you’re suggesting, and notices a trend:  in the past, when the “checkout” methods in the Book model have been updated, the BookController is usually changed, and a particular area in en.yml locale file is usually updated too.  The code review tool notices that in this latest change, nothing has been changed in en.yml.

The code review tool raises its eyebrow.  “I wonder if they forgot something…”, it ponders.

Now imagine that someone logs in to review the code.  Along with the proposed changes, the code review tool suggests that the reviewer also take a peek at en.yml just in case the submitter has missed something.  The reviewer notices that, yes, a translation string for an error message in en.yml no longer makes sense with the new method.  The reviewer writes a comment about it, and submits the review.

The reviewee looks at the review and goes, “Of course!  How could I forget that?”, and updates the en.yml before updating the diff under review.

Hm.  It’s like a recommendation engine for code reviews…”by the way, have you checked…?”

I wonder if this would be useful…

Mining Repositories for Predicting Impact Analysis

This area of research is really new to me, so bear with me as I stumble through it.

It seems like it should be possible to predict what methods/files have dependencies on other files based on static analysis, as well as VCS repository mining.  I believe this has been tried in various forms.

But I don’t think anything like this has been integrated into a code review tool.  Certainly not any of the ones that I listed earlier.

I wonder if such a tool would be accurate…  and, again, would it be useful?  Could it help catch more of the bugs that the standard light-weight code review process misses?

Thoughts?

Treasure Hunting, and Research Idea #4

Remember those research questions Greg had me and Zuzel come up with?  He’s asked us to select one, and find some tools that could help us in an experiment to answer that question.

Originally, my favourite question was my second one:  in the courses where students work in teams, why aren’t the instructors providing or encouraging code review tools?

I’ve already received two answers to this question from instructors here at UofT.

Greg has warned me that this line of inquiry might be a bit too shallow.  He warns that it’s possible that, when asked about code review, they’ll shrug their shoulders and say that they just don’t have time to teach it on top of everything else.  Karen Reid’s response echos this warning, somewhat.

And maybe Steve Easterbrook has a point – that code review is hard to assess as an assignment.  It seems he’s at least tried it. However, it appears that he was using fragments of Fagan Inspection reports as his measuring stick rather than the reviews themselves. I assert that this is where light-weight code review tools could be of some service: to actually see the conversation going on about the code.  To see the review.  I also assert that such a conversation is hard to fake, or at least, to fake well.

So, just go with me on this for a second:  let’s pretend that Steve is going to teach his course again.  Except this time, instead of collecting fragments of Fagan Inspection reports, he uses something like ReviewBoard, Crucible, or Code Collaborator to collect reviews and conversations.  Would it be worth his time?  Would it benefit the students?  Would they see the value of code review?

Reading this blog post, it seems that the Basie folk first got on the ReviewBoard band wagon because Blake Winton set a good example as a code reviewer.  I remember talking to Basie developer Bill Konrad this summer, and him telling me that he’d never go back after seeing the improvement in code quality.

Because that’s the clincher – getting the students to see the value.  You can’t make a horse drink, and you can’t get students to use a tool unless they find it useful.  And how do you show that to them?  How do you show them the value without having to call in Blake Winton?  How do you make them see?  And how do you make the process painless enough so that instructors don’t have to worry about teaching a new, confusing tool?

One of the comments on Greg’s post says the following:

My feeling is that the easier it is to review code, the more interest you’ll see from students.

Maybe that’s really all you need.

So, how about this for an experiment – take a class of students who are working on group assignments, and set them up a copy of a light-weight code review tool.  Get one of the first assignments in the class to involve a review so that they need to use the software at least once.  Now track the usage for the rest of the semester…do they still use it?  Do some use it, and not others?  If so, do the ones who use it tend to receive higher grades?  Or is there no difference?  What is the quality of their reviews when the reviews are not being marked?  What will the students think of review tool once the course is completed?

I think it’s simple enough, and I can’t believe I didn’t think about it earlier.

Some of the software I could use:

Quite a few choices for the review tool.  And I wasn’t even digging that deeply.  Perhaps I should do a quick survey across all of them to see which one would be the best fit for a CS course.  Perhaps.

The Achilles’ Heel of Light-Weight Code Review

So I had my weekly meeting with my supervisor, and fellow students Zuzel and Jon Pipitone.  Something interesting popped up, and I thought I’d share it.

If it wasn’t already clear, I dig code review.  I think it really helps get a team of developers more in tune with what’s going on in their repository, and is an easy way to weed out mistakes and bad design in small chunks of code.

But there’s a fly in the soup.

This semester, the MarkUs project has been using ReviewBoard to review all commits to the repository.  We’ve caught quite a few things, and we’ve been learning a lot.

Now for that fly:

A developer recently found a typo in our code base.  An I18n variable was misspelled in one of our Views as I18n.t(:no_students_wihtou_a_group_message).  One of our developers saw this, fixed the typo, and submitted the diff for review.

I was one of the reviewers.  And I gave it the green light.  It made sense – I18n.t(:no_students_wihtou_a_group_message) is clearly wrong, and I18n.t(:no_students_without_a_group_message) is clearly what was meant here.

So the review got the “Ship It”, and the code was committed.  What we didn’t catch, however, was that the locale string was actually named “no_student_without_a_group_message” in the translation file, not “no_students_without_a_group_message”.  So the fix didn’t work.

This is important:  the diff looked good, but the bug remained because we didn’t have more information on the context of the bug.  We had no information about I18n.t(:no_students_without_a_group_message) besides the fact that I18n.t(:no_students_wihtou_a_group_message) looked wrong.

Which brings me back to the conversation we had yesterday:  while it seems plausible that code review helps catch defects in small code blocks, does the global defect count on the application actually decrease?  Since ReviewBoard doesn’t have any static analysis tools to check what our diffs are doing, isn’t it plausible that while our diffs look good, we’re not preventing ourselves from adding new bugs into the code base?

So, the question is:  does light-weight code review actually decrease the defect count across an application as a whole?

If not, can we augment these code review tools so that they’re more sensitive to the context of the diffs that are being reviewed?

Learning to Read Things, and Code as Art

A while back, I went on a little rant about how CS students don’t learn how to read and critique code, and that because of this, they’re missing out on a huge opportunity for learning.  They’re not exercising their abilities to comprehend other people’s code.

But let’s step back for a second.  Forget code for a minute, and let’s just think about reading in general.

Check out this article that Greg Wilson forwarded to Zuzel and I.

The take home message:  reading is not a skill that can be sharpened in a vacuum.  One of the keys to increasing reading comprehension is to increase the variety of reading material to practice with.  This will help build the foundation needed to understand more sophisticated material.

Actually, this really doesn’t come as a surprise to me.  In the Drama department, before diving into a new script, we are encouraged to briefly study the time period (slang, accents, and turns of phrase in particular) and the subject matter of the play.  This builds a foundation of context, so that the script makes some sense the first time it is read.  And the first time a script is read is very important, because for me,  it lays down many of the assumptions that I carry throughout my time working with the play.  Every subsequent reading of that play builds off of the first reading.

And have you ever read T.S. Elliot’s Wasteland?  Good luck making sense of that unless you have a study guide, because that guy references a ton of stuff, and expects you to fill the gaps.

So let’s direct this back to reading source code.

Taking that article (and this video) under consideration, it seems that just slapping some code up on a projector, and getting people to talk about it might not be good enough.

Something like this might be a better idea:  check out this post by Jason Montojo.  If his CS art history course ever takes off, I’ll be first in line to sign up.

The idea of studying code as art is intruiging…maybe I’m not the only one who thinks of it like sculpture.