pcr | A Blog by Mike Conley

So I had my weekly meeting with my supervisor, and fellow students Zuzel and Jon Pipitone. Something interesting popped up, and I thought I’d share it.

If it wasn’t already clear, I dig code review. I think it really helps get a team of developers more in tune with what’s going on in their repository, and is an easy way to weed out mistakes and bad design in small chunks of code.

But there’s a fly in the soup.

This semester, the MarkUs project has been using ReviewBoard to review all commits to the repository. We’ve caught quite a few things, and we’ve been learning a lot.

Now for that fly:

A developer recently found a typo in our code base. An I18n variable was misspelled in one of our Views as I18n.t(:no_students_wihtou_a_group_message). One of our developers saw this, fixed the typo, and submitted the diff for review.

I was one of the reviewers. And I gave it the green light. It made sense – I18n.t(:no_students_wihtou_a_group_message) is clearly wrong, and I18n.t(:no_students_without_a_group_message) is clearly what was meant here.

So the review got the “Ship It”, and the code was committed. What we didn’t catch, however, was that the locale string was actually named “no_student_without_a_group_message” in the translation file, not “no_students_without_a_group_message”. So the fix didn’t work.

This is important: the diff looked good, but the bug remained because we didn’t have more information on the context of the bug. We had no information about I18n.t(:no_students_without_a_group_message) besides the fact that I18n.t(:no_students_wihtou_a_group_message) looked wrong.

Which brings me back to the conversation we had yesterday: while it seems plausible that code review helps catch defects in small code blocks, does the global defect count on the application actually decrease? Since ReviewBoard doesn’t have any static analysis tools to check what our diffs are doing, isn’t it plausible that while our diffs look good, we’re not preventing ourselves from adding new bugs into the code base?

So, the question is: does light-weight code review actually decrease the defect count across an application as a whole?

If not, can we augment these code review tools so that they’re more sensitive to the context of the diffs that are being reviewed?

Exploring Peer Review in the Computer Science Classroom

by Scott Turner and Manuel A. Pérez Quiñones

I’m new at reading papers, so I’ve gotten used to 5 or 10 page-ers. This looks like the big one, though – 69 pages.

I assume they have something significant to say. The title sure sounds interesting, especially considering what I’m looking for.

Anyhow, it’s a big paper, and there’s a lot to go through. Let’s get started.

Right off the bat, I can see that they’re interested in the same problem that I am:

Peer review, while it has many known benefits (Zeller 2000; Papalaskari 2003; Wolfe 2004; Hamer, Ma et al. 2005; Trytten 2005), and is used extensively in other fields (Falchikov and Goldfinch 2000; Topping, Smith et al. 2000; Liu and Hansen 2002; Dossin 2003; Carlson and Berry 2005) and in industry (Anderson and Shneiderman 1977; Anewalt 2005; Hundhausen, Agrawal et al. 2009), is not as widely used in the computer science curriculum. This may be due, in part, to a lack of information about what, who and when to review in order to achieve specific goals in computer science.

This next part got my attention:

That is not to say that the literature is silent on these issues. The studies make these choices but there are few reasons given for the decisions and even fewer comparisons performed to show relative value of those options.

Holy smokes. A pretty bold critique of those previous papers (at least one of which, I’ve already reviewed). It sounds like Scott and Manuel were as disappointed in some of the peer code review literature as I was. If I was part of an audience that was being read this paper aloud, I would “wooo!” at this point.

What is needed is a clearer understanding of the requirements that the discipline imposes on the peer review process so that it may be effectively used.

Cool – I’m looking forward to seeing what they dug up.

Reading onwards through the introduction, I’m seeing the same basic arguments for peer code review that I’ve seen elsewhere. I’ll summarize:

PCR (peer code review) involves active use of higher levels of Bloom’s Taxonomy: synthesis, analysis and evaluation, both for reviewers and review-ees
PCR prepares students for industry, since code review is (or should be) a common part of professional software development

Soon after, a good point is brought up – PCR is potentially a beneficial learning activity, but it all depends on the goals of a particular assignment. A particular review structure may be better for improving code quality, and another might be better for increasing student motivation. These considerations need to be taken into account when choosing a PCR structure.

By PCR structure, I think the writers mean:

What is being reviewed – source code vs design diagrams, for example
Who is being reviewed – students could review one another, or they could all review a piece of code provided by the instructor
When the review occurs – what level of students should take part in PCR? Is there a minimum grade level that must be achieved before PCR is effective? And when, during a project, should PCR happen? Early in the design process, or after-the-fact – like a “code autopsy”?

These are good questions. No wonder this paper is so long – I think their experiment is going to try to answer them all.

And just when things are starting to get good…they go into a literature review. Yech. I know it’s important to lay the groundwork, but I think I just felt my eyes turn to oatmeal.

The literature starts by briefly listing off those sources again – past papers who have tried to deal with this topic. They categorize them based on what the papers try to deliver, and then they give them a light slam for not having a scientific basis on which to form the structures of their PCR structures. I heartily agree.

The first part of the literature review discusses the benefits of peer review in other fields. Papers as far back as the 1950’s are cited. I think it goes without saying that having other people critique your work can be a great way of receiving constructive feedback (ask any playwrite, for example). I guess these fellas feel they have to be pretty rigorous though, and really ground their argument in some solid past work. Power to them.

An interesting notion is brought up – peer reviews over an extended length of time within the same groups helps cultivate “interaction between…peers and for the building of knowledge”. One-time reviews, however, “[lends] itself more to a cognitive approach…more attention can be paid to the changes in the students’ thought processes”. Hm.

This fellow named Topping seems to be quite popular with these guys. Apparently, he came up with something called “Topping’s Peer Review Topology”, and I get the feeling that he is one of their primary sources in figuring out different ways of constructing PCR structures.

Oh, and an important morsel just got snuck in:

We are interested in how the student is affected by the acts of reviewing and being reviewed rather than by the social interaction occurring during the process.

So that sense of community that that other paper was talking about – not being looked at here.

The paper then goes on to chisel down some of the jargon from Topping’s Topology, and make it fit the field of Computer Science. By doing this, the writers simpy reiterate the variables they’re going to be playing with: what, who, and when.

The paper then dives into a two page summary of some peer review papers, and the various results that they found. They note a few instances where peer review seemed to improve student performance, and other cases where peer review resulted in semi-disaster. There are just as many theories as there are conflicting accounts. From reading this stuff, it seems that peer review is a vast and complicated topic, and nobody seems to really have a firm grasp on it just yet.

Likewise, rubric creation seems to be a bit of a contentious topic:

While there seems to be a general consensus that rubrics are important and that they improve the peer review activity, there is not as much agreement on how they should be implemented.

However, it is clear that rubrics are useful in peer review for novice reviewers:

Rubrics can supply the guidance students need to learn how to evaluate an assignment. It provides the needed scaffolding until the students are comfortable witht he process and the domain to make correct judgements.

Makes sense to me.

The paper then asks an important question: what makes a “successful” review in the education context? Greater understanding? Learning new concepts? Improved grades? Better designs? Fewer code defects?

The answer: it really depends on the instructor, and what the course wants from the PCR process. For example, what is more important – having the reviewer learn to review? Or having the review-ee receive good feedback? Or both? PCR is complex, and has lots of things to consider…

The next section highlights how technology is used to support peer review. One particularly interesting example, is of a Moodle module that allows for peer reviews on assignments. Apparently, the authors are fans. I’ve never used Moodle before, and haven’t yet found the module that they’re talking about, but it sounds worth investigation.

The very next section details their experiment – their method, their data, and their results. However, this post is getting a bit long, so I’m going to stop here, and continue on in a second post.

Stay tuned for the exciting conclusion!

A Blog by Mike Conley

The personal blog of a Toronto based software mechanic, musician, sound designer, and theatre enthusiast.

Tag Archives: pcr

The Achilles’ Heel of Light-Weight Code Review

Exploring Peer Review in the Computer Science Classroom: Part 1

Exploring Peer Review in the Computer Science Classroom

by Scott Turner and Manuel A. Pérez Quiñones