If it wasn’t already clear, I dig code review. I think it really helps get a team of developers more in tune with what’s going on in their repository, and is an easy way to weed out mistakes and bad design in small chunks of code.
But there’s a fly in the soup.
This semester, the MarkUs project has been using ReviewBoard to review all commits to the repository. We’ve caught quite a few things, and we’ve been learning a lot.
Now for that fly:
A developer recently found a typo in our code base. An I18n variable was misspelled in one of our Views as I18n.t(:no_students_wihtou_a_group_message). One of our developers saw this, fixed the typo, and submitted the diff for review.
I was one of the reviewers. And I gave it the green light. It made sense – I18n.t(:no_students_wihtou_a_group_message) is clearly wrong, and I18n.t(:no_students_without_a_group_message) is clearly what was meant here.
So the review got the “Ship It”, and the code was committed. What we didn’t catch, however, was that the locale string was actually named “no_student_without_a_group_message” in the translation file, not “no_students_without_a_group_message”. So the fix didn’t work.
This is important: the diff looked good, but the bug remained because we didn’t have more information on the context of the bug. We had no information about I18n.t(:no_students_without_a_group_message) besides the fact that I18n.t(:no_students_wihtou_a_group_message) looked wrong.
Which brings me back to the conversation we had yesterday: while it seems plausible that code review helps catch defects in small code blocks, does the global defect count on the application actually decrease? Since ReviewBoard doesn’t have any static analysis tools to check what our diffs are doing, isn’t it plausible that while our diffs look good, we’re not preventing ourselves from adding new bugs into the code base?
So, the question is: does light-weight code review actually decrease the defect count across an application as a whole?
If not, can we augment these code review tools so that they’re more sensitive to the context of the diffs that are being reviewed?
Maybe it does. If a team uses a review tool, their members will know more about the different parts of the software (another research question). Thus they may notice possible external effects of a local change.
Not to split hairs, but I don’t completely agree with this part: “…the diff looked good, but the bug remained…”
*A* bug remained, but the bug that was detected by code review *was* fixed.
To me, there were *two* bugs, not one. The string in the locale file was:
So there are two bugs in the code:
1. Using “students” instead of “student”
2. Using “wihtou” instead of “without”
The code review caught the second bug, but not the first. Like I said, I’m kind of splitting hairs here. 🙂
The more interesting point you raise is: “isn’t it plausible that while our diffs look good, we’re not preventing ourselves from adding new bugs into the code base?”
Yes, that is completely plausible. The key is this: what does it really mean for a diff to “look good”? In this case, if the review does not include the locale file, then it is easy for the diff to look good.
If you have a static analysis tool that can do the grunt work of verifying the locale files, then great! But if not, that’s a problem….
*Some* bugs can be caught by using static analysis tools, others cannot. For those that cannot be caught by a static analysis tool, you’re better off doing code review than not because at least you have an opportunity for someone to spot the problems. Assuming, of course, that all the relevant files are included in the review.
Reviews catch bugs – reviews also miss bugs. At the end of the day, the code review is only as thorough as the reviewers choose for it to be.
The fact that the code review didn’t fix the issue doesn’t indicate a failure in the code review process, but rather a failure in the overall QA process. Code review is just one element of an effective QA effort. Ideally you’d couple it with automated testing (unit and functional), end user testing (either ad-hoc or based on test plans) and static analysis.
Another way to approach this problem is to ask “Why didn’t our automated testing catch this issue?” Granted, testing i18n’ed apps is problematic due to the sheer volume of validations you need to make, but technically, it’s not hard to automatically test that the correct string is being used at runtime.
Suppose the person writing the code hadn’t mis-spelled ‘without’. Do you think that code review should have caught the issue then? I would much rather rely on automated tests to do that sort of mind-numbing validation 😉
I think it’s fair to say that code review is only one slice of the QA pie. Testing would be another.
And perhaps it goes without saying that code review is good at catching particular *types* of bugs, static analysis catches another type…testing, another.
Clearly, code review is no silver bullet. Just because you had a ton of eyes swarming over you code, and you got a “ship it” from your reviewers, doesn’t mean you didn’t introduce a bunch of new show-stopper bugs.
So I guess my take home message is that “ship it” is not necessarily equal to “everything is dandy”.
Just a friendly warning to fellow beginners who are just starting the code review habit. 🙂