The Problem Space
Click here to read about my problem space
Related Work
During his study at Cisco Systems, Jason Cohen noticed that review requests with some form of author preparation consistently had fewer defects found in them.
Jason Cohen explains what author preparation is…
The idea of “author preparation” is that authors should annotate their source code before the review begins. Annotations guide the reviewer through the changes, showing which files to look at first and defending the reason and methods behind each code modification. The theory is that because the author has to re-think all the changes during the annotation process, the author will himself uncover most of the defects before the review even begins, thus making the review itself more efficient. Reviewers will uncover problems the author truly would not have thought of otherwise.
(Best Kept Secrets of Peer Code Review, p80-81)
Cohen gives two theories to account for the drop in defects:
- By performing author preparation, authors were effectively self-reviewing, and removed defects that would normally be found by others.
- Since authors were actively explaining, or defending their code, this sabotaged the reviewers ability to do their job objectively and effectively. There is a “blinding effect”.
In his study, Cohen subscribes to the first theory. He writes:
A survey of the reviews in question show the author is being conscientious, careful, and helpful, and not misleading the reviewer. Often the reviewer will respond to or ask a question or open a conversation on another line of code, demonstrating that he was not dulled by the author’s annotations.
While it’s certainly possible that Cohen is correct, the evidence to support his claim is tenuous at best, as it suffers from selection bias, and has not been drawn from a properly controlled experiment.
What do I want to do?
I want to design a proper, controlled experiment in an attempt to figure out why exactly the number of found defects drop when authors prepare their review requests.
My experiment is still being designed, but at its simplest:
We devise a review request with several types of bugs intentionally inserted. We create “author preparation” commentary to go along with the review request. We show the review request to a series of developers – giving some the author preparation, and some without – and ask the developers to perform a review.
We then take measurement on the number/type/density of the defects that they find.
Why do you care?
If it is shown that author preparation does not negatively affect the number of defects that the reviewers find, this is conclusive evidence to support Cohen’s claim that author preparation is good. This practice can then be adopted/argued for in order to increase the effectiveness of code reviews.
On the other hand, if it is shown that author preparation negatively affects the number of defects that the reviewers find, this has some interesting consequences.
The obvious one is the conclusion that authors should not prepare their review requests, so as to maximize the number of defects that their reviewers find.
The less obvious one takes the experimental result a step further. Why should this “blinding effect” stop at author preparation? Perhaps a review by any participant will negatively affect the number of defects found by subsequent reviews? The experiment will be designed to investigate this possibility as well.
Either way, the benefits or drawbacks of author preparation will hopefully be revealed, to the betterment of the code review process.
I’m so glad you’re proposing this!
This is one of the more interesting results from my study, and you’re right it was just an observation, not controlled. (We weren’t testing for this behavior, we just found it curious.)
This has huge ramifications if the results turn out to be positive in favor of self-review, because it means anyone, anywhere, without permission or process approval, has a new, free way of finding more bugs in their code. Awesome!
And if it turns out to be negative, that’s equally interesting because it demonstrates that “review” really does mean another pair of eyes, and that too is useful for anyone interested in quality software processes.
Good luck!