Category Archives: Uncategorized

Code Spelunking: Review Board Extensions

So this summer, I’m working on Review Board for the Google Summer of Code.

Until my GSoC acceptance, my romps into the code had been relatively shallow.  But with my proposal being given the green light, I’ve started doing more extensive explorations.

Review Board is built using the Django web framework.  I haven’t worked with Django before, but I have quite a bit of experience with Rails, so that should be an asset.  Using a web framework means having (relatively) predictable source code layout, and Review Board is no exception.

Djblets

At one point or another, the Review Board developers realized that a lot of their code wasn’t Review Board specific, and could be abstracted out into an external library.

That library is called Djblets.

Among other things, Djblets adds a DataGrid component for easy record sorting and pagination.  There are improvements to Django’s Authentication system.  Functions for easily displaying a user’s Gravatar.

And, low and behold, there is a branch of Djblets that provides classes and functions for giving a Django application an extension framework.  The classes are abstract enough so that, in your Django application, you can specify different types and behaviours for your Hooks.

Djblets -> Review Board

The Review Board extension branch takes these Djblets extension classes, and extends them into DashboardHooks, NavigationBarHooks, ReviewRequestDetailHooks…lots of different hooks.

So, Djblets creates the foundation abstractions.  Review Board makes these abstractions a little more specific.  And then an extension writer needs to instantiate and use these classes to design their extensions.  It sounds complicated, I know.

So Let’s Map It Out

When I start learning a new code base, I do a lot of drawing.  To me, getting to now a code base is like getting to know a city, and that means walking around it, and mapping it out.

So I’ve taken the liberty of mapping out the extension classes that I’ve found, and how they relate to one another.  Note that at the bottom of my map, a simple extension (RB Reports) is using some of those classes to hook itself into Review Board.  You can find this, and other extensions,here.

My map of the extension framework

Click here to check out my map of the current state of the extension framework

Now, before someone in the department starts complaining about my misuse of UML:  I’m not a UML guy.  I just wanted an easy piece of diagramming software, and the one that I found (Dia), did UML.  I just wanted something to draw boxes and lines. So please don’t freak out if you think I’m using the wrong symbols.

One symbol you might be wondering about is the blue quantum-flux-capacitor-implosion.

I’ll save that for a future post.

2 people like this post.

Process Improvement of Peer Code Review and Behavior Analysis of its Participants

Process Improvement of Peer Code Review and Behavior Analysis of its Participants

by WANG Yan-qing, LI Yi-jun, Michael Collins, LIU Pei-jie
SIGCSE ’08, March 12-15, 2008

If you’ve been following, I’ve been trying to figure out why code reviews aren’t a part of the basic undergraduate computer science curriculum.  The other papers and articles I’ve read so far have had less to do with the classroom, and more to do with code reviews in industry.

This paper got a little bit closer to the classroom, and, more importantly, closer to my particular question.

To begin, the paper introduces some terminology I’m not familiar with – the software crisis.  I’m familiar with the concept though:  writing good software for large systems is not a simple problem, and as computers become a bigger and more important part of our lives, this inability to easily write good code could quickly end up biting us in the collective rear.

Code review is one of several methods that the software industry has adopted to try to “tame” the software crisis.

I like this part:

Even though code reviews are time consuming, they are much more efficient than testing [19]. A typical engineer, for example,  will find approximately 2 to 4 defects in an hour of unit testing but will find 6 to 10 defects in each hour of review code [19].

What more argument do you need?  It’s just a matter of getting rid of that “time consuming” part, right?  Right…

And this is even juicier:

PCR [peer code review] is a technique which is generally considered to be effective on promoting students’ higher cognitive skills [9], since students use their own knowledge and skill to interpret, analyze and evaluate others’ work to clarify and correct it [2].

Wonderful!  I’m in my problem space!

Reading along, it seems that this paper is introducing a new, refined structure for PCR, and will detail results of a study on using that new structure in a programming course.  Cool.

The introduction ends by saying that the new structure seemed to enhance the quality of student’s work, as well as their ability to critique one another.  Great news!

It’s not all sunshine and puppies, though – they also mention that they ran into a few problems, and that they’ll be discussing those too.

So the first thing they’ve done, is tried to make the terminology clearer:

Roles

  • Author:  the student who writes the code that is being reviewed
  • Reviewer:  the person who is reviewing the code
  • Reviser:  the author, after receiving a Comments Form from a Reviewer
  • Instructor:  the teacher or qualified TA who is responsible for the class

Documents

  • Manuscript Code:  the unrevised code that is first submitted by an Author
  • Comments Form:  the comments given from the Reviewer to the Author
  • Revision Code:  the code that is revised by the Reviser after the Reviewer gives the Reviser the Comments Form (whew…follow that?)
  • Reference Solution:  the “answer” to the assignment, held by the Instructor

Now that we’ve got all the players and documents laid out, let’s take a look at the process:

Process

  • Phase 1:  The Author completes the Manuscript Code
  • Phase 2:  The Author emails the Manuscript Code to the Instructor.  Simultaneously, a blank Comments Form and a copy of the Manuscript Code is sent to a Reviewer
  • Phase 3:  The Reviewer reviews the code as soon as possible, filling in the Comments Form.
  • Phase 4:  The Reviewer sends the completed Comments Form back to the Author, and also sends a carbon copy to the Instructor
  • Phase 5:  After receiving the Comments Form, the Reviser (who was originally the Author…oh boy…almost went cross-eyed, there) makes the appropriate alterations to the original Manuscript Code, referencing the Comment Form where appropriate.  The completed Revision Code is emailed to the Instructor.
  • Phase 6:  The Instructor should now have a copy of the original Manuscript Code, the completed Comments Form, and the final Revision Code.  The Instructor should be able to check that the author and reviewer did their work properly.

Wow.  What a convoluted way of saying something simple.  They even included a diagram, with lots of arrows.  Somehow, I think this could be said simpler.  Oh well.

It also sounds like a lot of emailing.  You’re balancing your course on the reliability of the email protocol?  Errr….

Well, let’s see what problems they ran into…

  1. The assumption that all participants would carefully and responsibly carry out each phase of the process was faulty.  This may have been due to “careless authors, irresponsible reviewers and busy instructors in the review process”.
  2. Some students lack the coding ability to either:
    1. Produce code that is readable and reviewable in a constructive way
    2. Review code in a constructive, or informed way
  3. The process is difficult to control due to the reliance on email (no kidding!)
    1. Some students would not submit Manuscript Code or Comment Forms on time
    2. Some students would submit multiple copies of their Manuscript Code, due to an inherent mistrust of the reliability of email
  4. There was opportunity for students to “game” the process to their advantage. In this particular study, there was very little control of who was doing what.  Though a particular Author was supposed to write the Manuscript Code, this wasn’t enforced, and there was an occasion where another student wrote the code instead.  Same with review writing, and revision writing.  Yeah, cheating is always a problem.

The paper then goes into some discussion about the observed behaviour of Authors and Reviewers.  They noted that most students did not enjoy reviewing very poorly written code, and don’t give their best efforts on reviews for such code.  Mere encouragement from the instructor was not enough to compel them to give their best reviews either.  The paper suggests finding some way of making Reviewers review code more carefully; perhaps through awarding bonus marks.

Behaviour of Instructors was also analyzed.  The paper mentioned that Instructors with large class sizes might try to cut down on their workload by only viewing the Comment Forms that the Reviewers had provided.  But this strategy does not give the Instructor the entire story, and is open to manipulation from students.

The paper ends with a discussion about group formations, and how best to diffuse student cheating conspiracies.

At the last moment, they suggest some “web-based [application] with a built-in blind review mechanism” be developed.

Hm.

Be the first to like.

“Code reviews” by Arjen Markus (2009)

Code Reviews

by Arjen Markus
Deltares, The Netherlands
ACM Fortran Forum, August 2009, 28, 2

This is one of the first papers I found.  Consider it my “warm up” paper.

According to the header, Arjen Markus works for “Deltares”, and after a quick Google-hunt, I found out that Deltares is a “new independent Dutch institute for national and international delta issues”.

Upon closer inspection, it seems that Markus’ paper is concerned with what reviewers should be looking for during code reviews:

“What should you be looking for in the code?  It is not enough to check that the code adheres to the programming standard of the project it belongs to.  Such a standard may not exist, be incomplete or be focussed on layout, not on questionable constructs that are a liability.  With this article I would like to fill in this practical gap, at least partly.” (Page 4)

This isn’t exactly what I set out to look for, but I thought I’d give it a once-over anyways.

Markus’ paper is not what I would call a rigorous scientific publication.  There is no empirical data, no hypothesis, none of that good ol’ scientific method stuff.  Instead, it’s more akin to a “do” and “do not” set of advice and examples that one would find in a software engineering textbook.

A FORTRAN software engineering textbook, to be more precise.  Markus’ examples are all in FORTRAN.

Broken down simply, Markus has four principles, or bits of advice:

“The importance of being explicit”

Essentially, this means to be clear with what you’re doing in the code.  It’s common sense stuff:  don’t be overly clever, be readable, don’t use magic numbers or strings, document your code, group related routines into the same modules, use information hiding in your modules when appropriate, clear and precise error messages, etc.

“Don’t go your own way”

Markus advises developers to stick to an agreed coding standard / style guide.  Don’t reinvent the wheel – instead, use typical solutions to problems that arise.  “Don’t go against the grain” (Page 7).

“Be careful out there”

Markus advises developers to watch out for documented language quirks, common language pitfalls, etc.  This is followed by numerous examples in FORTRAN.

“Curiouser and curiouser”

Markus asks to keep an eye out for “a lack of attention to design, to readability and other aspects that are important for the program in the long run” (Page 11).    He also repeats a few things from “the importance of being explicit” – mainly, to make sure that the code is organized in a way that makes sense to the developers.

I don’t know.  I don’t think I’m the target audience here.  In hindsight, I found the information in this paper to be very general, and rather self-evident.  The only thing I seemed to learn was a bit about FORTRAN quirks.

I think I need to be less laissez faire in my paper selections.  This one didn’t help me find what I was looking for, and I should have seen that from the abstract.  Bah.

EDIT:  Why did I waste my time searching ACM when more interesting information was waiting right under my nose?  I think I have an idea what to review next.

Be the first to like.

Blog Posts from Google Docs

So it turns out that you can publish Google Docs documents to blogs that groove with the MoveableType, Blogger, and MetaWeblog publishing protocols.

Like WordPress, for example.

This document that you’re reading, was published to my blog via Google Docs.

Cool.

Be the first to like.

Hello Graduate School

Well, it’s official.  Today, I handed in my acceptance form for Graduate Studies here at the University of Toronto in the Computer Science Department!

Now I just need to keep my cGPA above 3.2…

Assuming that I get my B.Sc. without incident (because who knows, maybe the University will fight me for it…citing missing courses, insufficient credits, etc.  I’ve checked all of this with New College and the Drama/CS departments, but I’ve been here too long not to be ready for bureaucratic tom-foolery…), I think I’ve got an interesting year or so ahead of me.

This summer is already looking quite busy, but here’s what I’m looking forward to next year:

Interesting Courses

I’ve been leafing through the Graduate course calendar, looking for courses that sound good and fulfill my breadth requirement.  Here are the courses I’ve underlined as “interesting”.  Note that I haven’t checked the timetable at all to see if these conflict with one another.  They just sound interesting:

  • 2125H – Software Development Tools and Practices:
    This course is an introduction to software consulting practices. Students will be paired with clients whose problems require advanced knowledge of computer science to solve, and will then work under the direction of the course instructor to develop and deliver useful results. Topics will include requirements elicitation, scope negotiations, deployment concerns, and disaster recovery.
  • 2412H – Computer Algebra
    Algebraic theory that underlies symbolic and algebraic manipulation by computer. Chinese Remainder and interpolation theory, fast algorithms for computations with integers, polynomials and power series. Newton and Hensel iteration, polynomial and integer gcd algorithms, factorization of polynomials, the fast Fourier transform, solving systems of polynomial equations, Gröbner bases. The Maple computer algebra system.
  • 2426H – Fundamentals of Cryptography
    Rigorous definitions of security for pseudo-random generators, digital signature schemes, secure hash families, and public-key encryption.. Methods (including number-theoretic conjectures) for constructing these secure cryptographic primitives. Methods for using secure primitives to achieve secure session-key exchange and secure sessions.
  • 2511H – Natural Language Computing
    Introduction to techniques involving natural language and speech in applications such as information retrieval, extraction, and filtering; intelligent Web searching; spelling and grammar checking; speech recognition and synthesis; and multi-lingual systems including machine translation. N-grams, POS-tagging, semantic distance metrics, indexing, on-line lexicons and thesauri, markup languages, collections of on-line documents, corpus analysis. Python software.
  • 2529H – Computer Animation
    The primary focus of this course is on kinematic and dynamic techniques for character animation. Topics include physical modeling and simulation, motion planning, control and learning algorithms, locomotion, motion trajectory optimization, scripting languages, motion capture, and motion editing. Students will implement algorithms and interactive animation tools and then use these to produce motion for animations.
  • KMDI1001 – Fundamental Concepts in Knowledge Media Design
    Knowledge media are systems incorporating computer and communications technology that enhance human thinking, creativity, communication, collaboration, and learning. Examples include the Web, email, instant messaging, knowledge management systems, digital libraries, collaborative virtual environments, video conferencing environments, and webcasting systems.
    This course reviews the emerging field of knowledge media design, and the use of digital media for communications, collaboration, and learning.

I’m also looking into the possibility of hopping (back) over to the Computer Engineering Department to see if I can take ECE568H1 – Computer Security.  My general dislike for engineering courses notwithstanding, this still sounds like an interesting possibility.

(Note to self:  the word “notwithstanding” just felt right to put there, but is that correct usage?  I have no idea…)

Thesis

Well, it’s no surprise – a Master’s student is expected to produce a paper in order to graduate.  I have absolutely no idea what I’ll be doing my thesis on, but the number of possibilities is exciting.

It’d be nice to somehow merge Drama and Computer Science into a thesis – and I think it’d be an appropriate finale for my career here at UofT.  It’s something to mull over while I have time, anyhow.

Launching OLM

OLM is going up in the fall.  Whether or not I work on it this summer, as a TA, I’ll probably be using the software to mark and return student code.  “Eating one’s own dog-food” might be appropriate here – though I prefer, “eating the sandwich I just helped to make”.

Drama

A lot of my friends from the Drama department are either graduating in June, or staying on for one more year.  A bunch who are graduating are staying in the city, and the prospect of doing some work with them outside of school is exciting.

We’re all very spoiled here at the UCDP – modest budget, multiple rehearsal spaces, etc… working on our own stuff outside of school might be a very humbling experience.  Humbling as in, rehearsing in alley ways or rooftops, and using an audience holding flashlights instead of our own lighting grid.  Cool.

Operation: Party Mansion

This one is still in the works.  Some buddies of mine from highschool (who are also my roommates) are looking to buy some property in, or around downtown Toronto.

This may sound ambitious, foolhardy, and naive, but we’re serious, and a lot of legwork has already been done in order to get this moving.

Ideal scenario?  Next year, I’ll be living in a big house with my highschool buddies.  And isn’t that living the dream?

Anyhow, as I was saying, my Grad school papers are in, so my brain is going to put that on the backburner for a while.  Now I have to focus on my CSC301 midterm for this Wednesday, and an evidentiary analysis on CIA/JFK Assassination links for INI304.

Be the first to like.