Author Archives: Mike Conley

Code Spelunking: Review Board Extensions

So this summer, I’m working on Review Board for the Google Summer of Code.

Until my GSoC acceptance, my romps into the code had been relatively shallow.  But with my proposal being given the green light, I’ve started doing more extensive explorations.

Review Board is built using the Django web framework.  I haven’t worked with Django before, but I have quite a bit of experience with Rails, so that should be an asset.  Using a web framework means having (relatively) predictable source code layout, and Review Board is no exception.

Djblets

At one point or another, the Review Board developers realized that a lot of their code wasn’t Review Board specific, and could be abstracted out into an external library.

That library is called Djblets.

Among other things, Djblets adds a DataGrid component for easy record sorting and pagination.  There are improvements to Django’s Authentication system.  Functions for easily displaying a user’s Gravatar.

And, low and behold, there is a branch of Djblets that provides classes and functions for giving a Django application an extension framework.  The classes are abstract enough so that, in your Django application, you can specify different types and behaviours for your Hooks.

Djblets -> Review Board

The Review Board extension branch takes these Djblets extension classes, and extends them into DashboardHooks, NavigationBarHooks, ReviewRequestDetailHooks…lots of different hooks.

So, Djblets creates the foundation abstractions.  Review Board makes these abstractions a little more specific.  And then an extension writer needs to instantiate and use these classes to design their extensions.  It sounds complicated, I know.

So Let’s Map It Out

When I start learning a new code base, I do a lot of drawing.  To me, getting to now a code base is like getting to know a city, and that means walking around it, and mapping it out.

So I’ve taken the liberty of mapping out the extension classes that I’ve found, and how they relate to one another.  Note that at the bottom of my map, a simple extension (RB Reports) is using some of those classes to hook itself into Review Board.  You can find this, and other extensions,here.

My map of the extension framework

Click here to check out my map of the current state of the extension framework

Now, before someone in the department starts complaining about my misuse of UML:  I’m not a UML guy.  I just wanted an easy piece of diagramming software, and the one that I found (Dia), did UML.  I just wanted something to draw boxes and lines. So please don’t freak out if you think I’m using the wrong symbols.

One symbol you might be wondering about is the blue quantum-flux-capacitor-implosion.

I’ll save that for a future post.

Some Notes About My Experiment Design

This is mostly a reminder note to myself, but I thought I’d post it publicly.

So, remember the experiment I’m conducting?  I’ve been testing out various components of it while I wait for ethics board approval, and some interesting questions have come up.  Some of these questions have already been raised in other posts (and in comment threads – thanks for the discussion, all!), but I want to summarize them here.

Access to the Internet

When participants are working on the assignments, they will be given full access to the Internet.

I had a bunch of conversations with undergrad instructors here at UofT, and across the board, during programming exercises, students have full access to the Internet.  They’re prohibited from just copying and pasting code from somewhere else into their assignments, but they can certainly look at online documentation and examples to get some ideas.  So I’m going to allow this as well, in order to better model an actual undergraduate assignment.

I’m also considering writing a script that will take a screen capture every 30 seconds or so in the background.  This way,  I can quickly get a sense of the participants activity during the assignments.  This should hopefully give me some idea of how much time they’re spending in the browser, and how much time they’re spending writing Python.

Participant Programming Ability

The only prerequisite for my participants is that they must have 4+ months of Python programming experience.  I’m not filtering out the stronger or weaker programmers.  I’m taking them all.

Why?  Because I think this will give me a more accurate model of the composition of a first or second year programming course.  From my experience, students in first and second year programming courses have a wide spectrum of strengths and abilities.

But what if they don’t complete any of my assignments?  What if they stare blankly at the screen, or give up, or just surf Reddit because they can’t understand what I’m asking from them?  That’s OK – completion of the assignments is not necessary, and again – this relaxation helps to better model a real programming course (because there are invariably students who don’t start, or don’t finish a programming assignment).

And it’s OK if they don’t do well.  I’m taking a reading of their programming abilities with the first assignment, getting them to do peer grading, and then taking a reading with a second assignment.  I don’t care what absolute grades they get, I care about what happens after they’ve done the grading.

I might also see some interesting trends – for example, participants who perform poorly on the first assignment might benefit greatly from the peer grading, whereas participants who perform strongly on the first assignment might see little benefit.  Or vice-versa.  Or neither.  Granted, I probably won’t be able to collect enough participants to make such statements with statistical strength, but if a signal appears to be there, it’d be an interesting direction for future work.

30 Minute Time Limit

Students will have 30 minutes maximum to complete each assignment.  And that’s absolute – it includes reading the assignment, surfing the Internet, and coding it up.

This is because a hard time limit like this, again, better models the state of classrooms as they are.  In an open-book exam, you’re not timed on how long you’re spent only writing, with the clock paused as you glance at your textbook.  You have a firm time limit, and that’s it.

In an ideal world, students would be given a more personal level of teaching, as opposed to this mechanical, factory-floor approach.  But if I (or anybody) is ever going to implement these ideas into a real classroom setting, I’ll want to make sure it can be easily adapted into the teaching environment as it already is.

My GSoC Project: Review Board Extensions

If you didn’t already know, Review Board is an open-source web-based code review tool.  The MarkUs Team has been using Review Board for pre-commit code review for about a year now.  This has given the team a number of advantages:

  1. For a team that usually has a 4 month turnover, this allows us to quickly get new team members up to speed with how to contribute to MarkUs.  We review every change that they propose, and give them tips/guidance on how to make it fit in well with the application.  They learn, and the applications code stays healthy.
  2. We catch defects before they enter the code base.  Simple as that.
  3. We get a good sense of what other people are working on, and what is going on in the code.  Review Board has become a central conversation and learning hub for the developers on the MarkUs team.

So, the long and the short of it:  I like Review Board.  Review Board helps us write better code.  I want to make Review Board better.

So what am I proposing?

How to Avoid A Bloated Software Monster

You can never make some people happy.

No matter how decent your software is, someone will eventually come up to you and say:

Wow!  Your software would be perfect if only it had feature XYZ!  Sadly, because you don’t have feature XYZ, I can’t use it.  Please implement, k thx!

And so you either have to politely say “no”, and lose that user, or say “yes”, and add feature XYZ to the application.  And for users out there who don’t need, or don’t care about feature XYZ, that new feature just becomes a distraction and adds no value.  Make this happen a bunch of times, and you’ve got yourself a bloated mutha for a piece of software.

And we don’t want a bloated piece of software.  But we do want to make our users happy, and provide feature XYZ for them if they want it.

So what’s the solution?  We provide an extension framework (which is also sometimes called a plug-in architecture).

An extension framework allows developers to easily expand a piece of software to do new things.  So, if a user wants feature XYZ, we (or someone else) just creates and make available an extension that implements the feature.  The user installs the extension, activates it, and bam – our user is happy as a clam with their new feature.

And if we make it super-easy to develop them, third-party developers can write new, wonderful, interesting extensions to do things that…well, we wouldn’t have considered in the first place. It’s a new place for innovation.  What’s that old cliché?

If you build it [the plug-in framework], they will come [the third-party developers who write awesome things]

And the developers do come.  Just look at Firefox add-ons or WordPress plugins.  Entire ecosystems of extensions, doing things that the original developers would probably have never dreamed of doing on their own.  Hell, I’ve even written a Firefox add-on. And users love customizing their Firefox / WordPress with those extensions.  It adds value.

So we get wins all over the place:

  • Our user gets their feature
  • The software gets more attractive because it’s flexible and customizable
  • The original software developers get to focus on the core piece of software, and let the third-party developers focus on the fringe features

And this is where I think I can help Review Board.

(Before I go on, if you’re interested, here’s another article on the how and the why of plug-in architectures)

Review Board Extensions

So if you look at the Review Board Wiki, or glance at the mailing lists you see numerous requests from users for new features, for example:

It would be nice if the review board had a “next comment” button that is always available to click, or had a collapse/expand button. This would make it easier to see other people’s comments in cases like this.

It will be nice to have post-commit support. Instead of every post-commit review being a separate URL, if we could setup default rules for post-commit reviews to update an existing review providing the diff-between-diff features, it would be very useful.

The Review Board developers could smell the threat of bloated feature-creep from a mile away.  So, in a separate branch, they began working on integrating an extension framework into Review Board.

The extension branch, however, has been gathering dust, while the developers focus on more critical patches and releases.

My GSoC proposal is to finish off a draft of the extension framework, document it, and build a very simple extension for it.  My simple extension will allow me to record basic statistics about Review Board reviewers – for example, how long they spend on a particular review, their inspection rate, etc.

Having been a project lead MarkUs for so long, it’s going to be a good experience to be back on “the bottom” – to be the new developer who doesn’t entirely have a sense of the application code yet.  It’s going to be good to go code spelunking again.  I’ve done some preliminary explorations, and it’s reminding me of my first experiences with MarkUs.  Like a submarine using its sonar, I’m slowly getting a sense of the code terrain.

I’ll let you know what my first few sweeps find.

Ping!

I’ve done it again:  I’ve let dust gather on my blog.

Quick update:

  • I’ve finished my courses for this semester, and have gone into full-blown research mode.
  • My research proposal is going through ethics review, in order to make sure that I’m not going to blow things up (or hurt anybody if I do)
  • While my paperwork is reviewed, I’m refining my procedure and apparatus.  Better and better.
  • I’ve been accepted into Google Summer of Code this year – I’ll be working on Review Board.  Details about my project will be the subject of an upcoming post, which I will toss up shortly.
  • I may or may not be co-directing a radio play.  I’ll let you know.
  • The MarkUs team is about to release version 0.7, and a fresh batch of Summer students will soon be here at UofT to work on it!
  • I have not forgotten about the UCDP trip to Poland.  I still have to tell you what we saw and did at Auschwitz.  Cripes – it’s almost a year since I returned, and I’m only half-way through the whole story.  And there’s a ton more to tell.  Coming soon.

Stay tuned.

Does Peer Grading Make Students Better Programmers?

The Question

My past few blog posts have been concerned with the usefulness of peer grading.  Steve Joordens showed that peer grading was pedagogically useful for first-year psych students…but what about computer science students?  Would they learn from it?  Would they become better programmers?

We don’t know.

Maybe it’s time to find out.

The Experiment

It’s pretty simple, actually.

I have two groups of students.  Let’s call them groups A and B.

For each student in A, have them complete a simple programming assignment (call it P1).  Once they’re finished, have them complete a second simple programming assignment (call it P2).

For each student in B, have them complete P1.  Once that’s done, have them view 5 or 6 different mocked up submissions, also for P1.  For each submission, have the students fill out a rubric and assign a grade. Once finished, the students then complete P2.

Then, I get some fellow graduate students to mark my mocked up submissions, the group A P1/P2 submissions, and the group B P1/P2 submissions.

If grading made the students better programmers, we should see an increase in the number of marks given to the students in group B for P2.

Bonuses, and Other Concerns

This experiment is nice and simple. And, besides showing if peer grading makes students better programmers, it gives us a couple of bonuses:

  • It tells us if graduate students tend to agree on what marks to give to submissions.  If they don’t agree, and the marks wildly differ…we might have a problem
  • It tells us if some number of students can, on average, approximate the grade a TA would give on a submission
  • It can tell us the average inspection rate for both students and TAs

I’ll have to do randomization here and there to eliminate ordering effects – for example, randomizing the criteria on the rubric, randomizing which assignments go first and second, randomizing the order in which the mock-up submissions are shown, etc.

One thing to consider though:  what effect does simply seeing the rubric have on students?

I’ve been in courses where I’ve not been allowed to see the marking rubric for some assignment.  It’s frustrating.  Seeing the rubric helps me focus on the areas that I’ll be marked on.

So what if just seeing the rubric makes the students “better programmers”?  One way to counteract this would be to have the rubric for the second assignment be quite a bit different than the one for the first assignment.

Statistics

Oh yeah.  Stats.  Not my strongest subject.  I’m going to have to brush up on this (and probably enlist some help within the department) if I’m going to do this properly.  I’m probably not going to get as many participants as I think I will…so I have to accommodate small N.  Hrm.

Anyhow, this is where my summer experiment seems to heading.  What do you think?  I’m all ears.