I have successfully run my experiment on 30 participants.
Both of my graders have finished marking.
I’ve begun data analysis. Details soon.
I have successfully run my experiment on 30 participants.
Both of my graders have finished marking.
I’ve begun data analysis. Details soon.
This is mostly a reminder note to myself, but I thought I’d post it publicly.
So, remember the experiment I’m conducting? I’ve been testing out various components of it while I wait for ethics board approval, and some interesting questions have come up. Some of these questions have already been raised in other posts (and in comment threads – thanks for the discussion, all!), but I want to summarize them here.
When participants are working on the assignments, they will be given full access to the Internet.
I had a bunch of conversations with undergrad instructors here at UofT, and across the board, during programming exercises, students have full access to the Internet. They’re prohibited from just copying and pasting code from somewhere else into their assignments, but they can certainly look at online documentation and examples to get some ideas. So I’m going to allow this as well, in order to better model an actual undergraduate assignment.
I’m also considering writing a script that will take a screen capture every 30 seconds or so in the background. This way, I can quickly get a sense of the participants activity during the assignments. This should hopefully give me some idea of how much time they’re spending in the browser, and how much time they’re spending writing Python.
The only prerequisite for my participants is that they must have 4+ months of Python programming experience. I’m not filtering out the stronger or weaker programmers. I’m taking them all.
Why? Because I think this will give me a more accurate model of the composition of a first or second year programming course. From my experience, students in first and second year programming courses have a wide spectrum of strengths and abilities.
But what if they don’t complete any of my assignments? What if they stare blankly at the screen, or give up, or just surf Reddit because they can’t understand what I’m asking from them? That’s OK – completion of the assignments is not necessary, and again – this relaxation helps to better model a real programming course (because there are invariably students who don’t start, or don’t finish a programming assignment).
And it’s OK if they don’t do well. I’m taking a reading of their programming abilities with the first assignment, getting them to do peer grading, and then taking a reading with a second assignment. I don’t care what absolute grades they get, I care about what happens after they’ve done the grading.
I might also see some interesting trends – for example, participants who perform poorly on the first assignment might benefit greatly from the peer grading, whereas participants who perform strongly on the first assignment might see little benefit. Or vice-versa. Or neither. Granted, I probably won’t be able to collect enough participants to make such statements with statistical strength, but if a signal appears to be there, it’d be an interesting direction for future work.
Students will have 30 minutes maximum to complete each assignment. And that’s absolute – it includes reading the assignment, surfing the Internet, and coding it up.
This is because a hard time limit like this, again, better models the state of classrooms as they are. In an open-book exam, you’re not timed on how long you’re spent only writing, with the clock paused as you glance at your textbook. You have a firm time limit, and that’s it.
In an ideal world, students would be given a more personal level of teaching, as opposed to this mechanical, factory-floor approach. But if I (or anybody) is ever going to implement these ideas into a real classroom setting, I’ll want to make sure it can be easily adapted into the teaching environment as it already is.
I’ve done it again: I’ve let dust gather on my blog.
Quick update:
Stay tuned.
My past few blog posts have been concerned with the usefulness of peer grading. Steve Joordens showed that peer grading was pedagogically useful for first-year psych students…but what about computer science students? Would they learn from it? Would they become better programmers?
We don’t know.
Maybe it’s time to find out.
It’s pretty simple, actually.
I have two groups of students. Let’s call them groups A and B.
For each student in A, have them complete a simple programming assignment (call it P1). Once they’re finished, have them complete a second simple programming assignment (call it P2).
For each student in B, have them complete P1. Once that’s done, have them view 5 or 6 different mocked up submissions, also for P1. For each submission, have the students fill out a rubric and assign a grade. Once finished, the students then complete P2.
Then, I get some fellow graduate students to mark my mocked up submissions, the group A P1/P2 submissions, and the group B P1/P2 submissions.
If grading made the students better programmers, we should see an increase in the number of marks given to the students in group B for P2.
This experiment is nice and simple. And, besides showing if peer grading makes students better programmers, it gives us a couple of bonuses:
I’ll have to do randomization here and there to eliminate ordering effects – for example, randomizing the criteria on the rubric, randomizing which assignments go first and second, randomizing the order in which the mock-up submissions are shown, etc.
One thing to consider though: what effect does simply seeing the rubric have on students?
I’ve been in courses where I’ve not been allowed to see the marking rubric for some assignment. It’s frustrating. Seeing the rubric helps me focus on the areas that I’ll be marked on.
So what if just seeing the rubric makes the students “better programmers”? One way to counteract this would be to have the rubric for the second assignment be quite a bit different than the one for the first assignment.
Oh yeah. Stats. Not my strongest subject. I’m going to have to brush up on this (and probably enlist some help within the department) if I’m going to do this properly. I’m probably not going to get as many participants as I think I will…so I have to accommodate small N. Hrm.
Anyhow, this is where my summer experiment seems to heading. What do you think? I’m all ears.