Code Spelunking for Students

Last Friday, I went to the FSOSS conference at Seneca@York Campus with Zuzel and Greg.

One of the talks I attended was an open discussion about getting students involved in open source software.

I’m going off of memory there, but I believe one of the speakers at that talk said something like:

Students generally don’t have to deal with large code-bases in their school assignments…1000 lines of code is really nothing.  When students work on an open source project, they get dropped into a massive code-base with only a fork, a spoon, and a compass.  They have to find their way around, and that’s where the real challenge and learning is.  This is a skill that most students just don’t get with normal school assignments.

Again, I’m paraphrasing.

So is this true?  Hm.

During my undergraduate career, I’ve certainly had to explore strange code that someone else has written.  But nothing even close to the size of, say, the Mozilla Firefox code-baseOr the Chromium code-base.  I mean, these are massive wads of code.  This is not a criticism of my teachers or the UofT CS program by any means – it’s just an observation.

But some students explore these large code-bases on their own in their free time.  During my (admittedly brief) break before summer work began, I started poking around the Firefox code.  I made two discoveries:

  1. The code that I saw was, in my opinion, very well written
  2. I was completely lost, and didn’t know where to start

I still haven’t worked on any software that is nearly as large as Firefox.  Not even close.  MarkUs is a nice chunk of code, but minuscule in comparison.

So just go with me on this for a second.  Let’s assume that a large code base is intimidating and difficult for students to wrap their heads around, and this is one of the main challenges in getting those students to contribute to open source software.

Again, I only have my own experience to back up that claim.  Looking at Firefox, I didn’t know where to start.  I didn’t know where to go.  I didn’t know which way was up.  I was lost.

So how can students get a better grasp on a mountain of code?  A few ideas:

  1. Write tests for the code, starting small and going big.  This is a relatively easy way to play with the code without having to change it. This assumes, of course, that the software has been designed to be easy to test…
  2. Ask someone else.  Go into the appropriate IRC channel and ask around.  This, of course, has it’s own problems.
  3. Read up on the developer documentation.  Let’s just hope it’s up to date and relevant…
  4. Read up on someone else’s experiences exploring the same code base.  Good luck finding those.

Not Quite Blueprints

I’ve always thought of computer software as being like an invisible machine inside my computer.

And to me, the source code is a bit like the description of the blueprints for that invisible machine.  It’s not the top-down crystal-clear cutaways that a blueprint provides…it’s a flat, textual interpretation of those blueprints.  And it takes quite a bit of reading before those descriptions sink in, and the “personality” of the machine becomes clearer.

In his article “Code Spelunking Redux“, George V. Neville-Neil says:

Working in this way is a bit like trying to understand the United States by staring at a street sign in New York City. The ability to look at a high-level representation of the underlying system without the fine details would be perhaps the best tool for the code spelunker. Being able to think of software as a map that can be navigated in different ways—for example, by class relations and call graphs—would make code spelunkers far more productive.

I was thinking a lot about that on my ride home from FSOSS.  When I got home to my computer, I found out that there are some really cool alternative ways of viewing software.  Here are three that I found quite interesting:

Codecity

Imagine that you’re curious about developing on Firefox.  You can wade through the swaths of source code…

or you can stroll through a city that represents the software:

CodeCity

CodeCity

CodeCity is an integrated environment for software analysis, in which software systems are visualized as interactive, navigable 3D cities. The classes are represented as buildings in the city, while the packages are depicted as the districts in which the buildings reside. The visible properties of the city artifacts depict a set of chosen software metrics, as in the polymetric views of CodeCrawler.

Imagine virtually driving around that city, hearing a guided tour through your headphones…you can walk into buildings, check out the different floors…check the plumbing.  Interesting idea.

Cocoviz

Or how about a neighbourhood…

CocoViz

CocoViz

CocoViz address software comprehension by a combination of visualization and audio. It uses common place metaphors (like houses) for an intuitive understanding of software structures and evolution.

For each source code entity, evolution and structural aspects are mapped to such metaphors and annotated with different audio, to represent concepts such as design erosion, code smells or evolution metrics.

The tool is used in the software evolution analysis domain but offers DB-, XML-importer and a plugin architecture to extend its use into other domains.

EvoSpaces

Another attempt at using the architecture/neighbourhood metaphor.  This one does a neat job of displaying execution traces though – check out the video demo.

These are cool ideas.

But are they useful? Are they usable?  Do they work? Could they help students get a firm grasp on a large code-base?  Can they help visualize the evolution of software?

Has anyone actually used any of these?

One thought on “Code Spelunking for Students

  1. George

    In the operating systems class at my undergraduate alma mater, students were forced to work with the linux kernel source. I didn’t take the class, but from what I understand, that could be a nightmare. You can see an example of the sort of assignments and labs students had to do here: http://www.cs.swarthmore.edu/~newhall/cs45/f08/
    A fair number of them involve modifying the linux kernel.

    There are some real gems in the kernel code, especially the comments. I love the long explanations of why if you think the code is wrong YOU are wrong.

Comments are closed.