Tag Archives: code spelunking

Alice in DocShell Land

I’ve been reading a book called The Annotated Alice. In this book, the late and great Martin Gardner shows us the stories of Alice’s Adventures in Wonderland and Through the Looking-Glass but supplies copious footnotes to illustrate the puns, wordplay, allusions, logic problems and satire going on beneath the text. Some of these footnotes delve into pure conjecture (there are still people to this day who theorize about various aspects of the stories), and other footnotes show quite clearly that Carrol wrote these stories with a sophisticated wit and whimsy that isn’t immediately obvious at first glance.

And it’s clear that Gardner (and others like him) have spent hours upon hours thinking and theorizing about these stories. A purposeful misspelling gets awarded a two page footnote here, and a mention of a mirror sends us off talking about matter and anti-matter and other matters (ha) of quantum physics.

So much thinking and effort to interpret these stories, and what you get out of it is a fascinating tapestry of ideas and history.

Needless to say, I’ve been finding the whole thing fascinating. It’s a hell of a read.

While reading it, I’ve wondered what it’d be like to apply the same practice to source code. Take some relatively mysterious piece of source code that only a few people feel comfortable with, and explode it out. Go through the source control history, and all of the old bugs, and see where this code came from. What was its purpose to begin with? What is its purpose now? What are the battle scars?

After much thinking, I’ve decided to try this, and I’m going to try it on a piece of Gecko called “DocShell”.

I think I just heard Ms2ger laughing somewhere.

It’s become pretty clear having talked to a few seasoned Mozilla hackers that DocShell is not well understood. The wiki page on it makes that even more clear – it starts:

The goal of this page is to serve as a dumping/organization ground for docshell docs. When someone finds out something, it should be added here in a reasonable way. By the time this gets unwieldy, hopefully we will have enough material for several actual docs on what docshell does and why.

So, I’m going to attempt to figure out what DocShell was supposed to do, and figure out what it currently does. I’m going to dig through source code, old bugs, and old CVS commits, back to the point where Netscape first open-sourced the Mozilla code-base.

It’s not going to be easy. It’s definitely going to be a multiple month, multiple post effort. I’m likely to get things wrong, or only partially correct. I’ll need help in those cases, so please comment.

And I might not succeed in figuring out what DocShell was supposed to do. But I’m pretty confident I can get a grasp on what it currently does.

So in the end, if I’m lucky, we’ll end up with a few things:

  1. A greater shared understanding of DocShell
  2. Materials that can be used to flesh out the DocShell wiki
  3. Better inline documentation for DocShell maybe?

I’ve also asked bz to forward me feedback requests for DocShell patches, so that way I get another angle of attack on understanding the code.

So, deep breath. Here goes. Watch this space.

Code Spelunking: Review Board Extensions

So this summer, I’m working on Review Board for the Google Summer of Code.

Until my GSoC acceptance, my romps into the code had been relatively shallow.  But with my proposal being given the green light, I’ve started doing more extensive explorations.

Review Board is built using the Django web framework.  I haven’t worked with Django before, but I have quite a bit of experience with Rails, so that should be an asset.  Using a web framework means having (relatively) predictable source code layout, and Review Board is no exception.


At one point or another, the Review Board developers realized that a lot of their code wasn’t Review Board specific, and could be abstracted out into an external library.

That library is called Djblets.

Among other things, Djblets adds a DataGrid component for easy record sorting and pagination.  There are improvements to Django’s Authentication system.  Functions for easily displaying a user’s Gravatar.

And, low and behold, there is a branch of Djblets that provides classes and functions for giving a Django application an extension framework.  The classes are abstract enough so that, in your Django application, you can specify different types and behaviours for your Hooks.

Djblets -> Review Board

The Review Board extension branch takes these Djblets extension classes, and extends them into DashboardHooks, NavigationBarHooks, ReviewRequestDetailHooks…lots of different hooks.

So, Djblets creates the foundation abstractions.  Review Board makes these abstractions a little more specific.  And then an extension writer needs to instantiate and use these classes to design their extensions.  It sounds complicated, I know.

So Let’s Map It Out

When I start learning a new code base, I do a lot of drawing.  To me, getting to now a code base is like getting to know a city, and that means walking around it, and mapping it out.

So I’ve taken the liberty of mapping out the extension classes that I’ve found, and how they relate to one another.  Note that at the bottom of my map, a simple extension (RB Reports) is using some of those classes to hook itself into Review Board.  You can find this, and other extensions,here.

My map of the extension framework

Click here to check out my map of the current state of the extension framework

Now, before someone in the department starts complaining about my misuse of UML:  I’m not a UML guy.  I just wanted an easy piece of diagramming software, and the one that I found (Dia), did UML.  I just wanted something to draw boxes and lines. So please don’t freak out if you think I’m using the wrong symbols.

One symbol you might be wondering about is the blue quantum-flux-capacitor-implosion.

I’ll save that for a future post.

Code Spelunking for Students

Last Friday, I went to the FSOSS conference at Seneca@York Campus with Zuzel and Greg.

One of the talks I attended was an open discussion about getting students involved in open source software.

I’m going off of memory there, but I believe one of the speakers at that talk said something like:

Students generally don’t have to deal with large code-bases in their school assignments…1000 lines of code is really nothing.  When students work on an open source project, they get dropped into a massive code-base with only a fork, a spoon, and a compass.  They have to find their way around, and that’s where the real challenge and learning is.  This is a skill that most students just don’t get with normal school assignments.

Again, I’m paraphrasing.

So is this true?  Hm.

During my undergraduate career, I’ve certainly had to explore strange code that someone else has written.  But nothing even close to the size of, say, the Mozilla Firefox code-baseOr the Chromium code-base.  I mean, these are massive wads of code.  This is not a criticism of my teachers or the UofT CS program by any means – it’s just an observation.

But some students explore these large code-bases on their own in their free time.  During my (admittedly brief) break before summer work began, I started poking around the Firefox code.  I made two discoveries:

  1. The code that I saw was, in my opinion, very well written
  2. I was completely lost, and didn’t know where to start

I still haven’t worked on any software that is nearly as large as Firefox.  Not even close.  MarkUs is a nice chunk of code, but minuscule in comparison.

So just go with me on this for a second.  Let’s assume that a large code base is intimidating and difficult for students to wrap their heads around, and this is one of the main challenges in getting those students to contribute to open source software.

Again, I only have my own experience to back up that claim.  Looking at Firefox, I didn’t know where to start.  I didn’t know where to go.  I didn’t know which way was up.  I was lost.

So how can students get a better grasp on a mountain of code?  A few ideas:

  1. Write tests for the code, starting small and going big.  This is a relatively easy way to play with the code without having to change it. This assumes, of course, that the software has been designed to be easy to test…
  2. Ask someone else.  Go into the appropriate IRC channel and ask around.  This, of course, has it’s own problems.
  3. Read up on the developer documentation.  Let’s just hope it’s up to date and relevant…
  4. Read up on someone else’s experiences exploring the same code base.  Good luck finding those.

Not Quite Blueprints

I’ve always thought of computer software as being like an invisible machine inside my computer.

And to me, the source code is a bit like the description of the blueprints for that invisible machine.  It’s not the top-down crystal-clear cutaways that a blueprint provides…it’s a flat, textual interpretation of those blueprints.  And it takes quite a bit of reading before those descriptions sink in, and the “personality” of the machine becomes clearer.

In his article “Code Spelunking Redux“, George V. Neville-Neil says:

Working in this way is a bit like trying to understand the United States by staring at a street sign in New York City. The ability to look at a high-level representation of the underlying system without the fine details would be perhaps the best tool for the code spelunker. Being able to think of software as a map that can be navigated in different ways—for example, by class relations and call graphs—would make code spelunkers far more productive.

I was thinking a lot about that on my ride home from FSOSS.  When I got home to my computer, I found out that there are some really cool alternative ways of viewing software.  Here are three that I found quite interesting:


Imagine that you’re curious about developing on Firefox.  You can wade through the swaths of source code…

or you can stroll through a city that represents the software:



CodeCity is an integrated environment for software analysis, in which software systems are visualized as interactive, navigable 3D cities. The classes are represented as buildings in the city, while the packages are depicted as the districts in which the buildings reside. The visible properties of the city artifacts depict a set of chosen software metrics, as in the polymetric views of CodeCrawler.

Imagine virtually driving around that city, hearing a guided tour through your headphones…you can walk into buildings, check out the different floors…check the plumbing.  Interesting idea.


Or how about a neighbourhood…



CocoViz address software comprehension by a combination of visualization and audio. It uses common place metaphors (like houses) for an intuitive understanding of software structures and evolution.

For each source code entity, evolution and structural aspects are mapped to such metaphors and annotated with different audio, to represent concepts such as design erosion, code smells or evolution metrics.

The tool is used in the software evolution analysis domain but offers DB-, XML-importer and a plugin architecture to extend its use into other domains.


Another attempt at using the architecture/neighbourhood metaphor.  This one does a neat job of displaying execution traces though – check out the video demo.

These are cool ideas.

But are they useful? Are they usable?  Do they work? Could they help students get a firm grasp on a large code-base?  Can they help visualize the evolution of software?

Has anyone actually used any of these?