Category Archives: Computer Science

Electrolysis and the Big Tab Spinner of Doom

Have you been using Firefox Nightly and seen this big annoying spinner?

Big Tab Spinner of Doom in an e10s tab

Aw, crap. You again.

I hate that thing. I hate it.

Me, internally, when I see the spinner.

And while we’re working on making the spinner itself less ugly, I’d like to eliminate, or at least reduce its presence to the absolute minimum.

How do I do that? Well, first, know your enemy.

What does it even mean?

That big spinner means that the graphics part of Gecko hasn’t given us a frame yet to paint for this browser tab. That means we have nothing yet to show for the tab you’ve selected.

In the single-process Firefox that we ship today, this graphics operation of preparing a frame is something that Firefox will block on, so the tab will just not switch until the frame is ready. In fact, I’m pretty sure the whole browser will become unresponsive until the frame is ready.

With Electrolysis / multi-process Firefox, things are a bit different. The main browser process tells the content process, “Hey, I want to show the content associated with the tab that the user just selected”, and the content process computes what should be shown, and when the frame is ready, the parent process hears about it and the switch is complete. During that waiting time, the rest of the browser is still responsive – we do not block on it.

So there’s this window of time where the tab switch has been requested, and when the frame is ready.

During that window of time, we keep showing the currently selected tab. If, however, 300ms passes, and we still haven’t gotten a frame to paint, that’s when we show the big spinner.

So that’s what the big spinner means – we waited 300ms, and we still have no frame to draw to the screen.

How bad is it?

I suspect it varies. I see the spinner a lot less on my Windows machine than on my MacBook, so I suspect that performance is somehow worse on OS X than on Windows. But that’s purely subjective. We’ve recently landed some Telemetry probes to try to get a better sense of how often the spinner is showing up, and how laggy our tab switching really is. Hopefully we’ll get some useful data out of that, and as we work to improve tab switch times, we’ll see improvement in our Telemetry numbers as well.

Where is the badness coming from?

This is still unclear. And I don’t think it’s a single thing – many things might be causing this problem. Anything that blocks up the main thread of the content process, like slow JavaScript running on a web-site, can cause the spinner.

I also seem to see the spinner when I have “many” tabs open (~30), and have a build going on in the background (so my machine is under heavy load).

Maybe we’re just doing things inefficiently in the multi-process case. I recently landed profile markers for the Gecko Profiler for async tab switching, to help figure out what’s going on when I experience slow tab switch. Maybe there are optimizations we can make there.

One thing I’ve noticed is that there’s this function in the graphics layer, “ClientTiledLayerBuffer::ValidateTile”, that takes much, much longer in the content process than in the single-process case. I’ve filed a bug on that, and I’ll ask folks from the Graphics Team this week.

How you can help

UPDATE (May 12, 2015): Getting profiles from Windows is currently broken because the symbol server appears to be busted. Any profiles from Windows machines will be useless until this bug is fixed. Similarly, this bug recently changed the format of profiles1, so a different Gecko Profiler add-on will need to be installed until these patches land in the main repository for the add-on. You will also need to set profiler.url to https://people.mozilla.org/~sguo/cleopatra in about:config until those patches land.

If you’d like to help me find more potential causes, Profiles are very useful! NOTE – I don’t mean “user profiles”, as in, your bookmarks / customizations / history, etc, in the profile folder. I don’t mean this thing. I mean a performance profile.

A performance profile is a read-out of everything that Firefox / Gecko is doing over a particular span of time. When the profiler is running, Firefox / Gecko will record where the process is in the stack every 1ms or so. It’ll also record information about how long since it’s serviced the event loop, which helps us find jank.

To help, grab the Gecko Profiler add-on, make sure it’s enabled, and then dump a profile when you see the big spinner of doom. The interesting part will be between two markers, “AsyncTabSwitch:Start” and “AsyncTabSwitch:Finish”. There are also markers for when the parent process displays the spinner – “AsyncTabSwitch:SpinnerShown” and “AsyncTabSwitch:SpinnerHidden”. The interesting stuff, I believe, will be in the “Content” section of the profile between those markers. Here are more comprehensive instructions on using the Gecko Profiler add-on.

And here’s a video of me demonstrating how to use the profiler, and how to attach a profile to the bug where we’re working on improving tab switch times:

And here’s the link I refer you to in the video for getting the add-on.

So hopefully we’ll get some useful data, and we can drive instances of this spinner into the ground.

I’d really like that.

10 people like this post.

  1. See this mailing list post for details. 

The Joy of Coding (Ep. 12): Making “Save Page As” Work

After giving some updates on the last bug we were working on together, I started a new bug: Bug 1128050 – [e10s] Save page as… doesn’t always load from cache. The problem here is that if the user were to reach a page via a POST request, attempting to save that page from the Save Page item in the menu would result in silent failure1.

Luckily, the last bug we were working on was related to this – we had a lot of context about cache keys swapped in already.

The other important thing to realize is that fixing this bug is a bandage fix, or a wallpaper fix. I don’t think those are official terms, but it’s what I use. Basically, we’re fixing a thing with the minimum required effort because something else is going to fix it properly down the line. So we just need to do what we can to get the feature to limp along until such time as the proper fix lands.

My proposed solution was to serialize an nsISHEntry on the content process side, deserialize it on the parent side, and pass it off to nsIWebBrowserPersist.

So did it work? Watch the episode and find out!

I also want to briefly apologize for some construction noise during the video – I think it occurs somewhere halfway through minute 20 of the video. It doesn’t last long, I promise!

Episode Agenda

References

Bug 1128050 – [e10s] Save page as… doesn’t always load from cache – Notes

1 person likes this post.

  1. Well, it’d show something in the Browser Console, but for a typical user, I think that’s still a silent failure. 

The Joy of Coding (Ep. 11): Cleaning up the View Source Patch

For this episode, Richard Milewski and I figured out the syncing issue I’d been having in Episode 9, so I had my head floating in the bottom right corner while I hacked. Now you can see what I do with my face while hacking, if that’s a thing you had been interested in.

I’ve also started mirroring the episodes to YouTube, if YouTube is your choice platform for video consumption.

So, like last week, I was under a bit of time pressure because of a meeting scheduled for 2:30PM (actually the meeting I was supposed to have the week before – it just got postponed), so that gave me 1.5 hours to move forward with the View Source work we’d started back in Episode 8.

I started the episode by explaining that the cache key stuff we’d figured out in Episode 9 was really important, and that a bug had been filed by the Necko team to get the issue fixed. At the time of the video, there was a patch up for review in that bug, and when we applied it, we were able to retrieve source code out of the network cache after POST requests! Success!

Now that we had verified that our technique was going to work, I spent the rest of the episode cleaning up the patches we’d written. I started by doing a brief self-code-review to smoke out any glaring problems, and then started to fix those problems.

We got a good chunk of the way before I had to cut off the camera.

I know back when I started working on this particular bug, I had said that I wanted to take you through right to the end on camera – but the truth of the matter is, the priority of the bug went up, and I was moving too slowly on it, since I was restricting myself to a few hours on Wednesdays. So unfortunately, after my meeting, I went back to hacking on the bug off-camera, and yesterday I put up a patch for review. Here’s the review request, if you’re interested in seeing where I got to!

I felt good about the continuity experiment, and I think I’ll try it again for the next few episodes – but I think I’ll choose a lower-priority bug; that way, I think it’s more likely that I can keep the work contained within the episodes.

How did you feel about the continuity between episodes? Did it help to engage you, or did it not matter? I’d love to hear your comments!

Episode Agenda

References

Bug 1025146 – [e10s] Never load the source off of the network when viewing sourceNotes

Be the first to like.

The Joy of Coding (Ep. 10): The Mystery of the Cache Key

In this episode, I kept my camera off, since I was having some audio-sync issues1.

I was also under some time-pressure, because I had a meeting scheduled for 2:30 ET2, giving me exactly 1.5 hours to do what I needed to do.

And what did I need to do?

I needed to figure out why an nsISHEntry, when passed to nsIWebPageDescriptor’s loadPage, was not enough to get the document out from the HTTP cache in some cases. 1.5 hours to figure it out – the pressure was on!

I don’t recall writing a single line of code. Instead, I spent most of my time inside XCode, walking through various scenarios in the debugger, trying to figure out what was going on. And I eventually figured it out! Read this footnote for the TL;DR:3

Episode Agenda

References

Bug 1025146 – [e10s] Never load the source off of the network when viewing sourceNotes

Be the first to like.

  1. I should have those resolved for Episode 11! 

  2. And when the stream finished, I found out the meeting had been postponed to next week, meaning that next week will also be a short episode. :( 

  3. Basically, the nsIChannel used to retrieve data over the network is implemented by HttpChannelChild in the content process. HttpChannelChild is really just a proxy to a proper nsIChannel on the parent-side. On the child side, HttpChannelChild does not implement nsICachingChannel, which means we cannot get a cache key from it when creating a session history entry. With no cache key, comes no ability to retrieve the document from the network cache via nsIWebDescriptor’s loadPage. 

The Joy of Coding (Ep. 9): More View Source Hacking!

In this episode1, I continued the work we had started in Episode 8, by trying to make it so that we don’t hit the network when viewing the source of a page in multi-process Firefox.

It was a little bit of a slog – after some thinking, I decided to undo some of the work we had done in the previous episode, and then I set up the messaging infrastructure for talking to the remote browser in the view source window.

I also rebased and landed a patch that we had written in the previous episode, after fixing up some nits2.

Then, I (re)-learned that flipping the “remote” attribute of a browser is not enough in order for it to run out-of-process; I have to remove it from the DOM, and then re-add it. And once it’s been re-added, I have to reload any frame scripts that I had loaded in the previous incarnation of the browser.

Anyhow, by the end of the episode, we were able to view the source from a remote browser inside a remote view source browser!3 That’s a pretty big deal!

Episode Agenda

References

Bug 1025146 – [e10s] Never load the source off of the network when viewing sourceNotes

1 person likes this post.

  1. A note that I also tried an experiment where I keep my camera running during the entire session, and place the feed into the bottom right-hand corner of the recording. It looks like there were some synchronization issues between audio and video, which are a bit irritating. Sorry about that! I’ll see what I can do about that. 

  2. and dropping a nit having conversed with :gabor about it 

  3. We were still loading it off the network though, so I need to figure out what’s going on there in the next episode.