Category Archives: Technology

Firefox Performance Update #6

Hi there folks, just another Firefox Performance update coming at you here.

These updates are going to shift format slightly. I’m going to start by highlighting the status of some of the projects the Firefox Performance Team (the front-end team working to make Firefox snappy AF), and then go into the grab-bag list of improvements that we’ve seen landing in the tree.

But first a word from our sponsor: arewesmoothyet.com!

This performance update is brought to you by arewesmoothyet.com! On Nightly versions of Firefox, a component called BackgroundHangReporter (or “BHR”) notices anytime the main-threads hang too long, and then collect a stack to send via Telemetry. We’ve been doing this for years, but we’ve never really had a great way of visualizing or making use of the data¹. Enter arewesmoothyet.com by Doug Thayer! Initially a fork of perf.html, awsy.com lets us see graphs of hangs on Nightly broken down by category², and then also lets us explore the individual stacks that have come in using a perf.html-like interface! (You might need to be patient on that last link – it’s a lot of data to download).

Hot damn! Note the finer-grain categories showing up on April 1st.

Early first blank paint (lead by Florian Quèze)

This is a start-up perceived performance project where early in the executable life-cycle, long before we’ve figured out how to layout and paint the browser UI, we show a white “blank” area on screen that is overtaken with the UI once it’s ready. The idea here is to avoid having the user stare at nothing after clicking on the Firefox icon. We’ll also naturally be working to reduce the amount of time that the blank window appears for users, but our research shows users feel like the browser starts-up faster when we show something rather than nothing. Even if that nothing is… well, mostly nothing. Florian recently landed a Telemetry probe for this feature, made it so that we can avoid initting the GPU process for the blank window, and is in the midst of fixing an issue where the blank window appears for too long. We’re hoping to have this ready to ship enabled on some platforms (ideally Linux and Windows) in Firefox 61.

Faster content process start-up time (lead by Felipe Gomes)

Explorations are just beginning here. Felipe has been examining the scripts that are running for each tab on creation, and has a few ideas on how to both reduce their parsing overhead, as well as making them lazier to load. This project is mostly at the research stage. Expect concrete details on sub-projects and linked bugs soon!

Get ContentPrefService init off of the main thread (lead by Doug Thayer)

This is so, so close to being done. The patch is written and reviewed, but landing it is being stymied by a hard-to-reproduce locally but super-easy-to-reproduce-in-automation shutdown leak during test runs. Unfortunately, the last 10% sometimes takes 90% of the effort, and this looks like one of those cases.

Blocklist improvements (lead by Gijs Kruitbosch)

Gijs is continuing to make our blocklist asynchronous. Recently, he made the getAddonBlocklistEntry method of the API asynchronous, which is a big-deal for start-up, since it means we drop another place where the front-end has to wait for the blocklist to be ready! The getAddonBlocklistState method is next on the list.

As a fun exercise, you can follow the “true” value for the BLOCKLIST_SYNC_FILE_LOAD probe via this graph, and watch while Gijs buries it into the ground.

LRU cache for tab layers (lead by Doug Thayer)

Doug Thayer is following up on some research done a few years ago that suggests that we can make ~95% of our user’s tab switches feel instantaneous by implementing an LRU cache for the painted layers. This is a classic space-time trade-off, as the cache will necessarily consume memory in order to hold onto the layers. Research is currently underway here to see how we can continue to improve our tab switching performance without losing out on the memory advantage that we tend to have over other browsers.

Tab warming (lead by Mike Conley)

Tab warming has been enabled on Nightly for a few weeks, and besides one rather serious glitch that we’ve fixed, we’ve been pretty pleased with the result! There’s one issue on macOS that’s been mulled over, but at this point I’m starting to lean towards getting this shipped on at least Windows for the Firefox 61 release.

Firefox’s Most Wanted: Performance Wins (lead by YOU!)

Before we go into the grab-bag list of performance-related fixes – have you seen any patches landing that should positively impact Firefox’s performance? Let me know about it so I can include it in the list, and give appropriate shout-outs to all of the great work going on! That link again!

Grab-bag time

And now, without further ado, a list of performance work that took place in the tree:

(🌟 indicates a volunteer contributor)

Rob Wood and Greg Tatum made it so that Nightly profiled Talos jobs now link directly to perf.html from Treeherder!
Gijs Kruitbosch made it so that we don’t send a sync IPC message every time you keypress with the Findbar open!
Kris Maglione made it so that we recycle the panel used by WebExtensions, which should make them cheaper to open
Kartikaya Gupta made it so that painting a TicketMaster page showing the seating diagram of a venue much, much faster
Florian Quèze made it so that we spend far less resources painting the progress bars for downloads
Ludovic Hirlimann got rid of the Presentation API system add-on, since it was running some code during start-up on Nightly, but wasn’t actually being used anywhere.

Thanks to all of you! Keep it coming!

Pro-tip: if you’re collecting data, consider figuring out how you want to visualize it first, and then make sure that visualization work actually happens. ↩
since April 1st, these categories have gotten a lot finer-grain ↩

Firefox Performance Update #2

So I’ve had my eyes out, watching for bugfixes that are landing in the Firefox code base that will speed it up for our users.

But I can’t do it alone. So I have a request: Have you seen a bugfix land recently that’ll likely impact Firefox’s performance in a positive way for our users? If so, I want to hear about it, and you should fill in this form to let me know!

Here’s that link again, just in case you missed it.

Anyhow, without further ado – here’s a small selection of interesting bugs that have been fixed recently that should improve Firefox performance!

Rob Wood has made it so that Talos tests run on Nightly builds now produce Gecko Profiles for both Linux and MacOS! Windows is coming soon. This is great for engineers working on improving our performance on our internal benchmarks.
Xidorn Quan enabled Stylo for chrome-level CSS documents! This allows us to process style changes to the browser UI in parallel using the same techniques that we use for web content that shipped in Firefox Quantum.
Kris Maglione made it so that we don’t load the Onboarding script for every single tab.
Jan de Mooij landed a patch that made browser JavaScript run more quickly and consume less memory early on during main and content process start-up
Henri Sivonen took an old and slower chunk of C++ code that checks for RTL characters in a string, and replaced it with a newer, faster version written in Rust. This has dramatically reduced the processing time on text nodes for pages where we’re unsure if RTL characters exist. For example, for strings in Thai, we now spend 3% of the original time searching those strings for RTL characters! That’s a huge improvement!
- Substantial!

Making tabs close faster in multi-process Firefox

TL;DR: In bug 1336763, I have landed a series of patches that should hopefully make tab closing faster for the majority of cases for users that are using multi-process Firefox.

The rest of this blog post tries to explain why.

The beforeunload event handler

Perhaps you’ve seen this dialog before:

Are you sure?

This dialog shows up when a website that you’ve interacted with (or one of its subframes) has set an event handler for the beforeunload event, and you attempt to close the tab or browse away from the website.

Here’s the documentation for how beforeunload works, but long story short, you can do a thing in the event handler that will cause the browser UI to show that dialog, which means giving the user the opportunity to cancel their request to close or navigate away from the website¹.

In any event², if a page is going to be unloaded, and that page (or one of its subframes) has set one or more beforeunload event handlers, then it is necessary to run those event handlers to see if we’re going to show the dialog, or go ahead and unload the page straight away.

How multi-process Firefox used to handle beforeunload

When closing a tab in multi-process Firefox, what we’ve been doing is sending a message to the content process for that tab to check for (and run any) beforeunload event handlers. The parent sends that message, and then just kinda waits for the content process to respond with whether or not the close should occur. If the content process doesn’t respond within 5 seconds (I know), then we consider it a wash, and just close the tab.

The content process is sometimes doing stuff on the main thread, and sometimes it’s just waiting for messages from the parent. In the latter case, tab closes happen pretty smoothly – the message comes in, beforeunload events are fired (and hopefully those don’t take too long, but you never know), and then hopefully a result goes up to the parent, and it can move on.

That’s the best case scenario – but lots of things can prevent the best case scenario; for one thing, the main thread might be busy doing other stuff when the message is sent from the parent. Perhaps it’s doing a garbage collection, or a cycle collection, or it’s blocked on some busy JavaScript that some silly advertisement company is running in the background of one of your tabs. In that case, the message from the parent won’t be processed until the main thread is ready.

Once the message is received, we’re still not out of the woods – the beforeunload event handlers can run any kind of JavaScript inside them, more or less. For example, one anti-pattern I’ve seen in the wild is to use the beforeunload event as an opportunity to send a sync XMLHttpRequest in order to get some data to a server before the page goes away³. So the script on the page has an opportunity to delay you, even if it’s not going to cause the dialog to appear.

This problem seems to plague all browsers. beforeunload is a real pain, and our current implementation can cause slow tab closing even if the tab doesn’t have beforeunload event handlers set⁴. The patches in bug 1336763 offer what I think is a decent, simple solution for that common case in Firefox.

Don’t ask, just remember

In bug 1336763, I’ve made it so that for any given tab running in a content process, if a beforeunload event is ever added in that tab (or in any of its subframes), the content process tells the parent process so it can mark that tab as having listeners we need to fire. If the beforeunload events are removed, we unmark the tab. If no beforeunload events are ever added, there’s no mark at all.

The parent process remembers these markings, so that if the user decides to close the tab, the parent can know immediately whether or not it needs to message to tell the content process to run beforeunload event handlers. In the cases where no beforeunload event handlers have been set, we can close immediately without asking for permission from the content process at all.

Details, details

Using some Gecko terminology here, we start by storing a count on something called the TabChild. It might simplify things a bit if you try to imagine the TabChild as the representative of everything in a particular browser tab, and that underneath that TabChild are a bunch of nodes, forming a tree-like structure.

Let’s call these nodes “inner windows”.

The inner windows under the TabChild contain the documents that are loaded in a tab. For simple web pages, that might just be a single document. In that case, we have a TabChild with just a single inner window node under it.

More complicated pages might contain iframes (which themselves contain iframes, etc). In those cases, we have a TabChild with a single inner window node under it, and that node has any number of inner window children (and those children have any number of inner window children, etc)⁵.

When any of those subframes have a beforeunload event listener added to them via script, the inner window node tells the TabChild to increment its internal count. If a beforeunload event listener is removed via script, the TabChild is told to decrement its internal count.

If the TabChild count ever goes above 0, then we need to tell the parent “Hey, you have at least one beforeunload event listener here”. If that count continues to go up, the TabChild doesn’t need to tell the parent anything – it just needs to record the increase. If the count ever drops back to 0, then the TabChild needs to tell the parent again, “All beforeunload event listeners are clear”.

Pretty straight-forward so far, but there are a few other cases we also have to consider.

Other cases

There are a couple of ways for a set of beforeunload event handlers to go away. We’ve already mentioned one – script on the page might remove them via removeEventListener.

One way is if the inner window gets navigated away from. If we’re on a page, and that page set a beforeunload event handler, and the user clicks on a link, the user might end up navigating away (assuming the dialog wasn’t shown and they didn’t cancel), which essentially replaces the inner window with one for a different page. In that case, script didn’t remove the beforeunload event handlers – the page went away, and so the beforeunload event handlers on the page we’ve unloaded are no longer relevant.

Another way is if an <iframe> which has set a beforeunload event handler is removed from the DOM. Instead of replacing the inner window, we’re snipping the inner window out of the tree structure entirely.

In both of these cases, if there are beforeunload event handlers in the subframe, it’s necessary to tell the TabChild so that the right number can be decremented from the TabChild count.

So what this means is that we need the inner windows to keep a track of how many beforeunload event handlers have been set as well. That way, when they start to tear themselves down, they can tell the TabChild, “Hey, I’m going away now – decrement X number of beforeunload event handlers”.

It might seem redundant to have these two counts – counts in the inner windows, and a total count in the TabChild. It would seem like one can be easily inferred from the other; just sum the beforeunload counts for the inner windows, and you should have your TabChild count.

Having the TabChild keep a count is an optimization that prevents us from having to walk the inner window tree to collect a sum every time the count changes. It’s a classic space / time tradeoff, and I think it’s worth the extra integer member on the TabChild.

Comparison to other browsers

Here’s one way to compare the behaviour across different browsers:

In a browser window with more than one tab open, open the developer tools, and make your way to the JavaScript console.

Drop this tasty little snippet in and press enter:

var then = Date.now(); while (Date.now() - then < 15000) {}

This is going to hang the main thread in that tab for 15 seconds, but is otherwise inert.

Now try to close the tab. In Firefox, the tab closes right way. In Safari and Chrome (the two other browsers I have on this machine), the tab hangs out for a while. In Chrome, it appears to wait the full 15 seconds. In Safari, it seems to hit some kind of shorter timeout⁶.

Wrapping up

This was a neat set of patches to work on, precisely because it had me tour the depths of Gecko (dealing with things like inner / outer windows, the stuff that manages events, etc), which end up resulting in a simple property that the front-end can ultimately access to optimize closing tabs. So it nicely spanned the gap between low-level Gecko and higher-level Firefox, all for an event that was added back in 2004, for better or worse.

If all goes well, this change should ship in Firefox 55, and apply to multi-process tabs.

It’s not always necessary for the beforeunload event handler to show the dialog. The event handler needs to set the returnValue property on the event to a string in order for the dialog to show, but plenty of other stuff can happen in that event handler. ↩
Ooooh pun intended ↩
A better way would be to use navigator.sendBeacon, which allows the browser to send one last XHR in the background for a page even after it’s gone away. ↩
since we have to check to see if any of those event handlers exist. ↩
For my fellow Gecko Hackers – yes, this is not quite right. I’m missing other key structures in my description (specifically, the outer window). Forgive me – this is a very low-resolution mental model to make this post easier to write. 🙂 ↩
Note that it appears that the script running in the console is treated differently from the script running on the page. If you set up a page with that script, I notice that the tabs close immediately on both Chrome and Safari. I might have goofed in my experiment though – it is rather late at night. ↩

A printing story and a PSA: outparams over the IPC layer might not behave like you’d expect

Here’s a story about a printing bug on OS X, and a lesson about how our IPC layer works.

Last week, :ehsan came up to my desk and said “Mike… printing is broken on OS X with e10s enabled. Did you know this?”

It was sad to hear. Nobody, as far as I knew, had been touching printing code (besides bobowen, but he hadn’t landed his patches yet), which meant that some mystery change had landed and it was my responsibility (as the one who had implemented printing on OS X for e10s) to fix it.

The first step was to confirm it. Yep, it looked like I couldn’t print on my Nightly. Crap¹. So we get the bug filed, and then I fired up my local build and lldb, and tried to trace out where the problem was occurring.

Drilling down to the regressing changeset

The problem was that my local build printed just fine. Eyebrow raised, I tried using my default profile with my local build to see if something about my profile was causing printing to break. Printing continued to work with my local build².

Scratching my head, I used mozregression to bisect the problem down to a single changeset.

Here’s the bug it spat out:

Bug 1209930 – Update Mac clang to match the version in use everywhere else

The hair stood up on the back of my neck. Printing got broken by a compiler upgrade? This had bad news written all over it.

Many try builds

So that, I guess, explained why I couldn’t reproduce the problem locally. I needed to build with a particular compiler in order to experience the bug.

The bad news was that I found out that the version of clang that we’d upgraded to didn’t work on my version of OS X (10.10.5). It was supposed to work on machines running OS X 10.7, but I didn’t have access to any³.

I was able to get debug builds off of try, but I had no luck convincing lldb to let me step through it, despite having the symbols and the right revision checked out⁴.

So this meant a lot of try builds. The good news is that I was able to add a bunch of logging to try, and just kinda walk away while it built. A few hours later, I’d have my build, and I’d get my results. I wasn’t blocked waiting on it to build, so I could work on other things.

What did the logging reveal?

Our printing code sometimes shows this progress dialog when printing starts. It’s put up so that while we’re laying out the page for printing, the user knows that stuff is happening. We show this progress dialog on Linux and Windows, but not on OS X (I guess OS X users don’t expect such a progress window… this was a decision made way back in the day).

The printing engine in the content process sends up a message to the parent saying, “I’m all set to print, let’s show the progress dialog!”. On OS X, the parent responds, “Uh, nope” (which is a return code of NS_ERROR_NOT_IMPLEMENTED), and the child is supposed to go, “Oh okay, I’ll just print right away”.

But here’s the kicker – along with message to show the progress dialog, the child sends an outparam⁵. That outparam is sent because for some platforms that show the progress dialog, we want to wait until the dialog closes before starting the actual print. On other platforms, we may just want to start printing right away when laying out the page finishes without waiting for the dialog to close.

That outparam is a bool initialized to a value of false in the print engine, and in the non-e10s case where we return “Sorry, no progress dialog” with NS_ERROR_NOT_IMPLEMENTED, that bool stays false, and when it stays false, it means that we start printing right away.

My logging showed that for some reason, that value was getting flipped to true despite not being touched by (seemingly) anything in the IPC sender / receiver nor the print progress dialog backend (which still just returns NS_ERROR_NOT_IMPLEMENTED when asked to open the printing dialog).

What the hell?

Here’s the PSA

Ehsan helped me dig into what was going on, and we figured it out. Here’s the big lesson / PSA here:

outparams over IPC, when untouched on the receiver side, get filled with values from uninitialized memory when returned to the sender.

I’ll give you an example. Here’s some sorta-pseudocode:

#include <stdio.h>

void someFunc(bool* aOutParam) {
}

int main(int argc, char** argv) {
  bool myBool = false;
  someFunc(&myBool);
  if (myBool) {
    printf("Wait, WHAT?");
    return 1;
  }
  return 0;
}

We should never enter that “Wait, WHAT?” conditional block because myBool remains false throughout. It’s initialized to false, and even though we pass a pointer to it to someFunc, someFunc never touches it, so it stays false, and so we skip the conditional and return 0 as expected.

If, however, you did the same thing, but over IPC… well, now you’re in for a fun treat.

On the receiver side of the IPC message, here’s a chunk of the generated code that calls into RecvShowPrintProgress on the parent side:

            bool notifyOnOpen;
            bool success;
            int32_t id__ = mId;
            if \⁶, (&(success)))))) {
                mozilla::ipc::ProtocolErrorBreakpoint("Handler for ShowProgress returned error code");
                return MsgProcessingError;
            }

            reply__ = new PPrinting::Reply_ShowProgress(id__);

            Write(notifyOnOpen, reply__);
            Write(success, reply__);
            (reply__)->set_sync();
            (reply__)->set_reply();

That notifyOnOpen bool is the one that is the outparam in the child. Notice that it’s not being initialized to any value! That means its current value could be, well, anything. When we call into RecvShowProgress, we pass a pointer to that anything value. If RecvShowProgress doesn’t do anything with that pointer (like in the OS X case), well… that random value is what gets written to the IPC message that gets sent back to the child.

And somehow a clang upgrade made it far more likely that this value would evaluate to TRUE as a boolean instead of FALSE.

And so the child assumed that it’d have to wait for a dialog to close before starting to print – a dialog that would never open, because we don’t show it on OS X.

So the patch I eventually wrote initializes the outparam to false before calling into the OS X widget code that just returns NS_ERROR_NOT_IMPLEMENTED, and that fixed the bug.

It’s a small patch, but in my mind, it’s a big lesson, at least for me. I had assumed that outparams would work the same over IPC as they do in the same process, and that’s simply not the case.

I’ve filed bug 1220168 to try to make this sort of thing easier to spot in the future.

Our automated testing story for printing is pretty bad. We test some of the built-in UI for printing, but as for actually sending stuff to printer drivers… zero automated tests, which explains why this had gone uncaught for so long. ↩
If you’re wondering, I wasn’t printing out reams of paper to test this. XCode comes with a Printer Simulator, which is like a PDF printer, except that I can get a better sense of the communications being sent to the printer ↩
Okay, there’s one in the office, but apparently trying to build on it is a slow nightmare ↩
Happy to hear and share how to do this if someone will show me ↩
Read this about passing multiple values back from a method call if you’re curious about what outparams are ↩
!(RecvShowProgress(mozilla::Move(browser), mozilla::Move(printProgressDialog), mozilla::Move(isForPrinting), (&(notifyOnOpen ↩

The Joy of Coding – Ep’s 23 – 29

Wow! I’ve been a way from this blog for too long. I also haven’t posted any new episodes for The Joy of Coding. I also haven’t been keeping up with my Things I’ve Learned posts.

Time to get back in the saddle. First thing’s first, here are 6 episodes of The Joy of Coding that have aired. Unfortunately, I haven’t put together summaries for any of them, but I’ve put their agendas near the videos so that might give some clues.

Here we go!

Episode 23

Agenda

Episode 24

Agenda

Episode 25

Agenda

Episode 26

Agenda

Episode 27

Agenda

Episode 28

Agenda

Episode 29

Agenda

A Blog by Mike Conley

The personal blog of a Toronto based software mechanic, musician, sound designer, and theatre enthusiast.

Category Archives: Technology

Firefox Performance Update #6

But first a word from our sponsor: arewesmoothyet.com!

Early first blank paint (lead by Florian Quèze)

Faster content process start-up time (lead by Felipe Gomes)

Get ContentPrefService init off of the main thread (lead by Doug Thayer)

Blocklist improvements (lead by Gijs Kruitbosch)

LRU cache for tab layers (lead by Doug Thayer)

Tab warming (lead by Mike Conley)

Firefox’s Most Wanted: Performance Wins (lead by YOU!)

Grab-bag time

Firefox Performance Update #2

Making tabs close faster in multi-process Firefox

The beforeunload event handler

How multi-process Firefox used to handle beforeunload

Don’t ask, just remember

Details, details

Other cases

Comparison to other browsers

Wrapping up

A printing story and a PSA: outparams over the IPC layer might not behave like you’d expect

Drilling down to the regressing changeset

Many try builds

What did the logging reveal?

Here’s the PSA

The Joy of Coding – Ep’s 23 – 29

Episode 23

Episode 24

Episode 25

Episode 26

Episode 27

Episode 28

Episode 29