Monthly Archives: April 2019

Firefox Front-End Performance Update #17

Hello, folks. I wanted to give a quick update on what the Firefox Front-end Performance team is up to, so let’s get into it.

The name of the game continues to be start-up performance. We made some really solid in-roads last quarter, and this year we want to continue to apply pressure. Specifically, we want to focus on reducing IO (specifically, main-thread IO) during browser start-up.

Reducing main thread IO during start-up

There are lots of ways to reduce IO – in the best case, we can avoid start-up IO altogether by not doing something (or deferring it until much later). In other cases, when the browser might be servicing events on the main thread, we can move IO onto another thread. We can also re-organize, pack or compress files differently so that they’re read off of the disk more efficiently.

If you want to change something, the first step is measuring it. Thankfully, my colleague Florian has written a rather brilliant test that lets us take accounting of how much IO is going on during start-up. The test is deterministic enough that he’s been able to write a whitelist for the various ways we touch the disk on the main thread during start-up, and that whitelist means we’ve made it much more difficult for new IO to be introduced on that thread.

That whitelist has been processed by the team, and have been turned into bugs, bucketed by the start-up phase where the IO is occurring. The next step is to estimate the effort and potential payoff of fixing those bugs, and then try to whittle down the whitelist.

And that’s effectively where we’re at. We’re at the point now where we’ve got a big list of work in front of us, and we have the fun task of burning that list down!

Being better at loading DLLs on Windows

While investigating the warm-up service for Windows, Doug Thayer noticed that we were loading DLLs during start-up oddly. Specifically, using a tool called RAMMap, he noticed that we were loading DLLs using “read ahead” (eagerly reading the entirety of the DLL into memory) into a region of memory marked as not-executable. This means that anytime we actually wanted to call a library function within that DLL, we needed to load it again into an executable region of memory.

Doug also noticed that we were unnecessarily doing ReadAhead for the same libraries in the content process. This wasn’t necessary, because by the time the content process wanted to load these libraries, the parent process would have already done it and it’d still be “warm” in the system file cache.

We’re not sure why we were doing this ReadAhead-into-unexecutable-memory work – it’s existence in the Firefox source code goes back many many years, and the information we’ve been able to gather about the change is pretty scant at best, even with version control. Our top hypothesis is that this was a performance optimization that made more sense in the Windows XP era, but has since stopped making sense as Windows has evolved.

UPDATE: Ehsan pointed us to this bug where the change likely first landed. It’s a long and wind-y bug, but it seems as if this was indeed a performance optimization, and efforts were put in to side-step effects from Prefetch. I suspect that later changes to how Prefetch and SuperFetch work ultimately negated this optimization.

Doug hacked together a quick prototype to try loading DLLs in a more sensible way, and the he was able to capture quite an improvement in start-up time on our reference hardware:

This graph measures various start-up metrics. The scatter of datapoints on the left show the “control” build, and they tighten up on the right with the “test” build. Lower is better.

At this point, we all got pretty excited. The next step was to confirm Doug’s findings, so I took his control and test builds, and tested them independently on the reference hardware using frame recording. There was a smaller ((My test machine has SuperFetch disabled to help reduce noise and inconsistency with start-up tests, and we suspect SuperFetch is able to optimize start-up better in the test build)), but still detectable improvement in the test build. At this point, we decided it was worth pursuing.

Doug put together a patch, got it reviewed and landed, and we immediately saw an impact in our internal benchmarks.

We’re also seeing the impact reflected in Telemetry. The first Nightly build with Doug Thayer’s patch went out on April 14th, and we’re starting to see a nice dip in some of our graphs here:

This graph measures the time at which the browser window reports that it has first painted. April 14th is the second last date on the X axis, and the Y axis is time. The top-most line is plotting the 95th percentile, and there’s a nice dip appearing around April 14th.

There are other graphs that I’d normally show for improvements like this, except that we started tracking an unrelated regression on April 16th which kind of muddies the visualization. Bad timing, I guess!

We expect this improvement to have the greatest impact on weaker hardware with slower disks, but we’ll be avoiding some unnecessary work for all Windows users, and that gets a thumbs-up in my books.

If all goes well, this fix should roll out in Firefox 68, which reaches our release audience on July 9th!

Firefox Front-End Performance Update #16

With Firefox 67 only a few short weeks away, I thought it might be interesting to take a step back and talk about some of the work that the Firefox Front-end Performance team is shipping to users in that particular release.

To be clear, this is not an exhaustive list of the great performance work that’s gone into Firefox 67 – but I picked a few things that the front-end team has been focused on to talk about.

Stop loading things we don’t need right away

The fastest code is the code that doesn’t run at all. Sometimes, as the browser evolves, we realize that there are components that don’t need to be loaded right away during start-up, and can instead of deferred until sometime after start-up. Sometimes, that means we can wait until the very last moment to initialize some component – that’s called loading something lazily.

Here’s a list of things that either got deferred until sometime after start-up, or made lazy:

FormAutofillContent and FormValidationChild

These are modules that support, you guessed it, Form Autofill – that part of the browser that helps you fill in web forms, and makes sure forms are passing validation. We were loading these modules too early, and now we load them only when there are forms to auto-fill or validate on a page.

The hidden window

The Hidden Window is a mysterious chunk of code that manages the state of the global menu bar on macOS when there are no open windows. The Hidden Window is also sometimes used as a singleton DOM window where various operations can take place. On Linux and Windows, it turns out we were creating this Hidden Window far early than needs be, and now it’s quite a bit lazier.

Page style

Page Style is a menu you can find under View in the main menu bar, and it’s used to switch between alternative style sheets on a page. It’s a pretty rarely used feature from what we can tell, but we were scanning pages for their alternative stylesheets far earlier than we needed to. We were also scanning pages that we know don’t have alternative stylesheets, like the about:home / about:newtab page. Now we only scan normal web pages, and we do so only after we service the idle event queue.

Cache invalidation

The Startup Cache is an important part of Firefox startup performance. It’s primary job is to cache computations that occur during each startup so that they only have to happen every once in a while. For example, the mark-up of the browser UI often doesn’t change from startup to startup, so we can cache a version of the mark-up that’s faster to read from disk, and only invalidate that during upgrades.

We were invalidating the whole startup cache every time a WebExtension was installed or uninstalled. This used to be necessary for old-style XUL add-ons (since those could cause changes to the UI that would need to go into the cache), but with those add-ons no longer available, we were able to remove the invalidation. This means faster startups more often.

Don’t touch the disk

The disk is almost always the slowest part of the system. Reading and writing to the disk can take a long time, especially on spinning magnetic drives. The less we can read and write, the better. And if we’re going to read, best to do it off of the main thread so that the UI remains responsive.

Old XUL icons code

We were reading from the disk on the main thread to search for window-specific icons to display in the window titlebar.

Firefox doesn’t use window-specific icons, so we made it so that we skip these checks. This means less disk activity, which is great for responsiveness and start-up!

Hitting every directory on the way down

We noticed that when we were checking that a directory exists on Windows (to write a file to it), we were using the CreateDirectoryW Windows API. This API checks each folder on the way down to the last one to see if they exist. That’s a lot of disk IO! We can avoid this if we assume that the parent directories exist, and only fall into the slow path if we fail to write our file. This means that we hit the faster path with less IO more often, which is good for responsiveness and start-up time.

Enjoy your Faster Fox!

Firefox 67 is slated to ship with these improvements on May 14th – just a little over a month away. Enjoy!