Firefox Front-End Performance Update #17

Hello, folks. I wanted to give a quick update on what the Firefox Front-end Performance team is up to, so let’s get into it.

The name of the game continues to be start-up performance. We made some really solid in-roads last quarter, and this year we want to continue to apply pressure. Specifically, we want to focus on reducing IO (specifically, main-thread IO) during browser start-up.

Reducing main thread IO during start-up

There are lots of ways to reduce IO – in the best case, we can avoid start-up IO altogether by not doing something (or deferring it until much later). In other cases, when the browser might be servicing events on the main thread, we can move IO onto another thread. We can also re-organize, pack or compress files differently so that they’re read off of the disk more efficiently.

If you want to change something, the first step is measuring it. Thankfully, my colleague Florian has written a rather brilliant test that lets us take accounting of how much IO is going on during start-up. The test is deterministic enough that he’s been able to write a whitelist for the various ways we touch the disk on the main thread during start-up, and that whitelist means we’ve made it much more difficult for new IO to be introduced on that thread.

That whitelist has been processed by the team, and have been turned into bugs, bucketed by the start-up phase where the IO is occurring. The next step is to estimate the effort and potential payoff of fixing those bugs, and then try to whittle down the whitelist.

And that’s effectively where we’re at. We’re at the point now where we’ve got a big list of work in front of us, and we have the fun task of burning that list down!

Being better at loading DLLs on Windows

While investigating the warm-up service for Windows, Doug Thayer noticed that we were loading DLLs during start-up oddly. Specifically, using a tool called RAMMap, he noticed that we were loading DLLs using “read ahead” (eagerly reading the entirety of the DLL into memory) into a region of memory marked as not-executable. This means that anytime we actually wanted to call a library function within that DLL, we needed to load it again into an executable region of memory.

Doug also noticed that we were unnecessarily doing ReadAhead for the same libraries in the content process. This wasn’t necessary, because by the time the content process wanted to load these libraries, the parent process would have already done it and it’d still be “warm” in the system file cache.

We’re not sure why we were doing this ReadAhead-into-unexecutable-memory work – it’s existence in the Firefox source code goes back many many years, and the information we’ve been able to gather about the change is pretty scant at best, even with version control. Our top hypothesis is that this was a performance optimization that made more sense in the Windows XP era, but has since stopped making sense as Windows has evolved.

UPDATE: Ehsan pointed us to this bug where the change likely first landed. It’s a long and wind-y bug, but it seems as if this was indeed a performance optimization, and efforts were put in to side-step effects from Prefetch. I suspect that later changes to how Prefetch and SuperFetch work ultimately negated this optimization.

Doug hacked together a quick prototype to try loading DLLs in a more sensible way, and the he was able to capture quite an improvement in start-up time on our reference hardware:

This graph measures various start-up metrics. The scatter of datapoints on the left show the “control” build, and they tighten up on the right with the “test” build. Lower is better.

At this point, we all got pretty excited. The next step was to confirm Doug’s findings, so I took his control and test builds, and tested them independently on the reference hardware using frame recording. There was a smaller ((My test machine has SuperFetch disabled to help reduce noise and inconsistency with start-up tests, and we suspect SuperFetch is able to optimize start-up better in the test build)), but still detectable improvement in the test build. At this point, we decided it was worth pursuing.

Doug put together a patch, got it reviewed and landed, and we immediately saw an impact in our internal benchmarks.

We’re also seeing the impact reflected in Telemetry. The first Nightly build with Doug Thayer’s patch went out on April 14th, and we’re starting to see a nice dip in some of our graphs here:

This graph measures the time at which the browser window reports that it has first painted. April 14th is the second last date on the X axis, and the Y axis is time. The top-most line is plotting the 95th percentile, and there’s a nice dip appearing around April 14th.

There are other graphs that I’d normally show for improvements like this, except that we started tracking an unrelated regression on April 16th which kind of muddies the visualization. Bad timing, I guess!

We expect this improvement to have the greatest impact on weaker hardware with slower disks, but we’ll be avoiding some unnecessary work for all Windows users, and that gets a thumbs-up in my books.

If all goes well, this fix should roll out in Firefox 68, which reaches our release audience on July 9th!