Category Archives: Firefox

Firefox Front-End Performance Update #14

We’re only a few weeks away from Firefox 67 merging from the Nightly channel to Beta, and since my last update, a number of things have landed.

It’s the end of a long week for me, so I apologize for the brevity here. Let’s check it out!

Document Splitting Foundations for WebRender (In-Progress by Doug Thayer)

dthayer is still trucking along here – he’s ironed out a number of glitches, and kats is giving feedback on some APZ-related changes. dthayer is also working on a WebRender API endpoint for generating frames for multiple documents in a single transaction, which should help reduce the window of opportunity for nasty synchronization bugs.

Warm-up Service (In-Progress by Doug Thayer)

dthayer is pressing ahead with this experiment to warm up a number of critical files for Firefox shortly after the OS boots. He is working on a prototype that can be controlled via a pref that we’ll be able to test on users in a lab-setting (and perhaps in the wild as a SHIELD experiment).

Startup Cache Telemetry (In-Progress by Doug Thayer)

dthayer landed this Telemetry early in the week, and data has started to trickle in. After a few more days, it should be easier for us to make inferences on how the startup caches are operating out in the wild for our Nightly users.

Smoother Tab Animations (In-Progress by Felipe Gomes)

UX, Product and Engineering are currently hashing out the remainder of the work here. Felipe is also aiming to have the non-responsive tab strip bug fixed soon.

Lazier Hidden Window (Completed by Felipe Gomes)

After a few rounds of landings and backouts, this appears to have stuck! The hidden window is now created after the main window has finished painting, and this has resulted in a nice ts_paint (startup paint) win on our Talos benchmark!

This is a graph of the ts_paint startup paint Talos benchmark. The highlighted node is the first mozilla-central build with the hidden window work. Lower is better, so this looks like a nice win!

There’s still potential for more improvements on the hidden window, but that’s been split out to a separate project / bug.

Browser Adjustment Project (In-Progress by Gijs Kruitbosch)

This project appears to be reaching its conclusion, but with rather unsatisfying results. Denis Palmeiro from Vicky Chin’s team has done a bunch of testing of both the original set of patches that Gijs landed to lower the global frame rate (painting and compositing) from 60fps to 30fps for low-end machines, as well as the new patches that decrease the frequency of main-thread painting (but not compositing) to 30fps. Unfortunately, this has not yielded the page load wins that we wanted¹. We’re still waiting to see if there’s a least a power-usage win here worth pursuing, but we’re almost ready the pull the plug on this one.

Better about:newtab Preloading (In-Progress by Gijs Kruitbosch)

Gijs has a set of patches that should make this possible, which will mean (in theory) that we’ll present a ready-to-roll about:newtab when users request one more often than not.

Unfortunately, there’s a small snag with a test failure in automation, but Gijs is on the case.

Experiments with the Process Priority Manager (In-Progress by Mike Conley)

The Process Priority Manager has been enabled in Nightly for a number of weeks now, and no new bugs have been filed against it. I filed a bug earlier this week to run a pref-flip experiment on Beta after the Process Priority Manager patches are uplifted later this month. Our hope is that this has a neutral or positive impact on both page load time and user retention!

Make the PageStyleChild load lazily (Completed by Mike Conley)

There’s an infrequently used feature in Firefox that allows users to switch between different CSS stylesheets that a page might offer. I’ve made the component that scans the document for alternative stylesheets much lazier, and also made it skip non web-pages, which means (at the very least) less code running when loading about:home and about:newtab

This was unexpected – we ran an experiment late in 2018 where we noticed that lowering the frame rate manually via the layout.frame_rate pref had a positive impact on page load time… unfortunately, this effect is no longer being observed. This might be due to other refresh driver work that has occurred in the meantime. ↩

Firefox Front-End Performance Update #13

It’s been just a little over two weeks since my last update, so let’s see where we are!

A number of our projects are centered around trying to improve start-up time. Start-up can mean a lot of things, so we’re focused specifically on cold start-up on the Windows 10 2018 reference device when the machine is at rest.

If you want to improve something, the first thing to do is measure it. There are lots of ways to measure start-up time, and one of the ways we’ve been starting to measure is by doing frame recording analysis. This is when we capture display output from a testing device, and then analyze the videos.

This animated GIF shows eight videos. The four on the left are Firefox Nightly, and the four on the right are Google Chrome (71.0.3578.98). The videos are aligned so that both browsers are started at the same time.

Some immediate observations:

Firefox Nightly is consistently faster to reach first paint (that’s the big white window)
Firefox Nightly is consistently faster to draw its toolbar and browser UI
Google Chrome is faster at painting its initial content

This last bullet is where the team will be focusing its efforts – we want to have the initial content painted and settled much sooner than we currently do.

Document Splitting Foundations (In-Progress by Doug Thayer)

After some pretty significant refactorings to work better with APZ, Doug posted a new stack of patches late last week which will sit upon the already large stack of patches that have already landed. There are still a number of reviews pending on the main stack, but this work appears to be getting pretty close to conclusion, as the patches are in the final review and polish stage.

After this, once retained display lists are enabled in the parent process, and an API is introduced to WebRender to generate frames for multiple documents in a single transaction, we can start thinking about enabling document splitting by default.

Warm-up Service (In-Progress by Doug Thayer)

A Heartbeat survey went out a week or so back to get some user feedback about a service that would speed up the launching of Firefox at the cost of adding some boot time to Windows. The responses we’ve gotten back have been quite varied, but can be generally bucketed into three (unsurprising) groups:

Users who say they do not want to make this trade
Users who say they would love to make this trade
Users who don’t care at all about this trade

Each group is sufficiently large to warrant further exploration. Our next step is to build a version of this service that we can turn on and off with a pref and test either in a lab and/or out in the wild with a SHIELD study.

Startup Cache Telemetry (In-Progress by Doug Thayer)

We do a number of things to try to improve real and perceived start-up time. One of those things is to cache things that we calculate at runtime during start-up to the disk, so that for subsequent start-ups, we don’t have to do those calculations again.

There are a number of mechanisms that use this technique, and Doug is currently adding some Telemetry to see how they’re behaving in the wild. We want to measure cache hits and misses, so that we know how healthy our cache system is out in the wild. If we get signals back that our start-up caches are missing more than we expect, this will highlight an important area for us to focus on.

Smoother Tab Animations (In-Progress by Felipe Gomes)

UX has gotten back to us with valuable feedback on the current implementation, and Felipe is going through it and trying to find the simplest way forward to address their concerns.

Having been available (though disabled by default) on Nightly, we’ve discovered one bug where the tab strip can become unresponsive to mouse events. Felipe is currently working on this.

Lazy Hidden Window (In-Progress by Felipe Gomes)

Under the hood, Firefox’s front-end has a notion of a “hidden window”. This mysterious hidden window was originally introduced long long ago¹ for MacOS, where it’s possible to close all windows yet keep the application running.

Since then, it’s been (ab)used for Linux and Windows as well, as a safe-ish place to do various operations that require a window (since that window will always be around, and not go away until shutdown).

That window opens pretty early during start-up, and Felipe found an old patch that was written, and then abandoned to make its construction lazier. Felipe thinks we can still make this idea work, and has noted that in our internal benchmarks, this shaves off a few percentage points on our start-up tests

Activity Stream seems to depend on the hidden window early enough that we think we’re going to have to find an alternative there, but once we do, we should get a bit of a win on start-up time.

Browser Adjustment Project (In-Progress by Gijs Kruitbosch)

Gijs updated the patch so that the adjustment causes the main thread to skip every other VSync rather than swithing us to 30fps globally².

We passed the patch off to Denis Palmeiro, who has a sophisticated set-up that allows him to measure a pageload benchmark using frame recording. Unfortunately, the results we got back suggested that the new approach regressed visual page load time significantly in the majority of cases.

We’re in the midst of using the same testing rig to test the original global 30fps patch to get a sense of the magnitude of any improvements we could get here. Denis is also graciously measuring the newer patch to see if it has any positive benefits towards power consumption.

Better about:newtab Preloading (In-Progress by Gijs Kruitbosch)

By default, users see about:newtab / a.k.a Activity Stream when they open new tabs. One of the perceived performance optimizations we’ve done for many years now is to preload the next about:newtab in the background so that the next time that the user opens a tab, the about:newtab is all ready to roll.

This is a perceived performance optimization where we’re moving work around rather than doing less work.

Right now, we preload a tab almost immediately after the first tab is opened in a window. That means that the first opened tab is never preloaded, but the second one is. This is for historical reasons, but we think we can do better.

Gijs is working on making it so that we choose a better time to preload the tab – namely, when we’ve found an idle pocket of time where the user doesn’t appear to be doing anything. This should also mean that the first new tab that gets opened might also be preloaded, assuming that enough idle time was made available to trigger the preload. And if there wasn’t any idle time, that’s also good news – we never got in the users way by preloading when it’s clear they were busy doing something else

Experiments with the Process Priority Manager (In-Progress by Mike Conley)

The Process Priority Manager has been enabled on Nightly for a few weeks now. Except for a (now fixed) issue where audio playing in background tabs would drop samples periodically, it’s been all quiet for regression reports.

The next step is to file a bug to run an experiment on Beta to see how this work impacts page load time.

Enable the separate Activity Stream content process by default (Stalled by Mike Conley)

This work is temporarily stalled while I work on other things, so there’s not too much to report here.

Grab bag of notable performance work

Makoto Kato got rid of some more sync IPC
Gijs Kruitbosch made it so that we don’t preload about:newtab’s for browser windows that have just been restored
Doug Thayer made it so that we paint less when restoring sessions with pinned tabs

Check out that commit date – 2003! ↩
The idea here being that we can then continue to composite scrolling and video at 60fps, but main thread paints will only be updated at 30fps ↩

Firefox Front-End Performance Update #12

Well, here I am again – apologizing about a late update. Lots of stuff has been going on performance-wise in the Firefox code-base, and I’ll just be covering a small section of it here.

You might also notice that I changed the title of the blog series from “Firefox Performance Update” to “Firefox Front-end Performance Update”, to reflect that the things the Firefox Front-end Performance team is doing to keep Firefox speedy (though I’ll still add a grab-bag of other performance related work at the end).

So what are we waiting for? What’s been going on?

Migrate consumers to the new Places Observer system (Paused by Doug Thayer)

Doug was working on this later in 2018, and successfully ported a good chunk of our bookmarks code to use the new batched Places Observer system. There’s still a long-tail of other call sites that need to be updated to the new system, but Doug has shifted focus from this to other things in the meantime.

Document Splitting (In-Progress by Doug Thayer)

With WebRender becoming an ever-closer reality to our general user population, Doug has been focusing on “Document Splitting”, which makes WebRender more efficient by splitting updates that occur in the browser UI from updates that occur in the content area.

This has been a pretty long-haul task, but Doug has been plugging away, and landed a significant chunk of the infrastructure for this. At this time, Doug is working with kats to make Document Splitting integrate nicely with Async-Pan-Zooming (APZ).

The current plan is for Document Splitting to land disabled by default, since it’s blocked by parent-process retained display lists (which still have a few bugs to shake out).

Warm-up Service (In-Progress by Doug Thayer)

Doug is investigating the practicalities of having a service run during Windows start-up to preload various files that Firefox will need when started.

Doug’s prototype shows that this can save us something like 1 second of net start-up time, at least on the reference hardware.

We’re still researching this at multiple levels, and haven’t yet determined if this is a thing that we’d eventually want to ship. Stay tuned.

Smoother Tab Animations (In-Progress by Felipe Gomes)

After much ado, simplification, and review back-and-forth, the initial set of new tab animations have landed in Nightly. You can enable them by setting browser.tabs.newanimations to true in about:config and then restarting the browser. These new animations run entirely on the compositor, instead of painting at each refresh driver tick, so they should be smoother than the current animations that we ship.

There are still some cases that need new animations, and Felipe is waiting on UX for those.

Overhauling about:performance (V1 Completed by Florian Quèze)

The new about:performance shipped late last year, and now shows both energy as well as memory usage of your tabs and add-ons.

The current iteration allows you to close the tabs that are hogging your resources. Current plans should allow users to pause JavaScript execution in busy background tabs as well.

Browser Adjustment Project (In-Progress by Gijs Kruitbosch)

Gijs has landed some patches in Nightly (which have recently uplifted to Beta, and are only enabled on early Betas), which lowers the default frame rate of Firefox from 60fps to 30fps on devices that are considered “low-end”¹.

This has been on Nightly for a while, but as our Nightly population tends to skew to more powerful hardware, we expect not a lot of users have experienced the impact there.

At least one user has noticed the lowered frame rate on Beta, and this has highlighted that our CPU sampling code doesn’t take dynamic changes to clock speed into account.

While the lowered frame rate seemed to have a positive impact on page load time in the lab on our “low-end” reference hardware, we’re having a much harder time measuring any appreciable improvement in CI. We have scheduled an experiment to see if improvements are detectable via our Telemetry system on Beta.

We need to be prepared that this particular adjustment will either not have the desired page load improvement, or will result in a poorer quality of experience that is not worth any page load improvement. If that’s the case, we still have a few ideas to try, including:

Lowering the refresh driver tick, rather than the global frame rate. This would mean things like scrolling and videos would still render at 60fps, but painting the UI and web content would occur at a lower frequency.
Use the hardware vsync again (switching to 30fps turns hardware vsync off), but just paint every other time. This is to test whether or not software vsync results in worse page load times than hardware vsync.

Avoiding spurious about:blank loads in the parent process (Completed by Gijs Kruitbosch)

Gijs short-circuited a bunch of places where we were needlessly creating about:blank documents that we were just going to throw away (see this bug and dependencies). There are still a long tail of cases where we still do this in some cases, but they’re not the common cases, and we’ve decided to apply effort for other initiatives in the meantime.

Experiments with the Process Priority Manager (In-Progress by Mike Conley)

This was originally Doug Thayer’s project, but I’ve taken it on while Doug focuses on the epic mountain that is WebRender Document Splitting.

If you recall, the goal of this project is to lower the process priority for tabs that are only sitting in the background. This means that if you have tabs in the background that are attempting to use system resources (running JavaScript for example), those tabs will have less priority at the operating system level than tabs that are in the foreground. This should make it harder for background tabs to cause foreground tabs to be starved of processing resources.

After clearing a few final blockers, we enabled the Process Priority Manager by default last week. We also filed a bug to keep background tabs at a higher priority if they’re playing audio and video, and the fix for that just landed in Nightly today.

So if you’re on Windows on Nightly, and you’re curious about this, you can observe the behaviour by opening up the Windows Task Manager, switching to the “Details” tab, and watching the “Base priority” reading on your firefox.exe processes as you switch tabs.

Cheaper tabs in titlebar (Completed by Mike Conley)

After an epic round of review (thanks, Dao!), the patches to move our tabs-in-titlebar logic out of JS and into CSS landed late last year.

Along with simplifying our code, and hammering out at least one pretty nasty layout bug, this also had the benefit of reducing the number of synchronous reflows caused when opening new windows to zero.

This project is done!

Enable the separate Activity Stream content process by default (In-Progress by Mike Conley

There’s one known bug remaining that’s preventing us from letting the privileged content process from being enabled by default.

Thankfully, the cause is understood, and a fix is being worked on. Unfortunately, this is one of those bugs where the proper solution involves refactoring a bit of old crufty stuff, so it’s taking longer than I’d like.

Still, if all goes well, this bug should be closed out soon, and we can see about letting the privileged content process ride the trains.

Grab bag of notable performance work

This is an informal list of things that I’ve seen land in the tree lately that I believe will have a positive performance impact for our users. Have you seen something that you’d like to nominate for a future list? Submit the bug here!

Also, keep in mind that some of these landed months ago and already shipped to release. That’s what I get for taking so long to write a blog post.

Back in October, Gabriel Luong landed a set of patches that greatly improved the time to open DevTools panels. Web developers rejoice! Check out these public dashboards for more information on DevTools performance.
Sotaro Ikeda made it so that, with WebRender enabled, we do less frame invalidation with slow animations. This is great for energy usage!
Joel Maher made it so that Talos jobs can be retriggered from Treeherder so that they produce profiles! This means no re-pushing if you need to diagnose a Talos regression. That’s great news!
Gabriele Svelto made it so that the minidump analyzer, which runs after a crash has occurred to produce stack dumps for crash pings, runs with much lower process priority. This means that the CPU is being hogged less after a crash, when you’re likely to want to get right back to what you were doing.
Paolo Amadini got rid of a whole layer of DOM between the primary UI and the <browser> elements for each tab, resulting in some nice perf wins! A pleasant side-effect from the ongoing de-XBL work!
Mathieu Leplatre moved a bunch of client-side off of the main-thread to a Worker. This should help keep the main thread free and clear to process user events and keep the browser responsive.
Dan Holbert made layout calculations faster in certain situations, which resulted in some nice wins!
Jed Davis made it so that we preallocate the next content process off of the main thread. This is great for responsiveness, and helps set the stage for more off-main-thread process launchings.
Makoto Kato made spellchecking in web content asynchronous! This means we get to drop another sync IPC message, which is good for everybody.
Olli Pettay made it so that we don’t tick the refresh driver for a page as often when we’re still loading it. That should have a nice impact on page load performance!
Randall Jesup made it so that setTimeout’s run with very low priority during page load. This also should have a nice impact on page load performance!

For now, “low-end” means a machine with 2 or fewer cores, and a clock speed of 1.8Ghz or slower ↩

Firefox Performance Update #10

Hey folks – another Performance Update coming at you! It’s been a few weeks since I posted one of these, mostly due to travel, holidays and the Mozilla SF All-Hands. However, we certainly haven’t been idle during that time. Much work has been done Performance-wise, and there’s a lot to tell. So strap in! But first…

This Performance Update is brought to you by: promiseDocumentFlushed

promiseDocumentFlushed is a utility that’s available for browser engineers in chrome documents on the window global. The goal of promiseDocumentFlushed is to help avoid synchronous layout flushes in our JavaScript code by scheduling work to only occur after the next “natural” layout flush occurs¹.

promiseDocumentFlushed takes a function and returns a Promise. The function it takes will run the next time a natural layout flush and paint has finished occurring. At this point, the DOM should not be “dirty”, and size and position queries should be very cheap to calculate. It is critically important for the callback to not modify the DOM. I’ve filed bugs to make modifying the DOM inside that callback enter some kind of failure state, but it hasn’t been resolved yet.

The return value of the callback is what promiseDocumentFlushed’s returned Promise resolves with. Once the Promise resolves, it is then safe to modify the DOM.

This mechanism means that if, for some reason, you need to gather information about the size or position of things in the DOM, you can do it without forcing a synchronous layout flush – however, a paint will occur before that information is given to you. So be on the look-out for flicker, since that’s the trade-off here.

And now, here’s a list of the projects that the team has been working on lately:

ClientStorage (In-Progress by Doug Thayer)

The ClientStorage project should allow Firefox to communicate with the GPU more efficiently on macOS, which should hopefully reduce jank on the compositor thread². This is right on the verge of landing³, and we’re very excited to see how this impacts our macOS users!

Init WindowsJumpLists off-main-thread (Completed by Doug Thayer)

The JumpList is a Windows-only feature – essentially an application-specific context menu that opens when you right-click on the application in the task bar. Adding entries to this context menu involves talking to Windows, and unfortunately, the way we were originally doing this involved writing to the disk on the main thread. Thankfully, the API is thread-safe, so Doug was able to move the operation onto a background thread. This is good, because arewesmoothyet was reporting the Windows JumpList code as one of the primary causes of main-thread hangs caused by our front-end code.

Reduce painting while scrolling panels on macOS (Completed by Doug Thayer)

Matt Woodrow noticed that the recently added All Tabs list was performing quite poorly when scrolling it on macOS. After turning on paint-flashing for our browser UI, he noticed that we were re-painting the entire menu every time it scrolled. After some investigation, Matt realized that this was because our Graphics code was skipping some optimizations due to the rounded corners of the panels on macOS. We briefly considered removing the rounded corners on macOS, but then Doug found a more general fix, and now we only re-paint the minimum necessary to scroll the menu, and it’s much smoother!

Make the RemotePageManager lazy (In-Progress by Felipe Gomes)

The RemotePageManager is the way that the parent process communicates with a whitelist of privileged about: pages running in the content process. The RemotePageManager hooks itself in pretty early in a content process’s lifetime, but it’s really only necessary if and when one of those whitelisted about: pages loads. Felipe is working on using some of our new lazy script machinery to load RemotePageManager at the very last moment.

Overhauling about:performance (In-Progress by Florian Quèze)

Florian is working on improving about:performance, with the hopes of making it more useful for browser engineers and users for diagnosing performance problems in Firefox. Here’s a screenshot of what he has so far:

A screenshot of the nascent about:performance showing how much CPU tabs are consuming.

Apparently, mining cryptocurrency takes a lot of CPU!

Thanks to the work of Tarek Ziade, we now have a reliable mechanism for getting information on which tabs are consuming CPU cycles. For example, in the above screenshot, we can see that the coinhive tab that Firefox has open is consuming a bunch of CPU in some workers (mining cryptocurrency). Florian has also been clearing out some of the older code that was supporting about:performance, including the subprocess memory table. This table was useful for our browser engineers when developing and tuning the multi-process project, but we think we can replace it now with something more actionable and relevant to our users. In the meantime, since gathering the memory data causes jank on the main thread, he’s removed the table and the supporting infrastructure. The about:performance work hasn’t landed in the tree yet, but Florian is aiming to get it reviewed and landed (preffed off) soon.

Browser Adjustment Project (In-Progress by Gijs Kruitbosch)

This is a research project to find ways that Firefox can classify the hardware it’s running on, which should make it easier for the browser to make informed decisions on how to deal with things like CPU scheduling, thread and process priority, graphics and UI optimizations, and memory reclamation strategies. This project is still in its early days, but Gijs has already identified prior art and research that we can build upon, and is looking at lightweight ways we can assign grades to a user’s CPU, disk, and graphics hardware. Then the plan is to try hooking that up to the toolkit.cosmeticAnimations pref, to test disabling those animations on weaker hardware. He’s also exploring ways in which the user can override these measurements in the event that they want to bypass the defaults that we choose for each environment.

Avoiding spurious about:blank loads in the parent process (In-Progress by Gijs Kruitbosch)

When we open new browser windows, the initial browser tab inside them runs in the parent process and loads about:blank. Soon after, we do a process flip to load a page in the content process. However, that initial about:blank still has cost, and we think we can avoid it. There’s a test failure that Gijs is grappling with, but after much thorough detective work deep in the complex ball of code that supports our window opening infrastructure, he’s figured out a path forward. We expect this project to be wrapped up soon, which should hopefully make window opening cheaper and also produce less flicker.

Load Activity Stream scripts from ScriptPreloader (Completed by Jay Lim)

Jay has recently made it possible for Activity Stream to load its start-up scripts from the ScriptPreloader. From his local measurements on his MBP, this saves a sizeable chunk of time (around 20-30ms if I recall) on the time to load and render Activity Stream! This optimization is not available, however, unless the separate Activity Stream content process is enabled.

Enable the separate Activity Stream content process by default (In-Progress by Jay Lim)

This project not only ensures that Activity Stream content activity doesn’t impact other tabs (and vice versa), but also allows Firefox to take advantage of the ScriptPreloader to load Activity Stream faster. This does, however, mean an extra process flip when moving from about:home, about:newtab or about:welcome to a new page and back again. Because of this, Jay is having to modify some of our tests to accommodate that, as well as part of our Session Restore code to avoid unnecessary loading indicators when moving between processes.

Defer calculating Activity Stream state until idle (In-Progress by Jay Lim)

When Firefox starts up, one of the first things it prepares to do is show you the Activity Stream page, since that’s the default home and new tab page. Jay thinks we might be able to save the state of Activity Stream at shutdown, and load it again quickly during startup within the content process, and then defer the calculations necessary to produce a more recent state until after the parent process has become idle. We’re unsure yet what this will buy us in terms of start-up speed, but Jay is hacking together a prototype to see. I’m eager to find out!

Grab bag of Notable Performance Work

Luca Greco landed all of the infrastructure to move the WebExtension storage.local backend from a file in the profile directory to indexedDB. This should particularly help the performance of the browser when WebExtensions write small changes to large storage structures, since historically this would cause the entire JSON object for the structure to be recomputed and flushed to disk. This should also help with memory consumption. The infrastructure is disabled by default, and once this bug is fixed, it will be switched on.
Doug Thayer made our layerization logic smarter for pages that historically created many, many layers. This resulted in a nice win on our MotionMark score, and one user reported that it improved power usage as well.
Mark Banner made it so that moving many bookmarks in bulk isn’t nearly as expensive to complete. This dropped the cost of dropping 300 bookmarks with async transactions from ~2s to ~400ms!
Kartikaya Gupta made it so that users of the Gecko Profiler can use <pid>:<thread filter> in the thread filter input to gather samples of particular subprocesses. This will be very handy as we scale up the number of content processes!
Hiroyuki Ikezoe made it so that we more often throttle computations for transform animations for out-of-view elements.
Gijs Kruitbosch made it so that our DevTools don’t cause synchronous layout flushes when resizing the Inspector pane.
Kris Maglione made it so that we more lazily load PluginContent.jsm, which should result in a content process start-up and memory win.
Anny Gakhokize made it so that instead of sending 8 synchronous IPC messages to retrieve supported clipboard data types, we only send 1 with all of the necessary information.
Marco Bonardo fixed a very important Places regression, where an entire table was being recalculated when deleting certain records.
Dave Townsend fixed an issue where we were requesting the favicon for new pages twice instead of once. This resulted in a 2%-3% win on our internal session restoration bench on 64-bit Linux!
PSPDFKit noted that Firefox is absolutely crushing it at WebAssembly performance.
Andrew Swan enabled the delayed background page start-up optimization for WebExtensions by default, and it should ride out in the Firefox 63 release!
Blake Kaplan got rid of the PBrowser::Msg_GetTabCount synchronous IPC message!
The Graphics team has enabled WebRender by default for a subset of our Nightly population to test it. If you’re in that group, please file bugs if you see them! Check about:studies to see if you’re in the testing group.

Thank you Jay Lim!

As I draw this update to a close, I want to give a shout-out to my intern and colleague Jay Lim, whose internship is ending in a few short days. Jay took to performance work like a duck in water, and his energy, ideas and work were greatly appreciated! Thank you so much, Jay!

By “natural”, I mean a layout flush triggered by the refresh driver, and not by some JavaScript requesting size or position information on a dirty DOM ↩
And when it comes to smoothness and responsiveness, jank on the compositor thread is deadly ↩
it landed and bounced once due to a crash test failure, but Doug has just gotten a fix for it approved ↩

Firefox Performance Update #9

Hello, Internet! Here we are with yet another Firefox Performance Update for your consumption. Hold onto your hats – we’re going in!

But first a word from our sponsor: ScriptPreloader!

A lot of the Firefox front-end is written using JavaScript. With the possible exception of system add-ons that update outside of the normal release cycle, these scripts tend to be the same until you update.

About a year ago, Mozilla developer Kris Maglione had an idea: let’s try to optimize browser start time by noticing which scripts are being loaded during start-up, and then converting those scripts into a binary representation¹ that we can cache on disk. That way, next time we start up, we can just grab the cached binaries off of the disk, skip the parsing step and start executing the JavaScript right away.

Long-time Mozillians might know that we already do some aggressive caching to improve start time for things like XUL, XBL, manifests and other things that are read at start-up. I think we actually were already caching JavaScript files too – but I don’t think we were storing them pre-parsed. And the old caching stuff was definitely not caching scripts that were loading in content processes (since content processes didn’t exist when the old caching stuff was designed).

At any rate, my understanding is that the ScriptPreloader pays attention to script loads between main process start and the point where the first browser window fires the “browser-delayed-startup-finished” observer notification (after the window paints and does post-painting script loading). At that point, the ScriptPreloader examines the list of scripts that the parent and content processes have loaded, and² writes their pre-parsed bytecode representation to disk.

After that cache is written, the next time the main process or content processes start up, the cache is checked for the binary data. If it exists, this means that we can skip the parsing step. The ScriptPreloader goes one step further and starts to “decode”³ that binary format off of the main thread, even before those scripts are requested. Then, when the scripts are finally requested, they’re very much ready to execute right away.

When the ScriptPreloader landed, we saw some really nice wins in our start-up performance!

I’m now working on a series of patches in this bug that will widen the window of time where we note scripts that we can cache. This will hopefully improve the speed of privileged scripts that run up until the idle point of the first browser window.

And now for some Performance Project updates!

Early first blank paint (lead by Florian Quèze)

User Research has hired a contractor to perform a study to validate our hypothesis that the early first blank paint perceived performance optimization will make Firefox seem like it’s starting faster. More data to come out of that soon!

Faster content process start-up time (lead by Felipe Gomes)

The patches that Felipe wrote a few weeks back have landed and have had a positive impact! The proof is in the pudding – let’s look at some graphs:

The cpstartup impact. Those two clusters are test runs “before” and “after” Felipe’s patches landed, respectively.

The above graph shows a nice drop in the cpstartup Talos test. The cpstartup test measures the time it takes to boot up the content process and have it be ready to show you web pages.

This is a screen capture of a Base Content JS improvement in the AreWeSlimYet test. This graph measures the amount of memory that content processes consume via JavaScript not long after starting up.

In the graph above, we can see that the patches also helped reduce the memory that content processes use by default, by making more scripts only load when they’re needed.

It’s always nice to see our work have an impact in our graphs. Great work, Felipe! Keep it up!

LRU cache for tab layers (lead by Doug Thayer)

The patch to introduce the LRU cache landed last week, and was enabled for a few days so we could collect some data on its performance impact.

The good news is that it appears that this has had a significant and positive impact on tab switch times – tab switch times went down, and the number of Nightly instances reporting tab switch spinners went down by about 10%. Great work, Doug!

A number of bugs were filed against the original bug due to some glitchy edge-cases that we don’t handle well just yet.

We also detected a ~8% resident memory regression in our automated testing suites. This was expected (keeping layers around isn’t free!) and gave us a sense of how much memory we might consume were we to enable this by default.

The experiment is concluded for now, and we’re going to disable the cache for a bit while we think about ways to improve the implementation.

ClientStorageTextureSource for macOS (lead by Doug Thayer)

This project should allow us to be more efficient when uploading layers to the compositor on macOS. Doug has solved the crashing issues he was getting in automation(yay!), and is now attempting to figure out some Talos regressions on the MotionMark test suite. Deeper profiling is likely required to untangle what’s happening there.

Swapping DataURLs for Blobs in Activity Stream (lead by Jay Lim)

Jay’s patch to swap out DataURLs for Blobs for Activity Stream images has passed a first round of review from Mardak! He’s now waiting for a second review from k88hudson, and then hopefully this can land and give us a bit of a memory win. Having done some analysis, we expect this buy back quite a bit of memory that was being contained within those long DataURL strings.

Caching Activity Stream JS in the JS Bytecode Cache (lead by Jay Lim)

After examining the JavaScript Bytecode Cache that’s used for Web Content, Jay has determined that it’s really not the right mechanism for caching the Activity Steam scripts.

However, that ScriptPreloader that I was talking about earlier sounds like a much more reasonable candidate. Jay is now doing a deep dive on the ScriptPreloader to see whether or not the Activity Stream scripts are already being cached – and if not, why not.

Tab warming (lead by Mike Conley)

No news is good news here. Tab warming continues to ride and no new bugs have been filed. The work to reduce the number of paints when warming tabs has stalled a bit while I dealt with a rather strange cpstartup Talos regression. Ultimately, I think I can get rid of the second paint when warming by keeping background tabs display port suppressed⁴, and then only triggering the display port unsuppression after a tab switch. This will happily take advantage of a painting mechanism that Doug Thayer put in as part of the LRU cache experiment.

Firefox’s Most Wanted: Performance Wins (lead by YOU!)

Before we go into the grab-bag list of performance-related fixes – have you seen any patches landing that should positively impact Firefox’s performance? Let me know about it so I can include it in the list, and give appropriate shout-outs to all of the great work going on! That link again!

Grab-bag time

And now, without further ado, a list of performance work that took place in the tree:

(🌟 indicates a volunteer contributor)

Kris Maglione added helpers for generating QueryInterface functions on JS objects in native code to cut down on Native Code -> JS border crossings. Specifically, this should speed up situations where native code is calling QueryInterface on JS-implemented XPCOM components. This also means hand-rolling QueryInterface is no longer necessary, and is actively discouraged.
Jon Coppeard fixed a particularly bad Cycle Collector performance regression that was occurring when certain add-ons were installed. This fix was deemed important enough to ride along to the 60.0.1 builds (both release channel and ESR).
Gabriel Luong made it so that resizing the Inspector pane with the DevTools Toolbox open doesn’t cause layout flushes for every resize event. Instead, it throttles them via an idleCallback. This bug tracks the work to remove the layout flush entirely.
Ryan Hunt made it so that paint threads (see Off Main Thread Painting) no longer block Display List and Frame Layer building. This means we can get more done before offloading instructions to the paint threads, and this gives paint threads more time to finish up their work, increasing the probability that they’ll be free by the time the transaction to the paint thread needs to take place.
Felipe Gomes removed a bunch of unnecessary code for the old about:home that was still registering event listeners and consuming memory despite never being used anymore. This also shrunk our installer size down a bit, since it got rid of around 200kB worth of images and ~1600 lines of JS!
Marco Bonardo got rid of a sync layout flush that would occur when starting the browser or opening new windows with the Bookmarks Toolbar enabled.
Xidorn Quan identified a regression in Firefox 61+ where DOMSubtreeModified events were being nested and could result in infinite recursion on some sites, making the site freeze. The regressing changeset has been backed out in Firefox 61, and in Firefox 62+, DOMSubtreeModified and DOMAttrModified events will no longer fire when style attributes change.

Thanks, folks!

XDR, I think? ↩
My understanding breaks down here a little ↩
I assume that’s a type of de-serialization ↩
This is an optimization that we do that shrinks the painted area to just the region that’s visible to the browser. We normally paint a bit outside the viewable area so that it’s ready when a user starts scrolling ↩