Firefox Performance Update #9

Hello, Internet! Here we are with yet another Firefox Performance Update for your consumption. Hold onto your hats – we’re going in!

But first a word from our sponsor: ScriptPreloader!

A lot of the Firefox front-end is written using JavaScript. With the possible exception of system add-ons that update outside of the normal release cycle, these scripts tend to be the same until you update.

About a year ago, Mozilla developer Kris Maglione had an idea: let’s try to optimize browser start time by noticing which scripts are being loaded during start-up, and then converting those scripts into a binary representation¹ that we can cache on disk. That way, next time we start up, we can just grab the cached binaries off of the disk, skip the parsing step and start executing the JavaScript right away.

Long-time Mozillians might know that we already do some aggressive caching to improve start time for things like XUL, XBL, manifests and other things that are read at start-up. I think we actually were already caching JavaScript files too – but I don’t think we were storing them pre-parsed. And the old caching stuff was definitely not caching scripts that were loading in content processes (since content processes didn’t exist when the old caching stuff was designed).

At any rate, my understanding is that the ScriptPreloader pays attention to script loads between main process start and the point where the first browser window fires the “browser-delayed-startup-finished” observer notification (after the window paints and does post-painting script loading). At that point, the ScriptPreloader examines the list of scripts that the parent and content processes have loaded, and² writes their pre-parsed bytecode representation to disk.

After that cache is written, the next time the main process or content processes start up, the cache is checked for the binary data. If it exists, this means that we can skip the parsing step. The ScriptPreloader goes one step further and starts to “decode”³ that binary format off of the main thread, even before those scripts are requested. Then, when the scripts are finally requested, they’re very much ready to execute right away.

When the ScriptPreloader landed, we saw some really nice wins in our start-up performance!

I’m now working on a series of patches in this bug that will widen the window of time where we note scripts that we can cache. This will hopefully improve the speed of privileged scripts that run up until the idle point of the first browser window.

And now for some Performance Project updates!

Early first blank paint (lead by Florian Quèze)

User Research has hired a contractor to perform a study to validate our hypothesis that the early first blank paint perceived performance optimization will make Firefox seem like it’s starting faster. More data to come out of that soon!

Faster content process start-up time (lead by Felipe Gomes)

The patches that Felipe wrote a few weeks back have landed and have had a positive impact! The proof is in the pudding – let’s look at some graphs:

The cpstartup impact. Those two clusters are test runs “before” and “after” Felipe’s patches landed, respectively.

The above graph shows a nice drop in the cpstartup Talos test. The cpstartup test measures the time it takes to boot up the content process and have it be ready to show you web pages.

This is a screen capture of a Base Content JS improvement in the AreWeSlimYet test. This graph measures the amount of memory that content processes consume via JavaScript not long after starting up.

In the graph above, we can see that the patches also helped reduce the memory that content processes use by default, by making more scripts only load when they’re needed.

It’s always nice to see our work have an impact in our graphs. Great work, Felipe! Keep it up!

LRU cache for tab layers (lead by Doug Thayer)

The patch to introduce the LRU cache landed last week, and was enabled for a few days so we could collect some data on its performance impact.

The good news is that it appears that this has had a significant and positive impact on tab switch times – tab switch times went down, and the number of Nightly instances reporting tab switch spinners went down by about 10%. Great work, Doug!

A number of bugs were filed against the original bug due to some glitchy edge-cases that we don’t handle well just yet.

We also detected a ~8% resident memory regression in our automated testing suites. This was expected (keeping layers around isn’t free!) and gave us a sense of how much memory we might consume were we to enable this by default.

The experiment is concluded for now, and we’re going to disable the cache for a bit while we think about ways to improve the implementation.

ClientStorageTextureSource for macOS (lead by Doug Thayer)

This project should allow us to be more efficient when uploading layers to the compositor on macOS. Doug has solved the crashing issues he was getting in automation(yay!), and is now attempting to figure out some Talos regressions on the MotionMark test suite. Deeper profiling is likely required to untangle what’s happening there.

Swapping DataURLs for Blobs in Activity Stream (lead by Jay Lim)

Jay’s patch to swap out DataURLs for Blobs for Activity Stream images has passed a first round of review from Mardak! He’s now waiting for a second review from k88hudson, and then hopefully this can land and give us a bit of a memory win. Having done some analysis, we expect this buy back quite a bit of memory that was being contained within those long DataURL strings.

Caching Activity Stream JS in the JS Bytecode Cache (lead by Jay Lim)

After examining the JavaScript Bytecode Cache that’s used for Web Content, Jay has determined that it’s really not the right mechanism for caching the Activity Steam scripts.

However, that ScriptPreloader that I was talking about earlier sounds like a much more reasonable candidate. Jay is now doing a deep dive on the ScriptPreloader to see whether or not the Activity Stream scripts are already being cached – and if not, why not.

Tab warming (lead by Mike Conley)

No news is good news here. Tab warming continues to ride and no new bugs have been filed. The work to reduce the number of paints when warming tabs has stalled a bit while I dealt with a rather strange cpstartup Talos regression. Ultimately, I think I can get rid of the second paint when warming by keeping background tabs display port suppressed⁴, and then only triggering the display port unsuppression after a tab switch. This will happily take advantage of a painting mechanism that Doug Thayer put in as part of the LRU cache experiment.

Firefox’s Most Wanted: Performance Wins (lead by YOU!)

Before we go into the grab-bag list of performance-related fixes – have you seen any patches landing that should positively impact Firefox’s performance? Let me know about it so I can include it in the list, and give appropriate shout-outs to all of the great work going on! That link again!

Grab-bag time

And now, without further ado, a list of performance work that took place in the tree:

(🌟 indicates a volunteer contributor)

Kris Maglione added helpers for generating QueryInterface functions on JS objects in native code to cut down on Native Code -> JS border crossings. Specifically, this should speed up situations where native code is calling QueryInterface on JS-implemented XPCOM components. This also means hand-rolling QueryInterface is no longer necessary, and is actively discouraged.
Jon Coppeard fixed a particularly bad Cycle Collector performance regression that was occurring when certain add-ons were installed. This fix was deemed important enough to ride along to the 60.0.1 builds (both release channel and ESR).
Gabriel Luong made it so that resizing the Inspector pane with the DevTools Toolbox open doesn’t cause layout flushes for every resize event. Instead, it throttles them via an idleCallback. This bug tracks the work to remove the layout flush entirely.
Ryan Hunt made it so that paint threads (see Off Main Thread Painting) no longer block Display List and Frame Layer building. This means we can get more done before offloading instructions to the paint threads, and this gives paint threads more time to finish up their work, increasing the probability that they’ll be free by the time the transaction to the paint thread needs to take place.
Felipe Gomes removed a bunch of unnecessary code for the old about:home that was still registering event listeners and consuming memory despite never being used anymore. This also shrunk our installer size down a bit, since it got rid of around 200kB worth of images and ~1600 lines of JS!
Marco Bonardo got rid of a sync layout flush that would occur when starting the browser or opening new windows with the Bookmarks Toolbar enabled.
Xidorn Quan identified a regression in Firefox 61+ where DOMSubtreeModified events were being nested and could result in infinite recursion on some sites, making the site freeze. The regressing changeset has been backed out in Firefox 61, and in Firefox 62+, DOMSubtreeModified and DOMAttrModified events will no longer fire when style attributes change.

Thanks, folks!

XDR, I think? ↩
My understanding breaks down here a little ↩
I assume that’s a type of de-serialization ↩
This is an optimization that we do that shrinks the painted area to just the region that’s visible to the browser. We normally paint a bit outside the viewable area so that it’s ready when a user starts scrolling ↩

Firefox Performance Update #8

Howdy folks! Another Firefox Performance Update coming at you. Buckle up.

But first a word from our sponsor: Talos!

Talos is a framework that we use to measure various aspects of Firefox performance as part of our continuous integration pipeline.

There are a number of Talos “suites”, where each suite contains some number of tests. These tests, in turn, report some set of numbers that are then stored and graphable via our graph viewer here.

Here’s a full list of the Talos tests, including their purpose, the sorts of measurements they take, and who’s currently a good person to ask about them if you have questions.

A lot of work has been done to reduce the amount of noise in our Talos tests, but they’re still quite sensitive and noisy. This is why it’s often necessary to do 5-10 retriggers of Talos test runs in order to do meaningful comparisons.

Sometimes Talos detects regressions that aren’t actually real regressions¹, and that can be a pain. However, for the times where real regressions are caught, Talos usually lets us know much faster than Telemetry or user reports.

Did you know that you can get profiles from Try for Talos runs? This makes it much simpler to diagnose Talos regressions. Also, we now have Talos profiles being generated on our Nightly builds for added convenience!

And now for some Performance Project updates!

Early first blank paint (lead by Florian Quèze)

No new bugs have been filed against the feature yet from our beta population, and we are seeing an unsurprising drop in the time-to-first-paint probe on that channel. User Research is in the process of getting a (very!) quick study launched to verify our assumption that users will perceive the first blank paint as the browser having started more quickly.

Faster content process start-up time (lead by Felipe Gomes)

Felipe has some patches up for review to make our frame scripts as lazy as possible. To support that, he’s added some neat infrastructure using Proxy and Reflect to make it possible to create an object that can be registered as an event handler or observer, and only load the associated script when the events / observer notifications actually fire.

We’re excited to see how this work impacts our memory and content process start-up graphs!

LRU cache for tab layers (lead by Doug Thayer)

The patch to introduce the LRU cache landed and bounced a few times. There appears to be an invalidation bug with the approach that needs to be ironed out first. dthayer has a plan to address this (forcing re-paints when switching to a tab that’s already rendered in the background), and is just waiting for review.

ClientStorageTextureSource for macOS (lead by Doug Thayer)

Doug is working on finishing a project that should allow us to be more efficient when uploading things to the compositor on macOS (by handing memory over to the GPU rather than copying it). He’s currently dealing with strange crashes that he can only reproduce on Try. Somehow, Doug seems to always run into the weird bugs that only appear in automation, and the whole team is crossing our fingers for him on this one.

Swapping DataURLs for Blobs in Activity Stream (lead by Jay Lim)

Our new intern Jay Lim is diving right into performance work, and already has his first patch up. This patch makes it so that Activity Stream no longer uses DataURLs to serialize images down to the content process, and instead uses Blobs and Blob URLs. This should allow the underlying infrastructure to make better use of memory, as well as avoiding the cost of converting images to and from DataURLs.

Caching Activity Stream JS in the JS Bytecode Cache (lead by Jay Lim)

This project is still in the research phase. Jay is trying to determine if it’s possible to stash the parsed Activity Stream JS code in the JS bytecode cache that we normally use for webpages. We’re still evaluating how much this would save us on page load, and we’re also still evaluating the cost of modifying the underlying infrastructure to allow this. Stay tuned for updates.

AwesomeBar improvements (led by Gijs Kruitbosch)

Gijs has started this work by making it much cheaper to display long URLs in the AwesomeBar. This is particularly useful for DataURLs that might happen to be in your browsing history for some reason!

This is a long-pull effort, so expect this work to be spread out over a bunch of bugs.

Tab warming (lead by Mike Conley)

I’ve been focusing on determining why warming tabs seems to result in two consecutive paints. My findings are here, and I suspect that in the warming case, the second paint is avoidable. I suspect that this, coupled with dthayer’s work on ClientStorageTextureSource will greatly improve tab warming’s performance on macOS, and allow us to ship on that OS.

Firefox’s Most Wanted: Performance Wins (lead by YOU!)

Grab-bag time

And now, without further ado, a list of performance work that took place in the tree:

(🌟 indicates a volunteer contributor)

Emilio Cobos Álvarez made it so that we do less work dealing with fonts in Stylo code. This appears to impact YouTube pages, so that’s a nice win!
Marco Bonardo made it so that our Windows Jump Lists code no longer uses synchronous Places APIs to determine if browser history is empty.
Andrew Swan made it so that background pages for WebExtensions can have their loading deferred. It’s quite a clever optimization and has resulted in nice improvements on a number of our start-up performance benchmarks! Great work!
Andrew McCreight made it so that we can do less work when cycle collecting sometimes! This appears to fix some multi-second hangs that some of our users have been seeing on popular sites like Google Inbox.
Matt Woodrow made it so that we schedule paints in a more intelligent way, especially when painting is recovering from falling behind vsync.

Thanks, folks!

Sometimes, for example, the test is just measuring the wrong thing. ↩

Firefox Performance Update #7

G’day folks, just another Firefox Performance Update coming down the pike¹ for you, so strap in.

But first a word from our sponsor: health.graphics!

This performance update is brought to you by The Quantum Dashboards at health.graphics! The first step to changing something is to measure it over time, and once you have those measurements, it’s usually a good idea to find some kind of visual representation for that measurement so that you can track your progress.

Contrary to the domain name, health.graphics measures much more than just the health of our graphics layer. The dashboards at health.graphics show visualizations for a bunch of measurements that we care about – from crash rates, to platform feature state, to raw performance numbers, these dashboards help us make sure that we’re not back-sliding on things that truly matter to us and our users.

And now for some Performance Project updates!

Early first blank paint (lead by Florian Quèze)

Florian sent out an Intent to Ship for this perceived performance optimization for Firefox 61. The beta channel will transition to 61 in a bit over a week, and we’ll use that cycle to ensure that the feature should ship out to release.

Faster content process start-up time (lead by Felipe Gomes)

After some research and examination of how our content processes initialize themselves, the first few bugs have started to get filed to get fixed. This bug, for example, is for introducing infrastructure to make the privileged JavaScript loaded in the content processes more lazy. Another bug was filed to shine some light on the “dark matter” that exists at the start of many content processes in our profiler tools.

Get ContentPrefService init off of the main thread (lead by Doug Thayer)

After much heroic effort, this has landed! If you’re curious about the shutdown leak that was preventing this from landing earlier, here’s the patch that fixed it. Spoiler alert: it was the spellchecker, of all things.

This project is done, and will be removed from the updates from here forward.

Blocklist async-ification (lead by Gijs Kruitbosch)

As of a few days ago, all public blocklist API calls are asynchronous! This was a monumental effort from Gijs, and should result in faster start-up times for some of our users (especially ones with slower magnetic disks).

There are still some very minor internal mechanisms that can still cause the blocklist to be loaded synchronously, but hitting these should be super rare. In the meantime, now that the async-ification is complete, we have an eye towards migrating the back-end to indexedDB.

As the async-ification is wrapped, we’ll be removing this section from the updates from here forward.

LRU cache for tab layers (lead by Doug Thayer)

The patch to introduce the LRU cache have been written and are just waiting until the 62 cycle to begin on Nightly in order to land. No doubt there’ll be some interesting edge-cases to hammer out, but we’re very excited to see how this improves tab switching times for our users!

Tab warming (lead by Mike Conley)

After sending out an Intent to Ship, the prefs to allow Tab warming to ride to release on Windows and Linux were flipped. If all goes well on Beta, Windows and Linux Desktop users should see some nice tab switching performance improvements in Firefox 61!

While investigating the behaviour that’s preventing us from shipping tab warming on macOS, a new bug was filed to try to reduce the number of paints that are occurring during tab switches.

Firefox’s Most Wanted: Performance Wins (lead by YOU!)

Grab-bag time

And now, without further ado, a list of performance work that took place in the tree:

(🌟 indicates a volunteer contributor)

Daniel Holbert made it so that we’re more efficient when calculating the sizes of flex containers
Mike de Boer made it so that when you restore a session with several windows, the most recently used window appears first and stays on top during the restore – this means users have to spend less time getting back to what they were doing.
Kris Maglione flipped the pref that loads WebExtension background scripts in their own separate process for macOS! This was previously only supported on Windows. This is great for stability, security and performance! Kris also optimized some Add-on Manager code that runs on every start-up, which should gain us a nice start-up performance win
Marco Bonardo made it so that we don’t send unnecessary Download API updates when adding session history entries
Matt Woodrow got rid of an unneeded hash table and all associated look-ups in our retained displaylist code
Bas Schouten added some SIMD optimizations to our 2D geometry code. Bas says that, depending on the circumstances, this should buy us a 2x-4x performance boost for some frequently used geometric calculations.
Hiroyuki Ikezoe made it so that we now skip running calculations for animations that should not have changed yet
Mike Conley made it so that we can avoid some main-thread IO when initializing the Gecko Media Plug-ins backend.
Gijs Kruitbosch made it so that we do less synchronous layout flushing when resizing browser windows
Andrew McCreight fixed a rather nasty ghost window leak, which potentially means less cycle and garbage collecting for some users.

Thanks, folks!

I used to think it was pipe. It’s pike. ↩

Firefox Performance Update #6

Hi there folks, just another Firefox Performance update coming at you here.

These updates are going to shift format slightly. I’m going to start by highlighting the status of some of the projects the Firefox Performance Team (the front-end team working to make Firefox snappy AF), and then go into the grab-bag list of improvements that we’ve seen landing in the tree.

But first a word from our sponsor: arewesmoothyet.com!

This performance update is brought to you by arewesmoothyet.com! On Nightly versions of Firefox, a component called BackgroundHangReporter (or “BHR”) notices anytime the main-threads hang too long, and then collect a stack to send via Telemetry. We’ve been doing this for years, but we’ve never really had a great way of visualizing or making use of the data¹. Enter arewesmoothyet.com by Doug Thayer! Initially a fork of perf.html, awsy.com lets us see graphs of hangs on Nightly broken down by category², and then also lets us explore the individual stacks that have come in using a perf.html-like interface! (You might need to be patient on that last link – it’s a lot of data to download).

Hot damn! Note the finer-grain categories showing up on April 1st.

Early first blank paint (lead by Florian Quèze)

This is a start-up perceived performance project where early in the executable life-cycle, long before we’ve figured out how to layout and paint the browser UI, we show a white “blank” area on screen that is overtaken with the UI once it’s ready. The idea here is to avoid having the user stare at nothing after clicking on the Firefox icon. We’ll also naturally be working to reduce the amount of time that the blank window appears for users, but our research shows users feel like the browser starts-up faster when we show something rather than nothing. Even if that nothing is… well, mostly nothing. Florian recently landed a Telemetry probe for this feature, made it so that we can avoid initting the GPU process for the blank window, and is in the midst of fixing an issue where the blank window appears for too long. We’re hoping to have this ready to ship enabled on some platforms (ideally Linux and Windows) in Firefox 61.

Faster content process start-up time (lead by Felipe Gomes)

Explorations are just beginning here. Felipe has been examining the scripts that are running for each tab on creation, and has a few ideas on how to both reduce their parsing overhead, as well as making them lazier to load. This project is mostly at the research stage. Expect concrete details on sub-projects and linked bugs soon!

Get ContentPrefService init off of the main thread (lead by Doug Thayer)

This is so, so close to being done. The patch is written and reviewed, but landing it is being stymied by a hard-to-reproduce locally but super-easy-to-reproduce-in-automation shutdown leak during test runs. Unfortunately, the last 10% sometimes takes 90% of the effort, and this looks like one of those cases.

Blocklist improvements (lead by Gijs Kruitbosch)

Gijs is continuing to make our blocklist asynchronous. Recently, he made the getAddonBlocklistEntry method of the API asynchronous, which is a big-deal for start-up, since it means we drop another place where the front-end has to wait for the blocklist to be ready! The getAddonBlocklistState method is next on the list.

As a fun exercise, you can follow the “true” value for the BLOCKLIST_SYNC_FILE_LOAD probe via this graph, and watch while Gijs buries it into the ground.

LRU cache for tab layers (lead by Doug Thayer)

Doug Thayer is following up on some research done a few years ago that suggests that we can make ~95% of our user’s tab switches feel instantaneous by implementing an LRU cache for the painted layers. This is a classic space-time trade-off, as the cache will necessarily consume memory in order to hold onto the layers. Research is currently underway here to see how we can continue to improve our tab switching performance without losing out on the memory advantage that we tend to have over other browsers.

Tab warming (lead by Mike Conley)

Tab warming has been enabled on Nightly for a few weeks, and besides one rather serious glitch that we’ve fixed, we’ve been pretty pleased with the result! There’s one issue on macOS that’s been mulled over, but at this point I’m starting to lean towards getting this shipped on at least Windows for the Firefox 61 release.

Firefox’s Most Wanted: Performance Wins (lead by YOU!)

Grab-bag time

And now, without further ado, a list of performance work that took place in the tree:

(🌟 indicates a volunteer contributor)

Rob Wood and Greg Tatum made it so that Nightly profiled Talos jobs now link directly to perf.html from Treeherder!
Gijs Kruitbosch made it so that we don’t send a sync IPC message every time you keypress with the Findbar open!
Kris Maglione made it so that we recycle the panel used by WebExtensions, which should make them cheaper to open
Kartikaya Gupta made it so that painting a TicketMaster page showing the seating diagram of a venue much, much faster
Florian Quèze made it so that we spend far less resources painting the progress bars for downloads
Ludovic Hirlimann got rid of the Presentation API system add-on, since it was running some code during start-up on Nightly, but wasn’t actually being used anywhere.

Thanks to all of you! Keep it coming!

Pro-tip: if you’re collecting data, consider figuring out how you want to visualize it first, and then make sure that visualization work actually happens. ↩
since April 1st, these categories have gotten a lot finer-grain ↩

Firefox Performance Update #5

And here we are with another Firefox Performance Update!

This performance update is brought to you by perf.html! perf.html is our web-based profile analysis tool. We use it to analyze profiles gathered with the Gecko Profiler Add-on which helps us figure out why Firefox is feeling slow or sluggish. It’s probably the most important performance tool in our toolbox.

Before we go into the list – have you seen any patches landing that should positively impact Firefox’s performance? Let me know about it so I can include it in the list, and give appropriate shout-outs to all of the great work going on! That link again!

And now, without further ado, our list of excellent performance work:

(🌟 indicates a volunteer contributor)

🌟 Oriol Brufau optimized the tabs WebExtension API for users that have many tabs open!
This happened a while back, but I thought it was worth mentioning: Andrea Marchesini made it so that profile deletion in about:profiles occurs off of the main thread
Steve Fink made our cycle-collector operations cheaper in some situations
Dão Gottwald got rid of a potential style flush when opening new windows
Kris Maglione made the PageAction WebExtension API much cheaper to use
Marco Bonardo got rid of some potential main-thread IO by getting rid of mimeTypes.rdf
Jonathan Kew made it take less CPU to scroll the newsfeed on Facebook on macOS
Mark Banner extended our asynchronous Places API, which should hopefully make it easier to move more Places operations off of the main thread
Hiroyuki Ikezoe made it so that we can throttle down more hidden animations, which should help reduce CPU usage on pages like Reddit (which, for some users, has a hidden animation running in a collapsed chat popup)
Florian Quèze got us some nice sessionrestore Talos benchmark wins on Linux after refactoring how the first browser window is opened
Rob Wood re-enabled Nightly Talos jobs on Windows. This should make it easier for us to hunt down the causes of Talos regressions.
Doug Thayer made our BHR data for tab switch hangs actually useful. He’s also making the BHR dashboard much easier and faster to use!
Ludovic Hirlimann combined some scripts to reduce the number of files that we have to load during start-up

Thanks to all of you! Keep it coming!