Author Archives: Mike

Firefox Performance Update #10

Hey folks – another Performance Update coming at you! It’s been a few weeks since I posted one of these, mostly due to travel, holidays and the Mozilla SF All-Hands. However, we certainly haven’t been idle during that time. Much work has been done Performance-wise, and there’s a lot to tell. So strap in! But first…

This Performance Update is brought to you by: promiseDocumentFlushed

promiseDocumentFlushed is a utility that’s available for browser engineers in chrome documents on the window global. The goal of promiseDocumentFlushed is to help avoid synchronous layout flushes in our JavaScript code by scheduling work to only occur after the next “natural” layout flush occurs1.

promiseDocumentFlushed takes a function and returns a Promise. The function it takes will run the next time a natural layout flush and paint has finished occurring. At this point, the DOM should not be “dirty”, and size and position queries should be very cheap to calculate. It is critically important for the callback to not modify the DOM. I’ve filed bugs to make modifying the DOM inside that callback enter some kind of failure state, but it hasn’t been resolved yet.

The return value of the callback is what promiseDocumentFlushed’s returned Promise resolves with. Once the Promise resolves, it is then safe to modify the DOM.

This mechanism means that if, for some reason, you need to gather information about the size or position of things in the DOM, you can do it without forcing a synchronous layout flush – however, a paint will occur before that information is given to you. So be on the look-out for flicker, since that’s the trade-off here.

And now, here’s a list of the projects that the team has been working on lately:

ClientStorage (In-Progress by Doug Thayer)

The ClientStorage project should allow Firefox to communicate with the GPU more efficiently on macOS, which should hopefully reduce jank on the compositor thread2. This is right on the verge of landing3, and we’re very excited to see how this impacts our macOS users!

Init WindowsJumpLists off-main-thread (Completed by Doug Thayer)

The JumpList is a Windows-only feature – essentially an application-specific context menu that opens when you right-click on the application in the task bar. Adding entries to this context menu involves talking to Windows, and unfortunately, the way we were originally doing this involved writing to the disk on the main thread. Thankfully, the API is thread-safe, so Doug was able to move the operation onto a background thread. This is good, because arewesmoothyet was reporting the Windows JumpList code as one of the primary causes of main-thread hangs caused by our front-end code.

Reduce painting while scrolling panels on macOS (Completed by Doug Thayer)

Matt Woodrow noticed that the recently added All Tabs list was performing quite poorly when scrolling it on macOS. After turning on paint-flashing for our browser UI, he noticed that we were re-painting the entire menu every time it scrolled. After some investigation, Matt realized that this was because our Graphics code was skipping some optimizations due to the rounded corners of the panels on macOS. We briefly considered removing the rounded corners on macOS, but then Doug found a more general fix, and now we only re-paint the minimum necessary to scroll the menu, and it’s much smoother!

Make the RemotePageManager lazy (In-Progress by Felipe Gomes)

The RemotePageManager is the way that the parent process communicates with a whitelist of privileged about: pages running in the content process. The RemotePageManager hooks itself in pretty early in a content process’s lifetime, but it’s really only necessary if and when one of those whitelisted about: pages loads. Felipe is working on using some of our new lazy script machinery to load RemotePageManager at the very last moment.

Overhauling about:performance (In-Progress by Florian Quèze)

Florian is working on improving about:performance, with the hopes of making it more useful for browser engineers and users for diagnosing performance problems in Firefox. Here’s a screenshot of what he has so far:

A screenshot of the nascent about:performance showing how much CPU tabs are consuming.

Apparently, mining cryptocurrency takes a lot of CPU!

Thanks to the work of Tarek Ziade, we now have a reliable mechanism for getting information on which tabs are consuming CPU cycles. For example, in the above screenshot, we can see that the coinhive tab that Firefox has open is consuming a bunch of CPU in some workers (mining cryptocurrency). Florian has also been clearing out some of the older code that was supporting about:performance, including the subprocess memory table. This table was useful for our browser engineers when developing and tuning the multi-process project, but we think we can replace it now with something more actionable and relevant to our users. In the meantime, since gathering the memory data causes jank on the main thread, he’s removed the table and the supporting infrastructure. The about:performance work hasn’t landed in the tree yet, but Florian is aiming to get it reviewed and landed (preffed off) soon.

Browser Adjustment Project (In-Progress by Gijs Kruitbosch)

This is a research project to find ways that Firefox can classify the hardware it’s running on, which should make it easier for the browser to make informed decisions on how to deal with things like CPU scheduling, thread and process priority, graphics and UI optimizations, and memory reclamation strategies. This project is still in its early days, but Gijs has already identified prior art and research that we can build upon, and is looking at lightweight ways we can assign grades to a user’s CPU, disk, and graphics hardware. Then the plan is to try hooking that up to the toolkit.cosmeticAnimations pref, to test disabling those animations on weaker hardware. He’s also exploring ways in which the user can override these measurements in the event that they want to bypass the defaults that we choose for each environment.

Avoiding spurious about:blank loads in the parent process (In-Progress by Gijs Kruitbosch)

When we open new browser windows, the initial browser tab inside them runs in the parent process and loads about:blank. Soon after, we do a process flip to load a page in the content process. However, that initial about:blank still has cost, and we think we can avoid it. There’s a test failure that Gijs is grappling with, but after much thorough detective work deep in the complex ball of code that supports our window opening infrastructure, he’s figured out a path forward. We expect this project to be wrapped up soon, which should hopefully make window opening cheaper and also produce less flicker.

Load Activity Stream scripts from ScriptPreloader (Completed by Jay Lim)

Jay has recently made it possible for Activity Stream to load its start-up scripts from the ScriptPreloader. From his local measurements on his MBP, this saves a sizeable chunk of time (around 20-30ms if I recall) on the time to load and render Activity Stream! This optimization is not available, however, unless the separate Activity Stream content process is enabled.

Enable the separate Activity Stream content process by default (In-Progress by Jay Lim)

This project not only ensures that Activity Stream content activity doesn’t impact other tabs (and vice versa), but also allows Firefox to take advantage of the ScriptPreloader to load Activity Stream faster. This does, however, mean an extra process flip when moving from about:home, about:newtab or about:welcome to a new page and back again. Because of this, Jay is having to modify some of our tests to accommodate that, as well as part of our Session Restore code to avoid unnecessary loading indicators when moving between processes.

Defer calculating Activity Stream state until idle (In-Progress by Jay Lim)

When Firefox starts up, one of the first things it prepares to do is show you the Activity Stream page, since that’s the default home and new tab page. Jay thinks we might be able to save the state of Activity Stream at shutdown, and load it again quickly during startup within the content process, and then defer the calculations necessary to produce a more recent state until after the parent process has become idle. We’re unsure yet what this will buy us in terms of start-up speed, but Jay is hacking together a prototype to see. I’m eager to find out!

Grab bag of Notable Performance Work

Thank you Jay Lim!

As I draw this update to a close, I want to give a shout-out to my intern and colleague Jay Lim, whose internship is ending in a few short days. Jay took to performance work like a duck in water, and his energy, ideas and work were greatly appreciated! Thank you so much, Jay!

  1. By “natural”, I mean a layout flush triggered by the refresh driver, and not by some JavaScript requesting size or position information on a dirty DOM 

  2. And when it comes to smoothness and responsiveness, jank on the compositor thread is deadly 

  3. it landed and bounced once due to a crash test failure, but Doug has just gotten a fix for it approved 

Firefox Performance Update #9

Hello, Internet! Here we are with yet another Firefox Performance Update for your consumption. Hold onto your hats – we’re going in!

But first a word from our sponsor: ScriptPreloader!

A lot of the Firefox front-end is written using JavaScript. With the possible exception of system add-ons that update outside of the normal release cycle, these scripts tend to be the same until you update.

About a year ago, Mozilla developer Kris Maglione had an idea: let’s try to optimize browser start time by noticing which scripts are being loaded during start-up, and then converting those scripts into a binary representation1 that we can cache on disk. That way, next time we start up, we can just grab the cached binaries off of the disk, skip the parsing step and start executing the JavaScript right away.

Long-time Mozillians might know that we already do some aggressive caching to improve start time for things like XUL, XBL, manifests and other things that are read at start-up. I think we actually were already caching JavaScript files too – but I don’t think we were storing them pre-parsed. And the old caching stuff was definitely not caching scripts that were loading in content processes (since content processes didn’t exist when the old caching stuff was designed).

At any rate, my understanding is that the ScriptPreloader pays attention to script loads between main process start and the point where the first browser window fires the “browser-delayed-startup-finished” observer notification (after the window paints and does post-painting script loading). At that point, the ScriptPreloader examines the list of scripts that the parent and content processes have loaded, and2 writes their pre-parsed bytecode representation to disk.

After that cache is written, the next time the main process or content processes start up, the cache is checked for the binary data. If it exists, this means that we can skip the parsing step. The ScriptPreloader goes one step further and starts to “decode”3 that binary format off of the main thread, even before those scripts are requested. Then, when the scripts are finally requested, they’re very much ready to execute right away.

When the ScriptPreloader landed, we saw some really nice wins in our start-up performance!

I’m now working on a series of patches in this bug that will widen the window of time where we note scripts that we can cache. This will hopefully improve the speed of privileged scripts that run up until the idle point of the first browser window.

And now for some Performance Project updates!

Early first blank paint (lead by Florian Quèze)

User Research has hired a contractor to perform a study to validate our hypothesis that the early first blank paint perceived performance optimization will make Firefox seem like it’s starting faster. More data to come out of that soon!

Faster content process start-up time (lead by Felipe Gomes)

The patches that Felipe wrote a few weeks back have landed and have had a positive impact! The proof is in the pudding – let’s look at some graphs:

The cpstartup impact. Those two clusters are test runs “before” and “after” Felipe’s patches landed, respectively.

The above graph shows a nice drop in the cpstartup Talos test. The cpstartup test measures the time it takes to boot up the content process and have it be ready to show you web pages.

This is a screen capture of a Base Content JS improvement in the AreWeSlimYet test. This graph measures the amount of memory that content processes consume via JavaScript not long after starting up.

In the graph above, we can see that the patches also helped reduce the memory that content processes use by default, by making more scripts only load when they’re needed.

It’s always nice to see our work have an impact in our graphs. Great work, Felipe! Keep it up!

LRU cache for tab layers (lead by Doug Thayer)

The patch to introduce the LRU cache landed last week, and was enabled for a few days so we could collect some data on its performance impact.

The good news is that it appears that this has had a significant and positive impact on tab switch times – tab switch times went down, and the number of Nightly instances reporting tab switch spinners went down by about 10%. Great work, Doug!

A number of bugs were filed against the original bug due to some glitchy edge-cases that we don’t handle well just yet.

We also detected a ~8% resident memory regression in our automated testing suites. This was expected (keeping layers around isn’t free!) and gave us a sense of how much memory we might consume were we to enable this by default.

The experiment is concluded for now, and we’re going to disable the cache for a bit while we think about ways to improve the implementation.

ClientStorageTextureSource for macOS (lead by Doug Thayer)

This project should allow us to be more efficient when uploading layers to the compositor on macOS. Doug has solved the crashing issues he was getting in automation(yay!), and is now attempting to figure out some Talos regressions on the MotionMark test suite. Deeper profiling is likely required to untangle what’s happening there.

Swapping DataURLs for Blobs in Activity Stream (lead by Jay Lim)

Jay’s patch to swap out DataURLs for Blobs for Activity Stream images has passed a first round of review from Mardak! He’s now waiting for a second review from k88hudson, and then hopefully this can land and give us a bit of a memory win. Having done some analysis, we expect this buy back quite a bit of memory that was being contained within those long DataURL strings.

Caching Activity Stream JS in the JS Bytecode Cache (lead by Jay Lim)

After examining the JavaScript Bytecode Cache that’s used for Web Content, Jay has determined that it’s really not the right mechanism for caching the Activity Steam scripts.

However, that ScriptPreloader that I was talking about earlier sounds like a much more reasonable candidate. Jay is now doing a deep dive on the ScriptPreloader to see whether or not the Activity Stream scripts are already being cached – and if not, why not.

Tab warming (lead by Mike Conley)

No news is good news here. Tab warming continues to ride and no new bugs have been filed. The work to reduce the number of paints when warming tabs has stalled a bit while I dealt with a rather strange cpstartup Talos regression. Ultimately, I think I can get rid of the second paint when warming by keeping background tabs display port suppressed4, and then only triggering the display port unsuppression after a tab switch. This will happily take advantage of a painting mechanism that Doug Thayer put in as part of the LRU cache experiment.

Firefox’s Most Wanted: Performance Wins (lead by YOU!)

Before we go into the grab-bag list of performance-related fixes – have you seen any patches landing that should positively impact Firefox’s performance? Let me know about it so I can include it in the list, and give appropriate shout-outs to all of the great work going on! That link again!

Grab-bag time

And now, without further ado, a list of performance work that took place in the tree:

(🌟 indicates a volunteer contributor)

Thanks, folks!

  1. XDR, I think? 

  2. My understanding breaks down here a little 

  3. I assume that’s a type of de-serialization 

  4. This is an optimization that we do that shrinks the painted area to just the region that’s visible to the browser. We normally paint a bit outside the viewable area so that it’s ready when a user starts scrolling 

Firefox Performance Update #8

Howdy folks! Another Firefox Performance Update coming at you. Buckle up.

But first a word from our sponsor: Talos!

Talos is a framework that we use to measure various aspects of Firefox performance as part of our continuous integration pipeline.

There are a number of Talos “suites”, where each suite contains some number of tests. These tests, in turn, report some set of numbers that are then stored and graphable via our graph viewer here.

Here’s a full list of the Talos tests, including their purpose, the sorts of measurements they take, and who’s currently a good person to ask about them if you have questions.

A lot of work has been done to reduce the amount of noise in our Talos tests, but they’re still quite sensitive and noisy. This is why it’s often necessary to do 5-10 retriggers of Talos test runs in order to do meaningful comparisons.

Sometimes Talos detects regressions that aren’t actually real regressions1, and that can be a pain. However, for the times where real regressions are caught, Talos usually lets us know much faster than Telemetry or user reports.

Did you know that you can get profiles from Try for Talos runs? This makes it much simpler to diagnose Talos regressions. Also, we now have Talos profiles being generated on our Nightly builds for added convenience!

And now for some Performance Project updates!

Early first blank paint (lead by Florian Quèze)

No new bugs have been filed against the feature yet from our beta population, and we are seeing an unsurprising drop in the time-to-first-paint probe on that channel. User Research is in the process of getting a (very!) quick study launched to verify our assumption that users will perceive the first blank paint as the browser having started more quickly.

Faster content process start-up time (lead by Felipe Gomes)

Felipe has some patches up for review to make our frame scripts as lazy as possible. To support that, he’s added some neat infrastructure using Proxy and Reflect to make it possible to create an object that can be registered as an event handler or observer, and only load the associated script when the events / observer notifications actually fire.

We’re excited to see how this work impacts our memory and content process start-up graphs!

LRU cache for tab layers (lead by Doug Thayer)

The patch to introduce the LRU cache landed and bounced a few times. There appears to be an invalidation bug with the approach that needs to be ironed out first. dthayer has a plan to address this (forcing re-paints when switching to a tab that’s already rendered in the background), and is just waiting for review.

ClientStorageTextureSource for macOS (lead by Doug Thayer)

Doug is working on finishing a project that should allow us to be more efficient when uploading things to the compositor on macOS (by handing memory over to the GPU rather than copying it). He’s currently dealing with strange crashes that he can only reproduce on Try. Somehow, Doug seems to always run into the weird bugs that only appear in automation, and the whole team is crossing our fingers for him on this one.

Swapping DataURLs for Blobs in Activity Stream (lead by Jay Lim)

Our new intern Jay Lim is diving right into performance work, and already has his first patch up. This patch makes it so that Activity Stream no longer uses DataURLs to serialize images down to the content process, and instead uses Blobs and Blob URLs. This should allow the underlying infrastructure to make better use of memory, as well as avoiding the cost of converting images to and from DataURLs.

Caching Activity Stream JS in the JS Bytecode Cache (lead by Jay Lim)

This project is still in the research phase. Jay is trying to determine if it’s possible to stash the parsed Activity Stream JS code in the JS bytecode cache that we normally use for webpages. We’re still evaluating how much this would save us on page load, and we’re also still evaluating the cost of modifying the underlying infrastructure to allow this. Stay tuned for updates.

AwesomeBar improvements (led by Gijs Kruitbosch)

Gijs has started this work by making it much cheaper to display long URLs in the AwesomeBar. This is particularly useful for DataURLs that might happen to be in your browsing history for some reason!

This is a long-pull effort, so expect this work to be spread out over a bunch of bugs.

Tab warming (lead by Mike Conley)

I’ve been focusing on determining why warming tabs seems to result in two consecutive paints. My findings are here, and I suspect that in the warming case, the second paint is avoidable. I suspect that this, coupled with dthayer’s work on ClientStorageTextureSource will greatly improve tab warming’s performance on macOS, and allow us to ship on that OS.

Firefox’s Most Wanted: Performance Wins (lead by YOU!)

Before we go into the grab-bag list of performance-related fixes – have you seen any patches landing that should positively impact Firefox’s performance? Let me know about it so I can include it in the list, and give appropriate shout-outs to all of the great work going on! That link again!

Grab-bag time

And now, without further ado, a list of performance work that took place in the tree:

(🌟 indicates a volunteer contributor)

Thanks, folks!

  1. Sometimes, for example, the test is just measuring the wrong thing. 

Firefox Performance Update #7

G’day folks, just another Firefox Performance Update coming down the pike1 for you, so strap in.

But first a word from our sponsor:!

This performance update is brought to you by The Quantum Dashboards at! The first step to changing something is to measure it over time, and once you have those measurements, it’s usually a good idea to find some kind of visual representation for that measurement so that you can track your progress.

Contrary to the domain name, measures much more than just the health of our graphics layer. The dashboards at show visualizations for a bunch of measurements that we care about – from crash rates, to platform feature state, to raw performance numbers, these dashboards help us make sure that we’re not back-sliding on things that truly matter to us and our users.

And now for some Performance Project updates!

Early first blank paint (lead by Florian Quèze)

Florian sent out an Intent to Ship for this perceived performance optimization for Firefox 61. The beta channel will transition to 61 in a bit over a week, and we’ll use that cycle to ensure that the feature should ship out to release.

Faster content process start-up time (lead by Felipe Gomes)

After some research and examination of how our content processes initialize themselves, the first few bugs have started to get filed to get fixed. This bug, for example, is for introducing infrastructure to make the privileged JavaScript loaded in the content processes more lazy. Another bug was filed to shine some light on the “dark matter” that exists at the start of many content processes in our profiler tools.

Get ContentPrefService init off of the main thread (lead by Doug Thayer)

After much heroic effort, this has landed! If you’re curious about the shutdown leak that was preventing this from landing earlier, here’s the patch that fixed it. Spoiler alert: it was the spellchecker, of all things.

This project is done, and will be removed from the updates from here forward.

Blocklist async-ification (lead by Gijs Kruitbosch)

As of a few days ago, all public blocklist API calls are asynchronous! This was a monumental effort from Gijs, and should result in faster start-up times for some of our users (especially ones with slower magnetic disks).

There are still some very minor internal mechanisms that can still cause the blocklist to be loaded synchronously, but hitting these should be super rare. In the meantime, now that the async-ification is complete, we have an eye towards migrating the back-end to indexedDB.

As the async-ification is wrapped, we’ll be removing this section from the updates from here forward.

LRU cache for tab layers (lead by Doug Thayer)

The patch to introduce the LRU cache have been written and are just waiting until the 62 cycle to begin on Nightly in order to land. No doubt there’ll be some interesting edge-cases to hammer out, but we’re very excited to see how this improves tab switching times for our users!

Tab warming (lead by Mike Conley)

After sending out an Intent to Ship, the prefs to allow Tab warming to ride to release on Windows and Linux were flipped. If all goes well on Beta, Windows and Linux Desktop users should see some nice tab switching performance improvements in Firefox 61!

While investigating the behaviour that’s preventing us from shipping tab warming on macOS, a new bug was filed to try to reduce the number of paints that are occurring during tab switches.

Firefox’s Most Wanted: Performance Wins (lead by YOU!)

Before we go into the grab-bag list of performance-related fixes – have you seen any patches landing that should positively impact Firefox’s performance? Let me know about it so I can include it in the list, and give appropriate shout-outs to all of the great work going on! That link again!

Grab-bag time

And now, without further ado, a list of performance work that took place in the tree:

(🌟 indicates a volunteer contributor)

Thanks, folks!

  1. I used to think it was pipe. It’s pike. 

Firefox Performance Update #6

Hi there folks, just another Firefox Performance update coming at you here.

These updates are going to shift format slightly. I’m going to start by highlighting the status of some of the projects the Firefox Performance Team (the front-end team working to make Firefox snappy AF), and then go into the grab-bag list of improvements that we’ve seen landing in the tree.

But first a word from our sponsor:!

This performance update is brought to you by! On Nightly versions of Firefox, a component called BackgroundHangReporter (or “BHR”) notices anytime the main-threads hang too long, and then collect a stack to send via Telemetry. We’ve been doing this for years, but we’ve never really had a great way of visualizing or making use of the data1. Enter by Doug Thayer! Initially a fork of perf.html, lets us see graphs of hangs on Nightly broken down by category2, and then also lets us explore the individual stacks that have come in using a perf.html-like interface! (You might need to be patient on that last link – it’s a lot of data to download).

Hot damn! Note the finer-grain categories showing up on April 1st.

Early first blank paint (lead by Florian Quèze)

This is a start-up perceived performance project where early in the executable life-cycle, long before we’ve figured out how to layout and paint the browser UI, we show a white “blank” area on screen that is overtaken with the UI once it’s ready. The idea here is to avoid having the user stare at nothing after clicking on the Firefox icon. We’ll also naturally be working to reduce the amount of time that the blank window appears for users, but our research shows users feel like the browser starts-up faster when we show something rather than nothing. Even if that nothing is… well, mostly nothing. Florian recently landed a Telemetry probe for this feature, made it so that we can avoid initting the GPU process for the blank window, and is in the midst of fixing an issue where the blank window appears for too long. We’re hoping to have this ready to ship enabled on some platforms (ideally Linux and Windows) in Firefox 61.

Faster content process start-up time (lead by Felipe Gomes)

Explorations are just beginning here. Felipe has been examining the scripts that are running for each tab on creation, and has a few ideas on how to both reduce their parsing overhead, as well as making them lazier to load. This project is mostly at the research stage. Expect concrete details on sub-projects and linked bugs soon!

Get ContentPrefService init off of the main thread (lead by Doug Thayer)

This is so, so close to being done. The patch is written and reviewed, but landing it is being stymied by a hard-to-reproduce locally but super-easy-to-reproduce-in-automation shutdown leak during test runs. Unfortunately, the last 10% sometimes takes 90% of the effort, and this looks like one of those cases.

Blocklist improvements (lead by Gijs Kruitbosch)

Gijs is continuing to make our blocklist asynchronous. Recently, he made the getAddonBlocklistEntry method of the API asynchronous, which is a big-deal for start-up, since it means we drop another place where the front-end has to wait for the blocklist to be ready! The getAddonBlocklistState method is next on the list.

As a fun exercise, you can follow the “true” value for the BLOCKLIST_SYNC_FILE_LOAD probe via this graph, and watch while Gijs buries it into the ground.

LRU cache for tab layers (lead by Doug Thayer)

Doug Thayer is following up on some research done a few years ago that suggests that we can make ~95% of our user’s tab switches feel instantaneous by implementing an LRU cache for the painted layers. This is a classic space-time trade-off, as the cache will necessarily consume memory in order to hold onto the layers. Research is currently underway here to see how we can continue to improve our tab switching performance without losing out on the memory advantage that we tend to have over other browsers.

Tab warming (lead by Mike Conley)

Tab warming has been enabled on Nightly for a few weeks, and besides one rather serious glitch that we’ve fixed, we’ve been pretty pleased with the result! There’s one issue on macOS that’s been mulled over, but at this point I’m starting to lean towards getting this shipped on at least Windows for the Firefox 61 release.

Firefox’s Most Wanted: Performance Wins (lead by YOU!)

Before we go into the grab-bag list of performance-related fixes – have you seen any patches landing that should positively impact Firefox’s performance? Let me know about it so I can include it in the list, and give appropriate shout-outs to all of the great work going on! That link again!

Grab-bag time

And now, without further ado, a list of performance work that took place in the tree:

(🌟 indicates a volunteer contributor)

Thanks to all of you! Keep it coming!

  1. Pro-tip: if you’re collecting data, consider figuring out how you want to visualize it first, and then make sure that visualization work actually happens. 

  2. since April 1st, these categories have gotten a lot finer-grain