Hey folks – another Performance Update coming at you! It’s been a few weeks since I posted one of these, mostly due to travel, holidays and the Mozilla SF All-Hands. However, we certainly haven’t been idle during that time. Much work has been done Performance-wise, and there’s a lot to tell. So strap in! But first…
This Performance Update is brought to you by: promiseDocumentFlushed
promiseDocumentFlushed takes a function and returns a Promise. The function it takes will run the next time a natural layout flush and paint has finished occurring. At this point, the DOM should not be “dirty”, and size and position queries should be very cheap to calculate. It is critically important for the callback to not modify the DOM. I’ve filed bugs to make modifying the DOM inside that callback enter some kind of failure state, but it hasn’t been resolved yet.
The return value of the callback is what promiseDocumentFlushed’s returned Promise resolves with. Once the Promise resolves, it is then safe to modify the DOM.
This mechanism means that if, for some reason, you need to gather information about the size or position of things in the DOM, you can do it without forcing a synchronous layout flush – however, a paint will occur before that information is given to you. So be on the look-out for flicker, since that’s the trade-off here.
And now, here’s a list of the projects that the team has been working on lately:
ClientStorage (In-Progress by Doug Thayer)
The ClientStorage project should allow Firefox to communicate with the GPU more efficiently on macOS, which should hopefully reduce jank on the compositor thread ((And when it comes to smoothness and responsiveness, jank on the compositor thread is deadly)). This is right on the verge of landing ((it landed and bounced once due to a crash test failure, but Doug has just gotten a fix for it approved)), and we’re very excited to see how this impacts our macOS users!
Init WindowsJumpLists off-main-thread (Completed by Doug Thayer)
The JumpList is a Windows-only feature – essentially an application-specific context menu that opens when you right-click on the application in the task bar. Adding entries to this context menu involves talking to Windows, and unfortunately, the way we were originally doing this involved writing to the disk on the main thread. Thankfully, the API is thread-safe, so Doug was able to move the operation onto a background thread. This is good, because arewesmoothyet was reporting the Windows JumpList code as one of the primary causes of main-thread hangs caused by our front-end code.
Reduce painting while scrolling panels on macOS (Completed by Doug Thayer)
Matt Woodrow noticed that the recently added All Tabs list was performing quite poorly when scrolling it on macOS. After turning on paint-flashing for our browser UI, he noticed that we were re-painting the entire menu every time it scrolled. After some investigation, Matt realized that this was because our Graphics code was skipping some optimizations due to the rounded corners of the panels on macOS. We briefly considered removing the rounded corners on macOS, but then Doug found a more general fix, and now we only re-paint the minimum necessary to scroll the menu, and it’s much smoother!
Make the RemotePageManager lazy (In-Progress by Felipe Gomes)
The RemotePageManager is the way that the parent process communicates with a whitelist of privileged about: pages running in the content process. The RemotePageManager hooks itself in pretty early in a content process’s lifetime, but it’s really only necessary if and when one of those whitelisted about: pages loads. Felipe is working on using some of our new lazy script machinery to load RemotePageManager at the very last moment.
Overhauling about:performance (In-Progress by Florian Quèze)
Florian is working on improving about:performance, with the hopes of making it more useful for browser engineers and users for diagnosing performance problems in Firefox. Here’s a screenshot of what he has so far:
Thanks to the work of Tarek Ziade, we now have a reliable mechanism for getting information on which tabs are consuming CPU cycles. For example, in the above screenshot, we can see that the coinhive tab that Firefox has open is consuming a bunch of CPU in some workers (mining cryptocurrency). Florian has also been clearing out some of the older code that was supporting about:performance, including the subprocess memory table. This table was useful for our browser engineers when developing and tuning the multi-process project, but we think we can replace it now with something more actionable and relevant to our users. In the meantime, since gathering the memory data causes jank on the main thread, he’s removed the table and the supporting infrastructure. The about:performance work hasn’t landed in the tree yet, but Florian is aiming to get it reviewed and landed (preffed off) soon.
Browser Adjustment Project (In-Progress by Gijs Kruitbosch)
This is a research project to find ways that Firefox can classify the hardware it’s running on, which should make it easier for the browser to make informed decisions on how to deal with things like CPU scheduling, thread and process priority, graphics and UI optimizations, and memory reclamation strategies. This project is still in its early days, but Gijs has already identified prior art and research that we can build upon, and is looking at lightweight ways we can assign grades to a user’s CPU, disk, and graphics hardware. Then the plan is to try hooking that up to the toolkit.cosmeticAnimations pref, to test disabling those animations on weaker hardware. He’s also exploring ways in which the user can override these measurements in the event that they want to bypass the defaults that we choose for each environment.
Avoiding spurious about:blank loads in the parent process (In-Progress by Gijs Kruitbosch)
When we open new browser windows, the initial browser tab inside them runs in the parent process and loads about:blank. Soon after, we do a process flip to load a page in the content process. However, that initial about:blank still has cost, and we think we can avoid it. There’s a test failure that Gijs is grappling with, but after much thorough detective work deep in the complex ball of code that supports our window opening infrastructure, he’s figured out a path forward. We expect this project to be wrapped up soon, which should hopefully make window opening cheaper and also produce less flicker.
Load Activity Stream scripts from ScriptPreloader (Completed by Jay Lim)
Jay has recently made it possible for Activity Stream to load its start-up scripts from the ScriptPreloader. From his local measurements on his MBP, this saves a sizeable chunk of time (around 20-30ms if I recall) on the time to load and render Activity Stream! This optimization is not available, however, unless the separate Activity Stream content process is enabled.
Enable the separate Activity Stream content process by default (In-Progress by Jay Lim)
This project not only ensures that Activity Stream content activity doesn’t impact other tabs (and vice versa), but also allows Firefox to take advantage of the ScriptPreloader to load Activity Stream faster. This does, however, mean an extra process flip when moving from about:home, about:newtab or about:welcome to a new page and back again. Because of this, Jay is having to modify some of our tests to accommodate that, as well as part of our Session Restore code to avoid unnecessary loading indicators when moving between processes.
Defer calculating Activity Stream state until idle (In-Progress by Jay Lim)
When Firefox starts up, one of the first things it prepares to do is show you the Activity Stream page, since that’s the default home and new tab page. Jay thinks we might be able to save the state of Activity Stream at shutdown, and load it again quickly during startup within the content process, and then defer the calculations necessary to produce a more recent state until after the parent process has become idle. We’re unsure yet what this will buy us in terms of start-up speed, but Jay is hacking together a prototype to see. I’m eager to find out!
Grab bag of Notable Performance Work
- Luca Greco landed all of the infrastructure to move the WebExtension storage.local backend from a file in the profile directory to indexedDB. This should particularly help the performance of the browser when WebExtensions write small changes to large storage structures, since historically this would cause the entire JSON object for the structure to be recomputed and flushed to disk. This should also help with memory consumption. The infrastructure is disabled by default, and once this bug is fixed, it will be switched on.
- Doug Thayer made our layerization logic smarter for pages that historically created many, many layers. This resulted in a nice win on our MotionMark score, and one user reported that it improved power usage as well.
- Mark Banner made it so that moving many bookmarks in bulk isn’t nearly as expensive to complete. This dropped the cost of dropping 300 bookmarks with async transactions from ~2s to ~400ms!
- Kartikaya Gupta made it so that users of the Gecko Profiler can use <pid>:<thread filter> in the thread filter input to gather samples of particular subprocesses. This will be very handy as we scale up the number of content processes!
- Hiroyuki Ikezoe made it so that we more often throttle computations for transform animations for out-of-view elements.
- Gijs Kruitbosch made it so that our DevTools don’t cause synchronous layout flushes when resizing the Inspector pane.
- Kris Maglione made it so that we more lazily load PluginContent.jsm, which should result in a content process start-up and memory win.
- Anny Gakhokize made it so that instead of sending 8 synchronous IPC messages to retrieve supported clipboard data types, we only send 1 with all of the necessary information.
- Marco Bonardo fixed a very important Places regression, where an entire table was being recalculated when deleting certain records.
- Dave Townsend fixed an issue where we were requesting the favicon for new pages twice instead of once. This resulted in a 2%-3% win on our internal session restoration bench on 64-bit Linux!
- PSPDFKit noted that Firefox is absolutely crushing it at WebAssembly performance.
- Andrew Swan enabled the delayed background page start-up optimization for WebExtensions by default, and it should ride out in the Firefox 63 release!
- Blake Kaplan got rid of the PBrowser::Msg_GetTabCount synchronous IPC message!
- The Graphics team has enabled WebRender by default for a subset of our Nightly population to test it. If you’re in that group, please file bugs if you see them! Check about:studies to see if you’re in the testing group.
Thank you Jay Lim!
As I draw this update to a close, I want to give a shout-out to my intern and colleague Jay Lim, whose internship is ending in a few short days. Jay took to performance work like a duck in water, and his energy, ideas and work were greatly appreciated! Thank you so much, Jay!