Hello, Internet! Here we are with yet another Firefox Performance Update for your consumption. Hold onto your hats – we’re going in!
But first a word from our sponsor: ScriptPreloader!
A lot of the Firefox front-end is written using JavaScript. With the possible exception of system add-ons that update outside of the normal release cycle, these scripts tend to be the same until you update.
About a year ago, Mozilla developer Kris Maglione had an idea: let’s try to optimize browser start time by noticing which scripts are being loaded during start-up, and then converting those scripts into a binary representation1 that we can cache on disk. That way, next time we start up, we can just grab the cached binaries off of the disk, skip the parsing step and start executing the JavaScript right away.
Long-time Mozillians might know that we already do some aggressive caching to improve start time for things like XUL, XBL, manifests and other things that are read at start-up. I think we actually were already caching JavaScript files too – but I don’t think we were storing them pre-parsed. And the old caching stuff was definitely not caching scripts that were loading in content processes (since content processes didn’t exist when the old caching stuff was designed).
At any rate, my understanding is that the ScriptPreloader pays attention to script loads between main process start and the point where the first browser window fires the “browser-delayed-startup-finished” observer notification (after the window paints and does post-painting script loading). At that point, the ScriptPreloader examines the list of scripts that the parent and content processes have loaded, and2 writes their pre-parsed bytecode representation to disk.
After that cache is written, the next time the main process or content processes start up, the cache is checked for the binary data. If it exists, this means that we can skip the parsing step. The ScriptPreloader goes one step further and starts to “decode”3 that binary format off of the main thread, even before those scripts are requested. Then, when the scripts are finally requested, they’re very much ready to execute right away.
When the ScriptPreloader landed, we saw some really nice wins in our start-up performance!
I’m now working on a series of patches in this bug that will widen the window of time where we note scripts that we can cache. This will hopefully improve the speed of privileged scripts that run up until the idle point of the first browser window.
And now for some Performance Project updates!
Early first blank paint (lead by Florian Quèze)
User Research has hired a contractor to perform a study to validate our hypothesis that the early first blank paint perceived performance optimization will make Firefox seem like it’s starting faster. More data to come out of that soon!
Faster content process start-up time (lead by Felipe Gomes)
The patches that Felipe wrote a few weeks back have landed and have had a positive impact! The proof is in the pudding – let’s look at some graphs:
The above graph shows a nice drop in the cpstartup Talos test. The cpstartup test measures the time it takes to boot up the content process and have it be ready to show you web pages.
In the graph above, we can see that the patches also helped reduce the memory that content processes use by default, by making more scripts only load when they’re needed.
It’s always nice to see our work have an impact in our graphs. Great work, Felipe! Keep it up!
LRU cache for tab layers (lead by Doug Thayer)
The patch to introduce the LRU cache landed last week, and was enabled for a few days so we could collect some data on its performance impact.
The good news is that it appears that this has had a significant and positive impact on tab switch times – tab switch times went down, and the number of Nightly instances reporting tab switch spinners went down by about 10%. Great work, Doug!
A number of bugs were filed against the original bug due to some glitchy edge-cases that we don’t handle well just yet.
We also detected a ~8% resident memory regression in our automated testing suites. This was expected (keeping layers around isn’t free!) and gave us a sense of how much memory we might consume were we to enable this by default.
The experiment is concluded for now, and we’re going to disable the cache for a bit while we think about ways to improve the implementation.
ClientStorageTextureSource for macOS (lead by Doug Thayer)
This project should allow us to be more efficient when uploading layers to the compositor on macOS. Doug has solved the crashing issues he was getting in automation(yay!), and is now attempting to figure out some Talos regressions on the MotionMark test suite. Deeper profiling is likely required to untangle what’s happening there.
Swapping DataURLs for Blobs in Activity Stream (lead by Jay Lim)
Jay’s patch to swap out DataURLs for Blobs for Activity Stream images has passed a first round of review from Mardak! He’s now waiting for a second review from k88hudson, and then hopefully this can land and give us a bit of a memory win. Having done some analysis, we expect this buy back quite a bit of memory that was being contained within those long DataURL strings.
Caching Activity Stream JS in the JS Bytecode Cache (lead by Jay Lim)
After examining the JavaScript Bytecode Cache that’s used for Web Content, Jay has determined that it’s really not the right mechanism for caching the Activity Steam scripts.
However, that ScriptPreloader that I was talking about earlier sounds like a much more reasonable candidate. Jay is now doing a deep dive on the ScriptPreloader to see whether or not the Activity Stream scripts are already being cached – and if not, why not.
Tab warming (lead by Mike Conley)
No news is good news here. Tab warming continues to ride and no new bugs have been filed. The work to reduce the number of paints when warming tabs has stalled a bit while I dealt with a rather strange cpstartup Talos regression. Ultimately, I think I can get rid of the second paint when warming by keeping background tabs display port suppressed4, and then only triggering the display port unsuppression after a tab switch. This will happily take advantage of a painting mechanism that Doug Thayer put in as part of the LRU cache experiment.
Firefox’s Most Wanted: Performance Wins (lead by YOU!)
Before we go into the grab-bag list of performance-related fixes – have you seen any patches landing that should positively impact Firefox’s performance? Let me know about it so I can include it in the list, and give appropriate shout-outs to all of the great work going on! That link again!
Grab-bag time
And now, without further ado, a list of performance work that took place in the tree:
(🌟 indicates a volunteer contributor)
-
Kris Maglione added helpers for generating QueryInterface functions on JS objects in native code to cut down on Native Code -> JS border crossings. Specifically, this should speed up situations where native code is calling QueryInterface on JS-implemented XPCOM components. This also means hand-rolling QueryInterface is no longer necessary, and is actively discouraged.
-
Jon Coppeard fixed a particularly bad Cycle Collector performance regression that was occurring when certain add-ons were installed. This fix was deemed important enough to ride along to the 60.0.1 builds (both release channel and ESR).
-
Gabriel Luong made it so that resizing the Inspector pane with the DevTools Toolbox open doesn’t cause layout flushes for every resize event. Instead, it throttles them via an idleCallback. This bug tracks the work to remove the layout flush entirely.
-
Ryan Hunt made it so that paint threads (see Off Main Thread Painting) no longer block Display List and Frame Layer building. This means we can get more done before offloading instructions to the paint threads, and this gives paint threads more time to finish up their work, increasing the probability that they’ll be free by the time the transaction to the paint thread needs to take place.
-
Felipe Gomes removed a bunch of unnecessary code for the old about:home that was still registering event listeners and consuming memory despite never being used anymore. This also shrunk our installer size down a bit, since it got rid of around 200kB worth of images and ~1600 lines of JS!
-
Marco Bonardo got rid of a sync layout flush that would occur when starting the browser or opening new windows with the Bookmarks Toolbar enabled.
- Xidorn Quan identified a regression in Firefox 61+ where DOMSubtreeModified events were being nested and could result in infinite recursion on some sites, making the site freeze. The regressing changeset has been backed out in Firefox 61, and in Firefox 62+, DOMSubtreeModified and DOMAttrModified events will no longer fire when style attributes change.
Thanks, folks!
My understanding breaks down here a little ↩
I assume that’s a type of de-serialization ↩
This is an optimization that we do that shrinks the painted area to just the region that’s visible to the browser. We normally paint a bit outside the viewable area so that it’s ready when a user starts scrolling ↩