In this fifth episode, I didn’t work on any bugs. Instead, I did a bunch of code review. Ever wanted to know what a Firefox engineer does to review a patch? If so, then this episode is for you!
If you’ve opened the Browser Console lately while running Nightly with e10s enabled, you might have noticed a warning message – “unsafe CPOW usage” – showing up periodically.
I wanted to talk a little bit about what that means, and what’s being done about it. Brad Lassey already wrote a bit about this, but I wanted to expand upon it (especially since one of my goals this quarter is to get a handle on unsafe CPOW usage in core browser code).
I also wanted to talk about sluggishness that some of our brave Nightly testers with e10s enabled have been experiencing, and where that sluggishness is coming from, and what can be done about it.
What is a CPOW?
“CPOW” stands for “Cross-process Object Wrapper”1, and is part of the glue that has allowed e10s to be enabled on Nightly without requiring a full re-write of the front-end code. It’s also part of the magic that’s allowing a good number of our most popular add-ons to continue working (albeit slowly).
Let me give you an example.
In single-process Firefox, easy and synchronous access to the DOM of web content was more or less assumed. For example, in browser code, one could do this from the scope of a browser window:
let doc = gBrowser.selectedBrowser.contentDocument; let contentBody = doc.body;
Here contentBody corresponds to the <body> element of the document in the currently selected browser. In single-process Firefox, querying for and manipulating web content like this is quick and easy.
In multi-process Firefox, where content is processed and rendered in a completely separate process, how does something like this work? This is where CPOWs come in2.
With a CPOW, one can synchronously access and manipulate these items, just as if they were in the same process. We expose a CPOW for the content document in a remote browser with contentDocumentAsCPOW, so the above could be rewritten as:
let doc = gBrowser.selectedBrowser.contentDocumentAsCPOW; let contentBody = doc.body;
I should point out that contentDocumentAsCPOW and contentWindowAsCPOW are exposed on <xul:browser> objects, and that we don’t make every accessor of a CPOW have the “AsCPOW” suffix. This is just our way of making sure that consumers of the contentWindow and contentDocument on the main process side know that they’re probably working with CPOWs3. contentBody.firstChild would also be a CPOW, since CPOWs can only beget more CPOWs.
So for the most part, with CPOWs, we can continue to query and manipulate the <body> of the document loaded in the current browser just like we used to. It’s like an invisible compatibility layer that hops us right over that process barrier.
Well, not really.
CPOWs are really a crutch to help add-ons and browser code exist in this multi-process world, but they’ve got some drawbacks. Most noticeably, there are performance drawbacks.
Why is my Nightly so sluggish with e10s enabled?
Have you been noticing sluggish performance on Nightly with e10s? Chances are this is caused by an add-on making use of CPOWs (either knowingly or unknowingly). Because CPOWs are used for synchronous reading and manipulation of objects in other processes, they send messages to other processes to do that work, and block the main process while they wait for a response. We call this “CPOW traffic”, and if you’re experiencing a sluggish Nightly, this is probably where the sluggishness if coming from.
Instead of using CPOWs, add-ons and browser code should be updated to use frame scripts sent over the message manager. Frame scripts cannot block the main process, and can be optimized to send only the bare minimum of information required to perform an action in content and return a result.
Add-ons built with the Add-on SDK should already be using “content scripts” to manipulate content, and therefore should inherit a bunch of fixes from the SDK as e10s gets closer to shipping. These add-ons should not require too many changes. Old-style add-ons, however, will need to be updated to use frame scripts unless they want to be super-sluggish and bog the browser down with CPOW traffic.
And what constitutes “unsafe CPOW usage”?
“unsafe” might be too strong a word. “unexpected” might be a better term. Brad Lassey laid this out in his blog post already, but I’ll quickly rehash it.
There are two main cases to consider when working with CPOWs:
- The content process is already blocked sending up a synchronous message to the parent process
- The content process is not blocked
The first case is what we consider “the good case”. The content process is in a known good state, and its primed to receive IPC traffic (since it’s otherwise just idling). The only bad part about this is the IPC traffic.
The second case is what we consider the bad case. This is when the parent is sending down CPOW messages to the child (by reading or manipulating objects in the content process) when the child process might be off processing other things. This case is far more likely than the first case to cause noticeable performance problems, as the main thread of the content process might be bogged down doing other things before it can handle the CPOW traffic – and the parent will be blocked waiting for the messages to be responded to!
There’s also a more speculative fear that the parent might send down CPOW traffic at a time when it’s “unsafe” to communicate with the content process. There are potentially times when it’s not safe to run JS code in the content process, but CPOWs traffic requires both processes to execute JS. This is a concern that was expressed to me by someone over IRC, and I don’t exactly understand what the implications are – but if somebody wants to comment and let me know, I’ll happily update this post.
So, anyhow, to sum – unsafe CPOW usage is when CPOW traffic is initiated on the parent process side while the content process is not blocked. When this unsafe CPOW usage occurs, we log an “unsafe CPOW usage” message to the Browser Console, along with the script and line number where the CPOW traffic was initiated from.
We need to measure and understand CPOW usage in Firefox, as well as add-ons running in Firefox, and over time we need to reduce this CPOW usage. The priority should be on reducing the “unsafe CPOW usage” CPOWs in core browser code.
If there’s anything that working on the Australis project taught me, it’s that in order to change something, you need to know how to measure it first. That way, you can make sure your efforts are having an effect.
We now have a way of measuring the amount of time that Firefox code and add-ons spend processing CPOW messages. You can look at it yourself – just go to about:compartments.
It’s not the prettiest interface, but it’s a start. The second column is the time processing CPOW traffic, and the higher the number, the longer it’s been doing it. Naturally, we’ll be working to bring those numbers down over time.
A possibly quick-fix for a slow Nightly with e10s
As I mentioned, we also list add-ons in about:compartments, so if you’re experiencing a slow Nightly, check out about:compartments and see if there’s an add-on with a high number in the second column. Then, try disabling that add-on to see if your performance problem is reduced.
If so, great! Please file a bug on Bugzilla in this component for the add-on, mention the name of the add-on4, describe the performance problem, and mark it blocking e10s-addons if you can.
We’re hoping to automate this process by exposing some UI that informs the user when an add-on is causing too much CPOW traffic. This will be landing in Nightly near you very soon.
PKE Meter, a CPOW Geiger Counter
Logging “unsafe CPOW usage” is all fine and dandy if you’re constantly looking at the Browser Console… but who is constantly looking at the Browser Console? Certainly not me.
Instead, I whipped up a quick and dirty add-on that plays a click, like a Geiger Counter, anytime “unsafe CPOW usage” is put into the Browser Console. This has already highlighted some places where we can reduce unsafe CPOW usage in core Firefox code – particularly:
- The Page Info dialog. This is probably the worse offender I’ve found so far – humongous unsafe CPOW traffic just by opening the dialog, and it’s really sluggish.
- Closing tabs. SessionStore synchronously communicates with the content process in order to read the tab state before the tab is closed.
- Back / forward gestures, at least on my MacBook
- Typing into an editable HTML element after the Findbar has been opened.
If you’re interested in helping me find more, install this add-on5, and listen for clicks. At this point, I’m only interested in unsafe CPOW usage caused by core Firefox code, so you might want to disable any other add-ons that might try to synchronously communicate with content.
If you find an “unsafe CPOW usage” that’s not already blocking this bug, please file a new one! And cc me on it! I’m mconley at mozilla dot com.
I pronounce CPOW as “kah-POW”, although I’ve also heard people use “SEE-pow”. To each his or her own. ↩
I say probably, because in the single-process case, they’re not working with CPOWs – they’re accessing the objects directly as they used to. ↩
And say where to get it from, especially if it’s not on AMO. ↩