electrolysis | A Blog by Mike Conley

If you’ve opened the Browser Console lately while running Nightly with e10s enabled, you might have noticed a warning message – “unsafe CPOW usage” – showing up periodically.

I wanted to talk a little bit about what that means, and what’s being done about it. Brad Lassey already wrote a bit about this, but I wanted to expand upon it (especially since one of my goals this quarter is to get a handle on unsafe CPOW usage in core browser code).

I also wanted to talk about sluggishness that some of our brave Nightly testers with e10s enabled have been experiencing, and where that sluggishness is coming from, and what can be done about it.

What is a CPOW?

“CPOW” stands for “Cross-process Object Wrapper”¹, and is part of the glue that has allowed e10s to be enabled on Nightly without requiring a full re-write of the front-end code. It’s also part of the magic that’s allowing a good number of our most popular add-ons to continue working (albeit slowly).

In sum, a CPOW is a way for one process to synchronously access and manipulate something in another process, as if they were running in the same process. Anything that can be considered a JavaScript Object can be represented as a CPOW.

Let me give you an example.

In single-process Firefox, easy and synchronous access to the DOM of web content was more or less assumed. For example, in browser code, one could do this from the scope of a browser window:

let doc = gBrowser.selectedBrowser.contentDocument;
let contentBody = doc.body;

Here contentBody corresponds to the <body> element of the document in the currently selected browser. In single-process Firefox, querying for and manipulating web content like this is quick and easy.

In multi-process Firefox, where content is processed and rendered in a completely separate process, how does something like this work? This is where CPOWs come in².

With a CPOW, one can synchronously access and manipulate these items, just as if they were in the same process. We expose a CPOW for the content document in a remote browser with contentDocumentAsCPOW, so the above could be rewritten as:

let doc = gBrowser.selectedBrowser.contentDocumentAsCPOW;
let contentBody = doc.body;

I should point out that contentDocumentAsCPOW and contentWindowAsCPOW are exposed on <xul:browser> objects, and that we don’t make every accessor of a CPOW have the “AsCPOW” suffix. This is just our way of making sure that consumers of the contentWindow and contentDocument on the main process side know that they’re probably working with CPOWs³. contentBody.firstChild would also be a CPOW, since CPOWs can only beget more CPOWs.

So for the most part, with CPOWs, we can continue to query and manipulate the <body> of the document loaded in the current browser just like we used to. It’s like an invisible compatibility layer that hops us right over that process barrier.

Great, right?

Well, not really.

CPOWs are really a crutch to help add-ons and browser code exist in this multi-process world, but they’ve got some drawbacks. Most noticeably, there are performance drawbacks.

Why is my Nightly so sluggish with e10s enabled?

Have you been noticing sluggish performance on Nightly with e10s? Chances are this is caused by an add-on making use of CPOWs (either knowingly or unknowingly). Because CPOWs are used for synchronous reading and manipulation of objects in other processes, they send messages to other processes to do that work, and block the main process while they wait for a response. We call this “CPOW traffic”, and if you’re experiencing a sluggish Nightly, this is probably where the sluggishness if coming from.

Instead of using CPOWs, add-ons and browser code should be updated to use frame scripts sent over the message manager. Frame scripts cannot block the main process, and can be optimized to send only the bare minimum of information required to perform an action in content and return a result.

Add-ons built with the Add-on SDK should already be using “content scripts” to manipulate content, and therefore should inherit a bunch of fixes from the SDK as e10s gets closer to shipping. These add-ons should not require too many changes. Old-style add-ons, however, will need to be updated to use frame scripts unless they want to be super-sluggish and bog the browser down with CPOW traffic.

And what constitutes “unsafe CPOW usage”?

“unsafe” might be too strong a word. “unexpected” might be a better term. Brad Lassey laid this out in his blog post already, but I’ll quickly rehash it.

There are two main cases to consider when working with CPOWs:

The content process is already blocked sending up a synchronous message to the parent process
The content process is not blocked

The first case is what we consider “the good case”. The content process is in a known good state, and its primed to receive IPC traffic (since it’s otherwise just idling). The only bad part about this is the IPC traffic.

The second case is what we consider the bad case. This is when the parent is sending down CPOW messages to the child (by reading or manipulating objects in the content process) when the child process might be off processing other things. This case is far more likely than the first case to cause noticeable performance problems, as the main thread of the content process might be bogged down doing other things before it can handle the CPOW traffic – and the parent will be blocked waiting for the messages to be responded to!

There’s also a more speculative fear that the parent might send down CPOW traffic at a time when it’s “unsafe” to communicate with the content process. There are potentially times when it’s not safe to run JS code in the content process, but CPOWs traffic requires both processes to execute JS. This is a concern that was expressed to me by someone over IRC, and I don’t exactly understand what the implications are – but if somebody wants to comment and let me know, I’ll happily update this post.

So, anyhow, to sum – unsafe CPOW usage is when CPOW traffic is initiated on the parent process side while the content process is not blocked. When this unsafe CPOW usage occurs, we log an “unsafe CPOW usage” message to the Browser Console, along with the script and line number where the CPOW traffic was initiated from.

Measuring

We need to measure and understand CPOW usage in Firefox, as well as add-ons running in Firefox, and over time we need to reduce this CPOW usage. The priority should be on reducing the “unsafe CPOW usage” CPOWs in core browser code.

If there’s anything that working on the Australis project taught me, it’s that in order to change something, you need to know how to measure it first. That way, you can make sure your efforts are having an effect.

We now have a way of measuring the amount of time that Firefox code and add-ons spend processing CPOW messages. You can look at it yourself – just go to about:compartments.

It’s not the prettiest interface, but it’s a start. The second column is the time processing CPOW traffic, and the higher the number, the longer it’s been doing it. Naturally, we’ll be working to bring those numbers down over time.

A possibly quick-fix for a slow Nightly with e10s

As I mentioned, we also list add-ons in about:compartments, so if you’re experiencing a slow Nightly, check out about:compartments and see if there’s an add-on with a high number in the second column. Then, try disabling that add-on to see if your performance problem is reduced.

If so, great! Please file a bug on Bugzilla in this component for the add-on, mention the name of the add-on⁴, describe the performance problem, and mark it blocking e10s-addons if you can.

We’re hoping to automate this process by exposing some UI that informs the user when an add-on is causing too much CPOW traffic. This will be landing in Nightly near you very soon.

PKE Meter, a CPOW Geiger Counter

Logging “unsafe CPOW usage” is all fine and dandy if you’re constantly looking at the Browser Console… but who is constantly looking at the Browser Console? Certainly not me.

Instead, I whipped up a quick and dirty add-on that plays a click, like a Geiger Counter, anytime “unsafe CPOW usage” is put into the Browser Console. This has already highlighted some places where we can reduce unsafe CPOW usage in core Firefox code – particularly:

The Page Info dialog. This is probably the worse offender I’ve found so far – humongous unsafe CPOW traffic just by opening the dialog, and it’s really sluggish.
Closing tabs. SessionStore synchronously communicates with the content process in order to read the tab state before the tab is closed.
Back / forward gestures, at least on my MacBook
Typing into an editable HTML element after the Findbar has been opened.

If you’re interested in helping me find more, install this add-on⁵, and listen for clicks. At this point, I’m only interested in unsafe CPOW usage caused by core Firefox code, so you might want to disable any other add-ons that might try to synchronously communicate with content.

If you find an “unsafe CPOW usage” that’s not already blocking this bug, please file a new one! And cc me on it! I’m mconley at mozilla dot com.

I pronounce CPOW as “kah-POW”, although I’ve also heard people use “SEE-pow”. To each his or her own. ↩
For further reading, Bill McCloskey discusses CPOWs in greater detail in this blog post. There’s also this handy documentation. ↩
I say probably, because in the single-process case, they’re not working with CPOWs – they’re accessing the objects directly as they used to. ↩
And say where to get it from, especially if it’s not on AMO. ↩
Source code is here ↩

Maybe I’ll wear an afro wig while I stream ↩

Specifically, I’ll be working on Electrolysis bugs, since that’s what my focus is on these days. ↩

I’ve actually piloted this for the past few weeks, streaming on YouTube Live. Here’s a playlist of the pilot episodes. ↩

Here’s how I’m currently debugging Electrolysis stuff on OS X using gdb. It involves multiple terminal windows. I live with that.

# In Terminal Window 1, I execute my Firefox build with MOZ_DEBUG_CHILD_PROCESS=1.
# That environment variable makes it so that the parent process spits out the child
# process ID as soon as it forks out. I also use my e10s profile so as to not muck up
# my default profile.

MOZ_DEBUG_CHILD_PROCESS=1 ./mach run -P e10s

# So, now my Firefox is spawned up and ready to go. I have
# browser.tabs.remote.autostart set to "true" in my about:config, which means I'm
# using out-of-process tabs by default. That means that right away, I see the
# child process ID dumped into the console. Maybe you get the same thing if
# browser.tabs.remote.autostart is false. I haven't checked.

CHILDCHILDCHILDCHILD
  debug me @ 45326

# ^-- so, this is what comes out in Terminal Window 1.

So, the next step is to open another terminal window. This one will connect to the parent process.

# Maybe there are smarter ways to find the firefox process ID, but this is what I
# use in my new Terminal Window 2.
ps aux | grep firefox

# And this is what I get back:

mikeconley     45391  17.2  5.3  3985032 883932   ??  S     2:39pm   1:58.71 /Applications/FirefoxAurora.app/Contents/MacOS/firefox
mikeconley     45322   0.0  0.4  3135172  69748 s000  S+    2:36pm   0:06.48 /Users/mikeconley/Projects/mozilla-central/obj-x86_64-apple-darwin12.5.0/dist/Nightly.app/Contents/MacOS/firefox -no-remote -foreground -P e10s
mikeconley     45430   0.0  0.0  2432768    612 s002  R+    2:44pm   0:00.00 grep firefox
mikeconley     44878   0.0  0.0        0      0 s000  Z    11:46am   0:00.00 (firefox)

# That second one is what I want to attach to. I can tell, because the executable
# path lies within my local build's objdir. The first row is my main Firefox I just
# use for work browsing. I definitely don't want to attach to that. The third line
# is just me looking for the process with grep. Not sure what that last one is.

# I use sudo to attach to the parent because otherwise, OS X complains about permissions
# for process attachment. I attach to the parent like this:

sudo gdb firefox 45322

# And now I have a gdb for the parent process. Easy peasy.

And finally, to debug the child, I open yet another terminal window.

# That process ID that I got from Terminal Window 1 comes into play now.

sudo gdb firefox 45326

# Boom - attached to child process now.

Setting breakpoints for things like TabChild::foo or TabParent::bar can be done like this:

# In Terminal Window 3, attached to the child:

b mozilla::dom::TabChild::foo

# In Terminal Window 2, attached to the parent:

b mozilla::dom::TabParent::bar

And now we’re cookin’.

A Blog by Mike Conley

The personal blog of a Toronto based software mechanic, musician, sound designer, and theatre enthusiast.

Tag Archives: electrolysis

On unsafe CPOW usage, and “why is my Nightly so sluggish with e10s enabled?”