Author Archives: Mike Conley

Electrolysis Code Spelunking: How links open new windows in Firefox

Hey. I’ve started hacking on Electrolysis bugs. I’m normally a front-end engineer working on Firefox desktop, but I’ve been temporarily loaned out to help get Electrolysis ready to be enabled by default on Nightly.

I’m working on bug 989501. Basically, when you click on a link that targets “_blank” or uses window.open, we open a new tab instead. That’s no good – assuming the user’s profile is set to allow it, we should open the link in a new window.

In order to fix this, I need a clearer picture on what happens in the Firefox platform when we click on one of these links.

This isn’t really a tutorial – I’m not going to go out of my way to explain much here. Think of this more as a public posting of my notes during my exploration.

So, here goes.

(Note that the code in this post was current as of revision 400a31da59a9 of mozilla-central, so if you’re reading this in the future, it’s possible that some stuff has greatly changed).

I know for a fact that once the link is clicked, we eventually call mozilla::dom::TabChild::ProvideWindow. I know this because of conversations I’ve had with smaug, billm and jdm in and out of Bugzilla, IRC, and meatspace.

Because I know this, I can hook up gdb to see how I get to that call. I have some notes here on how to hook up gdb to the content process of an e10s window.

Once that’s hooked up, I set a breakpoint on mozilla::dom::TabChild::ProvideWindow, and click on a link somewhere with target=”_blank”.

I hit my breakpoint, and I get a backtrace. Ready for it? Here we go:

#0  mozilla::dom::TabChild::ProvideWindow (this=0x109afb400, aParent=0x10b098820, aChromeFlags=4094, aCalledFromJS=false, aPositionSpecified=false, aSizeSpecified=false, aURI=0xffe, aName=@0x0, aFeatures=@0x0, aWindowIsNew=0x10b098820, aReturn=0x7fff5fbfb648) at TabChild.cpp:1201
#1  0x00000001018682e4 in nsWindowWatcher::OpenWindowInternal (this=0x10b05b540, aParent=0x10b098820, aUrl=<value temporarily unavailable, due to optimizations>, aName=<value temporarily unavailable, due to optimizations>, aFeatures=<value temporarily unavailable, due to optimizations>, aCalledFromJS=false, aDialog=<value temporarily unavailable, due to optimizations>, aNavigate=<value temporarily unavailable, due to optimizations>, _retval=<value temporarily unavailable, due to optimizations>) at nsWindowWatcher.cpp:601
#2  0x0000000101869544 in non-virtual thunk to nsWindowWatcher::OpenWindow2(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsISupports*, nsIDOMWindow**) () at nsWindowWatcher.cpp:417
#3  0x0000000100e5dc63 in nsGlobalWindow::OpenInternal (this=0x10b098800, aUrl=@0x7fff5fbfbf90, aName=@0x7fff5fbfc038, aOptions=@0x103d77320, aDialog=false, aContentModal=false, aCalleePrincipal=<value temporarily unavailable, due to optimizations>, aJSCallerContext=<value temporarily unavailable, due to optimizations>, aReturn=<value temporarily unavailable, due to optimizations>) at /Users/mikeconley/Projects/mozilla-central/dom/base/nsGlobalWindow.cpp:11498
#4  0x0000000100e5e3a4 in non-virtual thunk to nsGlobalWindow::OpenNoNavigate(nsAString_internal const&, nsAString_internal const&, nsAString_internal const&, nsIDOMWindow**) () at /Users/mikeconley/Projects/mozilla-central/dom/base/nsGlobalWindow.cpp:7463
#5  0x000000010184d99d in nsDocShell::InternalLoad (this=<value temporarily unavailable, due to optimizations>, aURI=0x113eed200, aReferrer=0x1134c0fe0, aOwner=0x114a69070, aFlags=0, aWindowTarget=0x10b098820, aLoadType=<value temporarily unavailable, due to optimizations>, aSHEntry=<value temporarily unavailable, due to optimizations>, aSourceDocShell=<value temporarily unavailable, due to optimizations>, aDocShell=<value temporarily unavailable, due to optimizations>, aRequest=<value temporarily unavailable, due to optimizations>) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:9079
#6  0x0000000101855758 in nsDocShell::OnLinkClickSync (this=0x10b075000, aContent=0x112865eb0, aURI=0x113eed3c0, aTargetSpec=<value temporarily unavailable, due to optimizations>, aFileName=@0x106f27f10, aPostDataStream=0x0, aDocShell=<value temporarily unavailable, due to optimizations>, aRequest=<value temporarily unavailable, due to optimizations>) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:12699
#7  0x0000000101857f85 in mozilla::Maybe<mozilla::AutoCxPusher>::~Maybe () at /Users/mikeconley/Projects/mozilla-central/obj-x86_64-apple-darwin12.5.0/dist/include/nsCxPusher.h:12499
#8  0x0000000101857f85 in nsCxPusher::~nsCxPusher () at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:41
#9  0x0000000101857f85 in nsCxPusher::~nsCxPusher () at /Users/mikeconley/Projects/mozilla-central/obj-x86_64-apple-darwin12.5.0/dist/include/nsCxPusher.h:66
#10 0x0000000101857f85 in OnLinkClickEvent::Run (this=<value temporarily unavailable, due to optimizations>) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:12502
#11 0x0000000100084f60 in nsThread::ProcessNextEvent (this=0x106f245e0, mayWait=false, result=0x7fff5fbfc947) at nsThread.cpp:715
#12 0x0000000100023241 in NS_ProcessPendingEvents (thread=<value temporarily unavailable, due to optimizations>, timeout=20) at nsThreadUtils.cpp:210
#13 0x0000000100d41c47 in nsBaseAppShell::NativeEventCallback (this=0x1096e8660) at nsBaseAppShell.cpp:98
#14 0x0000000100cfdba1 in nsAppShell::ProcessGeckoEvents (aInfo=0x1096e8660) at nsAppShell.mm:388
#15 0x00007fff86adeb31 in __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ ()
#16 0x00007fff86ade455 in __CFRunLoopDoSources0 ()
#17 0x00007fff86b017f5 in __CFRunLoopRun ()
#18 0x00007fff86b010e2 in CFRunLoopRunSpecific ()
#19 0x00007fff8ad65eb4 in RunCurrentEventLoopInMode ()
#20 0x00007fff8ad65c52 in ReceiveNextEventCommon ()
#21 0x00007fff8ad65ae3 in BlockUntilNextEventMatchingListInMode ()
#22 0x00007fff8cce1533 in _DPSNextEvent ()
#23 0x00007fff8cce0df2 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#24 0x0000000100cfd266 in -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] (self=0x106f801a0, _cmd=<value temporarily unavailable, due to optimizations>, mask=18446744073709551615, expiration=0x422d63c37f00000d, mode=0x7fff7205e1c0, flag=1 '\001') at nsAppShell.mm:165
#25 0x00007fff8ccd81a3 in -[NSApplication run] ()
#26 0x0000000100cfe32b in nsAppShell::Run (this=<value temporarily unavailable, due to optimizations>) at nsAppShell.mm:746
#27 0x000000010199b3dc in XRE_RunAppShell () at /Users/mikeconley/Projects/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:679
#28 0x00000001002a0dae in MessageLoop::AutoRunState::~AutoRunState () at message_loop.cc:229
#29 0x00000001002a0dae in MessageLoop::AutoRunState::~AutoRunState () at /Users/mikeconley/Projects/mozilla-central/ipc/chromium/src/base/message_loop.h:197
#30 0x00000001002a0dae in MessageLoop::Run (this=0x0) at message_loop.cc:503
#31 0x000000010199b0cd in XRE_InitChildProcess (aArgc=<value temporarily unavailable, due to optimizations>, aArgv=<value temporarily unavailable, due to optimizations>, aProcess=<value temporarily unavailable, due to optimizations>) at /Users/mikeconley/Projects/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:516
#32 0x0000000100000f1d in main (argc=<value temporarily unavailable, due to optimizations>, argv=0x7fff5fbff4d8) at /Users/mikeconley/Projects/mozilla-central/ipc/app/MozillaRuntimeMain.cpp:149

Oh my. Well, the good news is, we can chop off a good chunk of the lower half because that’s all message / event loop stuff. That’s going to be in every single backtrace ever, pretty much, so I can just ignore it. Here’s the more important stuff:

#0  mozilla::dom::TabChild::ProvideWindow (this=0x109afb400, aParent=0x10b098820, aChromeFlags=4094, aCalledFromJS=false, aPositionSpecified=false, aSizeSpecified=false, aURI=0xffe, aName=@0x0, aFeatures=@0x0, aWindowIsNew=0x10b098820, aReturn=0x7fff5fbfb648) at TabChild.cpp:1201
#1  0x00000001018682e4 in nsWindowWatcher::OpenWindowInternal (this=0x10b05b540, aParent=0x10b098820, aUrl=<value temporarily unavailable, due to optimizations>, aName=<value temporarily unavailable, due to optimizations>, aFeatures=<value temporarily unavailable, due to optimizations>, aCalledFromJS=false, aDialog=<value temporarily unavailable, due to optimizations>, aNavigate=<value temporarily unavailable, due to optimizations>, _retval=<value temporarily unavailable, due to optimizations>) at nsWindowWatcher.cpp:601
#2  0x0000000101869544 in non-virtual thunk to nsWindowWatcher::OpenWindow2(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsISupports*, nsIDOMWindow**) () at nsWindowWatcher.cpp:417
#3  0x0000000100e5dc63 in nsGlobalWindow::OpenInternal (this=0x10b098800, aUrl=@0x7fff5fbfbf90, aName=@0x7fff5fbfc038, aOptions=@0x103d77320, aDialog=false, aContentModal=false, aCalleePrincipal=<value temporarily unavailable, due to optimizations>, aJSCallerContext=<value temporarily unavailable, due to optimizations>, aReturn=<value temporarily unavailable, due to optimizations>) at /Users/mikeconley/Projects/mozilla-central/dom/base/nsGlobalWindow.cpp:11498
#4  0x0000000100e5e3a4 in non-virtual thunk to nsGlobalWindow::OpenNoNavigate(nsAString_internal const&, nsAString_internal const&, nsAString_internal const&, nsIDOMWindow**) () at /Users/mikeconley/Projects/mozilla-central/dom/base/nsGlobalWindow.cpp:7463
#5  0x000000010184d99d in nsDocShell::InternalLoad (this=<value temporarily unavailable, due to optimizations>, aURI=0x113eed200, aReferrer=0x1134c0fe0, aOwner=0x114a69070, aFlags=0, aWindowTarget=0x10b098820, aLoadType=<value temporarily unavailable, due to optimizations>, aSHEntry=<value temporarily unavailable, due to optimizations>, aSourceDocShell=<value temporarily unavailable, due to optimizations>, aDocShell=<value temporarily unavailable, due to optimizations>, aRequest=<value temporarily unavailable, due to optimizations>) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:9079
#6  0x0000000101855758 in nsDocShell::OnLinkClickSync (this=0x10b075000, aContent=0x112865eb0, aURI=0x113eed3c0, aTargetSpec=<value temporarily unavailable, due to optimizations>, aFileName=@0x106f27f10, aPostDataStream=0x0, aDocShell=<value temporarily unavailable, due to optimizations>, aRequest=<value temporarily unavailable, due to optimizations>) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:12699
#7  0x0000000101857f85 in mozilla::Maybe<mozilla::AutoCxPusher>::~Maybe () at /Users/mikeconley/Projects/mozilla-central/obj-x86_64-apple-darwin12.5.0/dist/include/nsCxPusher.h:12499
#8  0x0000000101857f85 in nsCxPusher::~nsCxPusher () at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:41
#9  0x0000000101857f85 in nsCxPusher::~nsCxPusher () at /Users/mikeconley/Projects/mozilla-central/obj-x86_64-apple-darwin12.5.0/dist/include/nsCxPusher.h:66
#10 0x0000000101857f85 in OnLinkClickEvent::Run (this=<value temporarily unavailable, due to optimizations>) at /Users/mikeconley/Projects/mozilla-central/docshell/base/nsDocShell.cpp:12502

That’s a bit more manageable.

So we start inside something called a docshell. I’ve heard that term bandied about a lot, and I can’t say I’ve ever been too sure what it means, or what a docshell does, or why I should care.

I found some documents that make things a little bit clearer.

Basically, my understanding is that a docshell is the thing that connects incoming stuff from some URI (this could be web content, or it might be a XUL document that’s loading the browser UI…), and connects it to the things that make stuff show up on your screen.

So, pretty important.

It seems to be a place where some utility methods and functions go as well, so it’s kind of this abstract thing that seems to have multiple purposes.

But the most important thing for the purposes of this post is this: every time you load a document, you have a docshell taking care of it. All of these docshells are structured in a tree which is rooted with a docshell owner. This will come into play later.

So one thing that a docshell does, is that it notices when a link was clicked inside of its content. That’s nsDocShell.cpp’d OnLinkClickEvent::Run, and that eventually makes its way over to nsDocShell::OnLinkClickSync.

After some initial checks and balances to ensure that this thing really is a link we want to travel to, we get sent off to nsDocShell::InternalLoad.

Inside there, there’s some more checking… there’s a policy check to make sure we’re allowed to open a link. Lots of security going on. Eventually I see this:

if (aWindowTarget && *aWindowTarget)

That’s good. aWindowTarget maps to the target=”_blank” attribute in the anchor. So we’ll be entering this block.

    if (aWindowTarget && *aWindowTarget) {
        // Locate the target DocShell.
        nsCOMPtr<nsIDocShellTreeItem> targetItem;
        rv = FindItemWithName(aWindowTarget, nullptr, this,
                              getter_AddRefs(targetItem));

So now we’re looking for the right docshell to load this new document in. That makes sense – if you have a link where target=”foo”, subsequent links from the same origin targeted at “foo” will open in the same window or tab or what have you. So we’re checking to see if we’ve opened something with the name inside aWindowTarget already.

So now we’re in nsDocShell::FindItemWithName, and I see this:

        else if (name.LowerCaseEqualsLiteral("_blank"))
        {
            // Just return null.  Caller must handle creating a new window with
            // a blank name himself.
            return NS_OK;
        }

Ah hah, so target=”_blank”, as we already knew, is special-cased – and this is where it happens. There’s no existing docshell for _blank because we know we’re going to be opening a new window (or tab if the user has preffed it that way). So we don’t return a pre-existing docshell.

So we’re back in nsDocShell::InternalLoad.

        rv = FindItemWithName(aWindowTarget, nullptr, this,
                              getter_AddRefs(targetItem));
        NS_ENSURE_SUCCESS(rv, rv);

        targetDocShell = do_QueryInterface(targetItem);
        // If the targetDocShell doesn't exist, then this is a new docShell
        // and we should consider this a TYPE_DOCUMENT load
        isNewDocShell = !targetDocShell;

Ok, so now targetItem is nullptr, targetDocShell is also nullptr, and so isNewDocShell is true.

There seems to be more policy checking going on in InternalLoad after this… but eventually, I see this:

   if (aWindowTarget && *aWindowTarget) {
        // We've already done our owner-inheriting.  Mask out that bit, so we
        // don't try inheriting an owner from the target window if we came up
        // with a null owner above.
        aFlags = aFlags & ~INTERNAL_LOAD_FLAGS_INHERIT_OWNER;
        
        bool isNewWindow = false;
        if (!targetDocShell) {
            // If the docshell's document is sandboxed, only open a new window
            // if the document's SANDBOXED_AUXILLARY_NAVIGATION flag is not set.
            // (i.e. if allow-popups is specified)
            NS_ENSURE_TRUE(mContentViewer, NS_ERROR_FAILURE);
            nsIDocument* doc = mContentViewer->GetDocument();
            uint32_t sandboxFlags = 0;

            if (doc) {
                sandboxFlags = doc->GetSandboxFlags();
                if (sandboxFlags & SANDBOXED_AUXILIARY_NAVIGATION) {
                    return NS_ERROR_DOM_INVALID_ACCESS_ERR;
                }
            }

            nsCOMPtr<nsPIDOMWindow> win =
                do_GetInterface(GetAsSupports(this));
            NS_ENSURE_TRUE(win, NS_ERROR_NOT_AVAILABLE);

            nsDependentString name(aWindowTarget);
            nsCOMPtr<nsIDOMWindow> newWin;
            nsAutoCString spec;
            if (aURI)
                aURI->GetSpec(spec);
            rv = win->OpenNoNavigate(NS_ConvertUTF8toUTF16(spec),
                                     name,          // window name
                                     EmptyString(), // Features
                                     getter_AddRefs(newWin));

So we check again to see if we’re targeted at something, and check if we’ve found a target docshell for it. We hadn’t, so we do some security checks, and then … what the hell is nsPIDOMWindow? I’m used to things being called nsIBlahBlah, but now nsPIBlahBlah… what does the P mean?

It took some asking around, but I eventually found out that the P is supposed to be for Private – as in, this is a private XPIDL interface, and non-core embedders should stay away from it.

Ok, and we also see do_GetInterface. This is not the same as QueryInterface, believe it or not. The difference is subtle, but basically it’s this: QueryInterface says “you implement X, but I think you also implement Y. If you do, please return a pointer to yourself that makes you seem like a Y.” GetInterface is different – GetInterface says “I know you know about something that implements Y. It might be you, or more likely, it’s something you’re holding a reference to. Can I get a reference to that please?”. And if successful, it returns it. Here’s more documentation about GetInterface.

It’s a subtle but important difference.

So this docshell knows about a window, and we’ve now got a handle on that window using the private interface nsPIDOMWindow. Neat.

So eventually, we call OpenNoNavigate on that nsPIDOMWindow. That method is pretty much like nsIDOMWindow::Open, except that OpenNoNavigate doesn’t send the window anywhere – it just returns it so that the caller can send it to a URI.

Through the magic of do_GetInterface, nsDocShell::GetInterface, EnsureScriptEnvironment, and NS_NewScriptGlobalObject, I know that the nsPIDOMWindow is being implemented by nsGlobalWindow, and that’s where I should go to to find the OpenNoNavigate implementation.

So off we go!

nsGlobalWindow::OpenNoNavigate just seems to forward the call, after some argument setting, to nsGlobalWindow::OpenInternal, like this:

  return OpenInternal(aUrl, aName, aOptions,
                      false,          // aDialog
                      false,          // aContentModal
                      true,           // aCalledNoScript
                      false,          // aDoJSFixups
                      false,          // aNavigate
                      nullptr, nullptr,  // No args
                      GetPrincipal(),    // aCalleePrincipal
                      nullptr,           // aJSCallerContext
                      _retval);

Having a glance around at the rest of the nsGlobalWindow::Open[foo] methods, it looks like they all call into OpenInternal. It’s the big-mamma opening method.

This method does a few things, including making sure that we’re not being abused by web content that’s trying to spam the user with popups.

Eventually, we get to this:

      rv = pwwatch->OpenWindow2(this, url.get(), name_ptr, options_ptr,
                                /* aCalledFromScript = */ false,
                                aDialog, aNavigate, aExtraArgument,
                                getter_AddRefs(domReturn));

and return the domReturn pointer back after a few more checks to our caller. Remember that the caller is going to take this new window, and navigate it to some URI.

Ok, so, pwwatch. What is that? Well, that appears to be a private interface to nsWindowWatcher, which gives us access to the OpenWindow2 method.

After prepping some arguments, much like nsGlobalWindow::OpenNoNavigate did, we forward the call over to nsWindowWatcher::OpenWindowInternal.

And now we’re almost done – we’re almost at the point where we’re actually going to open a window!

Some key things need to happen though. First, we do this:

nsCOMPtr<nsIDocShellTreeOwner>  parentTreeOwner;  // from the parent window, if any
...
GetWindowTreeOwner(aParent, getter_AddRefs(parentTreeOwner));

So what that does is it tries to get the docshell owner of the docshell that’s attempting to open the window (and that’d be the docshell that we clicked the link in).

After a few more things, we check to see if there’s an existing window with that target name which we can re-use:

  // try to find an extant window with the given name
  nsCOMPtr<nsIDOMWindow> foundWindow = SafeGetWindowByName(name, aParent);
  GetWindowTreeItem(foundWindow, getter_AddRefs(newDocShellItem));

And if so, we set it to newDocShellItem.

After some more security stuff, we check to see if newDocShellItem exists. Because name is nullptr (since we had target=”_blank”, and nsDocShell::FindItemWithName returned nullptr), newDocShellItem is null.

Because it doesn’t exist, we know we’re opening a brand new window!

More security things seem to happen, and then we get to the part that I’m starting to focus on:

      nsCOMPtr<nsIWindowProvider> provider = do_GetInterface(parentTreeOwner);
      if (provider) {
        NS_ASSERTION(aParent, "We've _got_ to have a parent here!");

        nsCOMPtr<nsIDOMWindow> newWindow;
        rv = provider->ProvideWindow(aParent, chromeFlags, aCalledFromJS,
                                     sizeSpec.PositionSpecified(),
                                     sizeSpec.SizeSpecified(),
                                     uriToLoad, name, features, &windowIsNew,
                                     getter_AddRefs(newWindow));

We ask the parentTreeOwner to get us something that it knows about that implements nsIWindowProvider. In the Electrolysis / content process case, that’d be TabChild. In the normal, non-Electrolysis case, that’s nsContentTreeOwner.

The nsIWindowProvider is the thing that we’ll use to get a new window from! So we call ProvideWindow on it, to give us a pointer to new nsIDOMWindow window, assigned to newWindow.

Here’s TabChild::ProvideWindow:

NS_IMETHODIMP
TabChild::ProvideWindow(nsIDOMWindow* aParent, uint32_t aChromeFlags,
                        bool aCalledFromJS,
                        bool aPositionSpecified, bool aSizeSpecified,
                        nsIURI* aURI, const nsAString& aName,
                        const nsACString& aFeatures, bool* aWindowIsNew,
                        nsIDOMWindow** aReturn)
{
    *aReturn = nullptr;

    // If aParent is inside an <iframe mozbrowser> or <iframe mozapp> and this
    // isn't a request to open a modal-type window, we're going to create a new
    // <iframe mozbrowser/mozapp> and return its window here.
    nsCOMPtr<nsIDocShell> docshell = do_GetInterface(aParent);
    if (docshell && docshell->GetIsInBrowserOrApp() &&
        !(aChromeFlags & (nsIWebBrowserChrome::CHROME_MODAL |
                          nsIWebBrowserChrome::CHROME_OPENAS_DIALOG |
                          nsIWebBrowserChrome::CHROME_OPENAS_CHROME))) {

      // Note that BrowserFrameProvideWindow may return NS_ERROR_ABORT if the
      // open window call was canceled.  It's important that we pass this error
      // code back to our caller.
      return BrowserFrameProvideWindow(aParent, aURI, aName, aFeatures,
                                       aWindowIsNew, aReturn);
    }

    // Otherwise, create a new top-level window.
    PBrowserChild* newChild;
    if (!CallCreateWindow(&newChild)) {
        return NS_ERROR_NOT_AVAILABLE;
    }

    *aWindowIsNew = true;
    nsCOMPtr<nsIDOMWindow> win =
        do_GetInterface(static_cast<TabChild*>(newChild)->WebNavigation());
    win.forget(aReturn);
    return NS_OK;
}

The docshell->GetIsInBrowserOrApp() is basically asking “are we b2g?”, to which the answer is “no”, so we skip that block, and go right for CallCreateWindow.

CallCreateWindow is using the IPC library to communicate with TabParent in the UI process, which has a corresponding function called AnswerCreateWindow. Here it is:

bool
TabParent::AnswerCreateWindow(PBrowserParent** retval)
{
    if (!mBrowserDOMWindow) {
        return false;
    }

    // Only non-app, non-browser processes may call CreateWindow.
    if (IsBrowserOrApp()) {
        return false;
    }

    // Get a new rendering area from the browserDOMWin.  We don't want
    // to be starting any loads here, so get it with a null URI.
    nsCOMPtr<nsIFrameLoaderOwner> frameLoaderOwner;
    mBrowserDOMWindow->OpenURIInFrame(nullptr, nullptr,
                                      nsIBrowserDOMWindow::OPEN_NEWTAB,
                                      nsIBrowserDOMWindow::OPEN_NEW,
                                      getter_AddRefs(frameLoaderOwner));
    if (!frameLoaderOwner) {
        return false;
    }

    nsRefPtr<nsFrameLoader> frameLoader = frameLoaderOwner->GetFrameLoader();
    if (!frameLoader) {
        return false;
    }

    *retval = frameLoader->GetRemoteBrowser();
    return true;
}

So after some checks, we call mBrowserDOMWindow’s OpenURIInFrame, with (among other things), nsIBrowserDOMWindow::OPEN_NEWTAB. So that’s why we’ve got a new tab opening instead of a new window.

mBrowserDOMWindow is a reference to this thing implemented in browser.js:

function nsBrowserAccess() { }

nsBrowserAccess.prototype = {
  QueryInterface: XPCOMUtils.generateQI([Ci.nsIBrowserDOMWindow, Ci.nsISupports]),

  _openURIInNewTab: function(aURI, aOpener, aIsExternal) {
    let win, needToFocusWin;

    // try the current window.  if we're in a popup, fall back on the most recent browser window
    if (window.toolbar.visible)
      win = window;
    else {
      let isPrivate = PrivateBrowsingUtils.isWindowPrivate(aOpener || window);
      win = RecentWindow.getMostRecentBrowserWindow({private: isPrivate});
      needToFocusWin = true;
    }

    if (!win) {
      // we couldn't find a suitable window, a new one needs to be opened.
      return null;
    }

    if (aIsExternal && (!aURI || aURI.spec == "about:blank")) {
      win.BrowserOpenTab(); // this also focuses the location bar
      win.focus();
      return win.gBrowser.selectedBrowser;
    }

    let loadInBackground = gPrefService.getBoolPref("browser.tabs.loadDivertedInBackground");
    let referrer = aOpener ? makeURI(aOpener.location.href) : null;

    let tab = win.gBrowser.loadOneTab(aURI ? aURI.spec : "about:blank", {
                                      referrerURI: referrer,
                                      fromExternal: aIsExternal,
                                      inBackground: loadInBackground});
    let browser = win.gBrowser.getBrowserForTab(tab);

    if (needToFocusWin || (!loadInBackground && aIsExternal))
      win.focus();

    return browser;
  },

  openURI: function (aURI, aOpener, aWhere, aContext) {
    ... (removed for brevity)
  },

  openURIInFrame: function browser_openURIInFrame(aURI, aOpener, aWhere, aContext) {
    if (aWhere != Ci.nsIBrowserDOMWindow.OPEN_NEWTAB) {
      dump("Error: openURIInFrame can only open in new tabs");
      return null;
    }

    var isExternal = (aContext == Ci.nsIBrowserDOMWindow.OPEN_EXTERNAL);
    let browser = this._openURIInNewTab(aURI, aOpener, isExternal);
    if (browser)
      return browser.QueryInterface(Ci.nsIFrameLoaderOwner);

    return null;
  },

  isTabContentWindow: function (aWindow) {
    return gBrowser.browsers.some(function (browser) browser.contentWindow == aWindow);
  },

  get contentWindow() {
    return gBrowser.contentWindow;
  }
}

So nsBrowserAccess’s openURIInFrame only supports opening things in new tabs, and then it just calls _openURIInNewTab on itself, which does the job of returning the tab’s remote browser after the tab is opened.

I might follow this up with a post about how nsContentTreeOwner opens a window in the non-Electrolysis case, and how we might abstract some of that out for re-use here. We’ll see.

And that’s about it. Hopefully this is useful to future spelunkers.

Electrolysis: Debugging Child Processes of Content for Make Benefit Glorious Browser of Firefox

Here’s how I’m currently debugging Electrolysis stuff on OS X using gdb. It involves multiple terminal windows. I live with that.

# In Terminal Window 1, I execute my Firefox build with MOZ_DEBUG_CHILD_PROCESS=1.
# That environment variable makes it so that the parent process spits out the child
# process ID as soon as it forks out. I also use my e10s profile so as to not muck up
# my default profile.

MOZ_DEBUG_CHILD_PROCESS=1 ./mach run -P e10s

# So, now my Firefox is spawned up and ready to go. I have
# browser.tabs.remote.autostart set to "true" in my about:config, which means I'm
# using out-of-process tabs by default. That means that right away, I see the
# child process ID dumped into the console. Maybe you get the same thing if
# browser.tabs.remote.autostart is false. I haven't checked.

CHILDCHILDCHILDCHILD
  debug me @ 45326

# ^-- so, this is what comes out in Terminal Window 1.

So, the next step is to open another terminal window. This one will connect to the parent process.

# Maybe there are smarter ways to find the firefox process ID, but this is what I
# use in my new Terminal Window 2.
ps aux | grep firefox

# And this is what I get back:

mikeconley     45391  17.2  5.3  3985032 883932   ??  S     2:39pm   1:58.71 /Applications/FirefoxAurora.app/Contents/MacOS/firefox
mikeconley     45322   0.0  0.4  3135172  69748 s000  S+    2:36pm   0:06.48 /Users/mikeconley/Projects/mozilla-central/obj-x86_64-apple-darwin12.5.0/dist/Nightly.app/Contents/MacOS/firefox -no-remote -foreground -P e10s
mikeconley     45430   0.0  0.0  2432768    612 s002  R+    2:44pm   0:00.00 grep firefox
mikeconley     44878   0.0  0.0        0      0 s000  Z    11:46am   0:00.00 (firefox)

# That second one is what I want to attach to. I can tell, because the executable
# path lies within my local build's objdir. The first row is my main Firefox I just
# use for work browsing. I definitely don't want to attach to that. The third line
# is just me looking for the process with grep. Not sure what that last one is.

# I use sudo to attach to the parent because otherwise, OS X complains about permissions
# for process attachment. I attach to the parent like this:

sudo gdb firefox 45322

# And now I have a gdb for the parent process. Easy peasy.

And finally, to debug the child, I open yet another terminal window.

# That process ID that I got from Terminal Window 1 comes into play now.

sudo gdb firefox 45326

# Boom - attached to child process now.

Setting breakpoints for things like TabChild::foo or TabParent::bar can be done like this:

# In Terminal Window 3, attached to the child:

b mozilla::dom::TabChild::foo

# In Terminal Window 2, attached to the parent:

b mozilla::dom::TabParent::bar

And now we’re cookin’.

Much Ado About Brendan (or As I’ve Seen It)

Since Brendan Eich’s resignation, I’ve been struggling to articulate what I think and feel about the matter. It’s been difficult. I haven’t been able to find what I wanted to say. Many other better, smarter, and more qualified Mozillians have written things about this, and I was about to let it go. I didn’t just want to say “me too”.

I felt I had nothing of substance to contribute. I feebly wrote something about Brendan Eich and the Kobayashi Maru, but it became a rambling mess, and the analogy fell apart quite quickly. I was about to call it quits on contributing my thoughts.

And then this post happened.

Don’t ask me where this came from. A muse woke me up in the night to write it (it’s just past 4AM for crying out loud – muse, let me sleep). Maybe through the lens of this nonsense, some real sense will prevail. I’m not hopeful, but this muse is nodding emphatically (and grinning like a lunatic).

Please believe that I’m not at all trying to trivialize, oversimplify, or make light of the events of the past few weeks by writing this. I’m just trying to understand it, and view it with a looking glass I have at least a little familiarity with.

And maybe it’s mostly catharsis.

I also apologize that it’s not really told like a story from the Bard. I think that’d be too long winded (no offense, Shakey). I’m pretty sure the narrator / stage directions have the most lines. It’s actually quite criminal.

I also want to point out that the only “real world names” in this little travesty is Brendan Eich’s, Jay Sullivan’s, and Mitchell Baker’s. The rest are from the world of Shakespeare.

And I also apologize that it’s not in iambic pentameter – that’d probably be more appropriate, but I have neither the wit nor the patience to pull this off with that much verisimilitude.

Oooh! Verisimilitude! Fancy words! Enough apologies, let’s get started.

Much Ado About Brendan (or As I’ve Seen It)

Prologue

Venice, Italy. Sometime during the Renaissance. This glorious city is composed of many families – the Montagues, the Capulets, the Macbeths, the MacDuffs, the Aguecheeks, the Fortanbras, the Whitmore’s, and many many more. Too many to name or count.

Many of these families argue and disagree about things. There’s almost always one thing that one family does or thinks that another family just cannot abide by.

It is in this turbulent city of families that we find The Merchant’s Building. The Merchant’s of Venice are selling their wares, lending or selling books, playing music, and much more – and people are constantly streaming in and out. It’s a marketplace of endless possibility.

In one section of The Merchant’s Building, is the Mozilla booth. Mozilla does and makes many things – but it’s probably best known for its Firefox jewelry. Mozilla is one of a small number of merchants giving away jewelry – and jewelry, in this building, is special: the more people wear your jewelry, the more of a voice you have at the Merchant’s Weekly Meeting, where the rules of the building are written and refined.

So what is special about this Mozilla merchant? Why should we wear their jewelry? There are certainly other merchants giving away jewelry a few booths down. What does Mozilla bring to the table?

For one thing, the jewelry is beautiful. And it makes you walk faster. And it’s got the latest features. And it makes it harder for sketchy people to follow you. And it doesn’t have a built-in tracking device recording which merchants you’re visiting. And you can add cool charms to it, and make it look exactly how you want it.

And another thing that’s unique to the Mozilla booth is that they’re composed of members of every single family in Venice. Every single family has at least one member working in the Mozilla booth. And what’s more – a bunch of these workers are volunteering their time and efforts to make this stuff!

Why? Why do they volunteer? And why do these family members work side by side with people their families might balk at, or sneer at?

Well, In the very center of the Mozilla booth, overhanging the whole thing, is… The Mission. The Mission is the guiding principals upon which the Mozilla booth operates. This is what these family members bury their gauntlets for. They work, sweat and bleed side by side for this mission. This is their connective tissue. This is what guides them when they vote and argue for things at the Merchant’s Weekly Meeting.

The other truly unique thing about the Mozilla booth is that there are no walls to it! You can walk right in, and watch the craftspeople make jewelry! Heck, you can sit right down at a bench and somebody will show you how to make some yourself. They’ll guide you, and they’ll critique you, and soon, somebody will be wearing a piece of jewelry that you made.

The greatest debates also occur within the Mozilla booth. People stand on soap boxes and give their opinions about jewelry, or other merchandise – or merchandise practices. People say what they think out loud, and perhaps print it on a t-shirt and wear it. Sometimes, discussions get heated, but level thinking usually prevails because these Mozillians are an unusually bright bunch.

ACT I

There is a leadership selection underway. Someone needs to be the Chief of Business Affairs (or CBA) in the Mozilla booth. The current chief, Jay, has been holding the position as an interim chief, and the Board of Business Affairs is trying to select someone to take the position permanently.

Two members of this board already have their bags packed – for a while now, they’ve been neglecting other interests of theirs, and after this chief is selected, they feel they need to do other things.

Enter Brendan Eich. Brendan Eich is chief craftsperson of the makers of jewelry in the Mozilla booth. He’s a brilliant and widely respected craftsperson himself, having invented some of the amazing techniques that are used by all serious jewelry makers. He is also one of the founders of the Mozilla booth, having set it up with Mitchell Baker.

The Board of Business Affairs selects Brendan to be the next Chief of Business Affairs.

They announce this, and there is much applause! People clap Brendan on the back. Many craftspeople are pleased that one of their own will be in charge.

The two board members, as they’ve agreed to, take their bags, salute, and walk off out of the booth and on to other things.

A third board member leaves as well, but for reasons not related to what I describe below.

Suddenly, several Montagues and Montague supporters in the Mozillian booth grow concerned. They recall that several years ago, Brendan had donated $1000 dollars to a law that supported Capulet values – a law which impacted their rights. The Montagues and Montague supporters grow concerned that someone who supports this Capulet law is not fit to be Chief of a booth that houses all of the families, Montagues included.

Several of these Montagues raise these concerns out loud. This is not unusual in the Mozilla booth, as most concerns are raised out loud – and, as usual, debate begins. Brendan states that he will 100% abide by the Mozilla participation guidelines, and what’s more, began supporting a project that a Montague in the Mozilla booth has been working on – to bring more Montagues into the booth.

Vigorous debate continues, as is the Mozilla booth custom.

However, as the booth lets anybody in, and the debate can be heard outside of the booth, several Montagues and Montague supporters hear these concerns and start passing the message along to one another – a Capulet has been selected to be the CBA!

Many of these Montagues are reasonable, and say and write reasonable arguments about why they are concerned, and why Brendan may not be the right choice as CBA.

ACT II

A few meters away, the Cupid booth overhears all of this concern from the Montagues. Perhaps they really are Montague supportors (or, more likely, they just wanted to perk up business), but they suddenly decide to take a stand. For people who try to come into their booth wearing Firefox jewelry, they have to read a big sign that tells them about why the Cupid booth believes that restricting the rights of Montagues is terrible, and that the Mozilla booth is terrible for making a Capulet the CBA. They tell the people wearing Firefox jewelry that they should probably wear other things.

And so some people start to take off their Firefox jewelry. Some Montagues take it off angrily, and smash it into the ground – stomping it with their feet, creating a big dust cloud.

Enter Iago, and his team of writers. There are many writers and story-sellers in the Merchant’s Building, but Iago is one of those writers that just wants people to listen to him. He likes to twist words and make things up, or to insinuate things that are not true. He saw the board members leaving the Mozilla booth and concocts some headlines, insinuating that they left in protest of Brendan’s support of the Capulet laws. He also writes about how all of the Mozillians in the booth were not supporting Brendan’s appointment as CBA (which is not true – it’s true that some were concerned and questioned the wisdom of his appointment, but certainly not all). He writes and he writes, and his messengers pass copies and leaflets around. Montagues and Montague supporters read these leaflets, or hear people talking about them, and they grow very concerned. More Montagues start to take off their Firefox jewelry.

Some Montagues start to engage with Mozillians and try to figure out what is happening. As always, each family has calm and reasonable people to converse with – and that’s always welcome in the Mozilla booth.

However, every family also has their groundlings. The groundlings are the members of a family who are always looking for a fight. Always looking for blood. Always hoping an actor will forget their lines, and will shout distracting things at them to make it happen. They always have a bag of rotten fruit and vegetables with them to throw. Some of them just like to make trouble.

Every family has their groundlings. You’ve probably met some yourself.

The groundlings start to hear these rumors that Iago has been spreading around, copied and recopied, distorted and mutilated – and they see the signs at the Cupid booth.

And they rush the Mozilla booth! They start throwing rotten fruit and vegetables, and they tear off their Firefox jewelry, and swear to never wear it again! They gnash their teeth, and they rip out their own hair in a rage, and they scream and yell and make so much noise – it’s almost impossible for the craftspeople in the Mozilla booth to work!

A tempest of Montague rage was upon the Mozilla booth.

ACT III

After several hours of this, Brendan addresses the crowd outside, and speaks to some storytellers (Iago and his team are among them – he always is).

They ask him if he renounces Capulet ways, or if he will apologize for the Montague rights that were impacted by the Capulet law that he helped fund.

And Brendan says something along the lines of “I don’t think that’s helpful to discuss. I don’t think that’s relevant here. I’m not going to run this booth as if everybody in here were Capulets – I helped make this booth, I know that it’s composed of many families, and I know how it operates.”

But Iago and the groundlings were not satisfied. They put up signs and placards claiming that anybody wearing Firefox jewelry is supporting the Capulets!

The Mozillians look at all of the broken and stomped-on jewelry on the market ground. All their work, being trampled. If this continues, their ability to improve things for all families at the Merchant’s Weekly Meeting will fade. Their ability to enact their Mission will fade. They are agitated, discouraged, upset, angry, sad, anxious, confused – a cocktail of emotion playing pretty much the entire spectrum.

Brendan’s speech had not done anything to quell the groundlings. And Iago could smell blood, and was not going to stop writing about Brendan or Mozilla.

The other leaders look to Brendan. What will we do?

And Brendan said, “This noise is getting absurdly loud. How are we supposed to work under these conditions? There’s no way we can enact the mission like this.”

And Brendan steps onto the proscenium, and says:

To leave, or not to leave, that is the question—
Whether ’tis Nobler in the mind to suffer
The Slings and Arrows of outrageous Fortune,
Or to take Arms against a Sea of troubles,
And by opposing end them?

And so, after much thought, he takes arms. He sacrifices, and he chooses to leave the booth – the booth he helped plant into the ground over 15 years ago. The booth he helped build, the jewelry and techniques he helped craft.

“I think if I leave, you folks might have a chance to keep the mission going.”

And so he leaves, to the heartbreak of many Mozillians, and to the cheering of the Montague groundlings outside.

ACT IV

Several of the more sensible Montagues watch Brendan leave and wonder if perhaps the groundlings in their family have made them look petty and vindictive. Some of them are also sad that Brendan left the Mozilla booth – all they wanted from him was an apology, they say. That would have sufficed, they say. They didn’t expect or want him to leave the whole booth.

But the damage is done, and Brendan has left. There is no chief craftsperson, and there is no CBA. Holy shit.

The Mozillians in the booth start to get back to work, since the cheers of the Montagues outside are much easier to work against as a backdrop than the booing, hissing and food-throwing. A bunch of Montagues dust off their stomped Firefox jewelry (or grab new copies!) and put them back on proudly. Others are happy with the new jewelry they got, and don’t care about the Mission. Still others never took off the Firefox jewelry, but said they did. And now they wear it publicly again, proudly.

But suddenly, the Capulets and Capulet supporters in and around the Mozilla booth look at this gaping void where Brendan was and sense injustice. This was wrong, they cry! This man should not have been chased out of here!

Vigorous debate begins, as is the Mozilla booth custom.

And reasonable Capulets say and write reasonable things about why they think it was wrong for Brendan to have left.

And Iago, who never really left the area, hears all of this, and smells more blood in the air. He takes his poison pen, and writes stories about how Brendan was forcibly removed from the Mozilla booth by an angry mob of Montagues. He writes that, like Julius Cesar, Brendan was heard gasping “Et tu, Brute?” as he was stabbed by his fellow senators – or, like King Hamlet, poisoned and betrayed by the people closest to him.

But as usual, Iago gets this completely wrong. Not that he cares or bothers to check. What a douche. And LOUD too, holy smokes. And people listen to Iago, and read what he writes, and hear what he says, and the rumours abound!

And a second tempest starts to brew.

ACT V

Many reasonable Capulets, both inside and outside of the Mozilla booth are concerned about what this means for them. Does this mean that Capulets aren’t allowed to become CBA’s? That’s certainly against the inclusiveness guidelines, is it not? And much debate resonated, as is the Mozilla way.

But, as you recall, every family has their groundlings, and the Capulets are no exception. The Capulet groundlings heard the rumours that Iago and his ilk were slinging, and they gnashed their teeth, and they pulled out their hair.

“YOU KILLED BRENDAN”, the groundings howled at the Mozilla booth.

“No, he left on his own accord to save us and the mission,” some Mozillians said with sadness.

“NO HE DIDN’T, HE WAS BETRAYED AND MURDERED BY HIS CLOSEST ALLIES!” the groundlings yelled back.

“No, that’s simply not true. He left on his own accord in an attempt to save the booth and the mission.”

And the reasonable Capulets understood this, and they understood the mindblowing complexities of this whole clusterfuck. And they spoke with reason and passion.

The Mozillian craftspeople got up from their work making jewelry to talk to these Capulets, and the supporters of the Capulets. And many were very reasonable and calm – but the groundlings among them were vicious and yelled and made so much noise. In some ways, their rage was indistinguishable from the Montegue groundling rage, which I believe is some kind of irony.

And, as you’d expect, the Capulet groundlings, like all groundlings, love blood. They love a fight. And they tore off their Firefox jewelry, and they stamped it into the ground. Vegetables and rotten fruit started to be thrown at the Mozilla booth. Again.

And the Mozillians in the booth looked at each other. They looked at the gaping void where Brendan used to stand. They all hugged one another, and comforted one another, as the jeers and boos of the groundlings got louder and louder, and as rotten fruit and vegetables slammed into them and their works.

And this is where we currently are, I believe.

Epilogue

If these ramblings have offended,
Think but this, and all is mended,
That you have but slumber’d here
While these visions did appear.
And this weak and idle theme,
No more yielding but a dream,
Gentles, do not reprehend:
if you pardon, we will mend:
And, as I am an honest Mike,
I do yet miss this Brendan Eich.
Now to ‘scape the serpent’s tongue,
We will make amends ere long;
Else the Mike a liar call;
So, good night unto you all.
Give me your hands, if we be friends,
And Robin shall restore amends.

For a less silly and more sober analysis of what happened, I suggest reading this next.

Australis Performance Post-mortem Part 3: As Good As Our Tools

While working on the ts_paint and tpaint regressions, we didn’t just stab blindly at the source code. We had some excellent tools to help us along the way. We also MacGyver‘d a few of those tools to do things that they weren’t exactly designed to do out of the box. And in some cases, we built new tools from scratch when the existing ones couldn’t cut it.

I just thought I’d write about those.

MattN’s Spreadsheet

I already talked about this one in my earlier post, but I think it deserves a second mention. MattN has mad spreadsheet skills. Also, it turns out you can script spreadsheets on Google Docs to do some pretty magical things – like pull down a bunch of talos data, and graph it for you.

I think this spreadsheet was amazingly useful in getting a high-level view of all of the performance regressions. It also proved very, very useful in the next set of performance challenges that came along – but more on those later.

MattN’s got a blog post up about his spreadsheet that you should check out.

The Gecko Profiler

This is a must-have for Gecko hackers who are dealing with some kind of performance problem. The next time I hit something performance related, this is the first tool I’m going to reach for. We used a number of tools in this performance work, but I’m pretty sure this was the most powerful one in our arsenal.

Very simply, Gecko ships with a built-in sampling profiler, and there’s an add-on you can install to easily dump, view and share these profiles. That last bit is huge – you click a button, it uploads, and bam – you have a link you can send to someone over IRC to have them look at your profile. It’s sheer gold.

We also built some tools on top of this profiler, which I’ll go into in a few paragraphs.

You can read up on the Gecko Profiler here at the official documentation.

Homebrew Profiler

At one point, jaws built a very simple profiler for the CustomizableUI component, to give us a sense of how many times we were entering and exiting certain functions, and how much time we were spending in them.

Why did we build this? To be honest, it’s been too long and I can’t quite remember. We certainly knew about the Gecko Profiler at this point, so I imagine there was some deficiency with the profiler that we were dealing with.

My hypothesis is that this was when we were dealing strictly with the ts_paint / tpaint regression on Windows XP. Take a look at the graphs in my last post again. Notice how UX (red) and mozilla-central (green) converge at around July 1st on Ubuntu? And how OS X finally converges on t_paint around August 1st?

I haven’t included the Windows 7 and 8 platform graphs, but I’m reasonably certain that at this point, Windows XP was the last regressing platform on these tests.

And I know for a fact that we were having difficulty using the Gecko Profiler on Windows XP, due to this bug.

Basically, on Windows XP, the call tree wasn’t interleaving the Javascript and native-code calls properly, so we couldn’t trust the order of tree, making the profile really useless. This was a serious problem, and we weren’t sure how to workaround it at the time.

And so I imagine that this is what prompted jaws to write the homebrew profiler. And it worked – we were able to find sections of CustomizableUI that were causing unnecessary reflow, or taking too long doing things that could be shortcutted.

I don’t know where jaws’ homebrew profiler is – I don’t have the patch on my machine, and somehow I doubt he does too. It was a tool of necessity, and I think we moved past it once we sorted out the Windows XP stack interleaving thing.

And how did we do that, exactly?

Using the Gecko Profiler on Windows XP

jaws profiler got us some good data, but it was limited in scope, since it only paid attention to CustomizableUI. Thankfully, at some point, Vladan from the Perf team figured out what was going wrong with the Gecko Profiler on Windows XP, and gave us a workaround that lets us get proper profiles again. I have since updated the Gecko Profiler MDN documentation to point to that workaround.

Reflow Profiles

This is where we start getting into some really neat stuff. So while we were hacking on ts_paint and tpaint, Markus Stange from the layout team wrote a patch for Gecko to take “reflow profiles”. This is a pretty big deal – instead of telling us what code is slow, a reflow profile tells us what things take a long time to layout and paint. And, even better, it breaks it down by DOM id!

This was hugely powerful, and I really hope something like this can be built into the Gecko Profiler.

Markus’ patch can be found in this bug, but it’ll probably require de-bitrotting. If and when you apply it, you need to run Firefox with an environment variable MOZ_REFLOW_PROFILE_FILE pointing at the file you’d like the profile written out to.

Once you have that profile, you can view it on Markus’ special fork of the Gecko Profiler viewer.

This is what a reflow profile looks like:

Screen Shot 2013-12-13 at 11.49.34 PM

I haven’t linked to one I’ve shared because reflow profiles tend to be very large – too large to upload. If you’d like to muck about with a real reflow profile, you can download one of the reflow profiles attached to this bug and upload it to Markus’ Gecko Profiler viewer.

These reflow profiles were priceless throughout all of the Australis performance work. I cannot stress that enough. They were a way for us to focus on just a facet of the work that Gecko does – layout and painting – and determine whether or not our regressions lay there. If they did, that meant that we had to find a more efficient way to paint or layout. And if the regressions didn’t show up in the reflow profiles, that was useful too – it meant we could eliminate graphics and layout from our pool of suspects.

Comparison Profiles

Profiles are great, but you know what’s even better? Comparison profiles. This is some more Markus Stange wizardry.

Here’s the idea – we know that ts_paint and tpaint have regressed on the UX branch. We can take profiles of both the UX and mozilla-central. What if we can somehow use both profiles and find out what UX is doing that’s uniquely different and uniquely slow?

Sound valuable? You’re damn right it is.

The idea goes like this – we take the “before” profile (mozilla-central), and weight all of its samples by -1. Then, we add the samples from the “after” profile (UX).

The stuff that is positive in the resulting profile is an indicator that UX is slower in that code path. The stuff that is negative means that UX is faster.

How did we do this? Via these scripts. There’s a script in this repository called create_comparison_profile.py that does all of the work in generating the final comparison profile.

Here’s a comparison profile to look at, with mozilla-central as “before” and UX as “after”.

Now I know what you’re thinking – Mike – the root of that comparison profile is a negative number, so doesn’t that mean that UX is faster than mozilla-central?

That would seem logical based on what I’ve already told you, except that talos consistently returns the opposite opinion. And here’s where I expose some ignorance on my part – I’m simply not sure why that root node is negative when we know that UX is slower. I never got a satisfying answer to that question. I’ll update this post if I find out.

What I do know is that drilling into the high positive numbers of these comparison profiles yielded very valuable results. It allowed us to quickly determine what was unique slow about UX.

And in performance work, knowing is more than half the battle – knowing what’s slow is most of the battle. Fixing it is often the easy part – it’s the finding that’s hard.

Oh, and I should also point out that these scripts were able to generate comparison profiles for reflow profiles as well. Outstanding!

Profiles from Talos

Profiling locally is all well and good, but in the end, if we don’t clear the regressions on the talos hardware that run the tests, we’re still not good enough. So that means gathering profiles on the talos hardware.

So how do we do that?

Talos is not currently baked into the mozilla-central tree. Instead, there’s a file called testing/talos/talos.json that knows about a talos repository and a revision in that repository. The talos machines then pull talos from that repository, check out that revision, and execute the talos suites on the build of Firefox they’ve been given.

We were able to use this configuration to our advantage. Markus cloned the talos repository, and modified the talos tests to be able to dump out both SPS and reflow profiles into the logs of the test runs. He then pushed those changes to his user repository for talos, and then simply modified the testing/talos/talos.json file to point to his repo and the right revision.

The upshot being that Try would happily clone Markus’ talos, and we’d get profiles in the test logs on talos hardware! Brilliant!

Extracting and symbolicating those profiles would be handled by more of Markus’ scripts – see get_profiles.py.

Now we were cooking with gas – reflow and SPS profiles from the test hardware. Could it get better?

Actually, yes.

Getting the Good Stuff

When the talos tests run, the stuff we really care about is the stuff being timed. We care about how long it takes to paint the window, but not how long it takes to tear down the window. Unfortunately, things like tearing down the window get recorded in the SPS and reflow profiles, and that adds noise.

Wouldn’t it be wonderful to get samples just from the stuff we’re interested in? Just to get samples only when the talos test has its stopwatch ticking?

It’s actually easier than it sounds. As I mentioned, Markus had cloned the talos tests, and he was able to modify tpaint and ts_paint to his liking. He made it so that just as these tests started their stopwatches (waiting for the window to paint), an SPS profile marker was added to the sample taken at that point. A profile marker simply allows us to decorate a sample with a string. When the stopwatch stopped (the window has finished painting), we added another marker to the profile.

With that done, the extraction scripts simply had to exclude all samples that didn’t occur between those two markers.

The end result? Super concentrated profiles. It’s just the stuff we care about. Markus made it work for reflow profiles too – it was really quite brilliant.

And I think that pretty much covers it.

Lessons

  • If you don’t have the tools you need, go get them.
  • If the tools you need don’t exist, build them, or find someone who can. That someone might be Markus Stange.
  • If the tools you need are broken, fix them, or find someone who can.

So with these amazing tools we were eventually able to grind down our ts_paint and tpaint regressions into dust.

And we celebrated! We were very happy to clear those regressions. We were all clear to land!

Or so we thought. Stay tuned for Part 4.

Australis Performance Post-mortem Part 2: ts_paint and t_paint

Continued from Part 1.

So we’d just gotten Talos data in, and it looked like we were regressing on ts_paint and tpaint right across the board.

Speaking just for myself, up until this point, Talos had been a black box. I vaguely knew that Talos tests were run, and I vaguely understood that they measured certain performance things, but I didn’t know what those things were nor where to look at the results.

Luckily, I was working with some pretty seasoned veterans. MattN whipped up an amazing spreadsheet that dynamically pulled in the Talos test data for each platform so that we could get a high-level view of all of the regressions. This would turn out to be hugely useful.

Here’s a link to a read-only version of that spreadsheet in all of its majesty. Or, if that link is somehow broken in the future, here’s a screenshot:

Numbers!

Numbers!

So now we had a high-level view of the regressions. The next step was determining what to do about it.

I should also mention that these regressions, at this point, were the only big things blocking us from landing on mozilla-central. So naturally, a good chunk of us focused our attention on this performance stuff. We quickly organized a daily standup meeting time where we could all get together and give reports on what we were doing to grind down the performance issues, and what results we were getting from our efforts.

That chunk of team, however, didn’t initially include me. I believe Gijs, Unfocused, mikedeboer and myself kept hacking on customization and widget bugs while jaws and MattN dug at performance. As time went on though, a few more of us eventually joined MattN and jaws in their performance work.

The good news in all of this is that ts_paint and tpaint are related – both measure the time it takes from issuing the command to open a browser window to actually painting it on the screen. ts_paint is concerned with the very first Firefox window from a cold-start, and tpaint is concerned with new windows from an already-running Firefox. It was quite possible that there was some overlap in what was making us slow on these two tests, which was somewhat encouraging.

The following bugs are just a subset of the bugs we filed and landed to improve our ts_paint and tpaint performance. Looking back, I’m pretty sure these are the ones that made the most difference, but the full list can be found as dependencies of these bugs.

Bug 890105 – TabsInTitleBar._update should group measurements and style changes to avoid unnecessary reflows

After a bit of examination, MattN dealt the first blow when he filed Bug 890105. The cross-platform code that figures out how best to place the tabs in the titlebar (while taking into account things like the system font size) is run before the window first paints, and it was being inefficient.

By inefficient, I mean it was causing more reflows than necessary. Here’s some information on reflows. The MDN page states that the article is obsolete, but the page still does a pretty good job of explaining what a reflow is.

The code would take a measurement of something on the page (causing a reflow), update that thing’s size (causing a reflow), and then repeat the process. MattN found we could cluster the measurements into a single pass, and then do all of the changes one after another. This reduced the number of reflows, which helped speed up both ts_paint and tpaint.

And boom, we saw our first win for both ts_paint and tpaint!

Bug 892532 – Add an optional fast-path to CustomizableUI.isWidgetRemovable

jaws found the next big win using a home-brewed profiler. The home-brewed profiler simply counted the number of times we entered and exited various functions in the CustomizableUI code, and recorded the time it took from entering to exiting.

I can’t really recall why we didn’t use the SPS profiler at this point. We certainly knew about it, but something tells me that at this point, we were having a hard time getting useful data from it.

Anyhow, with the home-brew profiler, jaws determined that we had the opportunity to fast-path a section of our code. Basically, we had a function that takes the ID of a widget, looks for and retrieves the widget, and returns whether or not that widget can be removed from its current location. There were some places that called this function during window start-up, and those places already had the widget that was to be found. jaws figured we could fast-path the function by being able to pass the widget itself rather than the ID, and skip the look-up.

Bug 891104 – Skip calling onOverflow during startup if there wasn’t any overflowed content before the toolbar is fully initialized

It was MattN’s turn again – this time, he found that the overflow toolbar code for the nav-bar (this is the stuff that handles putting widgets into the overflow panel if the window gets too small) was running the overflow handler as soon as the nav-bar was initted, regardless of whether anything was overflowed. This was causing a reflow because a measurement was on the overflowable toolbar to see if items needed to be moved into the overflow panel.

Originally, the automatic call of the overflow handler was to account for the case where the nav-bar is overflowed from the very beginning – but jaws made it smarter by attaching an overflow handler before the CSS attribute that made the toolbar overflowable was applied. That meant that if the nav-bar would only call the overflow handler if it really needed to, as opposed to every time.

Bug 898126 – Cache client hit test values

Around this time, a few more people started to get involved in Australis performance work. Gijs and mstange got a bug filed to investigate if there was a way to make start-up faster on Windows XP and 7. Here’s some context from mstange in that bug in comment 9:

It turns out that Windows XP sends about 200 WM_NCHITTEST events per second when we open a new window. All these events have the same position – possibly the current mouse position. And all the ClientMarginHitTestPoint optimizations we’ve been playing with only make a difference because that function is called so often during the test – one invocation is unnoticeably quick, but it starts to add up if we call it so many times.

This patch makes sure that we only send one hittest event per second if the position doesn’t change, and returns a cached value otherwise.

After some fiddling about with cache invalidation times, the patch landed, and we saw a nice win on Windows XP and 7!

Bug 906075 – Only send toolbars through buildArea if they’re not in their default state

It was around now that I started to get involved with performance work. One of my first successful bugs was to only run a toolbar through CustomizableUI’s buildArea function if the toolbar was not starting in a default state. The buildArea function’s job is to populate a customizable area with only the things that the user has moved into the area, and remove the things that the user has taken out. That involves cycling through the nodes in the area to see if they belong, and that takes time. I wrote a patch that cached a “dirty” state on a toolbar to indicate that it’d been customized in the past, and if we didn’t see that value, we didn’t run the toolbar through the function. Easy as pie, and we saw a little win on both ts_paint and tpaint on all platforms.

Bug 905695 – Skip checking for tab overflows if there is only one tab open

This was another case where we had an unnecessary reflow during start-up. And, like bug 891104, it involved an overflow event handler running when it really didn’t need to. jaws writes:

If only one tab is opened and we show the left/right arrows, we are actually removing quite a bit of space that could have been used to show the tab. Scrolling the tabbox in this state is also quite useless, since all the user can do is scroll to see the other parts of the *only* tab.

If we make this change, we can skip a synchronous reflow for new windows that only have one tab.

Which means we could skip a reflow for all new windows. Are you starting to notice a pattern? Sections of our code had been designed to operate the same way, regardless of whether or not it was in the default, common case. We were finding ways of detecting the default case, and fast-pathing them.

Chalk up another win!

Bug 907787 – Australis: toolbar overflow button should be hidden by default

Yet another example where we could fast-path the default case. The overflow button in the nav-bar is only supposed to be displayed if there are too many items in the nav-bar, resulting in some getting put into the overflow panel, which anchors on the overflow button.

If nothing is being overflowed and the panel is empty, the button should not be displayed.

We were, however, displaying the button by default, and then hiding it when we determined that nothing was overflowed. Bug 907787 inverted that logic, and hid the button by default, and only showed it when things got overflowed (which was not the default case).

We were getting really close to performance parity with mozilla-central…

Bug 908326 – default the navbar to overflowable to avoid needless reflowing

Once again, an example of us not greasing the default-path. Our overflowable toolbar code applies an overflowable attribute to the nav-bar in order to apply some CSS styles to give the toolbar its overflowing properties. Adding that attribute dynamically means a reflow.

Instead, we just added the attribute to the node’s definition in browser.xul, and dropped that unnecessary reflow like a hot brick.

So how far had we come?

Let’s take a look at the graphs, shall we? Remember, in these graphs, the red points represent UX, and the green represent mozilla-central. Up is bad, and down is good. Our goal was to sink the red dots down into the noise of the green dots, which would give us performance parity.

ts_paint

Windows XP - ts_paint improvements

Windows XP – ts_paint improvements

Ubuntu - ts_paint improvements

Ubuntu – ts_paint improvements

OSX 10.6 ts_paint improvements

OSX 10.6 ts_paint improvements

You might be wondering what that bug jump is for ts_paint for OSX 10.6 at the end of the graph. This thread explains.

tpaint

Windows XP - tpaint improvements

Windows XP – tpaint improvements

 

Ubuntu - tpaint improvements

Ubuntu – tpaint improvements

OSX 10.6 tpaint improvements

OSX 10.6 tpaint improvements

Looking good.

The big lessons

I think the big lesson here is to identify the common, default case, and optimize it as best you can. By definition, this is the path that’s going to be hit the most, so you can special-case it, and build in fast paths for it. Your users will thank you.

Close the feedback loop as much as you can. To test our theories, we’d push our patches to try and use compare-talos to compare our tpaint and ts_paint numbers to baseline pushes to see if we were making improvements. This requires several hours for the try builds to complete. This is super slow. Release Engineering was awesome and lent us some Windows XP talos slaves for us to experiment on, and that helped us close the feedback loop a lot. Don’t be afraid to ask Release Engineering for talos slaves.

Also note that while it’s easy for me to rattle off bug numbers and explain where we were being slow, all of that investigation and progress occurred over several months. Performance work can be really slow. The bottleneck is not making the slow code faster – the bottleneck is identifying where the slow code is. Profiling is the key here. If you’re not using some kind of profiler while doing performance work, you’re seriously impeding yourself. If you don’t have a profiler, build a simple one. If you don’t know how to build a simple one, find someone who can.

I mentioned Gecko’s built-in SPS profiler a few paragraphs back. The SPS profiler was instrumental (pun intended) in getting our performance back up to snuff. We also built a number of tools alongside the SPS profiler to help us in our analyses.

Read up about those tools we built in Part 3…