Tag Archives: printing

A printing story and a PSA: outparams over the IPC layer might not behave like you’d expect

Here’s a story about a printing bug on OS X, and a lesson about how our IPC layer works.

Last week, :ehsan came up to my desk and said “Mike… printing is broken on OS X with e10s enabled. Did you know this?”

It was sad to hear. Nobody, as far as I knew, had been touching printing code (besides bobowen, but he hadn’t landed his patches yet), which meant that some mystery change had landed and it was my responsibility (as the one who had implemented printing on OS X for e10s) to fix it.

The first step was to confirm it. Yep, it looked like I couldn’t print on my Nightly. Crap1. So we get the bug filed, and then I fired up my local build and lldb, and tried to trace out where the problem was occurring.

Drilling down to the regressing changeset

The problem was that my local build printed just fine. Eyebrow raised, I tried using my default profile with my local build to see if something about my profile was causing printing to break. Printing continued to work with my local build2.

Scratching my head, I used mozregression to bisect the problem down to a single changeset.

Here’s the bug it spat out:

Bug 1209930 Update Mac clang to match the version in use everywhere else

The hair stood up on the back of my neck. Printing got broken by a compiler upgrade? This had bad news written all over it.

Many try builds

So that, I guess, explained why I couldn’t reproduce the problem locally. I needed to build with a particular compiler in order to experience the bug.

The bad news was that I found out that the version of clang that we’d upgraded to didn’t work on my version of OS X (10.10.5). It was supposed to work on machines running OS X 10.7, but I didn’t have access to any3.

I was able to get debug builds off of try, but I had no luck convincing lldb to let me step through it, despite having the symbols and the right revision checked out4.

So this meant a lot of try builds. The good news is that I was able to add a bunch of logging to try, and just kinda walk away while it built. A few hours later, I’d have my build, and I’d get my results. I wasn’t blocked waiting on it to build, so I could work on other things.

What did the logging reveal?

Our printing code sometimes shows this progress dialog when printing starts. It’s put up so that while we’re laying out the page for printing, the user knows that stuff is happening. We show this progress dialog on Linux and Windows, but not on OS X (I guess OS X users don’t expect such a progress window… this was a decision made way back in the day).

The printing engine in the content process sends up a message to the parent saying, “I’m all set to print, let’s show the progress dialog!”. On OS X, the parent responds, “Uh, nope” (which is a return code of NS_ERROR_NOT_IMPLEMENTED), and the child is supposed to go, “Oh okay, I’ll just print right away”.

But here’s the kicker – along with message to show the progress dialog, the child sends an outparam5. That outparam is sent because for some platforms that show the progress dialog, we want to wait until the dialog closes before starting the actual print. On other platforms, we may just want to start printing right away when laying out the page finishes without waiting for the dialog to close.

That outparam is a bool initialized to a value of false in the print engine, and in the non-e10s case where we return “Sorry, no progress dialog” with NS_ERROR_NOT_IMPLEMENTED, that bool stays false, and when it stays false, it means that we start printing right away.

My logging showed that for some reason, that value was getting flipped to true despite not being touched by (seemingly) anything in the IPC sender / receiver nor the print progress dialog backend (which still just returns NS_ERROR_NOT_IMPLEMENTED when asked to open the printing dialog).

What the hell?

Here’s the PSA

Ehsan helped me dig into what was going on, and we figured it out. Here’s the big lesson / PSA here:

outparams over IPC, when untouched on the receiver side, get filled with values from uninitialized memory when returned to the sender.

I’ll give you an example. Here’s some sorta-pseudocode:

#include <stdio.h>

void someFunc(bool* aOutParam) {
}

int main(int argc, char** argv) {
  bool myBool = false;
  someFunc(&myBool);
  if (myBool) {
    printf("Wait, WHAT?");
    return 1;
  }
  return 0;
}

We should never enter that “Wait, WHAT?” conditional block because myBool remains false throughout. It’s initialized to false, and even though we pass a pointer to it to someFunc, someFunc never touches it, so it stays false, and so we skip the conditional and return 0 as expected.

If, however, you did the same thing, but over IPC… well, now you’re in for a fun treat.

On the receiver side of the IPC message, here’s a chunk of the generated code that calls into RecvShowPrintProgress on the parent side:

            bool notifyOnOpen;
            bool success;
            int32_t id__ = mId;
            if \6, (&(success)))))) {
                mozilla::ipc::ProtocolErrorBreakpoint("Handler for ShowProgress returned error code");
                return MsgProcessingError;
            }

            reply__ = new PPrinting::Reply_ShowProgress(id__);

            Write(notifyOnOpen, reply__);
            Write(success, reply__);
            (reply__)->set_sync();
            (reply__)->set_reply();

That notifyOnOpen bool is the one that is the outparam in the child. Notice that it’s not being initialized to any value! That means its current value could be, well, anything. When we call into RecvShowProgress, we pass a pointer to that anything value. If RecvShowProgress doesn’t do anything with that pointer (like in the OS X case), well… that random value is what gets written to the IPC message that gets sent back to the child.

And somehow a clang upgrade made it far more likely that this value would evaluate to TRUE as a boolean instead of FALSE.

And so the child assumed that it’d have to wait for a dialog to close before starting to print – a dialog that would never open, because we don’t show it on OS X.

So the patch I eventually wrote initializes the outparam to false before calling into the OS X widget code that just returns NS_ERROR_NOT_IMPLEMENTED, and that fixed the bug.

It’s a small patch, but in my mind, it’s a big lesson, at least for me. I had assumed that outparams would work the same over IPC as they do in the same process, and that’s simply not the case.

I’ve filed bug 1220168 to try to make this sort of thing easier to spot in the future.


  1. Our automated testing story for printing is pretty bad. We test some of the built-in UI for printing, but as for actually sending stuff to printer drivers… zero automated tests, which explains why this had gone uncaught for so long. 

  2. If you’re wondering, I wasn’t printing out reams of paper to test this. XCode comes with a Printer Simulator, which is like a PDF printer, except that I can get a better sense of the communications being sent to the printer 

  3. Okay, there’s one in the office, but apparently trying to build on it is a slow nightmare 

  4. Happy to hear and share how to do this if someone will show me 

  5. Read this about passing multiple values back from a method call if you’re curious about what outparams are 

  6. !(RecvShowProgress(mozilla::Move(browser), mozilla::Move(printProgressDialog), mozilla::Move(isForPrinting), (&(notifyOnOpen 

The Joy of Coding (Ep. 15): OS X Printing Returns

In Episode 15, we kept working on the same bug as the last two episodes – proxying the printing dialog on OS X to the parent process from the content process. At the end of Episode 14, we’d finished the serialization bits, and put in the infrastructure for deserialization. In this episode, we did the rest of the deserialization work.

And then we attempted to print a test page. And it worked!

We did it!

Then, we cleaned up the patches and posted them up for review. I had a lot of questions about my Objective-C++ stuff, specifically with regards to memory management (it seems as if some things in Objective-C++ are memory managed, and it’s not immediately obvious what that applies to). So I’ve requested review, and I hope to hear back from someone more experienced soon!

I also plugged a new show that’s starting up! If you’re a designer, and want to see how a designer at Mozilla does their work, you’ll love The Design Hour, by Ricardo Vazquez. His design chops are formidable, and he shows you exactly how he operates. It’s great!

Finally, I failed to mention that I’m on holiday next week, so I can’t stream live. I have, however, pre-recorded a shorter Episode 16, which should air at the right time slot next week. The show must go on!

Episode Agenda

References

Bug 1091112 – Print dialog doesn’t get focus automatically, if e10s is enabled – Notes

The Joy of Coding (Ep. 14): More OS X Printing

In this episode, I kept working on the same bug as last week – proxying the print dialog from the content process on OS X. We actually finished the serialization bit, and started doing deserialization!

Hopefully, next episode we can polish off the deserialization and we’l be done. Fingers crossed!

Note that this episode was about 2 hours and 10 minutes, but the standard-definition recording up on Air Mozilla only plays for about 13 minutes and 5 seconds. Not too sure what’s going on there – we’ve filed a bug with the people who’ve encoded it. Hopefully, we’ll have the full episode up for standard-definition soon.

In the meantime, if you’d like to watch the whole episode, you can go to the Air Mozilla page and watch it in HD, or you can go to the YouTube mirror.

Episode Agenda

References

Bug 1091112 – Print dialog doesn’t get focus automatically, if e10s is enabled – Notes

The Joy of Coding (Ep. 13): Printing. Again!

Had to deal with some network issues during this video – sorry if people were getting dropped frames during the live show! I have personally checked this recording, and almost all frames are there.

The only frames that are missing are the ones where I scramble around to connect to the wired network, which was boring anyhow.

In this episode, I worked on proxying the print dialog from the content process on OS X. It was a wild ride, and I learned quite a bit about Cocoa stuff. It was also a throwback to my very first episode, where I essentially did the same thing for Linux!

We’ll probably polish this off in the next episode, or in the episode after.

Episode Agenda

References

Bug 1091112 – Print dialog doesn’t get focus automatically, if e10s is enabled – Notes

The Joy of Coding (Episode 4)

The fourth episode is up! Richard Milewski and I found the right settings to get OBS working properly on my machine, so this weeks episode is super-readable! If you’ve been annoyed with the poor resolution for past episodes, rejoice!

In this fourth episode, I solve a few things – I clean up a busted rebase, I figure out how I’d accidentally broken Linux printing, I think through a patch to make sure it does what I need it to do, and I review some code!

Episode Agenda

References:
Bug 1136855 – Print settings are not saved from print job to print job
Notes

Bug 1088070 – Instantiate print settings from the content process instead of the parent
Notes

Bug 1090448 – Make e10s printing work on Linux
Notes

Bug 1133577 – [e10s] “Open Link in New Tab” in remote browser causes unsafe CPOW usage warning
Notes

Bug 1133981 – [e10s] Stop sending unsafe CPOWs after the findbar has been closed in a remote browser
Notes