August 4, 2013
Cycle Collector on Worker Threads

Yesterday I landed Bug 845545 on mozilla-inbound.  This is the completion of the second step of our ongoing effort to make it possible to write a single implementation of DOM APIs that can be shared between the main thread and worker threads.  The first step was developing the WebIDL code generator which eliminated the need to write manual JSAPI glue for every DOM object in workers.  The next step will be discarding the separate worker thread DOM event implementation in favor of the one used on the main thread.  Once complete these changes will allow us to make DOM APIs such as WebGL, WebSockets, and many others work in web workers without writing a separate C++ implementation.

Read More

June 12, 2013
Cycle Collection

I gave a talk about the cycle collector at the Rendering Meetup in Taipei last month.  Chris Pearce had the foresight to record it, and now it is available on people.m.o for anyone who wants to watch it.

September 22, 2012
Refcounting thread-safety assertions are now fatal on mozilla-central

Gecko has long had assertions to verify that XPCOM objects are AddRefed/Released on the right thread.  Today I landed Bug 753659 which makes those assertions fatal (using MOZ_ASSERT).  This makes these assertions noticeable on test suites that do not check assertion counts (namely mochitest).  It also ensures that developers will notice these assertions when testing locally.  Remember that any time you see one of these assertions you are seeing a potential sg:crit (via a use-after-free on an object that’s reference count is too low due to AddRef racing with another operation) and should file and fix it immediately.

July 19, 2012
Cycle Collection

We don’t really have a comprehensive and current overview of the cycle collector and how to use it anywhere, so I wrote this.  This is probably part 1 of a multipart series, as I’ve only convered the simple cases here.

What?

The cycle collector is sort of like a garbage collector for C++.  It solves the fundamental problem of reference counting: cycles.  In a naive reference counting system, if A owns B and B owns A, neither A nor B will ever be freed.  Some structures in Gecko are inherently cyclic (e.g. a node tree) or can very easily be made cyclic by code beyond our control (e.g. most DOM objects can form cycles with expando properties added by content script).

The cycle collector operates on C++ objects that “opt-in” to cycle collection and all JS objects.  It runs a heavily modified version of Bacon and Rajan’s synchronous cycle collection algorithm. C++ objects opt-in by notifying the cycle collector when they may be garbage.  When the cycle collector wakes up it inspects the C++ objects (with help from the objects themselves) and builds a graph of the heap that participates in cycle collection.  It then finds the garbage cycles in this graph and breaks them, allowing the memory to be reclaimed.

Why?

The cycle collector makes developing Gecko much simpler at the cost of some runtime overhead to collect cycles.  Without a cycle collector, we would have to either a) manually break cycles when appropriate or b) use weak pointers to avoid ownership cycles.  These add significant complexity to modifying code and make avoiding memory leaks and use-after-free errors much harder. 

When?

C++ objects need to participate in cycle collection whenever they can be part of a reference cycle that is not guaranteed to be broken through other means.  C++ objects also need to participate in cycle collection if they hold direct references to objects that are managed by the JavaScript garbage collector (a jsval, JS::Value, JSObject*, etc.).

In practice, this means most DOM objects need to be cycle collected.

  • Does the object inherit from nsWrapperCache (directly or indirectly)?  If so, it must be cycle collected.
  • Does the object have direct references to JavaScript values (jsval, JS::Value, JSObject*, etc)?  If so, it must be cycle collected.  Note that interface pointers to interfaces implemented by JavaScript (e.g. nsIDOMEventListener) do *not* count here.
  • Does the object hold no strong references (e.g. it has no member variables of type nsCOMPtr or nsRefPtr, it has no arrays of those (nsTArray<nsCOMPtr>, nsTArray<nsRefPtr>, or nsCOMArray), no hashtables of them (nsInterfaceHashtable, nsRefPtrHashtable), and does not directly own any object that has these (via new/delete or nsAutoPtr))?  If so, it does not need to be cycle collected.
  • Is the object threadsafe (e.g. an nsRunnable, or something that uses the threadsafe AddRef/Release macros)?  Threadsafe objects cannot participate in cycle collection and must break ownership cycles manually.
  • Is the object a service or other long lived object?  Long lived objects should break ownership cycles manually.  Adding cycle collection may prevent shutdown leaks, but it will just replace that with a leak until shutdown, which is just as bad but doesn’t show up on our tools.
  • Does the object hold strong references to other things that are cycle collected?  If so, and the object does not have a well-defined lifetime (e.g. it can be accessed from Javascript) it must be cycle collected.
  • Does the object have strong references only to other things that are not cycle collected (e.g. interfaces from XPCOM, Necko, etc)?  If so, it probably does not need to be cycle collected.
  • Can the object be accessed from Javascript?  Then it probably needs to be cycle collected.

The last two are kind of vague on purpose.  Determining exactly when a class needs to participate in cycle collection is a bit tricky and involves some engineering judgement.  If you’re not sure, ask your reviewer or relevant peers/module owners.

How?

C++ objects participate in cycle collection by:

  1. Modifying their reference counting to use the cycle collector.
  2. Implementing a “cycle collection participant”, a set of functions that tell the cycle collector how to inspect the object.
  3. Modifying their QueryInterface implementation to return the participant when asked.

Like many things in Gecko, this involves lots of macros.

The reference counting is modified by replacing existing macros:

  • NS_DECL_ISUPPORTS becomes NS_DECL_CYCLE_COLLECTING_ISUPPORTS.
  • NS_IMPL_ADDREF becomes NS_IMPL_CYCLE_COLLECTING_ADDREF.
  • NS_IMPL_RELEASE becomes NS_IMPL_CYCLE_COLLECTING_RELEASE.

The cycle collection participant is a helper class that provides up to three functions:

  • A ‘Trace’ function is provided by participants that represent objects that use direct JavaScript object references.  It reports those JavaScript references to the cycle collector.
  • A ‘Traverse’ function is provided by all participants.  It reports strong C++ references to the cycle collector,
  • An ‘Unlink’ function is provided by (virtually) all participants.  It clears out both JavaScript and C++ references, breaking the cycle.

The cycle collection participant is implemented by placing one of the following macros in the header:

  • NS_DECL_CYCLE_COLLECTION_CLASS is the normal choice.  It is used for classes that only have C++ references to report.  This participant has Traverse and Unlink functions.
  • NS_DECL_CYCLE_COLLECTION_CLASS_AMBIGUOUS is a version of the previous macro for classes that multiply inherit from nsISupports.
  • NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS is used for classes that have JS references or a mix of JS and C++ references to report.  This participant has Trace, Traverse, and Unlink methods.
  • NS_DECL_CYCLE_COLLECTION_SCRIPT_HOLDER_CLASS_AMBIGUOUS is the ambiguous version of the previous macro.

And by doing one of the following in the cpp file:

  • For very simple classes, that don’t have JS references and only have nsCOMPtrs, you can use the NS_IMPL_CYCLE_COLLECTION_N macros, where N is the number of nsCOMPtrs the class has.
  • For classes that almost meet the above requirements, but inherit from nsWrapperCache, you can use the NS_IMPL_CYCLE_COLLECTION_WRAPPERCACHE_N macros, where N is the number of nsCOMPtrs the class has.
  • Otherwise, use the NS_IMPL_CYCLE_COLLECTION_CLASS macro and separate macros to implement the Traverse, Unlink, and Trace (if appropriate) methods.  To implement those, use the NS_IMPL_CYCLE_COLLECTION_[TRAVERSE|UNLINK|TRACE]_* macros to construct Traverse, Unlink, and Trace methods.

April 26, 2012
Fixing the Memory Leak

The MemShrink effort that has been underway at Mozilla for the last several months has substantially decreased the memory usage of Firefox for most users.  There are still some remaining issues that lead to pathological memory use.  One of those issues is leaky addons, which Nick has identified as the single most important MemShrink issue.

In Firefox, the JavaScript heap is split into compartments.  Firefox’s UI code, which is written in JS, lives in the privileged “chrome” compartment.  Addon code also usually lives in the chrome compartment.  Websites live in different, unprivileged compartments.  Exactly how compartments are allocated to websites is beyond the scope of this article, but at the time of writing there is roughly one compartment per domain.  Code running in the chrome compartment can hold references to objects in the content compartments (much like how a page can hold references to objects in an iframe).

For example of how this might look in practice, lets imagine we have Firefox open to three tabs: GMail, Twitter, and Facebook, and we have some sort of social media addon installed.  Our compartments might look something like this:

Where the blue lines are the references the Firefox UI is holding and the red lines are the references the addon is holding.

The problems start to arise if these references aren’t cleaned up properly when the tab is navigated or closed.  If the Facebook tab is closed, but not all of those references are cleaned up, some or all of the memory the Facebook tab was using is not released.  The result is popularly known as a zombie compartment, and is a big source of leaks in Firefox.

Chrome (privileged UI or other JS) code that leaks is particularly problematic because the leak usually persists for the lifetime of the browser.  When chrome code leaks, say, facebook.com, it leads to dozens of megabytes of memory being lost.  It turns out that writing chrome code that doesn’t leak can actually be quite difficult.  Even the Firefox front end code, which is worked on by a number of full time engineers and has extensive code review, has a number of leaks.  We can find and fix those, but addons are a much harder problem, and we can’t expect addon authors to be as diligent as we try to be in finding and fixing leaks.  The only defense we have had is the AMO review team and our list of best practices.

That changed last night when I landed Bug 695480.  Firefox now attempts to clean up after leaky chrome code.  My approach takes advantage of the fact that chrome code lives in a separate compartment from web page code.  This means that every reference from chrome code to content code goes through a cross-compartment wrapper, which we maintain in a list.  When the page is navigated, or a tab is closed, we reach into chrome compartment and grab this list.  We go through this list and “cut” all of the wrappers that point to objects in the page we’re getting rid of.  The garbage collector can then reclaim the memory used by the page that is now gone.

The result looks something like:

Code that accidentally (or intentionally!) holds references to objects in pages that are gone will no longer leak.  If the code tries to touch the object after the wrapper has been “cut”, it will get an exception.  This may break certain code patterns.  A few examples:

  • Creating a DOM node from a content document and storing it in a global variable for indefinite use.  Once the page you created the node from is closed your node will vanish.  Here’s an example of code in Firefox that used to do that.
  • Creating a closure over DOM objects can break if those objects can go away before the closure is invoked.  Here’s some code in Firefox that did that.  In one of our tests in our test suite the tab closed itself before the timeout ran, resulting in an exception being thrown.

Addon authors probably don’t need to bother changing anything unless they see breakage.  Breakage should be pretty rare, and the huge upside of avoided leaks will be worth it.  It’s a little early to be sure what effects this will have, but the amount of leaks we see on our test suite dropped by 80%.  I expect that this change will also fix a majority of the addon leaks we see, without any effort on the part of the addon authors.

February 23, 2012
Address Space Layout Randomization now mandatory for binary components

This evening I landed Bug 728429 on mozilla-central.  Firefox will now refuse to load XPCOM component DLLs that do not implement ASLR.  ASLR is an important defense-in-depth mechanism that makes it more difficult to successfully exploit a security vulnerability.  Firefox has used ASLR on its core components for some time now, but many extensions that ship with binary components do not.

ASLR is on by default on modern versions of Visual Studio, so extension authors will only need to ensure that they haven’t flipped the switch to turn it off.  MSDN documentation on ASLR options is available here.  Further reading about the benefits of ASLR is available here.

If no unexpected problems arise, this change will ship in Firefox 13.

December 19, 2011
Pushing Compilers to the Limit (and Beyond)

At the end of the first week of December Firefox exceeded the memory limits of the Microsoft linker we use to produce our highly optimized Windows builds.  After the problem was identified we took some emergency steps to ensure that people could continue to land changes to parts of Firefox not affected by this issue by disabling some new and experimental features.  Once that was complete we were able to make some other changes that reduced the memory used by the linker back below the limits.  We were then unable to undo those emergency steps and turn those features back on.

This will have no lasting impact on what is or is not shipped in Firefox 11.  The issues described here only affected Firefox developers, and have nothing to do with the memory usage or other performance characteristics of the Firefox binaries shipped to users.

Technical Details

Recently we began seeing sporadic compilation failures in our optimized builds on Windows.  After some debugging we determined that the problem was that the linker was running out of virtual address space.  In essence, the linker couldn’t fit everything it needed into memory and crashed.

The build configuration that was failing is not our normal build configuration.  It uses Profiled Guided Optimization, fancy words meaning that it runs some benchmarks that we give it and then uses that information to determine what to optimize for speed and what optimizations to use.  It also uses Link-Time Code Generation, which means that instead of the traditional compilation model where the compiler generates code and the linker glues it all together the linker does all of the code generation.  These two optimization techniques are quite powerful (they generally win 10-20% on various benchmarks that we have) but they require loading source code and profiling data for most of Firefox into RAM at the same time.

Once we identified the problem we took emergency steps by disabling SPDY support and the Graphite font subsystem, both new features that had been landed recently and were turned off by default (in other words, users had to use an about:config preference to turn them on).  This allowed us to reopen the tree for checkins that did not touch code that ends up in xul.dll (this allowed work to proceed on the Firefox UI, the Javascript engine, and a few other things).

We then disabled Skia (which is being used as an experimental <canvas> backend) and separated video codecs and parts of WebGL support into a separate shared library.  This work decreased the linker’s memory usage enough to resume normal development and turn SPDY back on.  The medium term solution is to start doing our 32 bit builds on 64 bit operating systems so that the linker can use 4 GB of memory instead of 3 GB of memory, and to separate pieces of code that aren’t on the critical startup path into other shared libraries.

Frequently Asked Questions:

  • Why don’t you just get machines with more RAM? - The problem is not that the linker was running out of physical memory, but that it was running out of virtual memory.  A 32 bit program can only address 2^32 bytes (4GB) of memory, regardless of how much memory is in the machine.  Additionally, on 32 bit Windows, the last 1 GB is reserved for the kernel, so a program is really limited to 3 GB of memory.
  • Ok, so why don’t you just use a 64 bit linker? - Unfortunately there is no 64->32 bit cross compiler provided with the Microsoft toolchain so you can’t generate binaries that run on 32 bit systems with a 64 bit compiler.
  • Sure you can, just use -MACHINE:X86 on the linker! - You can have the 64 bit linker link 32 bit binaries, but this is incompatible with Link-Time Code Generation.
  • Is Firefox bloated? - Firefox’s size and linker memory usage compares favorably with other browsers. These problems are not a reflection on which browsers are or are not bloated, but rather on how resource intensive it is to do whole program optimization across a large C++ codebase.

September 29, 2011
Using XHR.onload/etc in addons

I just landed https://bugzilla.mozilla.org/show_bug.cgi?id=687332 on mozilla-central which makes some changes to how .onfoo event listeners are handled on some DOM objects (including XHR).  These changes mean it is no longer possible to use .onfoo event listeners from JS scopes where the global object is not a Window, or from C++.  The correct way to listen for events from these scopes is to use .addEventListener.

This will likely affect a number of addons (particularly for XHR).  Addons that use XHR in XPCOM components should check to see if they are affected.  We may consider implementing some sort of a compatibility hack for XHR if that number is large.

August 10, 2011
xpidlc is dead. Long live pyxpidl.

Today I landed Bug 458936 which moves from using xpidlc to generate xpcom typelibs to new python code.  With that, and other work by people including Ted Mielczarek, Mike Hommey, and Benjamin Smedberg, Firefox is now built without ever invoking the binary xpidl.

The remaining pieces of work here are:

  • Migrate comm-central to the new python tools (interfaces in comm-central are still compiled with xpidlc)
  • Package the python xpidl into the Gecko SDK.
  • Stop building the binary xpidl entirely and remove it from the tree.
  • Remove our build time dependencies on libIDL, etc.

July 2, 2011
Mork is finally gone

It’s not even going to be worthy of a footnote compared to the other awesome things making Firefox 7, but I landed Bug 578268 this morning, which removes the last vestiges of Mork (specifically morkreader) from Firefox.

Liked posts on Tumblr: More liked posts »