May 9, 2014
DOM Object Reflection: How does it work?

I started writing a bug comment and it turned out to be generally useful, so I turned it into this blog post.

Let’s start by defining some vocabulary:

DOM object - any object (not just nodes!) exposed to JS code running in a web page.  This includes things that are actually part of the Document Object Model, such as the document, nodes, etc, and many other things such as XHR, IndexedDB, the CSSOM, etc.  When I use this term I mean all of the pieces required to make it work (the C++ implementation, the JS wrapper, etc).

wrapper - the JS representation of a DOM object

native object - the underlying C++ implementation of a DOM object

wrap - the process of taking a native object and retrieving or creating a wrapper for it to give to JS

IDL property - a property on a wrapper that is “built-in”. e.g. ‘nodeType’ on nodes, ‘responseXML’ on XHR, etc.  These properties are automatically defined on a wrapper by the browser.

expando property - a property on a wrapper that is not part of the set of “built-in” properties that are automatically reflected.  e.g. if I say “document.khueyIsAwesome = true” ‘khueyIsAwesome’ is now an expando property on ‘document’. (sadly khueyIsAwesome is not built into web browsers yet)

I’m going to ignore JS-implemented DOM objects here, but they work in much the same way: with an underlying C++ object that is automatically generated by the WebIDL code generator.

A DOM object consists of one or two pieces: the native object and potentially a wrapper that reflects it into JS.  Not all DOM objects have a wrapper.  Wrappers are created lazily in Gecko, so if a DOM object has not been accessed from JS it may not have a wrapper.  But the native object is always present: it is impossible to have a wrapper without a native object.

If the native object has a wrapper, the wrapper has a “strong” reference to the native.  That means that the wrapper exerts ownership over the native somehow.  If the native is reference counted then the wrapper holds a reference to it.  If the native is newed and deleted then the wrapper is responsible for deleting it.  This latter case corresponds to “nativeOwnership=’owned’” in Bindings.conf.  In both cases this means that as long as the wrapper is alive, the native will remain alive too.

For some DOM objects, the lifetimes of the wrapper and of the native are inextricably linked.  This is certainly true for all “nativeOwnership=’owned’” objects, where the destruction of the wrapper causes the deletion of the native.  It is also true for certain reference counted objects such as NodeIterator.  What these objects have in common is that they have to be created by JS (as opposed to, say, the HTML parser) and that there is no way to “get” an existing instance of the object from JS.  Things such as NodeIterator and TextDecoder fall into this category.

But many objects do not.  An HTMLImageElement can be created from JS, but can also be created by the HTML parser, and it can be retrieved at some point later via getElementById.  XMLHttpRequest is only created from JS, but you can get an existing XHR via event.target of events fired on it.

In these cases Gecko needs a way to create a wrapper for a native object.  We can’t even rely on knowing the concrete type.  Unlike constructors, where the concrete type is obviously known, we can’t require functions like getElementById or getters like event.target to know the concrete type of the thing they return.

Gecko also needs to be able to return wrappers that are indistinguishable from JS for the underlying native object.  Calling getElementById twice with the same id should return two things that === each other.

We solve these problems with nsWrapperCache.  This is an interface that we can get to via QueryInterface that exposes the ability to create and retrieve wrappers even if the caller doesn’t know the concrete type of the DOM object.  Overriding the WrapObject function allows the derived class to create wrappers of the correct type.  Most implementations of WrapObject just call into a generated binding function that does all the real work.  The bindings layer calls WrapObject and/or GetWrapper when it receives a native object and needs to hand a wrapper back to a JS caller.

This solves the two problems mentioned above: the need to create wrappers for objects that we don’t know the concrete type of and the need to make object identity work for DOM objects.  Gecko actually takes the latter a step further though.  By default, nsWrapperCache merely caches the wrapper stored in it.  It still allows that wrapper to be GCd.  GCing wrappers can save large amounts of memory, so we want to do it when we can avoid breaking object identity.  If JS does not have a reference to the wrapper then recreating it later after a GC does not break a === comparison because there is nothing to compare it to.  The internal state of the object all lives in the C++ implementation, not in JS, so don’t need to worry about the values of any IDL properties changing.

But we do need to be concerned about expando properties.  If a web page adds properties to a wrapper then if we later GC it we won’t be able to recreate the wrapper exactly as it was before and the difference will be visible for that page.  For that reason, setting expando properties on a wrapper triggers “wrapper preservation”.  This establishes a strong edge from the native object to the wrapper, ensuring that the wrapper cannot be GCd until the native object is garbage.  Because there is always an edge from the wrapper to the native object the two now participate in a cycle that will ultimately be broken by the cycle collector.  Wrapper preservation is also handled in nsWrapperCache.

tl;dr

DOM objects consist of two pieces, native objects and JS wrappers.  JS wrappers are lazily created and potentially garbage collected in certain situations.  nsWrapperCache provides an interface to handle the three aspects of working with wrappers:

  1. Creating (or recreating) wrappers for native objects whose concrete type may not be known.
  2. Retrieving an existing wrapper to preserve object identity.
  3. Preserving a wrapper to prevent it from being GCd when that would be observable by the web page.

And certain types of DOM objects, such as those with native objects that are not reference counted or those that can only be constructed, and never accessed through a getter or a function’s return value, do not need to be wrapper cached because the wrapper cannot outlive the native object.

  1. khuey posted this
blog comments powered by Disqus