The Limpid Container Classes

Introduction

The tree structure of the W3C DOM is supported by the NodeList class and the attributes of Elements by NamedNodeMap. They differ in two important respects:

A NodeList (see below) is the mechanism used by a parent node to hold information about its child nodes. Internal DOM mechanisms prevent it from holding more than one copy of a node. An NodeList, on the other hand, can hold multiple copies because it is simply a container for references to nodes that exist elsewhere. A NamedNodeMap can never hold multiple copies of a node; if an attempt is made to add a second copy (with certain qualifications, see below), the attempt either fails or the existing copy is overwritten. NamedNodeMap therefore enforces uniqueness by some criterion.

The data model of XPath requires another container class, NodeSet, which introduces a combination of both order and uniqueness. Specifically, it requires that, when is used to access nodes in a loop, it retrieves the nodes in document order, regardless of the order in which they were added to the NodeSet.

Implementation

Despite its name, NodeList is implemented as a wrapper for the standard library container class std::vector<Node*>, in order to support indexed access. Perhaps surprisingly, NamedNodeMap also uses a vector. This decision was reached slowly, after many prototypes using the more intuitive std::map<DOMString, Node*>. NodeSet, on the other hand, uses std::map<size_t, Node*>.

NodeList

There are two types of NodeList, depending on the constructor used. The type used automatically when you append a child node to a parent (Element, Attribute, Document or Entity) is and uses the NodeList(Node& parent) constructor. You will not be aware of this, because it is all automatic and under the wraps.

The visible constuctor for NodeList has no parameter, and is typically used to collect nodes that match some sort of criterion (node type, node name or somesuch). The difference is that no parameter means no owner node, otherwise the parameter (parent) node owns the list.

Now this is the basis of the automatic Limpid mechanism for adding pointers to copies of nodes, rather than to the nodes themselves, during the build of a document or subtree. Because the NodeList is owned, a copy is made.

If the NodeList is not owned, a copy is not made. This is the correct sort of behaviour when a NodeList is simply being used for a temporary container for selected nodes.

NamedNodeMap

The concept of ownership also applies to NamedNodeMap, in the same way as to NodeList. The main use for NamedNodeMap is as a container for attributes of an element, but it is also more generally as a container for entities and notations in DocumentType.

The simple concept of NamedNodeMap as a map keyed by a node name became complicated when support for namespaces was added in DOM Level 2. For example, it is now possible to set an attribute in two ways:

element.setAttribute("attName", "attrValue");
element.setAttributeNS("myURI", "attrName","attrValue");

Setting an attribute is perhaps more complicated than it appears. Before an attribute is set, a search is carried out to determine whether there is already one in the container that has either the same qualified name or a combination or the same qualified name and the same namespaceURI. If both setAttribute(...) and setAttributeNS(...) are used on the same NamedNodeMap, it is possible that there could be two different Attributes with the same qualified name and different namespaceURI. (This seems to me to be messy, and probably because the Level 2 DOM has simply bolted namespaces onto Level 1.)

Because these complications have made me feel rather insecure, I have not used an std::map as inner container, but an std::vector instead. This requires, then, a loop to do the comparisons. And this means getting a match in linear time, rather than logarithmic time with a map. But, with sensible numbers of attributes (less than about 10), it would be difficult to detect a performance difference. In any case, too, using a map with a DOMString key would bulk up the size of a NamedNodeMap instance.

DocumentOrder and NodeSet

In many cases, nodes are implicity added to a NodeList in document order. An example is the use of a Collector to return a NodeList of nodes that satisfy some test. A collector uses either a NodeStream or Traverser to move through a document or subtree, and it does this by a depth-first search. And depth-first order is document order, which means that the matching nodes are added to the NodeList in document order.

On the other hand, nodes may be assembled by repeated, or parallel, searches.

Context context;
  context.setNode(root);
  // here root is a reference to the root node from which the collection starts
  
  Expr expr = XPathFactory::getExpr("ACT | PROLOGUE | EPILOGUE");
  NodeSet resultSet = xpath.getNodeSet(context);

  // or more concisely:
  NodeSet resultSet = XPathFactory::getExpr("ACT | PROLOGUE | EPILOGUE").getNodeSet(context);

Here, we need control over the order of the collected nodes, otherwise we will collect all the ACT instances before the PROLOGUE instances before the EPILOGUE. NodeSet is able to do this because it knows about node position. Node position is a device used in Limpid to establish the relative positions in the document. It does this by setting an unsigned integer on each node.

If a node is created by parsing an XML document, this position is the line number on which the node is defined. This is useful in validation, as it allows a ValidationProcessor to report where an error is encountered. When a node is created by hand code, however, this option is not available and the node position is set to 0.

A common situation is one where a document is loaded from XML text, altered by hand code and then saved to a file. If the document is to be processed without saving and reloading, it is useful to set the positions of the new nodes in the context of the total document. This is done very easily:

document.reorder();

When a node is added to a NodeSet instance, it is added to an internal map that uses as a key the position of the node. Therefore, it is necessary to use reorder() if there have been any structural alterations of the document because hand-coded nodes have a position of 0. Reordering is extremely fast: reordering of the whole of "Hamlet", with over 11500 nodes, could not be timed accurately, as it was under 10 msec on my computer.

There is a slight deficiency in this arrangement. If a NodeSet has been used to collect nodes, some of which are attributes, the results may be puzzling. This is because, when an XMLParser constructs an element with attributes, it assigns the same position to the element and the attributes if they happen to be on the same line of the source file. Therefore, if more than one of them are to be added to a NodeSet, only the first will be successful. The solution to this is to reorder the document before collecting the NodeSet. Reordering assigns unique positions to all nodes, including attributes, and, as indicated, it has very little performance penalty.