Overview of Limpid Implementation

The Nodes

All Nodes of Limpid are based on the same pattern of a node body holding a single pointer to an instance of NodeContent, or one of its subclasses (DocumentContent, ElementContent, etc). Whatever the type of content, it is instantiated on the heap. All node bodies therefore have the same general structure, but that they provide the behavioural, or interface, polymorphism of the node. The NodeContent provides the structural polymorphism, together with an intrusive reference count. When a node is copied, the body is copied and and the reference count in the content is incremented. When the node is deleted, the reference count is decremented and, if it is zero, the content is deleted. This approach has much in common with automatic and shared pointer objects and removes from the programmer the responsibility of managing pointers.

Moreover, the process of copying or assigning nodes is not much more onerous than working with pointers. To support polymorphism, each node body holds a vptr as well as the pointer to the content. Two pointers instead of one; not too bad.

The Node Content

As a reaction to the profligate use of memory by Java, I decided that all NodeContent footprints should be kept to a reasonable minimum. This, in turn, meant that the inclusion of non-built-in classes, or pointers to them, in node content should be avoided as far as possible. Now, the handling of strings is very important to the DOM and XML more generally, so a string class would have to be involved in determining such node characteristics as NodeName and NamespaceURI. Despite this, I decided to include all string data in node content as null-terminated character arrays (although the user never sees this).

As a result of this design decision, as little as 20-25% of the size of a typical DOM corresponds to character data. The rest is taken up by pointers to parent, document, siblings, etc. The precise figure depends on the document being modelled by the DOM, whether it has a large population of attributes and (very importantly) on the character representation chosen by the user.

Class Comparisons

A very important consequence of the relationship between node body and node content is the way in which nodes are compared. Java coders will recall the old beginner's trap of comparing strings by:

String string1 = "value";
String string2 = "value";

if (string1 == string2) ... more code

Now, the result of the test is , perhaps contrary to expectations. It is false because it is a comparison of references (pointers) and the pointers for two different strings are different. The correct code is:

String string1 = "value";
String string2 = "value";

if (string1.equals(string2)) ... more code

Limpid avoids this arcane trap: for DOMStrings, the DOMStrings are compared on a character-by-character basis and the code is simply:

DOMString string1 = "value";
DOMString string2 = "value";

if (string1 == string2) ... more code

In Java, Nodes are compared by reference, as with Strings. In this case it is correct. In Limpid, however, we are working with values (copies) of Nodes, and not with pointers. But we get the same outcome because of the way node content is shared:

Element originalElement("name");
// make a copy using the copy constructor
Element copyElement = originalElement;
if (copyElement == originalElement) ... more code

Now, Nodes are compared by comparing their internal pointers to node content. Given that copies of the same node have the same node content, they return true when compared, which is the expected result.

The NodeList and NamedNodeMap

The user of Limpid should never be concerned with using pointers or, in particular, the new keyword. Instead, only local instances should be used. When a local instance of (say) an Element is appended to (say) a second local Element, and this second element is returned as a , where is the first element that was its child?

When a node is appended to a parent, or an attribute set on an element, a copy of it is made internally and a pointer to the copy stored in the node list or map. For this to work, a copy() method is defined for each Node class. (This is essential for polymorphic copying.) Now copy() uses (wait for it!) the new keyword. Because new is used, a new node body is created on the heap, and this contains a pointer to the same code content as the original node being appended.

The class that invoked copy() (the node list or map) owns the pointer and is responsible for its management (the pointer is deleted by the destructor of the list or map).

To make things more complicated, the NodeList and NameNodeMap classes are reference-counted to maintain reference semantics, so the pointers to nodes are not deleted until the reference count of the owner list or map goes to zero.

Finally, after copy() has been invoked on the node being appended, there are at least two copies of the node body (one of which is on the heap). After the current function is exited, and called on the original (and local) body, there is only a single body (on the heap), owned by the node list and pointing to node content with a reference count of 1 (which is also on the heap). The upshot of all this is that we have not appended a node to the parent: we have appended a copy of the node. But, because of reference semantics, thecopy behaves just like the original node.

Now, to address the question of what happens to a child node when its parent is returned from a function: a copy of the child is returned with the parent, and that is that!

This description has been rather lengthy, but the process of copying, deleting and adjustment to the reference count in the node content is very simple, fast and economical of memory. But, most importantly, it is automatic and frees the coder from the burden of pointer management.

But Wait a Minute 1

Let's complicate the scenario with some code:

Element returnElement() {
  Element topElement("top");
  Element middleElement("newName");
  Element bottomElement("bottom");
  topElement.appendChild(middleElement);
  middleElement.appendChild(bottomElement);
  
  middleElement.setAttribute("attrName", "attrValue");
  
  return topElement  ;
}

void getElement() {
  Element myElement = returnElement();
}

Given that we add copies, rather than the original nodes, to parents, we could ask, for instance, whether middleElement is the parent of bottomElement. The answer is , but it does not matter because of reference semantics. A of middleElement is the parent of bottomElement but anything we do to middleElement is applied to the copy.

Now, this matter of parenthood becomes important after returnElement() exits. From that point, getElement() has a copy of topElement, with a child and grandchild, each of which is a copy. Now, before we leave returnElement(), the parent of bottomElement appears to be middleElement, but it is really a copy of middleElement. This gets rather confusing to talk about, and it is very reassuring that the user really should never worry about it. In fact, you would never know about the sleight of hand unless you go to the trouble of getting pointers to the various nodes. And that is what Limpid encourages you not to do.

But Wait a Minute 2

Let's complicate the scenario with some more code:

Element returnElement() {
  Element topElement("top");
  Element middleElement("newName");
  Element bottomElement("bottom");
  topElement.appendChild(middleElement);
  middleElement.appendChild(bottomElement);
  
  middleElement.setAttribute("attrName", "attrValue");
  
  return topElement  ;
}

void changeElement(Element& rootElement) {
  Element newElement("replacement");
  myElement.replaceNode(rootElement.getFirstChild(), newElement);
}

Element getElement() {
  Element myElement = returnElement();
  Element keptElement = myElement.getFirstChild();
  
  changeElement(myElement);
  return keptElement;
}

In this example, we use returnElement() to construct a small element tree. We then make a copy of middleElement, using the copy constructor, and then use changeElement() to replace middleElement. What have we achieved?

In getElement() we have a local copy of middleElement. (A copy of this is returned when getElement() exits.)

In changeElement() we create newElement and use this to replace another copy of middleElement. A copy of newElement replaces the copy of middleElement in the tree. We started with a tree of 3 elements, but how many elements are in the tree returned by getElement. The answer is only 2.

When we replaced the middleElement copy in changeElement, we replaced the copy and all of its children and attributes. The copy of middleElement (keptElement) made in getElement() has the attributes and child of middleElement, and all of these are returned when getElement() exits. The final question is about the copy of bottomElement before and after getElement() exits. Now keptElement is constructed before changeElement() is invoked. When the copy of middleElement is replaced in changeElement() (or, equivalently, if it had been removed) a pointer (ownerNode) in its node content is cleared. Now, this pointer is used by any children (in this case, by bottomNode) to determine their parent (if any). So, immediately after replaceChild(...), the ownerNode pointer is NULL and the bottomElement copy is unaware of its parent.

This silly situation is not lethal, but is one that the user must be aware of. The solution is simply to assign or copy immediately after removal or replacement. The important details are:

So, in this particular case, the fact that we returned a of middleElement from getElement(), which means that we automatically invoked a copy constructor on it, means that the returned copy is automatically made the owner node. All complicated to talk about, but little problem in practice.

The following (but still useless) code illustrates this point, and introduces to Java coders the (intentionally) ugly C++ static cast.

Element returnElement() {
  Element topElement("top");
  Element middleElement("newName");
  Element bottomElement("bottom");
  topElement.appendChild(middleElement);
  middleElement.appendChild(bottomElement);
  
  middleElement.setAttribute("attrName", "attrValue");
  
  return topElement  ;
}

Element getElement() {
  Element myElement = returnElement();
  Node& node =
    myElement.replaceNode(rootElement.getFirstChild(), newElement);
  Element keptElement =
    static_cast<Element &>(node); // it had better be an element!  
  
  return keptElement;
}

But Wait a Minute 3

But we don't have to worry so much about such cases, as Limpid does it for us.

Element returnElement() {
  Element topElement("top");
  Element middleElement("newName");
  Element bottomElement("bottom");
  topElement.appendChild(middleElement);
  middleElement.appendChild(bottomElement);
  
  middleElement.setAttribute("attrName", "attrValue");
  
  return topElement  ;
}

Document getElement() {
  Element myElement = returnElement();
  Node& node =
    myElement.replaceNode(rootElement.getFirstChild(), newElement);
  
  Document doc;
  // this works for all nodes that can
  // be appended to docdoc.appendNode(node);
  return doc  ;
}

Here we take advantage of the fact that, when we append a node to a parent, a is actually added. This means using a copy constructor and this means that the owner node of the child node is updated.

But Wait a Minute 4

That's all very well for the updating of elements, but what about the updating of the ownerNode pointer for documents? This is certainly a special case, but a Document is still a parent node, just like Element, Attribute and Entity. The difficulty is that a document is never appended to another parent node, with automatic updating of the owner node. What happens when we have more than one copy of a document and we destroy the document that is the owner node?

This requires special handling, using a doubly linked list of pointers to the different copies. As necessary, control of the document content is passed on to another copy. The user should never worry about this.