Jade internals
PrevJade internalsNext

General overview

Program structure

Most of SP and Jade is user-interface independent: it doesn't know whether it's being run from the command line or from a GUI. The code is organized in several layers.

  1. The lowest layer is a general purpose class library (mostly template based), which is independent of SGML/XML.

  2. The next layer is a general concept of an entity manager, which is basically an interface to set of services to an SGML parser; basically it's everything that the SGML standard leaves undefined or makes system-dependent for an SGML parser. This layer includes the message reporting API (MessageReporter, Message), catalog API (EntityCatalog), character set API (CharsetInfo), and the entity manager proper (EntityManager). Template instantations for this are in entmgr_inst.m4.

  3. Dependent on the first two layers is the core SGML parser. This only implements the behaviour defined in the SGML standard. Main public classes are SgmlParser and Event. Template instantations for this are in parser_inst.m4.

  4. The architectural forms engine. It depends on the SGML parser. Main class is ArcEngine. Template instantiations are in arc_inst.m4.

  5. An implementation of the entity manager interface. This doesn't depend on the SGML parser or achitectural forms engine. Main class is ExtendEntityManager. This for example determines what the syntax of a system identifier is. Template instantiations for this are in xentmgr_inst.m4.

  6. A generic interface to groves; in grove/Node.h. This doesn't depend on any of the previous layers.

  7. An implementation of the grove interface using SP; this is in the spgrove/ directory. Main class is GroveBuilder. This doesn't depend on the implementation of the entity manager interface.

  8. An implementation of the DSSSL style language (the tree construction part, not the formatting part). This is in the style/ directory. There are really two sub parts:

    1. Packaging of the DSSSL stylesheet as an SGML document using architectural forms. Main classes are DssslSpecEventHandler and StyleEngine. This doesn't depend on the implemenation of the entity manager and grove interfaces.

    2. Processing of the contents of the elements in the DSSSL stylesheet; this depends only on the entity manager and grove interfaces. The main interface here is FOTBuilder which is the interface between the tree construction process and the formatter.

  9. Multiple implementations of the FOTBuilder interface (the backends).

Parallel to the hierarchy of layers is a hierarchy of convenience classes that collect together various pieces in a convenient way for command line apps.

  1. CmdLineApp is the lowest level and depends only on the general purpose class library.

  2. EntityApp additionally depends on the entity manager interface and implementation; it's a convenience class for accessing the functionality of the entity manager with a command line program.

  3. ParserApp additionally depends on the SGML parser; it packages the parser together with the entity manager for use in a command line program.

  4. GroveApp additionally depends on the grove interface and implementation; this packages the functionality of the grove builder in a convenient way for command line apps.

  5. DssslApp additionally depends on DSSSL style language implementation, tieing it to the grove implementation; it is packaging up the functionality of the DSSSL tree construction in a way suitable for command line apps.

  6. JadeApp additionally depends on the backends.

Other important classes

Short descriptions of several central classes in the DSSSL style language implementation, some of which have not yet been mentioned.

StyleEngine

main class of the style library. DssslApp uses an instance of this class to process the grove.

Interpreter

contains all the stylesheet-related state: there are no global variables. StyleEngine owns an instance of this class.

SchemeParser

parses a part of a DSSSL spec, creating expression language objects and binding variables using a given Interpreter. StyleEngine uses instances of this class to parse the parts of its spec.

ProcessContext

holds the current state of the processing of a grove.

VM

represents the state of the virtual machine that implements the expression language.

Expression

Expressions are the result of parsing expression language constructs. They are compiled to Insns.

Insn

an instruction for the virtual machine.

When an instruction is executed it modifies the state of the virtual machine (usually) and then returns the next instruction to be executed. Returning a null Insn terminates execution. Thus the inner loop of the expression evaluator is in a member function of VM and looks like:

  
while (insn)
    insn = insn->execute(*this);
ELObj

the abstract base class for all expression language types.

Garbage collection

For further information on the garbage collection technique used in Jade, look at ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps and ftp://ftp.netcom.com/pub/hb/hbaker/NoMotionGC.html.

Basically it works like a copying collector, but the copying is logical rather than physical. There are two doubly-linked lists one for each of the two spaces of a copying collector. Every object is one of these two lists. There's also a bit (the "color") which says which space it is in. To "copy" an object from one space to another, it is unlinked from one list, linked into the other, and its color is flipped. A key point is that unlike normal copying collectors, this collector never changes the address of a GC object.

This is a simplification. It is optimized so that there is one big circular list of all objects. A pointer into the list separates the allocated from the free list. Allocating just moves the pointer along the list.

Garbage collection starts with a set of root objects (more on this later). It finds all objects reachable from this set of root objects. All objects not reachable are considered garbage and are put on the free list where they can be reused. If garbage collection doesn't free up enough objects, then more memory is allocated from the system.

There are a couple of twists beyond what's described in the Wilson paper:

  1. It supports finalization (the ability to call an GC object's destructor when the object is GCed). All finalizable objects occur before non-finalizable objects in the allocated list. The garbage collection arranges so that immediately after completing the copy part of a garbage collection, the objects needing finalization are at the head of the free list, thus allowing the collector to efficiently perform finalization.

  2. Objects which are created during the parsing of the stylesheet, which can never become garbage during the processing of the source document, are separated off into a separate area (these are called "permanent"). All objects reachable from a permanent object must themselves be permanent.

  3. It has the concept of an object being read-only: it can mark an object and all objects reachable from that object as being read-only (needed for Jade extensions which allow limited mutation of objects).

  4. It always allocates a fixed amount of space for a GC object; so the sizeof() any object derived from ELObj must be ≤ this space. How big is it? On a 32-bit machine there is space for 16 bytes (eg 4 pointers, or a double+int+pointer) beyond what is used by the ELObj itself. On a 64-bit machine it will be about twice that; maxObjSize() in style/Interpreter.cxx figures it out at runtime, so to be safe add any new types of ELObj to the table in maxObjSize(). But make sure you don't use more than 16 bytes on a 32-bit machine, otherwise you will significantly increase Jade's memory consumption. If you need more space than this, then the ELObj should have a pointer to dynamcally alloacted memory; in this case you must deallocate the memory in the destructor. In this case and any other case where an ELObj has a destructor that must be called, you must declare an operator new():
    	  
    void *operator new(size_t, Collector <c) 
    {
      return c.allocateObject(1);
    }
    This tells the garbage collector that the object has a destructor that must be called when the object becomes garbage.

A key aspect of correct use of the garbage collector is to ensure that the collector always has a sufficient set of roots. Any time that C++ code does anything that may allocate a GC object, any GC object that is not reachable from a root object may get recycled by the system. The way to create a root is to use an auto variable of type ELObjDynamicRoot. An ELObjDynamicRoot adds a single ELObj as a root for the Collector for as long as the ELObjDynamicRoot is in scope. The first argument of the ELObjDynamicRoot constructor specifies the collector. The second argument specifies the ELObj that is to be made a root. The ELObj that the ELObjDynamicRoot causes to be a root can be changed by assiging an ELObj to the ELObjDynamicRoot. There's also a conversion from ELObjDynamicRoot to ELObj *.

Example 1. The reverse() function

DEFPRIMITIVE(Reverse, argc, argv, context, interp, loc)
{
  ELObjDynamicRoot protect(interp, interp.makeNil());
  ELObj *p = argv[0];
  while (!p->isNil()) {
    PairObj *tem = p->asPair();
    if (!tem)
      return argError(interp, loc,
		      InterpreterMessages::notAList, 0, argv[0]);
    protect = new (interp) PairObj(tem->car(), protect);
    p = tem->cdr();
  }
  return protect;
}

protect is a dynamic root that contains currently created part of the reversed node list. Making this a root ensures that all the newly created PairObjs are reachable from a root.

Example 2. The NodeListRef() function

The NodeListRef() function gives an example of the sort of bug that can creep in if you're not very careful. This used to end like this:
return new (interp) 
    NodePtrNodeListObj(nl->nodeListRef(k, context, interp));
	
The nodeListRef() function sometimes allocatesthat it takes an Interpreter argument is a good clue). So what could happen is:

  1. operator new() gets called to allocate a new object

  2. nodeListRef() gets called in a way that causes an allocation

  3. the free list happens to be empty, so the garbage collector gets run; the newly allocated object is not reachable from a root, so it gets GCed and recycled

  4. the constructor gets called with a GC object that the garbage collector thinks is free

The fix was to rewrite it as:
  
NodePtr nd(nl->nodeListRef(k, context, interp));
return new (interp) NodePtrNodeListObj(nd);
	


PrevHomeNext
Coding conventions How to add a new application