graphein: A Very Simple Example

1. A Very Simple Example

1.1. The Purpose of this Example

This is a "Hello, World!" type of example: an extremely simple situation intended not to explore the features of the system to any depth but simply to get to the point where you can run the system at all.

1.2. What You'll Need To Be Familiar With

The graphein system assumes that you're comfortable using a real computer at a command line. If you don't know what a command line is, look online for Neal Stephenson's marvellous essay "In the Beginning Was The Command Line," read a bit in the history of software, and come back in a while. Or not.

Ordinary "user-level use" (vs. programmer-level use) of graphein assumes that you're familiar with the basic programming utilities: project coordination with GNU make, shell scripting with bash, and basic command line utilities. In practice, a graphein system is also managed under a revision control system (I use Subversion), but this example doesn't get into that. A writing project should be managed in exactly the same manner as a programming project. Programming and writing are simply two aspects of the same intellectual activity.

While it may be possible to use graphein with an "XML aware" word processor, I prefer simply editing files using vi. I'm sure that Emacs would work as well.

The graphein system assumes, constantly, a familiarity with the Text Encoding Initiative's markup language (the TEI), itself defined in XML. Some understanding of XML, therefore, might also be handy (the TEI's documentation has a good "Gentle Introduction" to XML). The TEI in turn assumes some notion of scholarly practice, or at least an acknowledgement of the importance of scholarship.

It is not necessary to know the general principles and uses of XSLT, XSL-FO, HTML, CSS, and the other languages/tools used to process graphein documents, but it doesn't hurt either.

1.3. The End Result

So that I can be quite certain of what you'll see when reading this chapter, I'll use more screen-grabbed images than I might otherwise. Below is a screen-grab of the document which is the final result of this example:

A Very Simple graphein Example

A Very Simple graphein Example

For comparison, here's a link to this same thing as HTML. It may look a bit different from the image above, as it's being processed and presented by your browser, but it ought to be recognizable as the same document:

example1.html

This document has several features which merit some comment.

First, it has conventional document-like things: a title and some section headings and some text. The system permits considerable flexibility in these elements, and of course supports many more types of text elements.

Below this appear several more or less standard elements. These give much of the basic "framework" of a document (document hierarchy). If this framework works for you, then graphein may (may!) work for you. If it does not, then its probably too hard to fight the system and you'd be served better by some other tool. While the ideas behind graphein are very general, the system itself is not.

In an indented (or is it outdented? I can never remember) box appears a miscellany of "legal" and related information. The exact content of these is specified in the document. E.g., the copyright can be specified, of course, as can the licensing terms. It is also possible (and appropriate) to add here specific notes about the copyright status and licensing of incorporated elements. For example, if this document incorporated images (in its main text) which were in the public domain, I'd note those details here.

The link below this, "About the Images", is optional. If present, it's intended to link to a page which describes all of the images used in (or introduced in, for repetatively used images) a particular graphein directory.

The links below this are also optional, and if present link "forward" and "backward" within a user-defined chain of pages.

Below the horizontal rule (supplied by the formatting process; a rule in printing is often a sign of dubious taste) appears a set of illustrated links to other sections ("Home", "Category", "Topic", and "Up"). Some of the pictures are the same because it happens that the links are to the same place (e.g., if one directory level Up is Home, then the Home and Up links will be the same). These links are automatically generated (though their illustrations are not). The overall structure of a graphein-managed document hierarchy is standardized (I describe it in more detail in the next example).

The final link, "Resolution", is intended to let the user select an appropriate size or scale for the icons which might better suit his or her screen.

Of course, the graphein system is designed to be able to output documents in non-HTML formats as well (such as PDF), and even non-software formats (final output on paper, for example, though building a system to write cuneiform on clay tablets would be fun!) Concepts such as linking and resolution may differ in these cases. (I haven't yet written any of the support for PDF or other non-HTML output.)

1.4. The Source Input

1.4.1. Viewing the Source File

Here's a link to the actual source TEI file for this example:

example1.tei

If you just try to load this up on your browser (that is, try to click on the link above), it's not at all clear what your browser will do with it. A very old browser might simply refuse to display it. The version of Mozilla that I happen to be running now recognizes it as XML but can't handle its particular markup (and indeed there is no reason it should) so it parses it and displays the resulting tree. It's possible that a "smarter" (?) browser in the future might detect it as TEI markup and try to do something more/other with it.

What I really want, though, is access to the raw source. So obtain the file referenced above, example-simple.tei, and edit it in a terminal window using vi, from the command line. It should look much like this:

A Very Simple graphein Example: Input File

A Very Simple graphein Example: Input File

While the first image on this page was a view of what the reader sees in the end, this present image is a view of what I, the writer, see while I work.

If this doesn't feel comfortable, then graphein isn't for you.

1.4.2. Source File: XML Header

The source file is marked up in the TEI, and as such defines a tree of TEI/XML statements and entities. In this section, I'll begin a depth-first traversal of this tree.

The first statement is the XML declaration:

Traversal: XML Declaration

Traversal: XML Declaration

This is stock XML, and never changes. It is supplied in each TEI source file, but since in general I copy each new TEI source file from an existing one which acts therefore as a template, I never have to retype (or remember) this line.

1.4.3. Source File: DTD

The next statement is the XML "Document Type Declaration." The DTD defines the syntax of the document to follow. Here, the DTD ("DOCTYPE") statement simply references the TEI's actual DTD, including in it several optional components.

Traversal: XML Declaration

Traversal: XML Declaration

This is stock graphein use of the TEI, and, like the XML Declaration, never changes.

1.4.4. Source File: TEI Root Node

With the line:

Traversal: TEI Root

Traversal: TEI Root

I come to the root node of the actual document tree. Everything else in the file is a child of this node. It just says that this is a TEI document which uses the TEI DTD declared earlier. Again, this is a stock, unchanging line. It is matched, down at the very bottom of the file, with a closing tag:

Traversal: TEI Root, Closing Tag

Traversal: TEI Root, Closing Tag

1.4.5. Source File: TEI Header

All TEI documents start out with a TEI Header. It contains information about the document, not the contents of the document itself (though of course nothing forbids a processing program from outputting TEI Header information in such a way that it looks as if it's a part of the document).

The TEI defines many types of things that may appear in the TEI Header, but a minimal TEI Header can be written with just one: a File Description ("fileDesc"). I find it useful, as well, to include a Revision Description ("revisionDesc") header element.

Traversal: TEI Header

Traversal: TEI Header

The revisionDesc element contains fairly straightfoward notes concerning, obviously, revisions of the text. I use a rather free-form "change" statement. Every time I make any significant change to the source file, whether or not I check that change in to Subversion, I add a new "change" statement here (in reverse chronological order). Nothing forces me to maintain good information in these "change" statements, but it's been my experience that it is a good idea to do so. I use these statements, which are tightly integrated into the document itself, in place of change statements in the revision control system.

The fileDesc element contains more types of sub-elements. Of these, three are mandatory: the titleStmt, the publicationStmt, and the sourceDesc.

A Title Statement pretty clearly ought to specify a title, as this one does. This specification doesn't do much, though, as the document's Front Matter (see below) will also specify a title. In fact, the graphein XSL uses the title from the Front Matter, not the one specified here. Still, it's probably a good idea to have it here. It might be handy to have if at some point you use software which processes TEI documents differently and which does look at the TEI Header.

A Source Description describes the source of the document. This sounds like a tautology, but isn't, really. While graphein uses the TEI to mark up documents which are most often new creations, the original purpose of the TEI was to define the markup for the transcription of existing paper documents into electronic form. The "source description" therefore describes the original non-electronic source document, if it exists. Here, at least, there is no such document and I note this fact in this statement.

The TEI "Publication Statement" is more complex. It describes the circumstances of the publication of the present electronic text (vs. an original paper text transcribed, if that had existed). It can be just a prose paragraph, but can alternatively contain structured elements. I choose here to use a few of these structured elements.

Having made the choice for structure, the TEI says that I must then choose to identify one of the following: a publisher, a distributor, or an "authority." Here, I choose to represent myself neither as a publisher nor a distributor, but simply as the distribution "authority" (which seems like too strong a term for me, but that's the term the TEI uses).

I also choose to encode two additional, optional, elements: "pubPlace" and "availability".

The pubPlace element simply identifies the place of publication. I use the city where I receive my mail. This is a bit like the tradition of authors signing their prefaces with "London" or "The Old Manse," or whatever other location they happen to be in at the time of writing. I view it as no more than an affectation on my part, but I can see how, regrettably, it may sometimes be the case that a record of the jurisdiction in which a document was created might become important.

The "availability" element is a bit more directly important. The TEI says that this can include "restrictions on [the document's] use" and "copyright status." I use it, therefore, to collect together and state the legal items relating to the document. Each of these I put into a TEI "ab" block (which is like a paragraph, but without the semantic associations of a paragraph). Each "ab" block is further identified by a "type" attribute.

The various values of the "type" attribute are defined by graphein (not by the TEI). In other words, I just made them up. This isn't the place to list them all. Here, I use three: "copyright", "legal-license-gfdl", and "presented-by".

The type="copyright" element contains a plain-text statement of the copyright. The graphein software will prefix this literal statement with the text "All portions of this document not noted otherwise are Copyright [copyright symbol] ". The rest of the text as worded in this example flows well from this stock opening. Here of course there are no sections "noted otherwise," but in the general case there may be. Examples might include texts or images clearly noted as being in the public domain, or those noted as being in copyright but used here within "fair use," or with permission, or under the terms of some other open source license. If other material of this type existed, it would be best to note it before this type="copyright" statement.

The type="legal-license-gfdl" expands to "boilerplate" language which invokes the particular license noted (in this case the Free Software Foundation's GNU Free Documentation License). Other documents might specify other licenses by using different attribute values.

Finally, the type="presented-by" element generates a text which starts "Presented originally by " and which concludes with whatever text is supplied here. The idea behind this element is that since this is an open source document it might end up on websites anywhere. If, though, this element remains, then a reader could track the document back to its source. There's no guarantee that this element will remain, of course, but it might end up useful. Or not.

Finally, note that if you use graphein you may of course encode TEI files using any of the TEI Header elements. Such files would be valid TEI. The graphein processing tools, however (e.g., the XSLT transformations to HTML) only recognize certain elements. If you use other elements, you'll need to modify graphein. (That's what open source is for.)

1.4.6. Source File: text

The actual document consists of three large-scale elements: "front", "body", and "back" (the Front Matter, Body, and Back Matter of the text, of course).

Traversal: text

Traversal: text

Here, the Front Matter contains just the document's title. This is the title that the graphein XSL(T) actually uses.

The Body contains, well, the body of the document. Here, it's just a single DIVision into a chapter, which has a chapter HEADing and which contains a Paragraph.

The Back Matter contains three graphein-defined "ab" blocks. Each is optional. These expand into three of the links seen in the image of this page: an "About the Images" link, a "backward" link, and a "forward" link. The "About the Images" link assumes a predefined (by graphein, not the TEI) filename base: "about-the-images" The backward and forward links require the user to supply the filename bases for these files (but just the base, omitting any ".html" or other suffix).

All of the items which appears below the horizontal rule in the image of this page (the Home/Category/Topic/Up links, and the "Resolution" selection) are generated automatically by graphein. They are not encoded in the source file at all.

And that's it. Hello, world!

1.5. Running graphein to Process this Source

1.5.1. Installing graphein

To run graphein to process this input file into (say) an HTML result file, you'll need first to make sure that the basic tools are installed.

The most difficult of these will be the XSLT processor. The stock graphein system uses a version 6 (vs. 7) release of the Saxon XSL processor, and the Xerces XML parser. These are Java-based tools, so they in turn require Java on your machine. If you don't have these, then you're in for some major configuration issues.

These tools also require the XML Commons Resolver. This in turn requires two things: a line in your ".bashrc" file which tells it where to find both the systemwide XML Catalog and where to find a graphein-defined "xml-catalog" file. Without these, the parser will never find the TEI.

 
XML_CATALOG_FILES="/path/to/xml-catalog/file/xml-catalog file:///etc/xml/catalog"; 
export XML_CATALOG_FILES 

1.6. Running graphein

Having installed graphein, running it is simple.

 
cd graphein 
./run-make-weave.sh 

Then point a browser at "example1.html" and you're done.

Nothing could possibly go wrong :-)