Editing Principles, Encoding Practices, and Tools

Contents

  1. Editing Principles
  2. Markup, Structure, Appearance, and Viewing

Editing Principles

In the academic study of Literature, there is something called a "scholarly edition." This is more than just an edition used by scholars. The term describes an edition that has been edited according to certain well-established and quite rigorous principles, which has the backing of certain institutions devoted to textual reconstruction, and which therefore has a distinctive standing when compared to ordinary editions. No doubt the Holtzapffels' Turning and Mechanical Manipulation deserves this. The creation of a scholarly edition, however, requires significant resources and generally can be accomplished only with institutional support. Regrettably, this isn't likely for Holtzapffel - even if one could persuade a sufficient number of ornamental turners to abandon their lathes for textual studies.

Near the other end of the spectrum of editions may be found many of the online reprints of the present Editor. These are often simply scans put online with as little effort as possible. Reprints such as these serve the useful function of making obscure information more generally available to fellow enthusiasts, but they are in many other respects inadequate.

While this present Reprint cannot hope to be a Scholarly Edition, Holtzapffel deserves better than simply to be slapped online in low resolution scans.

Two high-level requirements seem, to the Editor at least, to be necessary in order to make this Reprint worthwhile.

Firstly, it must include an actual transcription into ASCII text. This allows the text to be searched (without which ability this Reprint would be no better than the generally available paper reprints).

Secondly, and perhaps less obviously, this Reprint must be marked up well. To understand why this is important requires a digression into the philosophy of text markup or encoding.

Marking up a text is the process of adding to it "meta"-information which describes it in a way that some one (a human compositer setting type, say) or some thing (a computer program) can in some way mechanically "understand" the marked-up aspects of the text. As a simple example, a symbol might be introduced before each paragraph to say "here starts a paragraph." This may seem too obvious ("everybody knows where paragraphs start") until you try to write a computer program which displays paragraphs in one way and poetic stanzas in another.

Markup is labor intensive. You only want to do it once, so it is a good idea to do it well. One aspect of this is purely intellectual - it's a good idea to figure out all of what needs to be marked up so that you don't have to go back and add more markup later. Another aspect, however, is technological - markup systems have changed, and certainly will change again. Marking up the five volumes of Holtzapffel is a large task that shouldn't have to be re-done every few years as computer systems change. How, though, can this be accomplished?

Clearly the proprietary markup systems implicitly built into commercial word-processing software won't do. Any markup scheme must be openly documented and freely available to all.

Mere openness isn't enough, however. For example, HTML is a freely available and widely implemented markup language. At the time of writing, most of the World Wide Web is built using it. Yet it would be shortsighted to believe that it will last forever. The first volume of Holtzapffel came out 170 years ago and I can still read it exactly as it was published then. At very least, this Reprint should be readable 170 years from now, long after the "World Wide Web" becomes a page of history.

The solution, in the Editor's opinion, is in the concept of "formal langauges" and the work with meta-markup languages begun in the 1960s by Dr. Charles Goldfarb. There isn't room here for the whole history or philosophy of this. Suffice it to say that Goldfarb and his colleagues defined not a markup language, but rather a meta-language for defining markup languages. This metalanguage, SGML (the Standard Generalized Markup Language) later was used by Dr. Tim Berners-Lee to define HTML (HyperText Markup Language, the language of the current WWW), and by others to define various other markup languages.

The advantage of using a markup language formally defined in this way is that it makes it possible to automate the process of translating from one markup language to another (within the limitations of each, of course). So, actually, HTML (as an SGML-defined markup langauge) might not be such a bad choice because it will be possible in the future to define an automatic translation of markup from HTML to whatever its successor might be.

The only problem with using HTML is that it wasn't really designed for the complex task of marking up a text such as Holtzapffel. Another SGML-defined (or now XML-defined, keeping up with trends) markup language was, however. This is the language defined by the Text Encoding Initiative. Basically, the "TEI Guidelines" are an SGML/XML-defined markup language designed to be able to encode any text from any time period for any scholarly purpose. It should be sufficient for this task.

Markup, Structure, Appearance, and Viewing

One of the generally accepted principles of good markup practice is that markup should define the underlying structure of the text but leave unspecified its exact appearance. As an author writing a traditional book, you'd indicate where the chapters and paragraphs started, but leave it up to the compositor to set the type appropriately. The markup of a digital version of an existing book can become more complex than this because often it is desirable to indicate features of the physical book (for example, the original division into pages so that readers of both the digital and the original print version can refer to the same page by number). Still, this distinction seems a good one.

The idea, therefore, is that you can take three things: the grammar of a markup language (such as the TEI, or HTML), a text marked up in this markup language, and a "style sheet" which specifies the appearance of things so marked up. If you feed all three into a computer program, you should be able to view it. But the visual part - the style sheet - is independent of the marked-up text. If you wanted it to look different, you could supply a different style sheet. If you wanted to do something else entirely with it, such as analyze its sentences, you could do that by feeding the text and the markup language grammar into a different program. The key to this flexability is separating out the markup of structure.

Unfortunately, this works easily only in a few cases. HTML is one. If you have a text marked up in HTML (such as a web page), and you have the HTML grammar (which is built in to web browsers), and a CSS stylesheet, all three work together to display a presumably nice looking page. At the present moment, this doesn't work so well with the TEI. The language for general style sheets for XML-defined markup languages such as the TEI, called XSL (XML Stylesheet Language) has been defined only recently and is not yet widely implemented. In the future, "TEI text + TEI grammar + XSL stylesheet for TEI = viewable page" will work, but right now it doesn't.

For the markup (encoding in TEI) process itself, this doesn't matter. For using the marked-up text of Holtzapffel today, it does.

Fortunately, there is also a general-purpose translation language that allows TEI encoding to be transformed into pretty much anything - including, say, HTML encoding. This language is, somewhat confusingly, also a part of XSL and is called "XSLT" (XSL Translations). So for now, the most portable way to view this Reprint is to transform the TEI encoded text automatically into an HTML encoded text. So "TEI text + XSLT translation specification = HTML text" and "HTML text + CSS stylesheet structure vs appearance tei-encoded text + tei grammar + stylesheet = directly viewable but not yet supported in most browser environments or tei-encoded text + tei grammar + translation specification = (say) html, add CSS and it becomes viewable in most browsers the basic work is in marking up Holtzapffel using the TEI Guidelines. This is done once. The translations to make it viewable (stylesheets, or translation specifications) can be added later, changed later, or not done at all. MISC put the "pd-disclaimers.html" and "trademark-recognition.html" pages in the "Ornamental Turning" level of "A Library of Antiquarian Technology" because it is likely that the O.T. section will be mirrored by itself.

A Reprint of the Holtzapffels' Turning and Mechanical Manipulation