Generating VARKON Fonts from the Hershey Glyphs

Here is the Awk programming language script which I used to generate these VARKON fonts from the Hershey font data.

# vkf-hershey-generate.awk

# Create a VARKON font file (write to stdout)
# given a font mapping file and the data_glyphs_occidental.txt file

# No error checking; assume perfect input.

# To use:
#    awk -f vkf-hershey-generate.awk FONTMAPFILE data_glyphs_occidental.txt baseline=BASELINE
#    - replace FONTMAPFILE with the font map file, as appropriate
#    - data_glyphs_occidental.txt is the literal filename for this file
#    - the baseline is in the "distribution" coordinate system
#             e.g., cartographic = 4, indexical = 6, normal = 9
#                   possibly special for some symbols
#
# Note to self: If you specify the wrong baseline (e.g., 4 for cartographic
#               when you meant 9 for normal), strange things will happen
#               (e.g., lowercase Greek Normal characters grow tails).
#               This is minimalist code, not robust code!

#

BEGIN {
   input_file = "FONTMAPFILE"
   ascii_count = 0
   glyph_count = 0 # Hershey glyphs start at 1, 
                   # but data_glyphs_occidenta.txt at 0
   scaleadjustx = 1
   scaleadjusty = 1

   debug = 1       # 0 = no debugging
                   # 1 = debug coordinate transformations only
                   # 9 = everything
                   # 10 = special
}
{
   # discard comment lines
   if (substr($0,1,1) == "#") {
      next
   }

   # If reading the FONTMAPFILE, create an array which associates
   # a glyph number (or 0) with each ASCII position 0 .. 127
   # Assume perfect input.
   if (input_file == "FONTMAPFILE") {
      ascii_position[ascii_count] = $2
      ascii_count++
      if (ascii_count == 128) {
         input_file = "FONTDATAFILE"
         nextfile
      }
   }

   # Read the whole FONTDATAFILE into an array
   if (input_file == "FONTDATAFILE") {
      glyph_data[glyph_count] = $0
      glyph_count++
   }

}
END {

   # if debugging, dump the two arrays (ascii-to-glyph mapping and glyph data)
   if (debug == 9) {
      for (i = 0; i <= 127; i++) {
         printf ("ascii_position[%d] = %d\n", i, ascii_position[i]) > "/dev/stderr"
      }
      for (i = 1; i <= 4195; i++) {
         printf ("glyph_data[%d] = %s\n", i, glyph_data[i]) > "/dev/stderr"
      }
   }

   # Iterate through the mapping array and for each position 
   # generate the glyph.
   for (i = 0; i <= 127; i++) {
      if (ascii_position[i] == 0) {
         if (debug == 9) {
            printf ("ascii: %d = 0; no glyph\n", i) > "/dev/stderr"
         }
      } else {
          if (debug == 9) {
            printf ("ascii: %d == glyph %d, data: %s\n", 
                             i,       
                                         ascii_position[i],
                                                  glyph_data[ascii_position[i]]) > "/dev/stderr"
         }
         # Transform the Hershey glyph data into VARKON character
         # data and write it to stdout.
         generate_vk_character(glyph_data[ascii_position[i]],i)
      }
   }

   # I never use non-ASCII VARKON font positions 128-255,
   # but VARKON expects them in the FNT file, so create them here.
   for (i = 128; i <= 255; i++) {
      varkon_glyph_vectorcount[i] = 0
   }

   if (debug == 10) {
      for (i = 0; i <= 255; i++) {
         printf ("varkon_glyph_vectorcount[%d] = %d\n", i, varkon_glyph_vectorcount[i]) > "/dev/stderr"
      }
  }

   # Iterate through the vector count array to determine how many
   # non-empty characters there are; output that number to stdout 
   # as the first line of the VARKON font file.
   nonempty_characters = 0
   for (i = 0; i <= 255; i++) {
      if (varkon_glyph_vectorcount[i] > 0) {
         nonempty_characters++
      } 
   }
   printf ("%d\n", nonempty_characters)

   # Iterate through the vector count array and determine the cumulative
   # sum of all of the vector counts.
   # Print to stdout as the second line of the VARKON font file.
   vector_count = 0
   for (i = 0; i <= 255; i++) {
      vector_count = vector_count + varkon_glyph_vectorcount[i]
   }
   printf ("%d\n", vector_count)

   # Iterate through the vector count and vector data arrays
   # and output each VARKON glyph to stdout
   for (i = 0; i <= 255; i++) {
      printf ("%d\n", varkon_glyph_vectorcount[i])
      data_position = 0
      if (varkon_glyph_vectorcount[i] > 0) {
         while (data_position < ((varkon_glyph_vectorcount[i] + 1) * 2)) {
            printf ("%s %s\n", varkon_glyph_data[i,data_position], varkon_glyph_data[i,data_position+1])
            data_position = data_position + 2
         }
      }
   }

}




# Transform Hershey glyph data into VARKON character
# data and accumulate it in two arrays:
#    varkon_glyph_vectorcount[0..255] number of vectors in each glyph
#    varkon_glyph_data[0..255]        actual glyph data
#
# After a while, all of the coordinate transformations become a blur;
# hence the many debugging print statements.
#
function generate_vk_character(one_glyphs_data,glyphs_ascii_position) {

   if (debug == 1) {
      printf ("generate_vk_character()\n")  > "/dev/stderr"
      printf ("baseline == %d\n", baseline) > "/dev/stderr"
   }

   # initialize the glyph vectors
   data_position = 0
   varkon_glyph_vectorcount[glyphs_ascii_position] = 0

   # Obtain the number of 16-bit words of coordinate data,
   # excluding the initial pair which encodes the left and right margins.
   datapairs = substr(one_glyphs_data,6,3) - 1
   if (debug == 1) {
      printf ("datapairs: %d: %s\n", datapairs, one_glyphs_data) > "/dev/stderr"
   }

   # Obtain the left margin (ignore the right margin)
   # See below for a discussion of the "encoded," "distribution," and
   # "renumbered" coordinate systems.
   # First, get it in the "encoded" coordinate system.
   leftmargin_encoded = substr(one_glyphs_data,9,1)
   if (debug == 1) {
      printf ("leftmargin_encoded: %s\n", leftmargin_encoded) > "/dev/stderr"
   }
   # Then transform it to the "distribution" coordinate system.
   leftmargin_distribution = hurt(leftmargin_encoded)
   if (debug == 1) {
      printf ("leftmargin_distribution: %d\n", leftmargin_distribution) > "/dev/stderr"
   }
   # Then transform it to the "renumbered" coordinate system.
   leftmargin_renumbered = leftmargin_distribution + 49
   if (debug == 1) {
      printf ("leftmargin_renumbered: %d\n", leftmargin_renumbered) > "/dev/stderr"
   }

   # If this is the start of a glyph, by definition it must be a new polyline
   newpolyline = 1

   # Extract and transform the coordinate data
   pointpos = 11               # position in the data string
   x_shifted_prev = 0
   y_shifted_prev = 0
   refnum = 1
   # note to self: all variables are global in Awk; if I use "i" here
   #               then things get quite messed up :-)
   for (g = 1; g <= datapairs; g++) {

      # Is it a "real" coordinate or pseudo-coordinate pair which
      # indicates a new polyline/pen-up action?
      if (substr(one_glyphs_data,pointpos,2) == " R") {

         # set state so that when we come to the next coordinate pair
         # we know it starts a new polyline
         newpolyline = 1
         if (debug == 1) {
            printf ("pen up\n") > "/dev/stderr"
         }

         # Do not increment the varkon_glyph_vectorcount[]. 
         # While the Hershey glyph encoding specifies new polylines
         # using a separate data pair, the VARKON font encoding
         # folds this specification into the X coordinate of the next
         # vector.

         # Go on to the next coordinate pair
         pointpos = pointpos + 2

      } else {

         # Get the coordinate pair as Hurt encoded data
         x_encoded = substr(one_glyphs_data,pointpos,1)
         y_encoded = substr(one_glyphs_data,pointpos + 1,1)
         if (debug == 1) {
            printf ("datapair, encoded x,y: |%c%c|\n", x_encoded, y_encoded) > "/dev/stderr"
         }

         # Decode the coordinate pair
         # This is the Hurt "distribution" coordinate system:
         #    upper left of the glyph cell is (-49,-49)
         #    the special value (-50,0) represents "pen up" (new polyline)
         x_distribution = hurt(x_encoded)
         y_distribution = hurt(y_encoded)
         if (debug == 1) {
            printf ("datapair, distribution x,y: %d %d\n", x_distribution, y_distribution) > "/dev/stderr"
         }

         # Convert from the "distribution" coordinate system to the
         # "renumbered" coordinate system.  
         # (Convert cell from upper left at (-49,-49) to lower left at (0,0))
         # This normalizes the cell into the first quadrant.
         # X was     -49 to -1,  0,  1 to 49
         # X becomes   0 to 48, 49, 50 to 98
         x_renumbered = x_distribution + 49
         # Y was      49 to  1,  0, -1 to -49
         # Y becomes   0 to 48, 49, 50 to  98
         y_renumbered = (y_distribution * -1) + 49
         if (debug == 1) {
            printf ("datapair, renumbered X Y: %d %d\n", x_renumbered, y_renumbered) > "/dev/stderr"
         }

         # I'm not presently doing any "jogging" 
         # (that would be from an auxiliary annotation file).
         # Leave this in as a comment so I'll remember where it might go.
         # x_renumbered = x_renumbered - jogl
         # x_renumbered = x_renumbered + jogr
         # y_renumbered = y_renumbered + jogu
         # y_renumbered = y_renumbered - jogd
         # if (debug == 1) {
         #    printf ("datapair, renumbered & jogged X Y: %d %d\n",x_renumbered,y_renumbered) > "/dev/stderr"
         # }

         # The Hershey glyph is centered at (0,0) (distribution coordinates).
         # VARKON characters start at the left side of the VARKON 
         # character cell, and have a particular baseline.
         # So the Hershey glyph must be shifted to make it a VARKON glyph.
         # This is the "shifted" coordinate system.

         # Note: The Hurt "distribtion" and the "renumbered" coordinate systems
         # are integer systems.  However, it will turn out that the
         # baseline for the shifted coordinate system is 10.5.
         # Rounding this creates rounding errors in the subsequently
         # scaled values (baseline and topline are off - by a small
         # but visible amount).
         # The solution is to treat the "shifted" coordinate system
         # as a real number system.  Values scaled from it can be
         # truncated to integers before use with VARKON; the error
         # after scaling should be small enough.

         # X: Left margin goes to left of cell
         x_shifted = x_renumbered - leftmargin_renumbered

         # Y: There are two baselines which must be brought together.
         #    The glyph has its own baseline, specified by the "baseline"
         #    parameter.  I'll call this the "glyph baseline."
         #    The shifted coordinate system has a baseline,
         #    calculatable from the height of a "normal" Hershey glyph.
         #    This baseline is therefore a constant.
         #    I'll call it the "cell baseline."
         #
         #    Height is an absolute value valid in the "distribution,"
         #    "renumbered," and "shifted" coordinate systems.
         #    The VARKON baseline (5000) is at a distance half the
         #    height (10000) up, so the baseline in the shifted coordinate
         #    system should also be half the height up.
         #    The height of Hershey glyph 501 (simplex normal size "A")
         #    is 21 (-12 to 9 in the distribution coordinate system).
         #    The cell baseline in the shifted coordinate system must 
         #    therefore be half this, or 10.5.
         #    This cell baseline is the same for all sizes
         #    (cartographic, indexical, and normal).

         #    Glyph Y coordinates must be shifted so that the 
         #    glyph baseline is made to coincide with the cell baseline.

         # To calculate all of this, first change the glyph baseline
         # from the "distribution" coordinate system to the
         # "renumbered" coordinate system.
         # Example: cartographic baseline = 4  in distribution coordinates
         #                                = 45 in renumbered coordinates
         glyph_baseline_renumbered = (baseline * -1) + 49
         # Then calculate the difference between the glyph baseline
         # and the cell baseline.  E.g., (45 - 11) = 34 for cartographic
         baseline_difference = glyph_baseline_renumbered - 10.5
         # Then shift the glyph's Y coordinates down by this amount.
         y_shifted = y_renumbered - baseline_difference

         if (debug == 1) {
            printf ("datapair, shifted X Y: %d %d\n", x_shifted, y_shifted) > "/dev/stderr"
         }


         # Since all Hershey glyphs have the same relative size,
         # they must all employ the same scaling factor. 
         # The problem is that both the Hershey and VARKON
         # coordinate systems are integer, and the Hershey system
         # is relatively coarse.
         # If the Hershey coordinate for the baseline
         # (11 in the shifted coordinate system)
         # is divided into the VARKON baseline, the result is
         # (5000/10.5) = 476.190476..

         x_scaled = int(x_shifted * (5000/10.5) * scaleadjustx)
         y_scaled = int(y_shifted * (5000/10.5) * scaleadjusty)
         if (debug == 1) {
            printf ("scaled X Y: %d %d\n", x_scaled, y_scaled) > "/dev/stderr"
         }

         # If this is the start of a new polyline, add 32768
         # to X to encode this fact.

         if (newpolyline == 1) {
             x_scaled = x_scaled + 32768
         }

         varkon_glyph_data[glyphs_ascii_position,data_position]     = x_scaled
         varkon_glyph_data[glyphs_ascii_position,data_position + 1] = y_scaled

         data_position = data_position + 2
         varkon_glyph_vectorcount[glyphs_ascii_position]++

         # Go on to the next coordinate pair
         # Since this not a "pen up" pseudo-coordinate, we cannot
         # (yet at least) be at the start of a new polyline.
         newpolyline = 0
         x_shifted_prev = x_shifted
         y_shifted_prev = y_shifted
         pointpos = pointpos + 2
      } # end: if
   }  # end: for

   # In the VARKON font, the number of vectors specified as the number less 1
   # (that is, it is a count starting from 0)
   if (varkon_glyph_vectorcount[glyphs_ascii_position] > 0) {
      varkon_glyph_vectorcount[glyphs_ascii_position]--
   }
}  # end: function generate_vk_character




# The Hurt encoding
function hurt (c) {
   ascii[" "]  = -50
   ascii["!"]  = -49
   ascii["\""] = -48
   ascii["#"]  = -47
   ascii["S"]  = -46
   ascii["%"]  = -45
   ascii["&"]  = -44
   ascii["'"]  = -43
   ascii["("]  = -42
   ascii[")"]  = -41
   ascii["*"]  = -40
   ascii["+"]  = -39
   ascii[","]  = -38
   ascii["-"]  = -37
   ascii["."]  = -36
   ascii["/"]  = -35
   ascii["0"]  = -34
   ascii["1"]  = -33
   ascii["2"]  = -32
   ascii["3"]  = -31
   ascii["4"]  = -30
   ascii["5"]  = -29
   ascii["6"]  = -28
   ascii["7"]  = -27
   ascii["8"]  = -26
   ascii["9"]  = -25
   ascii[":"]  = -24
   ascii[";"]  = -23; ascii["["] =  9; ascii["{"] = 41
   ascii["<"]  = -22; ascii["\\"] = 10; ascii["|"] = 42
   ascii["="]  = -21; ascii["]"] = 11; ascii["}"] = 43
   ascii[">"]  = -20; ascii["^"]  = 12; ascii["~"] = 44
   ascii["?"]  = -19; ascii["_"]  = 13
   ascii["@"]  = -18; ascii["`"]  = 14
   ascii["A"]  = -17; ascii["a"]  = 15
   ascii["B"]  = -16; ascii["b"]  = 16
   ascii["C"]  = -15; ascii["c"]  = 17
   ascii["D"]  = -14; ascii["d"]  = 18
   ascii["E"]  = -13; ascii["e"]  = 19
   ascii["F"]  = -12; ascii["f"]  = 20
   ascii["G"]  = -11; ascii["g"]  = 21
   ascii["H"]  = -10; ascii["h"]  = 22
   ascii["I"]  =  -9; ascii["i"]  = 23
   ascii["J"]  =  -8; ascii["j"]  = 24
   ascii["K"]  =  -7; ascii["k"]  = 25
   ascii["L"]  =  -6; ascii["l"]  = 26
   ascii["M"]  =  -5; ascii["m"]  = 27
   ascii["N"]  =  -4; ascii["n"]  = 28
   ascii["O"]  =  -3; ascii["o"]  = 29
   ascii["P"]  =  -2; ascii["p"]  = 30
   ascii["Q"]  =  -1; ascii["q"]  = 31
   ascii["R"]  =   0; ascii["r"]  = 32 
   ascii["S"]  =   1; ascii["s"]  = 33
   ascii["T"]  =   2; ascii["t"]  = 34
   ascii["U"]  =   3; ascii["u"]  = 35
   ascii["V"]  =   4; ascii["v"]  = 36
   ascii["W"]  =   5; ascii["w"]  = 37
   ascii["X"]  =   6; ascii["x"]  = 38
   ascii["Y"]  =   7; ascii["y"]  = 39
   ascii["Z"]  =   8; ascii["z"]  = 40

   return ascii[substr(c,1,1)]
}

Here's a link to the file containing the same: vkf-hershey-generate.awk. If you use this script, use the version in the file; the version above has HTML entity references substituted for some literal characters (if you cut and paste it, it won't run until you make the less-than signs real ASCII less-than characters again, e.g.)

Exploring Dr. Hershey's Typography
CircuitousRoot