The mapping used in the "greek_utf-8" keymap supplied with vim perplexed me at first, but that was only because of my own ignorance. For example, I would have used the "w" character for lowercase omega, as it looks a bit like a lowercase omega; this keyboard mapping uses "v" (and uses "w" for final sigma). It turns out, of course, that the mapping used in this keymap is simply that of the traditional Greek keyboard layout (for modern monotonic and polytonic Greek?)
Here is an illustration of such a Greek keyboard layout, from the Wikipedia (EN) article on " Keyboard layout."
Traditional Greek Keyboard Layout
What is interesting in this keyboard mapping is simply the position of several alphabetic keys vis-a-vis a US ASCII keyboard:
A number of other comments may be made concerning this keyboard image relative to the mapping in the greek_utf-8 keymap, and of course I'll make them, but they may be irrelevant. I have no idea how representative the keyboard shown above is (keyboards differ, and even my own QWERTY keyboard differs from it in the placement of some non-Greek symbols). It might be best simply to skip to the next section, which concerns the greek_utf-8 mapping itself. In any event:
This is just a sort of summary by categories I've invented. For the real details, see the "greek_utf-8" keymap file itself (in /usr/share/vim/current/keymap/greek_utf-8.vim on SuSE 10.0).
QUERTY A maps to alpha, B to beta, etc. If unaccented, then these are ordinary, unaccented characters from the Unicode "Greek and Coptic" range, U+0370 to U+03FF). No problem.
These are, as noted earlier, simply conventions from Greek keyboards:
They, also, map to the Unicode "Greek and Coptic" range, U+0370 to U+03FF.
To mark a character with an acute accent, type a (QWERTY) semicolon before the letter. It must be a letter capable of taking an acute accent.
The acute accent is also called the "oxia," and perhaps (in modern Greek only?) the "tonos."
To mark a character with a grave accent, type a (QWERTY) left-single-quote (the unshifted tilde key) before the letter. It must be a letter capable of taking a grave accent.
The grave accent is also called the "varia."
To mark a character with a circumflex accent, type a (QWERTY) tilde before the letter. It must be a letter capable of taking a circumflex accent.
Note that this is for data entry, not display. Whether the circumflex displays looking like a tilde or looking like a carat depends on the program displaying it. This displaying program won't necessarily be vi.
Note that the necessarily short vowels ε and ο cannot take a circumflex accent.
The circumflex accent is also called the "perispomeni" and is also represented, at times, by characters called "tilde" (though I don't believe it should be written as a tilde, but this is another issue) and "inverted breve."
To mark a character with a rough breathing mark, type a (QWERTY) less-than before the letter. It must be a letter capable of taking a rough breathing mark.
The rough breathing mark is also called the "dasia" and sometimes (e.g., in Unicode) the "reversed comma above."
To mark a character with a smooth breathing mark, type a (QWERTY) greater-than before the letter. It must be a letter capable of taking a smooth breathing mark.
The smooth breathing mark is also called the "psili" and sometimes (e.g., in Unicode) the "comma above."
To mark a character with an iota subscript type a (QWERTY) vertical-bar after the letter. It must be a letter capable of taking an iota subscript.
The iota subscript is also called the "ypogegrammeni."
To mark a character with an Iota adscript type a (QWERTY) vertical-bar after the letter. It must be a letter capable of taking an Iota adscript.
This produces precomposed Unicode characters such as (in the first example above) U+1FBC "GREEK CAPITAL LETTER ALPHA WITH PROSGEGRAMMENI." How this precomposed character is rendered depends on what is doing the rendering. The Unicode 4.1.0 chart, the glyphs for which are non-normative, shows it as an Α with a tiny iota to the right. Whatever font is being used with vim 6.3 as I type this shows it with a subscript iota. Whatever font is being used with Mozilla 1.7.11 as I view this (it's not clear which font this is), does the same.
To force the display of "ΑΙ" (an adscript Iota done as a full-sized capital on the lettering baseline) you'd have to type "ΑΙ". This would, of course, be encoded as U+0391 U+0399, not as a single precomposed character. This could affect such things as searching and typographic processing.
The iota subscript is also called the "prosgegrammeni."
For multiple diacritical marks on the same character: Type the rough/smooth breathing marks first (if they exist), followed by the acute/grave/circumflex accent (if it exists), followed by the letter, followed by the iota adscript (if it exists).
The vim greek_utf-8 keymap maps specific sequences of typed keystrokes to particular Unicode characters. The order in which they're typed thus does make a difference. This is true both for simple diacritical marks (e.g., type the accent before the character to be accented, or the iota-subscript indicator after) and for stacked-up diacritical marks (e.g., greater-than + semicolon + a gives an acute accented alpha with a smooth breathing mark (ἄ, U+1F04), while semicolon + greater-than + a gives a right-double-pointy-quote followed by an unaccented alpha (U+00bb U+03B1).
Typing the diacritical mark (e.g., the semicolon for acute accent) followed by a sufficient pause or followed by a character with which it cannot compose will result in the diacritical mark itself. (Exeption: iota subscript (type the vertical bar twice: || gives ͺ); it is not possible to do an iota adscript by itself.)
The results here are in the Unicode "Greek Extended" range (U+1F00 to U+1FFF). (Exception: acute accent (U+0384)
Use an ordinary ASCII period. It maps to U+002E.
U+037E "GREEK QUESTION MARK" ("erotimatiko," visually like an ASCII/Latin-1 semicolon) is the unshifted upper-left key (where the "q" is on a QWERTY keyboard). The QWERTY semicolon, if pressed by itself with a pause afterward, or if pressed just before another key with which an acute accent cannot compose, produces an ASCII/Lating-1 semicolon (U+003B)
This one is easy to type (type either a capital Q or the sequence ";." (semicolon period)). It's harder to explain, though.
The Unicode 4.1.0 standard calls U+0387 "GREEK ANO TELEIA," but prefers the "middle dot" character U+00B7 from the range U+0080 to U+00FF, "C1 Controls and Latin-1 Supplement" ("Latin-1 Punctuation"). The greek_utf-8 encoding calls it "GREEK ANO TELEIA" and allows two different ways to type it: ";." (following the KDE methods) or "W" (following Emacs, but also the Greek keyboard as illustrated above).
However, greek_utf-8 also calls U+003A (an ASCII colon) "GREEK EPEXIGIMATIKA OR ANO & KATO TELEIA - DEFINITION MISSING FROM UNICODE" (and indeed there is no "epexigimatika or kato teleia in Unicode 4.1.0). It allows this to be typed as a capital Q, which corresponds with the Greek keyboard illustrated above.
To do this we need to enter Greek characters from the ordinary "Greek and Coptic" range (U+0370 to U+03FF) together with diacritical marks from the "Combining Diacritical Marks" range (U+0300 to U+036F). Unfortunately, these latter are not mapped in greek_utf-8. To use them in it means that we need first to enter the ordinary letter as discussed above, and then to enter the diacritical mark or marks using the "CNTRL-V u XXXX" method.
For example, to insert an alpha with an acute accent, I'd do:
A pause of indefinite length may occur between the letter and the diacritical mark. Indeed, one can at this point exit insert mode, do something else, come back, and still have the successive characters appear combined. (They are, after all, simply successive characters underneath; the combining is merely a matter of presentation graphics.)
I have created a modified version of this keymap
which includes these combining characters.
The Unicode standard allows any sequence of characters but also indicates that when multiple combining diacritical marks can interact with each other typographically that order can be important. Conveniently, it gives as an example the Greek acute accent and smooth breathing mark ("COMMA ABOVE"). In this case, since the marks do not stack vertically but instead combine horizontally with the breathing mark first, the breathing mark must come first in the sequence. Thus the first example below is correct, while the second is not:
U+03B1 U+0314 U+0301 gives: ἅ U+03B1 U+0301 U+0314 gives: ά̔
Note: My mozilla isn't rendering these properly, but vim/xterm is. Here's an image from a screen-grab from vim/xterm:
Order of Combining Diacritical Marks Example
The diacritical marks of interest are, as noted in the Unicode section:
(See the Unicode section earlier, or the Unicode standard, for notes on U+0302, U+0303, U+0343, and U+0344.)
Final problem: vim supports only two combining diacritical marks. Sometimes we need three (e.g., eta, breathing, accent, and iota subscript). For example:
U+03B7 η U+03B7 U+0314 ἡ U+03B7 U+0314 U+0301 ἥ U+03B7 U+0314 U+0345 ᾑ U+03B7 U+0314 U+0301 U+0345 ᾕ
My vim presently shows:
Combining Three Diacritical Marks in vim
That is, the final version with three diacritical marks does not show the last one at all.
By way of contrast, my mozilla presently shows:
Combining Three Diacritical Marks in Mozilla
That is, it shows the iota subscript, but shows it spaced over to the next character position rather than as a subscript.
So the diacritical marks have been entered in this way, and are there in the file; they're just invisible.
Type a hyphen (minus sign) before a letter which can take a macron.
The macron is also called (or is) the "long" mark. A phonetician would probably use a colon suffixed.
Type a carat (shift-6) before a letter which can take a vraxy.
The greek_utf-8 keymap calls this "braxy"; also "breve." It's the "short" mark.
The QWERTY colon produces a "diaeresis" (double-dot-above, "umlaut"). It does so either by itself (as U+00A8, "DIAERESIS" from the Unicode range U+0080 to U+00FF, "C1 Controls and Latin-1 Supplement" ("Latin-1 Punctuation")), or precombined in the "Greek and Coptic" (not "Greek Extended") range (with acute, grave, and modern Greek accents).
(Use the same methods as discussed above for common diacritical marks.)
Note: The character which looks like a large letter T with "hands" on the ends of its arms is "archaic sampi." It is not in Unicode.
The keymap does not include characters in the ranges:
None of the keymaps distributed with vim 6.3 include these. I haven't yet written keymaps for them.
All portions of this document not noted otherwise are Copyright © 2006 by David M. MacMillan and Rollande Krandall.
Circuitous Root is a Registered Trademark of David M. MacMillan and Rollande Krandall.
This work is licensed under the Creative Commons "Attribution - ShareAlike" license. See http://creativecommons.org/licenses/by-sa/3.0/ for its terms.
Linux is a registered trademark of Linus Torvalds.
SuSE is a registered trademark of Novell Corporation.
Unicode is a registered trademark of The Unicode Consortium.
Presented originally by Circuitous Root®
Select Resolution: 0 [other resolutions temporarily disabled due to lack of disk space]