1. General

My text editor of choice is "vi", or more specifically "vim" (VI Improved). For further information, see the vim home page at: If you're hazy on the distinction between a text editor and, say, a word processor, then you'd probably better avoid this chapter entirely.

As usual, nothing about the discussion here is complete. These are just random notes for my own use. Write it down so I won't forget it; post it in public so I always know where I put what I wrote down.

As I write this, I'm using vim 6.3 as distributed with SuSE Linux 10.0. Things noted here might or might not work on earlier versions or on versions compiled differently. All of the multibyte stuff seems to be enable. I use the ordinary "in-terminal" vi, not the GUI-enabled gvim.

2. Unicode Entry in Hexadecimal

To enter arbitrary Unicode in vim as up to four hex digits:


Terminate the entry of digits either by typing all four or by typing a non-hex character. If you do this latter, it'll be interpreted as itself (so if you type a space and don't want it, you'll have to backspace over it; if you type an Esc you'll leave Insert mode).

Four hex digits are enough for everything in the Basic Multilingual Plane (BMP), but the Supplemental Multilingual Plane (SMP; where for example the Cypriot Syllabary is) and the rest of Unicode require more digits. To enter arbitrary Unicode in vim as up to eight hex digits:


Terminate as above.

3. Unicode Entry via digraphs

It is also possible to enter multibyte characters using "digraphs" (pairs of characters mapped to multibyte sequences). I don't do this, so I'll just note here how the documentation says it is to be done. To see what digraphs are defined:


To enter a digraph:

CTRL-K ab 

where "ab"is the digraph.

4. Identifying a Character

To identify what a character really is, move over it (in command mode) and press:


Note: "g8" will show the literal multibyte hex value, which in UTF-8 encoding won't be the same as the UCS-2 or UCS-4 Unicode value. You don't want to (or I don't want to) do UTF-8 decoding in your (my) head.

5. keymaps

On SuSE 10.0, the vim keymaps live in:


which, through the magic of symbolic links, is also:


The ancient Greek keymap I use is either the default:


or a modification of it that I've made:


I've also had to write keymaps for the Cypriot Syllabary

I'll cover the use of these keymaps in another section, but the ultimate reference is always simply reading the keymap source.

6. vimrc and keymap Switching

Put this in $HOME/.vimrc to make

:set encoding=utf-8 
:map <F2> :set keymap=greek_utf-8<CR> 
:map <F3> :set keymap=<CR> 
:imap <F2> <Esc><F2>a 
:imap <F3> <Esc><F3>a 

The ":set encoding" may be redundant nowdays.

The ":map" sets up a keyboard mapping; F2 and higher function keys have no default values in vim, and so are handy to use for this. The command mapped is the setting of the keymap. The "<F2>", "<CR>", etc. are typed literally (that is, type less-than-sign, F, 2, greater-than-sign; don't simply press the F2 key). The :map applies to Normal, Visual, and Operator Pending modes. The :imap applies to Insert mode. In Insert mode, the command escapes to Normal mode, executes the previously defined F2 command (to do the keymapping), and then with the "a" gets back into insert mode at the right place.

To search, enter the greek_utf-8 keymap mode first (<F2>) and then enter / to search.

To do the same with a user-written keymap, first put it in the system keymap directory (I'll assume I have root, as I do this on a home system). Then, e.g. for my Cypriot Syllabary keymap (/usr/share/vim/current/keymap/cypriot-syllabary_utf-8.vim):

:map <F6> :set keymap=cypriot-syllabary_utf-8<CR> 
:map <F7> :set keymap=<CR> 

Aside: The following non-Greek function key setup is handy for entering vocabulary lists which have italics in HTML:

:imap <F4> <i> 
:imap <F5> </i> 
:map <F4> i<F4><Esc> 
:map <F5> i<F5><Esc> 

7. In-vi(m) Help Pointers

Useful online help in vim:

:help map-overview 
:help unicode 
:help usr_45.txt 
45.3 encodings; using Unicode in the GUI 
45.5 entering language text (keymaps) 
:echo globpath(&rtp, "keymap/*.vim") 
:help usr_40.txt 
about making new commands 
:help i_CTRL-V_digit 
about hexadecimal character entry 

8. Using keymaps

9. Unicode Byte Order Marker (BOM)

I find that with the encoding set up correctly it is not necessary to mess with details such as the Unicode "Byte Order Marker" in my UTF-8 files. Should it be necessary to insert the Unicode Byte Order Marker in a UTF-8 file, though, use CTRL-V u FEFF. Here's an example of the results of doing this with a new (empty) file:

$ hexdump testfile 
0000000 bbef 0abf 

which is the BOM (U+FEFF) in UTF-8 (0xEF 0xBB 0xBF) plus a linefeed (0x0A)

Select Resolution: 0 [other resolutions temporarily disabled due to lack of disk space]