KOI8-R
KOI8-R (RFC 1489) is an 8-bit character encoding, designed to cover Russian, which uses a Cyrillic alphabet. It also happens to cover Bulgarian, but has not been used for that purpose since CP1251 was accepted. A derivative encoding is KOI8-U, which adds Ukrainian characters. The original KOI-8 encoding was designed by Soviet authorities in 1974. KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251. The use of these older code pages is being replaced with Unicode as a more common way to represent Cyrillic together with other languages.
In Microsoft Windows, KOI8-R is assigned the code page number 20866. In IBM, KOI8-R is assigned code page 878.[1][2]
KOI8 stands for Kod Obmena Informatsiey, 8 bit (Russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".
The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the normal Cyrillic alphabetical order as in ISO 8859-5 or Unicode. Although this may seem unnatural, it has the useful property that if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct KOI7. For instance, "Русский Текст" in KOI8-R becomes rUSSKIJ tEKST ("Russian Text") if the 8th bit is stripped; attempting to interpret the ASCII string rUSSKIJ tEKST as KOI7 yields "РУССКИЙ ТЕКСТ". KOI8 was based on Russian Morse code, which was created from Latin Morse code based on sound similarities, and which has the same connection to the Latin Morse codes for A-Z as KOI8 has with ASCII.
Character set
The following table shows the KOI8-R encoding.[1][3] Each character is shown with its equivalent Unicode code point.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0_ 0 |
||||||||||||||||
1_ 16 |
||||||||||||||||
2_ 32 |
SP 0020 |
! 0021 |
" 0022 |
# 0023 |
$ 0024 |
% 0025 |
& 0026 |
' 0027 |
( 0028 |
) 0029 |
* 002A |
+ 002B |
, 002C |
- 002D |
. 002E |
/ 002F |
3_ 48 |
0 0030 |
1 0031 |
2 0032 |
3 0033 |
4 0034 |
5 0035 |
6 0036 |
7 0037 |
8 0038 |
9 0039 |
: 003A |
; 003B |
< 003C |
= 003D |
> 003E |
? 003F |
4_ 64 |
@ 0040 |
A 0041 |
B 0042 |
C 0043 |
D 0044 |
E 0045 |
F 0046 |
G 0047 |
H 0048 |
I 0049 |
J 004A |
K 004B |
L 004C |
M 004D |
N 004E |
O 004F |
5_ 80 |
P 0050 |
Q 0051 |
R 0052 |
S 0053 |
T 0054 |
U 0055 |
V 0056 |
W 0057 |
X 0058 |
Y 0059 |
Z 005A |
[ 005B |
\ 005C |
] 005D |
^ 005E |
_ 005F |
6_ 96 |
` 0060 |
a 0061 |
b 0062 |
c 0063 |
d 0064 |
e 0065 |
f 0066 |
g 0067 |
h 0068 |
i 0069 |
j 006A |
k 006B |
l 006C |
m 006D |
n 006E |
o 006F |
7_ 112 |
p 0070 |
q 0071 |
r 0072 |
s 0073 |
t 0074 |
u 0075 |
v 0076 |
w 0077 |
x 0078 |
y 0079 |
z 007A |
{ 007B |
| 007C |
} 007D |
~ 007E |
|
8_ 128 |
─ 2500 |
│ 2502 |
┌ 250C |
┐ 2510 |
└ 2514 |
┘ 2518 |
├ 251C |
┤ 2524 |
┬ 252C |
┴ 2534 |
┼ 253C |
▀ 2580 |
▄ 2584 |
█ 2588 |
▌ 258C |
▐ 2590 |
9_ 144 |
░ 2591 |
▒ 2592 |
▓ 2593 |
⌠ 2320 |
■ 25A0 |
∙ 2219 |
√ 221A |
≈ 2248 |
≤ 2264 |
≥ 2265 |
NBSP 00A0 |
⌡ 2321 |
° 00B0 |
² 00B2 |
· 00B7 |
÷ 00F7 |
A_ 160 |
═ 2550 |
║ 2551 |
╒ 2552 |
ё 0451 |
╓ 2553 |
╔ 2554 |
╕ 2555 |
╖ 2556 |
╗ 2557 |
╘ 2558 |
╙ 2559 |
╚ 255A |
╛ 255B |
╜ 255C |
╝ 255D |
╞ 255E |
B_ 176 |
╟ 255F |
╠ 2560 |
╡ 2561 |
Ё 0401 |
╢ 2562 |
╣ 2563 |
╤ 2564 |
╥ 2565 |
╦ 2566 |
╧ 2567 |
╨ 2568 |
╩ 2569 |
╪ 256A |
╫ 256B |
╬ 256C |
© 00A9 |
C_ 192 |
ю 044E |
а 0430 |
б 0431 |
ц 0446 |
д 0434 |
е 0435 |
ф 0444 |
г 0433 |
х 0445 |
и 0438 |
й 0439 |
к 043A |
л 043B |
м 043C |
н 043D |
о 043E |
D_ 208 |
п 043F |
я 044F |
р 0440 |
с 0441 |
т 0442 |
у 0443 |
ж 0436 |
в 0432 |
ь 044C |
ы 044B |
з 0437 |
ш 0448 |
э 044D |
щ 0449 |
ч 0447 |
ъ 044A |
E_ 224 |
Ю 042E |
А 0410 |
Б 0411 |
Ц 0426 |
Д 0414 |
Е 0415 |
Ф 0424 |
Г 0413 |
Х 0425 |
И 0418 |
Й 0419 |
К 041A |
Л 041B |
М 041C |
Н 041D |
О 041E |
F_ 240 |
П 041F |
Я 042F |
Р 0420 |
С 0421 |
Т 0422 |
У 0423 |
Ж 0416 |
В 0412 |
Ь 042C |
Ы 042B |
З 0417 |
Ш 0428 |
Э 042D |
Щ 0429 |
Ч 0427 |
Ъ 042A |
See also
- KOI8-B (a derivation of KOI8-R with only the letter subset implemented)
- KOI character encodings
- RELCOM
References
- ^ a b "SBCS code page information - CPGID: 00878 / Name: Russian internet koi8-r". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. IBM. C-H 3-3220-050. Archived from the original on 2017-02-18. Retrieved 2017-02-18. [1] [2]
- ^ "CCSID information document; CCSID 878; KOI8-R CYRILLIC". IBM. Retrieved 2017-02-18.
- ^ Richter, Helmut (2016-01-04) [1999-08-18]. "KOI8-R.TXT". 2.0. Retrieved 2016-12-09.
Further reading
- Flohr, Guido; Kiss, Gabor; Chernov, Andrey A. (2016) [2006]. "Locale::RecodeData::KOI8_R - Conversion routines for KOI8-R". CPAN libintl-perl. 1.0. Archived from the original on 2017-01-15. Retrieved 2017-01-15.
- Kostis, Kosta. "koi8-r (Russian U*IX encoding, also used by RELCOM)". 1.20. Archived from the original on 2017-01-16. Retrieved 2017-01-16.
- RFC 1489
- "KOI8-R (RFC 1489)". Kermit. Columbia University. Retrieved 2017-02-18.
- Kornai, Andras; Birnbaum, David J.; da Cruz, Frank; Davis, Bur; Fowler, George; Paine, Richard B.; Paperno, Slava; Simonsen, Keld J.; Thobe, Glenn E.; Vulis, Dimitri; van Wingen, Johan W. (1993-03-13). "CYRILLIC ENCODING FAQ Version 1.3". 1.3. Retrieved 2017-02-18.
External links
- Universal Cyrillic decoder, an online program that may help recovering Cyrillic texts with broken KOI8-R or other character encodings.
- "The Home of the KOI8-R since 1995". Retrieved 2016-12-05.
- Czyborra, Roman (1998-11-30) [1998-05-25]. "The Cyrillic Charset Soup". Archived from the original on 2016-12-03. Retrieved 2016-12-03.
- Hohlov, Yu. E. "Cyrillic Information Representation in Electronic Form - Character Set (Code Page) Tables". Archived from the original on 2016-12-05. Retrieved 2016-12-05.
- Nechayev, Valentin (2013) [2001]. "Review of 8-bit Cyrillic encodings universe". Archived from the original on 2016-12-05. Retrieved 2016-12-05.