Code Table

This is a tabular overview of the similarities and differences of the following text encodings:

Notes

ISO/IEC 8859 (without hyphen), as defined by ISO/IEC, does not assign control codes to the 0x00-0x1f and 0x7f-0x9f ranges. This is done by its superset ISO-8859 (with hyphen), as defined by IANA, which assigns the C0 and C1 control codes to these code points, as given below.

The "Latin-x" naming of the various ISO-8859 variants is non-continuous. Note the "holes" (8859-5 to 8859-8 and 8859-11 are not "Latin-x").

Standardization of ISO 8859-12 (Devanagari) was officially abandoned in 1997.

IBM CP858 differs from CP850 in only one character: 0xD5 (LATIN SMALL LETTER DOTLESS I), which was replaced with the Euro currency symbol.

Several devices from the IBM codepage era interpret code points 0x01 - 0x1F and 0x7F as graphic characters, but the official encoding tables list the same C0 control codes as given for the Windows and ISO/IEC codepages. The graphic characters are not given here.

Historically, CP1252 was based on an ANSI draft, and calling this encoding "ANSI" is still common in the Microsoft universe despite being a misnomer.

Unicode code points are given as 16bit hexadecimal on this page. This is enough to cover the Basic Multilingual Plane, and by implication, all characters presented here. However, Unicode also specifies several Supplementary Planes (e.g. historic scripts, extended CJK ideographs, emoticons etc.), which are outside the 16bit range. If you need to hold all conceivable Unicode code points, use a 32bit integer.

There are several characters that, from a casual look at their glyphs, seem to be identical, but are not. Special attention is advised at code point 0xd0 (Unicode characters 0x00d0 'Ð' and 0x0110 'Đ'). Other "close" characters are found at 0xd5, 0xd9, 0xe3, 0xf1, 0xf5, 0xfb and 0xf0.

Notation

The top row in each table cell is the character (or the acronym of a control code / special character, see note above).

Bottom row is the Unicode code point as 16-bit hexadecimal (but see note above).

On mouse-over, a tooltip appears with three lines of information:

The octal encoding is presented for use in C/C++ string literals (\ooo is well-defined, whereas \xhh fails to work as expected if the next character is also a valid hex digit).

Control codes and special characters are given as their two- or three-letter acronym, with a bold-line box around their table cell. (Character 0x20 is the standard space).

A code point not assigned a character in a given encoding is given a black box.

Code points that are assigned a character in the given encoding which differs from ISO-8859-1 are given a grey box. (In case of Windows-1250, grey boxes indicate differences from ISO-8859-2.)

Combining characters are given light grey boxes.

0x00 - 0x7f - the similarities

The first 127 code points are identical for all encodings, including UTF-8 (which was the whole point behind the latter).

Code ...0 ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...A ...B ...C ...D ...E ...F
0... NUL
0000
SOH
0001
STX
0002
ETX
0003
EOT
0004
ENQ
0005
ACK
0006
BEL
0007
BS
0008
HT
0009
LF
000a
VT
000b
FF
000c
CR
000d
SO
000e
SI
000f
1... DLE
0010
DC1
0011
DC2
0012
DC3
0013
DC4
0014
NAK
0015
SYN
0016
ETB
0017
CAN
0018
EM
0019
SUB
001a
ESC
001b
FS
001c
GS
001d
RS
001e
US
001f
2... SP
0020
!
0021
"
0022
#
0023
$
0024
%
0025
&
0026
'
0027
(
0028
)
0029
*
002a
+
002b
,
002c
-
002d
.
002e
/
002f
3... 0
0030
1
0031
2
0032
3
0033
4
0034
5
0035
6
0036
7
0037
8
0038
9
0039
:
003a
;
003b
<
003c
=
003d
>
003e
?
003f
4... @
0040
A
0041
B
0042
C
0043
D
0044
E
0045
F
0046
G
0047
H
0048
I
0049
J
004a
K
004b
L
004c
M
004d
N
004e
O
004f
5... P
0050
Q
0051
R
0052
S
0053
T
0054
U
0055
V
0056
W
0057
X
0058
Y
0059
Z
005a
[
005b
\
005c
]
005d
^
005e
_
005f
6... `
0060
a
0061
b
0062
c
0063
d
0064
e
0065
f
0066
g
0067
h
0068
i
0069
j
006a
k
006b
l
006c
m
006d
n
006e
o
006f
7... p
0070
q
0071
r
0072
s
0073
t
0074
u
0075
v
0076
w
0077
x
0078
y
0079
z
007a
{
007b
|
007c
}
007d
~
007e
DEL
007f

0x80 - 0x9f - the twiddly bits

Having a text with codes in this area is a good indication that you are looking at an IBM or Windows encoding, not ISO/IEC (as the C1 control characters are hardly ever used in "normal" text).

CodeISO-8859WindowsIBM
-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -13 -14 -15 -16 -1250 -1252 CP437 CP850
0x80 PAD
0080

20ac
Ç
00c7
0x81 HOP
0081
  ü
00fc
0x82 BPH
0082

201a
é
00e9
0x83 NBH
0083
  ƒ
0192
â
00e2
0x84 IND
0084

201e
ä
00e4
0x85 NEL
0085

2026
à
00e0
0x86 SSA
0086

2020
å
00e5
0x87 ESA
0087

2021
ç
00e7
0x88 HTS
0088
  ˆ
02c6
ê
00ea
0x89 HTJ
0089

2030
ë
00eb
0x8A VTS
008a
Š
0160
è
00e8
0x8B PLD
008b

2039
ï
00ef
0x8C PLU
008c
Ś
015a
Œ
0152
î
00ee
0x8D RI
008d
Ť
0164
  ì
00ec
0x8E SS2
008e
Ž
017d
Ä
00c4
0x8F SS3
008f
Ź
0179
Å
00c5
0x90 DCS
0090
  É
00c9
0x91 PU1
0091

2018
æ
00e6
0x92 PU2
0092

2019
Æ
00c6
0x93 STS
0093

201c
ô
00f4
0x94 CCH
0094

201d
ö
00f6
0x95 MW
0095

2022
ò
00f2
0x96 SPA
0096

2013
û
00fb
0x97 EPA
0097

2014
ù
00f9
0x98 SOS
0098
  ˜
02dc
ÿ
00ff
0x99 SGCI
0099

2122
Ö
00d6
0x9A SCI
009a
š
0161
Ü
00dc
0x9B CSI
009b

203a
¢
00a2
ø
00f8
0x9C ST
009c
ś
015B
œ
0153
£
00a3
0x9D OSC
009d
ť
0165
  ¥
00a5
Ø
00d8
0x9E PM
009e
ž
017E

20A7
×
00d7
0x9F APC
009f
ź
017A
Ÿ
0178
ƒ
0192

0xA0 - 0xFF - The Long March

This is where all encodings have their respective special characters. If you have text with codes from this area (and none in the 0x80 - 0x9f area above), you can only guess as to which encoding you are looking at. Try to find out which characters would make sense in their respective places, and strike off those encodings that would not give meaningful results. (Always keeping in mind that there might be typos in the text and perhaps not all characters do make sense.)

Code ISO-8859 Windows IBM
-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -13 -14 -15 -16 -1250 -1252 CP437 CP850
0xA0 NBSP
00a0
á
00e1
0xA1 ¡
00a1
Ą
0104
Ħ
0126
Ą
0104
Ё
0401
 
2018
  ¡
00a1
Ą
0104

0e01

201d

1e02
¡
00a1
Ą
0104
ˇ
02c7
¡
00a1
í
00ed
0xA2 ¢
00a2
˘
02d8
ĸ
0138
Ђ
0402
 
2019
¢
00a2
Ē
0112

0e02
¢
00a2

1e03
¢
00a2
ą
0105
˘
02d8
¢
00a2
ó
00f3
0xA3 £
00a3
Ł
0141
£
00a3
Ŗ
0156
Ѓ
0403
  £
00a3
Ģ
0122

0e03
£
00a3
Ł
0141
Ł
0141
£
00a3
ú
00fa
0xA4 ¤
00a4
Є
0404
¤
00a4

20ac
¤
00a4
Ī
012a

0e04
¤
00a4
Ċ
010a

20ac
¤
00a4
ñ
00f1
0xA5 ¥
00a5
Ľ
013d
  Ĩ
0128
Ѕ
0405
 
20af
¥
00a5
Ĩ
0128

0e05

201e
ċ
010b
¥
00a5

201e
Ą
0104
¥
00a5
Ñ
00d1
0xA6 ¦
00a6
Ś
015a
Ĥ
0124
Ļ
013b
І
0406
  ¦
00a6
Ķ
0136

0e06
¦
00a6

1e0a
Š
0160
¦
00a6
¦
00a6
ª
00aa
0xA7 §
00a7
Ї
0407
  §
00a7

0e07
§
00a7
º
00ba
0xA8 ¨
00a8
Ј
0408
  ¨
00a8
Ļ
013b

0e08
Ø
00d8

1e80
š
0161
¨
00a8
¿
00bf
0xA9 ©
00a9
Š
0160
İ
0130
Š
0160
Љ
0409
  ©
00a9
Đ
0110

0e09
©
00a9

2310
®
00ae
0xAA ª
00aa
Ş
015e
Ē
0112
Њ
040a
  ͺ
037a
×
00d7
ª
00aa
Š
0160

0e0a
Ŗ
0156

1e82
ª
00aa
Ș
0218
Ş
015e
ª
00aa
¬
00ac
0xAB «
00ab
Ť
0164
Ğ
011e
Ģ
0122
Ћ
040b
  «
00ab
Ŧ
0166

0e0b
«
00ab

1e0b
«
00ab
«
00ab
«
00ab
½
00bd
0xAC ¬
00ac
Ź
0179
Ĵ
0134
Ŧ
0166
Ќ
040c
،
060c
¬
00ac
Ž
017d

0e0c
¬
00ac

1ef2
¬
00ac
Ź
0179
¬
00ac
¬
00ac
¼
00bc
0xAD SHY
00ad

0e0d
SHY
00ad
¡
00a1
0xAE ®
00ae
Ž
017d
  Ž
017d
Ў
040e
  ®
00ae
Ū
016a

0e0e
®
00ae
ź
017a
®
00ae
®
00ae
«
00ab
0xAF ¯
00af
Ż
017b
¯
00af
Џ
040f
 
2015
¯
00af
Ŋ
014a

0e0f
Æ
00c6
Ÿ
0178
¯
00af
Ż
017b
Ż
017b
¯
00af
»
00bb
0xB0 °
00b0
А
0410
  °
00b0

0e10
°
00b0

1e1e
°
00b0

2591
0xB1 ±
00b1
ą
0105
ħ
0127
ą
0105
Б
0411
  ±
00b1
ą
0105

0e11
±
00b1

1e1f
±
00b1
±
00b1
±
00b1

2592
0xB2 ²
00b2
˛
02db
²
00b2
˛
02db
В
0412
  ²
00b2
ē
0113

0e12
²
00b2
Ġ
0120
²
00b2
Č
010c
˛
02db
²
00b2

2593
0xB3 ³
00b3
ł
0142
³
00b3
ŗ
0157
Г
0413
  ³
00b3
ģ
0123

0e13
³
00b3
ġ
0121
³
00b3
ł
0142
ł
0142
³
00b3

2502
0xB4 ´
00b4
Д
0414
  ΄
0384
ī
012b

0e14

201c

1e40
Ž
017d
´
00b4

2524
0xB5 µ
00b5
ľ
013e
µ
00b5
ĩ
0129
Е
0415
  ΅
0385
µ
00b5
ĩ
0129

0e15
µ
00b5

1e41
µ
00b5

201d
µ
00b5
µ
00b5

2561
Á
00c1
0xB6
00b6
ś
015b
ĥ
0125
ļ
013c
Ж
0416
  Ά
0386

00b6
ķ
0137

0e16

00b6

00b6

00b6

2562
Â
00c2
0xB7 ·
00b7
ˇ
02c7
·
00b7
ˇ
02c7
З
0417
  ·
00b7

0e17
·
00b7

1e56
·
00b7
·
00b7
·
00b7

2556
À
00c0
0xB8 ¸
00b8
И
0418
  Έ
0388
¸
00b8
ļ
013c

0e18
ø
00f8

1e81
ž
017e
¸
00b8

2555
©
00a9
0xB9 ¹
00b9
š
0161
ı
0131
š
0161
Й
0419
  Ή
0389
¹
00b9
đ
0111

0e19
¹
00b9

1e57
¹
00b9
č
010d
ą
0105
¹
00b9

2563
0xBA º
00ba
ş
015f
ē
0113
К
041a
  Ί
038a
÷
00f7
º
00ba
š
0161

0e1a
ŗ
0157

1e83
º
00ba
ș
0219
ş
015f
º
00ba

2551
0xBB »
00bb
ť
0165
ğ
011f
ģ
0123
Л
041b
؛
061b
»
00bb
ŧ
0167

0e1b
»
00bb

1e60
»
00bb
»
00bb
»
00bb

2557
0xBC ¼
00bc
ź
017a
ĵ
0135
ŧ
0167
М
041c
  Ό
038c
¼
00bc
ž
017e

0e1c
¼
00bc

1ef3
Œ
0152
Ľ
013d
¼
00bc

255d
0xBD ½
00bd
˝
02dd
½
00bd
Ŋ
014a
Н
041d
  ½
00bd

2015

0e1d
½
00bd

1e84
œ
0153
˝
02dd
½
00bd

255c
¢
00a2
0xBE ¾
00be
ž
017e
  ž
017e
О
041e
  Ύ
038e
¾
00be
ū
016b

0e1e
¾
00be

1e85
Ÿ
0178
ľ
013e
¾
00be

255b
¥
00a5
0xBF ¿
00bf
ż
017c
ŋ
014b
П
041f
؟
061f
Ώ
038f
  ¿
00bf
ŋ
014b

0e1f
æ
00e6

1e61
¿
00bf
ż
017c
ż
017c
¿
00bf

2510
0xC0 À
00c0
ş
015f
À
00c0
Ā
0100
Р
0420
  ΐ
0390
  À
00c0
Ā
0100

0e20
Ą
0104
À
00c0
ş
015f
À
00c0

2514
0xC1 Á
00c1
С
0421
ء
0621
Α
0391
  Á
00c1

0e21
Į
012e
Á
00c1

2534
0xC2 Â
00c2
Т
0422
آ
0622
Β
0392
  Â
00c2

0e22
Ā
0100
Â
00c2

252c
0xC3 Ã
00c3
Ă
0102
  Ã
00c3
У
0423
أ
0623
Γ
0393
  Ã
00c3

0e23
Ć
0106
Ã
00c3
Ă
0102
Ă
0102

251c
0xC4 Ä
00c4
Ф
0424
ؤ
0624
Δ
0394
  Ä
00c4

0e24
Ä
00c4

2500
0xC5 Å
00c5
Ĺ
0139
Ċ
010a
Å
00c5
Х
0425
إ
0625
Ε
0395
  Å
00c5

0e25
Å
00c5
Ć
0106
Ĺ
0139
Å
00c5

253c
0xC6 Æ
00c6
Ć
0106
Ĉ
0108
Æ
00c6
Ц
0426
ئ
0626
Ζ
0396
  Æ
00c6

0e26
Ę
0118
Æ
00c6
Ć
0106
Æ
00c6

255e
ã
00e3
0xC7 Ç
00c7
Į
012e
Ч
0427
ا
0627
Η
0397
  Ç
00c7
Į
012e

0e27
Ē
0112
Ç
00c7

255f
Ã
00c3
0xC8 È
00c8
Č
010c
È
00c8
Č
010c
Ш
0428
ب
0628
Θ
0398
  È
00c8
Č
010c

0e28
Č
010c
È
00c8
Č
010c
È
00c8

255a
0xC9 É
00c9
Щ
0429
ة
0629
Ι
0399
  É
00c9

0e29
É
00c9

2554
0xCA Ê
00ca
Ę
0118
Ê
00ca
Ę
0118
Ъ
042a
ت
062a
Κ
039a
  Ê
00ca
Ę
0118

0e2a
Ź
0179
Ê
00ca
Ę
0118
Ê
00ca

2569
0xCB Ë
00cb
Ы
042b
ث
062b
Λ
039b
  Ë
00cb

0e2b
Ė
0116
Ë
00cb

2566
0xCC Ì
00cc
Ě
011a
Ì
00cc
Ė
0116
Ь
042c
ج
062c
Μ
039c
  Ì
00cc
Ė
0116

0e2c
Ģ
0122
Ì
00cc
Ě
011a
Ì
00cc

2560
0xCD Í
00cd
Э
042d
ح
062d
Ν
039d
  Í
00cd

0e2d
Ķ
0136
Í
00cd

2550
0xCE Î
00ce
Ю
042e
خ
062e
Ξ
039e
  Î
00ce

0e2e
Ī
012a
Î
00ce

256c
0xCF Ï
00cf
Ď
010e
Ï
00cf
Ī
012a
Я
042f
د
062f
Ο
039f
  Ï
00cf

0e2f
Ļ
013b
Ï
00cf
Ď
010e
Ï
00cf

2567
¤
00a4
0xD0 Ð
00d0
Đ
0110
  Đ
0110
а
0430
ذ
0630
Π
03a0
  Ğ
011e
Ð
00d0

0e30
Š
0160
Ŵ
0174
Ð
00d0
Đ
0110
Đ
0110
Ð
00d0

2568
ð
00f0
0xD1 Ñ
00d1
Ń
0143
Ñ
00d1
Ņ
0145
б
0431
ر
0631
Ρ
03a1
  Ñ
00d1
Ņ
0145

0e31
Ń
0143
Ñ
00d1
Ń
0143
Ń
0143
Ñ
00d1

2564
Ð
00d0
0xD2 Ò
00d2
Ň
0147
Ò
00d2
Ō
014c
в
0432
ز
0632
  Ò
00d2
Ō
014c

0e32
Ņ
0145
Ò
00d2
Ň
0147
Ò
00d2

2565
Ê
00ca
0xD3 Ó
00d3
Ķ
0136
г
0433
س
0633
Σ
03a3
  Ó
00d3

0e33
Ó
00d3

2559
Ë
00cb
0xD4 Ô
00d4
д
0434
ش
0634
Τ
03a4
  Ô
00d4

0e34
Ō
014c
Ô
00d4

2558
È
00c8
0xD5 Õ
00d5
Ő
0150
Ġ
0120
Õ
00d5
е
0435
ص
0635
Υ
03a5
  Õ
00d5

0e35
Õ
00d5
Ő
0150
Ő
0150
Õ
00d5

2552
ı
0131
0xD6 Ö
00d6
ж
0436
ض
0636
Φ
03a6
  Ö
00d6

0e36
Ö
00d6

2553
Í
00cd
0xD7 ×
00d7
з
0437
ط
0637
Χ
03a7
  ×
00d7
Ũ
0168

0e37
×
00d7

1e6a
×
00d7
Ś
015a
×
00d7

256b
Î
00ce
0xD8 Ø
00d8
Ř
0158
Ĝ
011c
Ø
00d8
и
0438
ظ
0638
Ψ
03a8
  Ø
00d8

0e38
Ų
0172
Ø
00d8
Ű
0170
Ř
0158
Ø
00d8

256a
Ï
00cf
0xD9 Ù
00d9
Ů
016e
Ù
00d9
Ų
0172
й
0439
ع
0639
Ω
03a9
  Ù
00d9
Ų
0172

0e39
Ł
0141
Ù
00d9

2518
0xDA Ú
00da
к
043a
غ
063a
Ϊ
03aa
  Ú
00da

0e3a
Ś
015a
Ú
00da

250c
0xDB Û
00db
Ű
0170
Û
00db
л
043b
  Ϋ
03ab
  Û
00db
  Ū
016a
Û
00db

2588
0xDC Ü
00dc
м
043c
  ά
03ac
  Ü
00dc
  Ü
00dc

2584
0xDD Ý
00dd
Ŭ
016c
Ũ
0168
н
043d
  έ
03ad
  İ
0130
Ý
00dd
  Ż
017b
Ý
00dd
Ę
0118
Ý
00dd

258c
¦
00a6
0xDE Þ
00de
Ţ
0162
Ŝ
015c
Ū
016a
о
043e
  ή
03ae
  Ş
015e
Þ
00de
  Ž
017d
Ŷ
0176
Þ
00de
Ț
021a
Ţ
0162
Þ
00de

2590
Ì
00cc
0xDF ß
00df
п
043f
  ί
03af

2017
ß
00df
฿
0e3f
ß
00df

2580
0xE0 à
00e0
ŕ
0155
à
00e0
ā
0101
р
0440
ـ
0640
ΰ
03b0
א
05d0
à
00e0
ā
0101

0e40
ą
0105
à
00e0
ŕ
0155
à
00e0
α
03b1
Ó
00d3
0xE1 á
00e1
с
0441
ف
0641
α
03b1
ב
05d1
á
00e1

0e41
į
012f
á
00e1
ß
00df
0xE2 â
00e2
т
0442
ق
0642
β
03b2
ג
05d2
â
00e2

0e42
ā
0101
â
00e2
Γ
0393
Ô
00d4
0xE3 ã
00e3
ă
0103
  ã
00e3
у
0443
ك
0643
γ
03b3
ד
05d3
ã
00e3

0e43
ć
0107
ã
00e3
ă
0103
ă
0103
ã
00e3
π
03c0
Ò
00d2
0xE4 ä
00e4
ф
0444
ل
0644
δ
03b4
ה
05d4
ä
00e4

0e44
ä
00e4
Σ
03a3
õ
00f5
0xE5 å
00e5
ĺ
013a
ċ
010b
å
00e5
х
0445
م
0645
ε
03b5
ו
05d5
å
00e5

0e45
å
00e5
ć
0107
ĺ
013a
å
00e5
σ
03c3
Õ
00d5
0xE6 æ
00e6
ć
0107
ĉ
0109
æ
00e6
ц
0446
ن
0646
ζ
03b6
ז
05d6
æ
00e6

0e46
ę
0119
æ
00e6
ć
0107
æ
00e6
µ
00b5
0xE7 ç
00e7
į
012f
ч
0447
ه
0647
η
03b7
ח
05d7
ç
00e7
į
012f

0e47
ē
0113
ç
00e7
τ
03c4
þ
00fe
0xE8 è
00e8
č
010d
è
00e8
č
010d
ш
0448
و
0648
θ
03b8
ט
05d8
è
00e8
č
010d

0e48
č
010d
è
00e8
č
010d
è
00e8
Φ
03a6
Þ
00de
0xE9 é
00e9
щ
0449
ى
0649
ι
03b9
י
05d9
é
00e9

0e49
é
00e9
Θ
0398
Ú
00da
0xEA ê
00ea
ę
0119
ê
00ea
ę
0119
ъ
044a
ي
064a
κ
03ba
ך
05da
ê
00ea
ę
0119

0e4a
ź
017a
ê
00ea
ę
0119
ê
00ea
Ω
03a9
Û
00db
0xEB ë
00eb
ы
044b
ً
064b
λ
03bb
כ
05db
ë
00eb

0e4b
ė
0117
ë
00eb
δ
03b4
Ù
00d9
0xEC ì
00ec
ě
011b
ì
00ec
ė
0117
ь
044c
ٌ
064c
μ
03bc
ל
05dc
ì
00ec
ė
0117

0e4c
ģ
0123
ì
00ec
ě
011b
ì
00ec

221e
ý
00fd
0xED í
00ed
э
044d
ٍ
064d
ν
03bd
ם
05dd
í
00ed

0e4d
ķ
0137
í
00ed
φ
03c6
Ý
00dd
0xEE î
00ee
ю
044e
َ
064e
ξ
03be
מ
05de
î
00ee

0e4e
ī
012b
î
00ee
ε
03b5
¯
00af
0xEF ï
00ef
ď
010f
ï
00ef
ī
012b
я
044f
ُ
064f
ο
03bf
ן
05df
ï
00ef

0e4f
ļ
013c
ï
00ef
ď
010f
ï
00ef

2229
´
00b4
0xF0 ð
00f0
đ
0111
  đ
0111

2116
ِ
0650
π
03c0
נ
05e0
ğ
011f
ð
00f0

0e50
š
0161
ŵ
0175
ð
00f0
đ
0111
đ
0111
ð
00f0

2261
SHY
00ad
0xF1 ñ
00f1
ń
0144
ñ
00f1
ņ
0146
ё
0451
ّ
0651
ρ
03c1
ס
05e1
ñ
00f1
ņ
0146

0e51
ń
0144
ñ
00f1
ń
0144
ń
0144
ñ
00f1
±
00b1
0xF2 ò
00f2
ň
0148
ò
00f2
ō
014d
ђ
0452
ْ
0652
ς
03c2
ע
05e2
ò
00f2
ō
014d

0e52
ņ
0146
ò
00f2
ň
0148
ò
00f2

2265

2017
0xF3 ó
00f3
ķ
0137
ѓ
0453
  σ
03c3
ף
05e3
ó
00f3

0e53
ó
00f3

2264
¾
00be
0xF4 ô
00f4
є
0454
  τ
03c4
פ
05e4
ô
00f4

0e54
ō
014d
ô
00f4

2320

00b6
0xF5 õ
00f5
ő
0151
ġ
0121
õ
00f5
ѕ
0455
  υ
03c5
ץ
05e5
õ
00f5

0e55
õ
00f5
ő
0151
ő
0151
õ
00f5

2321
§
00a7
0xF6 ö
00f6
і
0456
  φ
03c6
צ
05e6
ö
00f6

0e56
ö
00f6
÷
00f7
0xF7 ÷
00f7
ї
0457
  χ
03c7
ק
05e7
÷
00f7
ũ
0169

0e57
÷
00f7

1e6b
÷
00f7
ś
015b
÷
00f7

2248
¸
00b8
0xF8 ø
00f8
ř
0159
ĝ
011d
ø
00f8
ј
0458
  ψ
03c8
ר
05e8
ø
00f8

0e58
ų
0173
ø
00f8
ű
0171
ř
0159
ø
00f8
°
00b0
0xF9 ù
00f9
ů
016f
ù
00f9
ų
0173
љ
0459
  ω
03c9
ש
05e9
ù
00f9
ų
0173

0e59
ł
0142
ù
00f9
ů
016f
ù
00f9

2219
¨
00a8
0xFA ú
00fa
њ
045a
  ϊ
03ca
ת
05ea
ú
00fa

0e5a
ś
015b
ú
00fa
·
00b7
0xFB û
00fb
ű
0171
û
00fb
ћ
045b
  ϋ
03cb
  û
00fb

0e5b
ū
016b
û
00fb
ű
0171
û
00fb

221a
¹
00b9
0xFC ü
00fc
ќ
045c
  ό
03cc
  ü
00fc
  ü
00fc

207f
³
00b3
0xFD ý
00fd
ŭ
016d
ũ
0169
§
00a7
  ύ
03cd
LRM
200e
ı
0131
ý
00fd
  ż
017c
ý
00fd
ę
0119
ý
00fd
²
00b2
0xFE þ
00fe
ţ
0163
ŝ
015d
ū
016b
ў
045e
  ώ
03ce
RLM
200f
ş
015f
þ
00fe
  ž
017e
ŷ
0177
þ
00fe
ț
021b
ţ
0163
þ
00fe

25a0
0xFF ÿ
00ff
˙
02d9
џ
045f
  ÿ
00ff
ĸ
0138
 
2019
ÿ
00ff
˙
02d9
ÿ
00ff
NBSP
00a0

ISO 8859-1 vs. ISO 8859-15 vs. Windows-1252 vs. Unicode

The Latin-1, Latin-9 and Windows-1252 encodings are very difficult to tell apart, especially if the file in question has only a few characters in the "differing" range. Thus a table listing the characters in which Latin-1 and Latin-9 differ, including their encodings in Windows-1252.

Character Š š Ž ž Œ œ Ÿ ¤ ¦ ¨ ´ ¸ ¼ ½ ¾
ISO 8859-1                 A4 A6 A8 B4 B8 BC BD BE
ISO 8859-15 A4 A6 A8 B4 B8 BC BD BE                
Windows-1252 80 8A 9A 8E 9E 8C 9C 9F A4 A6 A8 B4 B8 BC BD BE
Unicode 20ac 0160 0161 017d 017e 0152 0153 0178 00a4 00a6 00a8 00b4 00b8 00bc 00bd 00be
UTF-8 e2 82 ac c5 a0 c5 a1 c5 bd c5 be c5 92 c5 93 c5 b8 c2 a4 c2 a6 c2 a8 c2 b4 c2 b8 c2 bc c2 bd c2 be

Windows Codepages for ISO Encodings

If your environment forces you to state an encoding using codepage numbers, but you want to use ISO/IEC Latin encodings, here is a list of how Windows refers to those standard encodings:

ISO/IEC Latin Windows Codepage
ISO 8859-1 (Latin-1 Western European) Windows-28591
ISO 8859-2 (Latin-2 Central European) Windows-28592
ISO 8859-3 (Latin-3 South European) Windows-28593
ISO 8859-4 (Latin-4 North European) Windows-28594
ISO 8859-5 (Latin / Cyrillic) Windows-28595
ISO 8859-6 (Latin / Arabic) Windows-28596
ISO 8859-7 (Latin / Greek) Windows-28597
ISO 8859-8 (Latin / Hebrew) Windows-28598
ISO 8859-9 (Latin-5 Turkish) Windows-28599
ISO 8859-10 (Latin-6 Nordic) Windows-28600
ISO 8859-11 (Latin / Thai) Windows-874 comes close but is not identical...
ISO 8859-13 (Latin-7 Baltic Rim) Windows-28603
ISO 8859-14 (Latin-8 Celtic) Windows-28604
ISO 8859-15 (Latin-9) Windows-28605
ISO 8859-16 (Latin-10 South-Eastern European) Windows-28606 (?)

Demonstration of variable length vs. wide encoding

Most people have problems with understanding that UTF-16 is not a "wide" encoding. Just like with UTF-8, a single code point can require more than one code unit (up to four 8-bit code units in UTF-8, up to two 16-bit code units in UTF-16). The only really wide encoding is UTF-32, which takes 32 bits for every code point.

To showcase this, some example characters and their respective encoding. All code units are hexadecimal.

Character a ä Š 0x2f929
ISO-8859-15 61 e4 a6 a4 ---
UTF-32 00000061 000000e4 00000160 000020ac 0002f929
UTF-16 0061 00e4 0160 20ac d87e dd29
UTF-8 61 c3 a4 c5 a0 e2 82 ac f0 af a4 a9

And then you still have to consider that a code point does not necessarily equal a character, and that a given character does not imply a specific code point.

The character Ü, for example, can be encoded as either 0x00dc (LATIN CAPITAL LETTER U WITH DIAERESIS), or as the sequence of the two code points 0x0055 0x0308 (LATIN CAPITAL LETTER U, COMBINING DIAERESIS).

With UTF-16 and UTF-32, you also have to consider endianess. To enable parsers to determine endianess of a text automatically, a special character is put at the beginning, the Byte Order Mark (BOM, code point U+feff). If no BOM exists, Big Endian should be assumed.

UTF-8 does not have this issue, and a BOM is not required.

Rule of thumb when looking at UTF-8: Any byte value of 0x80 or higher is part of a multi-byte sequence.

Rule of thumb when looking at UTF-16: Any 16-bit value between 0xd800 and 0xdfff is part of a surrogate pair.

Weblinks