HTML character set


To display a correctly HTML page , The browser must know the character set to use ( Character encoding ).


HTML character set

stay HTML in , What is the correct character encoding ?

HTML5 The default character encoding in is UTF-8.

This is not always the case . The character encoding of early networks was ASCII code .

Later , from HTML 2.0 To HTML 4.01,ISO-8859-1 Identified as standard .

along with XML and HTML5 The emergence of ,UTF-8 It's finally here , It solves a large number of character coding problems .

The following is a brief overview of the character coding standard .


At the beginning :ASCII

Computer information ( number 、 written words 、 Picture ) In electronics, it is in binary 1 and 0(01000101) For storage .

In order to standardize the storage of alphanumeric characters , Created ASCII( Full name American Standard Code for Information Interchange). It defines a unique binary for each stored character 7 Digit number , support 0-9 number , large / Lowercase English letters (a-z、A-Z) And some special characters , such as ! $ + - ( ) @ < > .

because ASCII Use one byte (7 Bits represent characters ,1 Bit indicates transmission parity control ), So it can only mean 128 Different characters . There are... In these characters 32 One is reserved for use for other control purposes .

ASCII The biggest disadvantage of is , It excludes non English letters .

ASCII It is still widely used today , Especially in large computer systems .

For further understanding ASCII, Please check complete ASCII Reference manual .


stay Windows in :ANSI

ANSI( Also known as Windows-1252), yes Windows 95 And before Windows The default character set in the system .

ANSI yes ASCII The extension of , It adds international characters . It uses a complete byte (8 Bit ) To show 256 Different characters .

since ANSI become Windows The default character set in , All browsers support ANSI.

For further understanding ANSI, Please check complete ANSI Reference manual .


stay HTML 4 in :ISO-8859-1

Because most countries use ASCII Characters other than , stay HTML 2.0 In the standard , The default character encoding is changed to ISO-8859-1.

ISO-8859-1 yes ASCII The extension of , It adds international characters . And ANSI Same , It uses a complete byte (8 Bit ) To show 256 Different characters .

Note When the browser detects ISO-8859-1 Time , Usually the default is ANSI, Because except ANSI have 32 An extra character for this , Other aspects ANSI Basically equivalent to ISO-8859-1.

If HTML 4 The web page uses a different language from ISO-8859-1 Character set for , You need to be in <meta> Specify... In the label , As shown below :

example

< meta http-equiv = " Content-Type " content = " text/html;charset=ISO-8859-8 " >

Note

HTML5 The default character set in is UTF-8.
be-all HTML 4 All processors support UTF-8, be-all HTML5 and XML All processors support UTF-8 and UTF-16.

For further understanding ISO-8859-1, Please check complete ISO-8859-1 Reference manual .


stay HTML5 in :Unicode(UTF-8)

Because the character set listed above is limited , Is incompatible in a multilingual environment , So Unicode union (Unicode Consortium) Developed Unicode Standard (Unicode Standard).

Unicode The standard covers ( Almost ) All the characters 、 Punctuation and symbols .

Unicode Make text processing 、 Storage and transportation , Platform and language independent .

HTML5 The default character encoding in is UTF-8.

For further understanding Unicode(UTF-8), Please check complete Unicode Reference manual .