/ / HZEncode.cpp: Defines the entry point for the console application.
/ /
/ *
References:
Chinese character encoding and representation
1) The character exchange code (GB Code) Chinese exchange code (GB code) is mainly used for character information exchange.
GB Code: The National Bureau of Standards issued in 1980, "Information exchange with the Chinese coded character set" basic set "(code-named GB2312 80) provided for the exchange code as a national standard Chinese character encoding.GB2312 80 symbols total of 7445 characters: a Chinese character symbol 6763 3755 (in alphabetical order by pinyin) two characters 3008 (alphabetical order by radicals) non-character symbols 682 GB2312 80 provides that allGB code Chinese characters and symbols to form a 94 94 square.In this matrix, each row is called a "zone", each column is called a "bit."This matrix is actually composed of a 94 zone (numbers from 01-94), each district has 94 bits (numbers from 01 to 94) of the Chinese character set.A character position where the area code and number combinations to form the Chinese characters of "area code."Among them, two of the code of high and low for the bit number two.This area code can uniquely identify a particular character or characters; the contrary, any character or symbol corresponds to a unique area code, there is no re-code.
Area code distributed as follows:
The contents of a zone code of various symbols on the keyboard is not all Zone 2 Zone 3 serial number of symbols on the keyboard (given by the Chinese way) 4-5 District 6 District, the Japanese alphabet Russian alphabet Greek alphabet 7 District 8 Districtidentify phonetic alphabet vowels and tones the name of tab symbols Area 9 10 - 15 with a 16-55 district-level area is not character (in Pinyin alphabetical order) 56--87 area two characters (in alphabetical order radicals) 88- 94 custom character area
From the above we can see that all the Chinese characters and symbols of the 94 areas can be divided into four groups:
① 1 -15 zone: a graphic symbol area.Of which 19 areas as the standard symbol area; 10 15 area area for the custom symbol.
② 16 -55 zone: as a character area, including the 3755 Chinese characters.Character of these areas in Alphabetical order, homonyms are listed in order by stroke.
③ 56 -87 zone: zone for the two Chinese characters, contains 3,008 Chinese characters.The character of these areas is the sort order by radical stroke.
④ 88 -94 zone: for the custom character area.
GB Code provides that every character (including some non-character symbol) expressed by a 2 byte code.The highest bit of each byte is 0, use only the lower 7 bits, while the lower 7-bit encoding of 34 have used for control, so that each byte is only 27 - 34 = 94 codes for Chinese characters.2 bytes have 9494 = 8836 character encoding.Said a Chinese character in 2 bytes, high byte code corresponding to the line number in the table, called the area code; low byte code corresponding to the column number in the table, called bit number.
The scope of Chinese national standard code in binary is: 0,010,000,100,100,001 0,111,111,001,111,110 (1 +32) 10 (1 +32) 10 (94 +32) 10 (94 +32) 10 7-bit ASCII character code is composed of 128 characters.Which encodes the value of 031 (0,000,000,000,011,111) does not correspond to any printing characters, often referred to as control characters, used in computer communications or computer equipment, communication control function of control.Code value 32 (00100000) is the space character SP.Code value 127 (1111111) is to remove the characters DEL.
Chinese national standard binary code of the starting position of choice is 00100001 (33) 10 is the ASCII code to skip the 32 control characters and space characters.Therefore, the Chinese national standard code of high and low respectively, compared with the corresponding area code Large (32) 10 or (00100000) 2 or (20) H, namely: the national standard code high = area code + 20H (H hexadecimal) GBcode-bit code + 20H = Low
2) Chinese machine code (the code) (Chinese storage yards)
Chinese machine code (the code) (Chinese characters stored code) is to unify the various Chinese character input code representation inside the computer.In order to enter the code in a variety of characters within the unified computer, there is a store dedicated to the characters inside the computer machine code using the characters to be input using a variety of Chinese character input unified code into machine code for Chinese charactersstorage, to facilitate the Chinese character processing inside the machine code is in computer storage, processing code.Computer it is necessary to deal with Chinese characters, have to deal with the English.Therefore, the computer must be able to distinguish between Chinese characters and English characters.English characters of the machine code is the highest for the 8-bit ASCII code 0.In order not to 7-bit ASCII code with the conflict, the national standard code for each byte of the highest bit from 0 to 1, and the remaining bits remain unchanged as the Chinese character encoding machine code.
Chinese machine code of the range in binary is: 1,010,000,110,100,001 1,111,111,011,111,110 machine code of the highs and lows than the corresponding national standard code of high and low large (128) 10 or (10000000) 2 or (80) H, namely: the machinecode high = GB code high + 80H machine code low = GB code low + 80H and also because: GB code high = area code + 20H GB code low = bit code + 20H so: machine code high = area code + A0H machinecode low bit code + A0H = that is, machine code machine code high and low respectively, compared with the corresponding area code and location code large (160) 10 or (10100000) 2 or (A0) H Example: the Chinese character "ah"The area code is" 1601 ", in which area code is (16) 10 or (10) H,-bit code (01) 10 or (01) H.Then: machine code high = 10H + A0H = B0H machine code low = 01H + A0H = A1H so: machine code = B0A1H
The following is quoted fragment: