Stay Hungry: Enconding Knowledge

Sunday, 9 December 2018

Enconding Knowledge

ASCII, UNICODE, UTF8

that ASCII is a Code-point + Encoding scheme, and in modern times, we use Unicode as the Code-point scheme and UTF-8 as the Encoding scheme. Yes, except that UTF-8 is an encoding scheme. Other encoding schemes include UTF-16 (with two different byte orders) and UTF-32. (For some confusion, a UTF-16 scheme is called “Unicode” in Microsoft software.) And, to be exact, the American National Standard that defines ASCII specifies a collection of characters and their coding as 7-bit quantities, without specifying a particular transfer encoding in terms of bytes. In the past, it was used in different ways, e.g. so that five ASCII characters were packed into one 36-bit storage unit or so that 8-bit bytes used the extra bytes for checking purposes (parity bit) or for transfer control. But nowadays ASCII is used so that one ASCII character is encoded as one 8-bit byte with the first bit set to zero. This is the de facto standard encoding scheme and implied in a large number of specifications, but strictly speaking not part of the ASCII standard. ASCII used 0-127 dec code point(numeric value takes up code space) to represent english char and symbol, then to encode it uses 1 byte (7 bits) to represent each code point Unicode uses way more codepoint(to suppourt more language), UTF8 uses 1-4 byte to represent each code point.(UTF8 is backward compatibile of ASCII, ASCII is just an extension which uses 1 byte(7 bits ) to represent english chars ASCII might not be able to represent all chars, usually ASCII to get code points then base64encode into a string, and then ASCII that string again to include more symbols.

Sources

Stack Overflow

Stay Hungry

Sunday, 9 December 2018

Enconding Knowledge

ASCII, UNICODE, UTF8

Sources

No comments:

Post a Comment