ASCII is an acronym that you might have heard in relation to computer text, but it’s a term that's rapidly falling out of use thanks to a more powerful newcomer. But what is ASCII, and what is it used for?
What Does ASCII Stand For?
Perhaps the easiest place to begin is the acronym itself, so let’s expand it:
American Standard Code for Information Interchange
This mouthful of a phrase doesn’t really give the complete picture, but some parts immediately offer some clues, notably the first two words. ASCII is an American Standard, the significance of which will soon become apparent.
“Code for Information Interchange” suggests we’re talking about a format for passing data back and forth. Specifically, ASCII deals with textual data: characters making up words in a typically human-readable language.
ASCII solves the problem of how to assign values to letters and other characters so that, when they’re stored as ones and zeroes in a file, they can be translated back into letters when the file is read later. If different computer systems agree on the same code to use, such information can be interchanged reliably.
The History of ASCII
Sometimes referred to as US-ASCII, ASCII was an American innovation developed in the 1960s. The standard has undergone many revisions since, primarily in 1977 and 1986, when ASCII was last updated.
Extensions and variations have built upon ASCII over the years, mainly to cater for the fact that ASCII omits many characters used, or even required, by languages other than US English. ASCII does not even cater for the UK currency symbol (“£”), although the pound is present in Latin-1, an 8-bit extension developed in the 1980s, which encodes several other currencies too.
ASCII was greatly extended and succeeded by Unicode, a much more comprehensive and ambitious standard, which is discussed below. In 2008, Unicode overtook ASCII in popularity for online usage.
What Characters Does ASCII Represent?
To a computer, the letter “A” is just as unfamiliar as the color purple or the feeling of jealousy. Computers deal in ones and zeroes, and it’s up to humans to decide how to use those ones and zeroes to represent numbers, words, images, and anything else.
You can think of ASCII as the Morse code of the digital world—the first attempt, anyway. Whilst Morse code is used to represent just 36 different characters (26 letters and 10 digits), ASCII was designed to represent up to 128 different characters in 7 bits of data.
ASCII is case-sensitive, meaning it represents 52 upper and lower case letters from the English alphabet. Alongside the same 10 digits, that’s about half the space used.
Punctuation, mathematical and typographic symbols occupy the remainder, and a collection of control characters, which are special non-printable codes with functional meanings—see below for more.
Here are some typical characters that ASCII encodes:
Note that the values chosen have some useful properties, in particular:
- Letters of the same case can always be sorted numerically since they're in order. For example, A has a lower value than B, which has a lower value than Z.
- Letters of different cases are offset by exactly 32. This makes it very easy to translate between lower and upper case since just a single bit needs to be switched for each letter, either way.
Other than letters, punctuation, and digits, ASCII can represent a number of control characters, special code points that do not produce single-character output but instead provide alternative meanings about the data to whatever might be consuming it.
For example, ASCII 000 1001 is the horizontal tab character. It represents the space you’ll get when you press the TAB key. You won’t typically see such characters directly, but their effect will often be shown. Here are some more examples:
|000 1001||9||Horizontal Tab|
|000 1010||10||Line Feed|
|001 0111||23||End of Transmission Block|
What About Other Characters?
ASCII was enormously successful during the early days of computing since it was simple and widely adopted. However, in a world with a more international outlook, one writing system just won’t cut it. Modern communications need to be possible in French, Japanese—in fact, any language we might want to store text in.
The Unicode character set can address a total of 1,112,064 different characters, although only about one-tenth of those are actually currently defined. That might sound like a lot, but the encoding aims to not only cater for tens of thousands of Chinese characters, it also covers emoji (nearly one and a half thousand) and even extinct writing systems such as Jurchen.
Unicode acknowledged ASCII’s dominance in its choice of the first 128 characters: they are exactly the same as ASCII. This allows ASCII-encoded files to be used in situations where Unicode is expected, providing backward compatibility.
ASCII text represents the 26 letters of the English alphabet, with digits, punctuation, and a few other symbols thrown in. It served its purpose very well for the best part of half a century,
It has now been superseded by Unicode, which supports a huge number of languages and other symbols, including emoji. UTF-8 is, for all practical purposes, the encoding that should be used to represent Unicode characters online.
Charging your smartphone overnight can damage the battery and shorten its lifespan. Here's everything you need to know.