Definition

ASCII (American Standard Code for Information Interchange)

Peter Loshin

By

Peter Loshin, Former Senior Technology Editor

What is ASCII?

ASCII (American Standard Code for Information Interchange) is the most common character encoding format for text data in computers and on the internet. In standard ASCII-encoded data, there are unique values for 128 alphabetic, numeric or special additional characters and control codes.

ASCII encoding is based on character encoding used for telegraph data. The American National Standards Institute first published it as a standard for computing in 1963.

Characters in ASCII encoding include upper- and lowercase letters A through Z, numerals 0 through 9 and basic punctuation symbols. It also uses some non-printing control characters that were originally intended for use with teletype printing terminals.

ASCII characters may be represented in the following ways:

as pairs of hexadecimal digits -- base-16 numbers, represented as 0 through 9 and A through F for the decimal values of 10-15;
as three-digit octal (base 8) numbers;
as decimal numbers from 0 to 127; or
as 7-bit or 8-bit binary

For example, the ASCII encoding for the lowercase letter "m" is represented in the following ways:

Character	Hexadecimal	Octal	Decimal	Binary (7 bit)	Binary (8 bit)
m	0x6D	/155	109	110 1101	0110 1101

ASCII characters were initially encoded into 7 bits and stored as 8-bit characters with the most significant bit -- usually, the left-most bit -- set to 0.

Why is ASCII important?

ASCII was the first major character encoding standard for data processing. Most modern computer systems use Unicode, also known as the Unicode Worldwide Character Standard. It's a character encoding standard that includes ASCII encodings.

The Internet Engineering Task Force (IETF) adopted ASCII as a standard for internet data when it published "ASCII format for Network Interchange" as RFC 20 in 1969. That request for comments (RFC) document standardized the use of ASCII for internet data and was accepted as a full standard in 2015.

ASCII encoding is technically obsolete, having been replaced by Unicode. Yet, ASCII characters use the same encoding as the first 128 characters of the Unicode Transformation Format 8, so ASCII text is compatible with UTF-8.

In 2003, the IETF standardized the use of UTF-8 encoding for all web content in RFC 3629.

Almost all computers now use ASCII or Unicode encoding. The exceptions are some IBM mainframes that use the proprietary 8-bit code called Extended Binary Coded Decimal Interchange Code (EBCDIC).

How does ASCII work?

ASCII offers a universally accepted and understood character set for basic data communications. It enables developers to design interfaces that both humans and computers understand. ASCII codes a string of data as ASCII characters that can be interpreted and displayed as readable plain text for people and as data for computers.

Programmers use the design of the ASCII character set to simplify certain tasks. For example, using ASCII character codes, changing a single bit easily converts text from uppercase to lowercase.

The capital letter "A" is represented by the binary value:

0100 0001

The lowercase letter "a" is represented by the binary value:

0110 0001

The difference is the third most significant bit. In decimal and hexadecimal, this corresponds to:

Character	Binary	Decimal	Hexadecimal
A	0100 0001	65	0x41
a	0110 0001	97	0x61

The difference between upper- and lowercase characters is always 32 (0x20 in hexadecimal), so converting from upper- to lowercase and back is a matter of adding or subtracting 32 from the ASCII character code.

Similarly, hexadecimal characters for the digits 0 through 9 are as follows:

Character	Binary	Decimal	Hexadecimal
0	0011 0000	48	0x30
1	0011 0001	49	0x31
2	0011 0010	50	0x32
3	0011 0011	51	0x33
4	0011 0100	52	0x34
5	0011 0101	53	0x35
6	0011 0110	54	0x36
7	0011 0111	55	0x37
8	0011 1000	56	0x38
9	0011 1001	57	0x39

Using this encoding, developers can easily convert ASCII digits to numerical values by stripping off the four most significant bits of the binary ASCII values (0011). This calculation can also be done by dropping the first hexadecimal digit or by subtracting 48 from the decimal ASCII code.

Developers can also check the most significant bit of characters in a sequence to verify that a data stream, string or file contains ASCII values. The most significant bit of basic ASCII characters will always be 0; if that bit is 1, then the character is not an ASCII-encoded character.

ASCII variants and Unicode

When it was first introduced, ASCII supported English language text only. When 8-bit computers became common during the 1970s, vendors and standards bodies began extending the ASCII character set to include 128 additional character values. Extended ASCII incorporates non-English characters, but it is still insufficient for comprehensive encoding of text in most world languages, including English. Different extended ASCII character sets are common, depending on the vendor, language and country.

Initially, other character encoding standards were adopted for other languages. In some cases, the standards were designed for other countries with different requirements. In other cases, the encodings were hardware manufacturers' proprietary designs.

Unicode defines codespaces for the implementation of character encodings for different languages. Characters can be mapped to encodings using one of the following two methods:

UTF
Universal Coded Character Set (UCS)

Depending on the language and the mapping used, characters can be expressed in one to four 8-bit bytes (UTF-8), in two 16-bit units (UTF-16) or in a single 32-bit unit (UTF-32).

The UCS standard is maintained as an ISO (International Organization for Standardization) standard, ISO/IEC 10646. As of this writing, there are 143,859 different characters defined in version 13.0 of the Unicode standard.

ASCII advantages and disadvantages

After more than half a century of use, the advantages and disadvantages of using ASCII character encoding are well understood. That is one of the encoding format's great strengths.

Advantages

Universally accepted. ASCII character encoding is universally understood. Except for the IBM mainframes that use EBCDIC encoding, it is universally implemented in computing through the Unicode standard. Unicode character encoding replaces ASCII encoding, but it is backward-compatible with ASCII.
Compact character encoding. Standard codes can be expressed in 7 bits. This means data that can be expressed in the standard ASCII character set requires only as many bytes to store or send as the number of characters in the data.
Efficient for programming. The character codes for letters and numbers are well adapted to programming techniques for manipulating text and using numbers for calculations or storage as raw data.

Disadvantages

Limited character set. Even with extended ASCII, only 255 distinct characters can be represented. The characters in a standard character set are enough for English language communications. But even with the diacritical marks and Greek letters supported in extended ASCII, it is difficult to accommodate languages that do not use the Latin alphabet.
Inefficient character encoding. Standard ASCII encoding is efficient for English language and numerical data. Representing characters from other alphabets requires more overhead such as escape codes.

Converting text to ASCII code in Windows

There is more than one way to display text as ASCII codes in Windows. To use the Windows PowerShell command Format-Hex to display ASCII encoding for a text file, perform the following steps.

Open the Windows PowerShell application. Click on the search box in the lower left of your Windows 10 desktop. Type PowerShell and click on the PowerShell icon to start the application.

Format-Hex command. Enter the following command to display the ASCII encoding for a file called hello.txt in the c:\Users\userID\Documents directory:

format-hex .\hello.txt

View output. ASCII encoding for the file hello.txt will be displayed as in Figure 1 below. The top of the output shows that data is displayed in 16 columns, with one character per column. A running count of characters, in hexadecimal, is displayed along the left side of the output. In this case, in the last line, there are 0x60 (or 96 in decimal) characters at the start of the last line. ASCII encoding for the file's characters are shown in a grid 16 characters wide, with encoding in two-digit hexadecimal values. The original contents of the file are displayed to the right in 16 character groupings.

Note that the original file has two spaces (ASCII 0x20) followed by a CR (carriage return, ASCII 0x0D) and LF (line feed, ASCII 0A) characters. The CR-LF combination is used in ASCII files to show the end of a line.

Format-Hex command in PowerShell — Figure 1. View ASCII encoding for a text file using the 'Format-Hex' command in PowerShell.

Other options. Format-Hex can be used with other commands for easier command-line viewing of larger files. For example, the following command is used to page through ASCII encoding of a large file:

Format-Hex .\hello-long.txt | more

The output will look similar Figure 2, and you can view output one page at a time.

View ASCII encoding of longer files with Format-Hex and more — Figure 2. View ASCII encoding for a longer file using the 'Format-Hex' command with the more command.

ASCII code tables

As originally composed, ASCII includes 32 non-printing control codes. Many of those codes were to control the teletypewriter terminal devices used in the early days of computing to input and output data. The remaining 96 printing characters include the DEL (delete) and SPACE (space) characters, as well as all 26 letters of the alphabet, upper- and lowercase, numerals and punctuation symbols.

Extended ASCII uses 8 bits, creating a set of 127 additional characters. There is no single extended ASCII character set. These sets may differ depending on the operating system or vendor. Extended ASCII character sets typically include symbols, letters with diacritical marks, graphical markings, and mathematical symbols including some Greek letters.

Non-printing ASCII control codes

The ASCII values for 0 through 31 (binary: 000 0000 through 001 1111) are non-printing control codes. They were originally intended for controlling the flow of data and include codes that show the end or beginning of data components, codes to control or show the state of hardware used for data transmission, and codes for positioning of the cursor pointer in a data stream.

Non-printing ASCII control codes — Table 1. Non-printing ASCII control codes are used to manage data flows.

Printing ASCII characters

Although the ASCII codes for DEL and SPACE are non-printing, they are considered part of the printing characters, as they are used when sending character streams. The standard ASCII character set includes binary values from 0 (000 0000) through 127 (111 1111).

Printing ASCII characters — Table 2. ASCII characters include the characters a-z, A-Z, 0-9 and a selection of punctuation marks.

Extended ASCII characters

The standard ASCII character set is only 7 bits, and characters are represented as 8-bit bytes with the most significant bit set to 0. Modern computers almost universally use 8-bit bytes, and the extended ASCII character set includes 127 more 8-bit characters, where the most significant bit is set to 1. The extended ASCII characters includes the binary values from 128 (1000 0000) through 255 (1111 1111).

Unlike standard ASCII characters, there are multiple versions of the extended ASCII character set. Table 3 lists Microsoft's Windows-1252 character encoding of the Latin alphabet. This is the default extended ASCII character set for Windows that American and British English and other European languages use.

Microsoft Windows character encoding — Table 3. Microsoft Windows extended ASCII character encoding.

ASCII art

ASCII characters can be combined graphically to create an image. ASCII art is a common technique for creating graphical images on text-only media like a computer terminal or text-only printer. For example:

¯\_(ツ)_/¯

This is a simple example of ASCII art. Much more elaborate images are possible when using more lines and more characters, especially from extended ASCII character sets.

The FTP ascii command

The File Transfer Protocol (FTP) has an ascii command that is used to enable the transfer of ASCII-encoded files. When transferring files in ASCII mode in FTP, the receiving host may change the file so it will be formatted as ASCII on the destination host.

When FTP transfers files using the binary mode, those files are not changed in any way.

The future of ASCII character encoding

After more than half a century, and despite being subsumed into the Unicode standard, ASCII character encoding for universally accessible computer and network data is unrivaled. Given the need to preserve data stored over the past decades, ASCII will continue to be foundational for all computing.

Learn more about data storage management in general and how data retention policies are used to maintain access to data over the long term.

This was last updated in September 2021

Continue Reading About ASCII (American Standard Code for Information Interchange)

A developer guide to software localization

Encoding and decoding

Hone your PowerShell text manipulation skills

network scanning
Network scanning is a procedure for identifying active devices on a network by employing a feature or features in the network ...
networking (computer)
Networking, also known as computer networking, is the practice of transporting and exchanging data between nodes over a shared ...
What is SD-WAN (software-defined WAN)? Ultimate guide
Software-defined WAN is a technology that uses software-defined networking concepts to distribute network traffic across a wide ...

identity management (ID management)
Identity management (ID management) is the organizational process for ensuring individuals have the appropriate access to ...
fraud detection
Fraud detection is a set of activities undertaken to prevent money or property from being obtained through false pretenses.
single sign-on (SSO)
Single sign-on (SSO) is a session and user authentication service that permits a user to use one set of login credentials -- for ...

CIO

IT budget
IT budget is the amount of money spent on an organization's information technology systems and services. It includes compensation...
project scope
Project scope is the part of project planning that involves determining and documenting a list of specific project goals, ...
core competencies
For any organization, its core competencies refer to the capabilities, knowledge, skills and resources that constitute its '...

recruitment
Recruitment is the process of finding, screening, hiring and onboarding qualified job candidates.
Workday
Workday is a cloud-based software vendor that specializes in human capital management (HCM) and financial management applications.
recruitment management system (RMS)
A recruitment management system (RMS) is a set of tools designed to manage the employee recruiting and hiring process. It might ...

Customer Experience

martech (marketing technology)
Martech (marketing technology) refers to the integration of software tools, platforms, and applications designed to streamline ...
transactional marketing
Transactional marketing is a business strategy that focuses on single, point-of-sale transactions.
customer profiling
Customer profiling is the detailed and systematic process of constructing a clear portrait of a company's ideal customer by ...

Close