How come that Ruby has two ASCII encodings?
Encoding.name_list.grep(/ASCII/) # => ["ASCII-8BIT", "US-ASCII"]
Which one is the normal one you should use for ASCII?
So, US-ASCII is aliased to ASCII, but then what is ASCII-8BIT for? Encodings' RDoc has some help:
Encoding::ASCII_8BIT is a special encoding that is usually used for a byte string, not a character string. But as the name insists, its characters in the range of ASCII are considered as ASCII characters. This is useful when you use ASCII-8BIT characters with other ASCII compatible characters.
So basically, it is not a real encoding, but represents an arbitrary stream of bytes (bytes with a value between 0 and 255). It is used for raw byte stream or if you want to make clear that you do not know about a string's encoding!
The ASCII charset only takes 7 bits, so in strict ASCII, the 8th bit should never be set. The allowed byte value range is from 0 to 127. This is what the US-ASCII encoding is all about: It is used when dealing with ASCII encoded strings. Think: "ASCII-7BIT"
A simple example illustrating the difference:
out_of_ascii_range = 128.chr # => "\x80" out_of_ascii_range.force_encoding("US-ASCII").valid_encoding? # => false out_of_ascii_range.force_encoding("ASCII-8BIT").valid_encoding? # => true