Idiosyncratic Ruby: Meta Escape Control

Double-quoted strings can not only be used with interpolation, #{}, they also support various escape sequences, which are initiated with \. Escape sequences allow you to embed raw byte and codepoint values. Furthermore, there are shortcuts for common formatting and control characters.

Byte Sequences

There are two basic ways in which you can specify raw bytes to embed: \x00 (hexadecimal) or \000 (octal):

"\x20" # => " " # space
"\xab" # => "\xAB" # byte value 171
"\033" # => "\e" # escape
"\0"   # => "\u0000" # null byte

Meta Escapes

There is a meta escape syntax "\M-x" with x being a byte value. If the byte is below 128 (\x80), it will add 128, otherwise it will return the same value. Differently put: It will return the same byte value with 8th bit set. The x value can be escaped again:

"A".unpack("C")    # => [65]
"\M-A".unpack("C") # => [193]
"\M-\x01" # => "\x81"
"\M-\x81" # => "\x81"

Control Escapes

Another legacy syntax is the control escape syntax: "\C-x" (or "\cx") with x being a byte value. It will return the value of the 5 least significant bits, so the value will also between 0 and 31. The x value can be escaped again, or combined with meta escapes:

"\C-\x01" # => "\u0001"
"\C-!"    # => "\u0001"
"\C-A"    # => "\u0001"
"\M-\C-A" # => "\x81"
"\C-\M-A" # => "\x81"

Unicode Codepoints

Unicode characters are represented by codepoint values. If you know the numerical codepoint value, you can embed it in a double-quoted string using \u. You must use exactly 4 digits of the hexadecimal representation of the value, but casing is not relevant:

"\u0020" # => " " # space
"\u00A0" # => " " # no-break space
"\u203d" # => "‽" # interrobang

The \u syntax supports a more explicit {} flavor:

"\u{9}" # => "\t" # tab
"\u{2602}" # => "☂" # umbrella

The \u{} syntax is required if you want to display codepoints which need more than four hexadecimal digest, for example, U+1F6A1 AERIAL TRAMWAY:

"\u{1F6A1}" # "🚡"

It also allows you to specify multiple characters at once:

"\u{49 64 69 6f 73 79 6e 63 72 e4 74 69 63 20 52 75 62 79}"
# => "Idiosyncrätic Ruby"

Control and Formatting Characters

Some byte values used for common control/formatting characters have escape sequences:

Escape	Byte Value	Description
`\a`	7	terminal bell
`\b`	8	backspace
`\t`	9	tab
`\n`	10	newline
`\v`	11	vertical tab
`\f`	12	form feed
`\r`	13	carriage return
`\e`	27	start escape sequence
`\s`	32	(space)

More Idiosyncratic Ruby

Please Comment on GitHub
Next Article: Less Feature-Rich, More Fun
Previous Article: Escape Back Referencing