Idiosyncratic Ruby: Regex with Class

Ruby's regex engine defines a lot of shortcut character classes. Besides the common meta characters (\w, etc.), there is also the POSIX style expressions and the unicode property syntax. This is an overview of all character classes:

Meta Chars

Char	Negation	ASCII	Unicode
`.`	-	¹ Any	¹ Any
`\X`	-	Any	Grapheme clusters (`\P{M}\p{M}*`)
`\d`	`\D`	`[0-9]`	² ASCII plus Decimal_Number (Nd)
`\h`	`\H`	`[0-9a-fA-F]`	Like ASCII
`\w`	`\W`	`[0-9a-zA-Z_]`	² ASCII plus Letter (LC / Ll / Lm / Lo / Lt / Lu), Mark (Mc / Me / Mn), Number (Nd / Nl / No), Connector_Punctuation (Pc)
`\s`	`\S`	`[ \t\r\v\n\f]`	² ASCII plus Separator (Zl / Zp / Zs)
`\R`	-	`[\n\v\f\r]`,`\r\n`	ASCII plus , Line_Separator (Zl), Paragraph_Separator (Zp)

¹ Will only match linebreaks with /m flag
² You'll need to manually turn on unicode matching for these to work

POSIX and Unicode Property Style

POSIX	Negation	Property	Negation³	ASCII	Unicode
`[:alnum:]`	`[:^alnum:]`	`\p{Alnum}`	`\p{^Alnum}`	`[0-9a-zA-Z]`	Letter (LC / Ll / Lm / Lo / Lt / Lu), Mark (Mc / Me / Mn), Decimal_Number (Nd)
`[:alpha:]`	`[:^alpha:]`	`\p{Alpha}`	`\p{^Alpha}`	`[a-zA-Z]`	Letter (LC / Ll / Lm / Lo / Lt / Lu), Mark (Mc / Me / Mn)
`[:ascii:]`	`[:^ascii:]`	`\p{ASCII}`	`\p{^ASCII}`	`[\x00-\x7F]`	Like ASCII
`[:blank:]`	`[:^blank:]`	`\p{Blank}`	`\p{^Blank}`	`[ \t]`	`\t`, Space_Separator (Zs)
`[:cntrl`]	`[:^cntrl:]`	`\p{Cntrl}`	`\p{^Cntrl}`	`[\x00-\x1F]`, `\x7F`	Other (Cc / Cf / Cn / Co / Cs)
`[:digit:]`	`[:^digit:]`	`\p{Digit}`	`\p{^Digit}`	`[0-9]`	ASCII plus Decimal_Number (Nd)
`[:graph:]`	`[:^graph:]`	`\p{Graph}`	`\p{^Graph}`	`[\x21-\x7E]`	ALL, EXCEPT: Separator (Zl / Zp / Zs), Control (Cc), Unassigned (Cn), Surrogate (Cs)
`[:lower:]`	`[:^lower:]`	`\p{Lower}`	`\p{^Lower}`	`[a-z]`	Lowercase_Letter (Ll)
`[:print:]`	`[:^print:]`	`\p{Print}`	`\p{^Print}`	`[\x20-\x7E]`	ALL, EXCEPT: Line_Separator (Zl), Paragraph_Separator (Zp) , Control (Cc), Unassigned (Cn), Surrogate (Cs)
`[:punct:]`	`[:^punct:]`	`\p{Punct}`	`\p{^Punct}`	[!-/:-@\[-`{-~]	Punctuation (Pc / Pd / Pe / Pf / Pi / Po / Ps)
`[:space:]`	`[:^space:]`	`\p{Space}`	`\p{^Space}`	`[ \t\r\v\n\f]`	ASCII plus Separator (Zl / Zp / Zs)
`[:upper:]`	`[:^upper:]`	`\p{Upper}`	`\p{^Upper}`	`[A-Z]`	Uppercase_Letter (Lu)
`[:xdigit:]`	`[:^xdigit:]`	`\p{XDigit}`	`\p{^XDigit}`	`[0-9a-fA-F]`	Like ASCII
`[:word:]`	`[:^word:]`	`\p{Word}`	`\p{^Word}`	`[0-9a-zA-Z_]`	ASCII plus Letter (LC / Ll / Lm / Lo / Lt / Lu), Mark (Mc / Me / Mn), Number (Nd / Nl / No), Connector_Punctuation (Pc)

³ An alternative way of negating unicode properties is \P{Property}

More Properties

The above groups are only the tip of the iceberg. Using the \p{} syntax, you can match for a lot more unicode properties, see Episode 41: Proper Unicoding for details!

More Idiosyncratic Ruby

Please Comment on GitHub
Next Article: Roots of Rubyism
Previous Article: Limitations of Language

Regex with Class

Meta Chars

POSIX and Unicode Property Style

More Properties

Further Reading

More Idiosyncratic Ruby