Idiosyncratic Ruby: What the Regex?

Regexes, the go-to-mechanism for string matching, must not only be written, but also need to be applied. This episode acts as a reference with some style advice for working with regular expressions in Ruby. If you are looking for resources on writing the actual regexes, take a look at the link collection at the bottom.

1 - Task: Check if Regex Matches

1a) `match?`

This is the preferred way to check for a match since Ruby 2.4. It only returns true or false, but does not store any match data to get more performance:

"string".match? /1.3/ # => false
"123".match? /1.3/ # => true

1b) `=~`

This method is baked into Ruby's syntax, although its return value is rather special: It is the codepoint index in the string where the match occurred or nil otherwise. However, it is a wise choice to only use it for its truthy/falsey value and use the more self-explaining String#index method otherwise. Other than with the previous' match? approach, match data is set accordingly (this is the case with all other ways of matching) - see next section "Find First Occurrence" for ways to do so. Here is the example:

"string" =~ /1.3/ # => false
"123" =~ /1.3/ # => true

The match operator's sibling is !~ which negates the match result:

"string" !~ /1.3/ # => true
"123" !~ /1.3/ # => false

More complicated matching can involve capture groups. Depending on the reference style (named or numbered), the way you can access it differs:

Numbered: `$1-$9`

The Perlish special variables contain the matches:

"String with 42 things" =~ /(\d+) things/
$1 # => "42"

Named: `$~`

The match data object contains the matches:

"String with 42 things" =~ /(?<thing_count>\d+) things/
$~[:thing_count] # => "42"

Note that regex matching with named captures can implicitly create local variables. This is extremely confusing and you should rather use the above syntax which is clearer, yet still maintains conciseness.

1c) Case Compare

Regex' === operator is also mapped to matching strings (returns true or false). However, although it should not be used directly¹, it allows you to write very expressive and readable case statements²:

case variable = "string or number"
when /\A\d+\z/
  variable.to_i
when /\A\d+\.\d+\z/
  variable.to_f
else
  variable.to_s
end

¹ The reason: It depends on the order of both operands, regex must be first, which is rather unintuitive. String's === operator has a different semantic of just comparing two strings
² For more general documentation about equalness in Ruby, checkout Episode 55: Struggling Four Equality.

2 - Task: Find Single/First Occurrence

2a) String#[]

A very readable way to to return the match result of the string is:

"String with 42 things"[/\d\d/] # => "42"

You can also use capture groups here:

"String with 42 things"[/\d(\d)/, 1] # => "2"
"String with 42 things"[/(?<first>\d)\d/, :first] # => "4"

2b) `=~` + `$&`

If you prefer the =~ syntax, you can retrieve the matched string with the special variable $&:

"String with 42 things" =~ /\d+/
$& # => "42"

2c) String#rindex

Worth mentioning is the special behavior of String.rindex. It will start the match process on the right side of string and return the first index, where a match is possible:

 "String with 42, sorry with 23 things".rindex /\d+/
 $& # => "3"

Note that it does not match "23", but "3". If you want to match an expression in relation to the end of the string you could use a positive-lookahead in combination with \z:

"String with 42, sorry with 23 things" =~ /\d+(?=\D*\z)/
$& # => "23"

3 - Task: Find All Occurrences

3) String#scan

Your friend is the scan method which returns an array of all results:

"String with 42, sorry with 23 things".scan /\d+/ # => ["42", "23"]

4 - Task: Replace

The usual string replacement tool is gsub (global substitution) which replaces all matching occurrences of the regex. Should you only want to replace the first occurrence, use the sub method instead.

4a) String#gsub with String Argument

"String with 42 things".gsub /\d+/, "23" # => "String with 23 things"

You can use back references in the replacement string.

4b) String#gsub with Block

"String with 42 things".gsub /\d+/ do
  $&.to_i + 1
end # => "String with 43 things"

You can use Perl-style regex globals in the replacement block.

Special Task: Split String Into Array

Splitting a string along a separator is the main way of converting it into a useful array:

array = "String with     42\nthings".split(/\s+/)
# => ["String", "with", "42", "things"]

Special Task: Filter Array of Strings

The Enumerable#grep method allows you to do so:

["String", "with", "42", "things"].grep(/\d/) # => ["42"]

Ther is also Enumerable#grep_v which returns all elements that do not match (think #reject):

["String", "with", "42", "things"].grep_v(/\d/) # => ["String", "with", "things"]

Special Task: Partition String

Ruby's String#partition divides a string into an array consisting of three elements:

parts = "String with 42 things".partition(/\d+/)
parts # => ["String with ", "42", " things"]

The string before regex match
The regex match
The string after the regex match

Note that you can get to the same result using the special pre- and post match variables:

"String with 42 things" =~ /\d+/
parts = [$`, $&, $'] # => ["String with ", "42", " things"]

Regex Resources

RDoc: Regular expressions - Ruby regex documentation
RDoc: Regexp - Class docs (overlaps with general regex docs)
Episode 11: Regular Extremism - Collection of advanced regex syntaxes
Episode 21: Uniform Resource Matching - URL regex included in Ruby's standard library
Episode 30: Regex with Class - Overview of Unicode and POSIX-style character classes
Episode 41: Proper Unicoding - Regex Unicode Property syntax (\p{})
re method (part of irb.tools) - Displays first match (including capture groups) in the terminal
Rubular - Online regex testing
Onigmo - Upstream repository of Ruby's regex engine

What the Regex?

What do you Want to Achieve?

1 - Task: Check if Regex Matches

1a) `match?`

1b) `=~`

Numbered: `$1-$9`

Named: `$~`

1c) Case Compare

2 - Task: Find Single/First Occurrence

2a) String#[]

2b) `=~` + `$&`

2c) String#rindex

3 - Task: Find All Occurrences

3) String#scan

4 - Task: Replace

4a) String#gsub with String Argument

4b) String#gsub with Block

Special Task: Split String Into Array

Special Task: Filter Array of Strings

Special Task: Partition String

Regex Resources

Also See

More Idiosyncratic Ruby

What the Regex?

What do you Want to Achieve?

1 - Task: Check if Regex Matches

1a) match?

1b) =~

Numbered: $1-$9

Named: $~

1c) Case Compare

2 - Task: Find Single/First Occurrence

2a) String#[]

2b) =~ + $&

2c) String#rindex

3 - Task: Find All Occurrences

3) String#scan

4 - Task: Replace

4a) String#gsub with String Argument

4b) String#gsub with Block

Special Task: Split String Into Array

Special Task: Filter Array of Strings

Special Task: Partition String

Regex Resources

Also See

More Idiosyncratic Ruby

1a) `match?`

1b) `=~`

Numbered: `$1-$9`

Named: `$~`

2b) `=~` + `$&`