Regexes, the go-to-mechanism for string matching, must not only be written, but also need to be applied. This episode acts as a reference with some style advice for working with regular expressions in Ruby. If you are looking for resources on writing the actual regexes, take a look at the link collection at the bottom.
What do you Want to Achieve?
- 1 - Task: Check if Regex Matches
- 2 - Task: Find Single/First Occurrence
- 3 - Task: Find All Occurrences
- 4 - Task: Replace
- Special Task: Split String Into Array
- Special Task: Filter Array of Strings
- Special Task: Partition String
1 - Task: Check if Regex Matches
This is the preferred way to check for a match since Ruby 2.4. It only returns
false, but does not store any match data to get more performance:
"string".match? /1.3/ # => false "123".match? /1.3/ # => true
This method is baked into Ruby's syntax, although its return value is rather special: It is the codepoint index in the string where the match occured or
nil otherwise. However, it is a wise choice to only use it for its truthy/falsey value and use the more self-explaining String#index method otherwise. Other than with the previous'
match? approach, match data is set accordingly (this is the case with all other ways of matching) - see next section "Find First Occurrence" for ways to do so. Here is the example:
"string" =~ /1.3/ # => false "123" =~ /1.3/ # => true
The match operator's sibling is
!~ which negates the match result:
"string" !~ /1.3/ # => true "123" !~ /1.3/ # => false
More complicated matching can involve capture groups. Depending on the reference style (named or numbered), the way you can accees it differs:
"String with 42 things" =~ /(\d+) things/ $1 # => "42"
The match data object contains the matches:
"String with 42 things" =~ /(?<thing_count>\d+) things/ $~[:thing_count] # => "42"
Note that regex matching with named captures can implicitly create local variables. This is extremely confusing and you should rather use the above syntax which is clearer, yet still maintains conciseness.
1c) Case Compare
=== operator is also mapped to matching strings (returns
false). However, although it should not be used directly¹, it allows you to write very expressive and readable case statements²:
case variable = "string or number" when /\A\d+\z/ variable.to_i when /\A\d+\.\d+\z/ variable.to_f else variable.to_s end
¹ The reason: It depends on the order of both operands, regex must be first, which is rather unintuitive. String's
=== operator has a different semantic of just comparing two strings
² For more general documentation about equalness in Ruby, checkout Episode 55: Struggling Four Equality.
2 - Task: Find Single/First Occurrence
A very readable way to to return the match result of the string is:
"String with 42 things"[/\d\d/] # => "42"
You can also use capture groups here:
"String with 42 things"[/\d(\d)/, 1] # => "2" "String with 42 things"[/(?<first>\d)\d/, :first] # => "4"
If you prefer the
=~ syntax, you can retrieve the matched string with the special variable
"String with 42 things" =~ /\d+/ $& # => "42"
Worth mentioning is the special behavior of String.rindex. It will start the match process on the right side of string and return the first index, where a match is possible:
"String with 42, sorry with 23 things".rindex /\d+/ $& # => "3"
Note that it does not match
"3". If you want to match an expression in relation to the end of the string you could use a positive-lookahead in combination with
"String with 42, sorry with 23 things" =~ /\d+(?=\D*\z)/ $& # => "23"
3 - Task: Find All Occurrences
Your friend is the scan method which returns an array of all results:
"String with 42, sorry with 23 things".scan /\d+/ # => ["42", "23"]
4 - Task: Replace
The usual string replacement tool is gsub (global substitution) which replaces all matching occurrences of the regex. Should you only want to replace the first occurrence, use the sub method instead.
4a) String#gsub with String Argument
"String with 42 things".gsub /\d+/, "23" # => "String with 23 things"
You can use back references in the replacement string.
4b) String#gsub with Block
"String with 42 things".gsub /\d+/ do $&.to_i + 1 end # => "String with 43 things"
You can use Perl-style regex globals in the replacement block.
Special Task: Split String Into Array
Splitting a string along a separator is the main way of converting it into a useful array:
array = "String with 42\nthings".split(/\s+/) # => ["String", "with", "42", "things"]
Special Task: Filter Array of Strings
The Enumerable#grep method allows you to do so:
["String", "with", "42", "things"].grep(/\d/) # => ["42"]
Ther is also Enumerable#grep_v which returns all elements that do not match (think #reject):
["String", "with", "42", "things"].grep_v(/\d/) # => ["String", "with", "things"]
Special Task: Partition String
Ruby's String#partition divides a string into an array consisting of three elements:
parts = "String with 42 things".partition(/\d+/) parts # => ["String with ", "42", " things"]
- The string before regex match
- The regex match
- The string after the regex match
Note that you can get to the same result using the special pre- and post match variables:
"String with 42 things" =~ /\d+/ parts = [$`, $&, $'] # => ["String with ", "42", " things"]
- RDoc: Regular expressions - Ruby regex documentation
- RDoc: Regexp - Class docs (overlaps with general regex docs)
- Episode 11: Regular Extremism - Collection of advanced regex syntaxes
- Episode 21: Uniform Resource Matching - URL regex included in Ruby's standard library
- Episode 30: Regex with Class - Overview of Unicode and POSIX-style character clasess
- Episode 41: Proper Unicoding - Regex Unicode Property syntax (
remethod (part of irb.tools) - Displays first match (including capture groups) in the terminal
- Rubular - Online regex testing
- Onigmo - Upstream repository of Ruby's regex engine