For an internal project at work, I recently had to parse the names of Heroku review applications to retrieve some data. The application names looked like this:
<project_name>-pr-<pull_request_id>
At first, since each part I needed was separated by a dash, I had some code that looked like this:
*project_name, _, pull_request_id = application_name.split('-')
project_name = project_name.join('-')
Because the project name could also have some dashes in it, I needed to rejoin it after extracting the pull request data. At first, for a prototype, this worked fine. But when this internal project transitioned into being an important part of my team's tooling, I started looking at a better and cleaner way to achieve the same result.
Since we were already validating the format of the application name with a regular expression, I figured I'd use it to also retrieve the data using named captures.
Regular expressions in Ruby
For a refresher on a regular expressions, I highly recommend this article[1] by Dan Eden.
As a reminder, there are multiple ways to create regular expressions in Ruby:
- Using
/xxxx/
- Using percent literal :
%r{}
- Using the class initializer:
Regexp#new
With your newly created regular expression, there are two main ways to check if a string matches a regular expression:
- Calling
String#match
with the regular expression as argument:
'abc'.match(/a/)
# => #<MatchData "a">
- Calling
Regexp#match
on the regular expression with the string as argument:
/a/.match('abc')
# => #<MatchData "a">
If the String matches the regular expression, it will return a MatchData
object, otherwise it will return nil
. The MatchData
object encapsulates the result of matching a String against a Regexp, including the different submatches. It also contains the eventual captures and named captures.
Named captures
Named captures allow you to describe submatches of a regular expression and then retrieve them from the resulting MatchData
object. In our case, our regular expression looked like this:
/.*-pr-\d+/
To use named captures, we first need to add capture them into groups to our regular expressions. Adding capture groups is as simple as wrapping them inside parentheses:
/(.*)-pr-(\d+)/
Finally, name the different captures. To do this, we need to prefix the content of the capture group with its name:
/(?<project_name>.*)-pr-(?<pull_request_id>\d+)/
Now that we've done this, we can easily retrieve the data we want from the application name using our resulting object:
expression = /(?<project_name>.*)-pr-(?<pull_request_id>\d+)/
application_name = 'my_app-pr-1234'
matches = expression.match(application_name)
matches[:project_name] # => 'my_app'
matches.named_captures # => {"project_name"=>"my_app", "pull_request_id"=>"1234"}
Thanks to Bachir Çaoui, Alexis Woo and Alexis Focheux for reviewing draft versions of this post.
Footnotes
While the article is intended for designers and UX writers, I found that it was an excellent introduction to regular expressions for everyone. ↩︎