Using named captures to extract information from ruby strings

For an internal project at work, I recently had to parse the names of Heroku review applications to retrieve some data. The application names looked like this:

<project_name>-pr-<pull_request_id>

At first, since each part I needed was separated by a dash, I had some code that looked like this:

*project_name, _, pull_request_id = application_name.split('-')
project_name = project_name.join('-')

Because the project name could also have some dashes in it, I needed to rejoin it after extracting the pull request data. At first, for a prototype, this worked fine. But when this internal project transitioned into being an important part of my team's tooling, I started looking at a better and cleaner way to achieve the same result.

Since we were already validating the format of the application name with a regular expression, I figured I'd use it to also retrieve the data using named captures.

Regular expressions in Ruby

For a refresher on a regular expressions, I highly recommend this article^[1] by Dan Eden.

As a reminder, there are multiple ways to create regular expressions in Ruby:

Using /xxxx/
Using percent literal : %r{}
Using the class initializer: Regexp#new

With your newly created regular expression, there are two main ways to check if a string matches a regular expression:

Calling String#match with the regular expression as argument:

'abc'.match(/a/)
# => #<MatchData "a">

Calling Regexp#match on the regular expression with the string as argument:

/a/.match('abc')
# => #<MatchData "a">

If the String matches the regular expression, it will return a MatchData object, otherwise it will return nil. The MatchData object encapsulates the result of matching a String against a Regexp, including the different submatches. It also contains the eventual captures and named captures.

Named captures

Named captures allow you to describe submatches of a regular expression and then retrieve them from the resulting MatchData object. In our case, our regular expression looked like this:

/.*-pr-\d+/

To use named captures, we first need to add capture them into groups to our regular expressions. Adding capture groups is as simple as wrapping them inside parentheses:

/(.*)-pr-(\d+)/

Finally, name the different captures. To do this, we need to prefix the content of the capture group with its name:

/(?<project_name>.*)-pr-(?<pull_request_id>\d+)/

Now that we've done this, we can easily retrieve the data we want from the application name using our resulting object:

expression = /(?<project_name>.*)-pr-(?<pull_request_id>\d+)/
application_name = 'my_app-pr-1234'
matches = expression.match(application_name)

matches[:project_name] # => 'my_app'
matches.named_captures # => {"project_name"=>"my_app", "pull_request_id"=>"1234"}

Thanks to Bachir Çaoui, Alexis Woo and Alexis Focheux for reviewing draft versions of this post.

Footnotes

While the article is intended for designers and UX writers, I found that it was an excellent introduction to regular expressions for everyone. ↩︎

Using named captures to extract information from ruby strings

This post may contain outdated information

Regular expressions in Ruby

Named captures

Footnotes