| Recursion & Subroutines |
| Recursion |
| Subroutines |
| Infinite Recursion |
| Recursion & Quantifiers |
| Recursion & Capturing |
| Recursion & Backreferences |
| Recursion & Backtracking |
This tutorial introduced regular expression subroutines with this example that we want to match accurately:
Name: John Doe Born: 17-Jan-1964 Admitted: 30-Jul-2013 Released: 3-Aug-2013
In Ruby, PCRE, or PCRE2, we can use this regular expression:
^Name:\ (.*)\n
Born:
Perl and Boost need slightly different syntax, which also works in PCRE and PCRE2:
^Name:\ (.*)\n
Born:
Unfortunately, there are differences in how these three regex flavors treat subroutine calls beyond their syntax. First of all, in Ruby a subroutine call makes the capturing group store the text matched during the subroutine call. In Perl, PCRE, PCRE2, and Boost a subroutine call does not affect the group that is called.
When the Ruby solution matches the sample above, retrieving the contents of the capturing group “date” will get you 3-Aug-2013 which was matched by the last subroutine call to that group. When the Perl solution matches the same, retrieving $+{date} will get you 17-Jan-1964. In Perl, the subroutine calls did not capture anything at all. But the “Born” date was matched with a normal named capturing group which stored the text that it matched normally. Any subroutine calls to the group don’t change that. PCRE and PCRE2 behave as Perl in this case, even when you use the Ruby syntax with PCRE or PCRE2.
JGsoft V2 behaves like Ruby when you use the first regular expression. You can remember this by the fact that the \g syntax is a Ruby invention, later copied by PCRE. JGsoft V2 behaves like Perl when you use the second regular expression. You can remember this by the fact that Perl uses ampersands for subroutine calls in procedural code too.
Finally, you can use this regular expression with syntax the syntax that PCRE 4.0 invented subroutine calls with. It was the only syntax supported by PCRE 6.7 and prior. These days it also works with PCRE2, Perl, and Boost. It works exactly the same as the above regex with these flavors. Ruby does not support this syntax.
^Name:\ (.*)\n
Born:
If you want to extract the dates from the match, the best solution is to add another capturing group for each date. Then you can ignore the text stored by the “date” group and this particular difference between these flavors. In Ruby, PCRE, or PCRE2:
^Name:\ (.*)\n
Born:
Perl and Boost needs slightly different syntax, which also works in PCRE and PCRE2:
^Name:\ (.*)\n
Born:
There are further differences between Perl, PCRE, and Ruby when your regex makes a subroutine call or recursive call to a capturing group that contains other capturing groups. The same issues also affect recursion of the whole regular expression if it contains any capturing groups. For the remainder of this topic, the term “recursion” applies equally to recursion of the whole regex, recursion into a capturing group, or a subroutine call to a capturing group.
PCRE, PCRE2, and Boost back up and restore capturing groups when entering and exiting recursion. When the regex engine enters recursion, it internally makes a copy of all capturing groups. This does not affect the capturing groups. Backreferences inside the recursion match text captured prior to the recursion unless and until the group they reference captures something during the recursion. After the recursion, all capturing groups are replaced with the internal copy that was made at the start of the recursion. Text captured during the recursion is discarded. This means you cannot use capturing groups to retrieve parts of the text that were matched during recursion.
Perl 5.18 and prior isolated capturing groups between each level of recursion. When Perl 5.18’s regex engine enters recursion, all capturing groups appear as if they have not participated in the match yet. Initially, all backreferences will fail. During the recursion, capturing groups capture as normal. Backreferences match text captured during the same recursion as normal. When the regex engine exits from the recursion, all capturing groups revert to the state they were in prior to the recursion. Perl 5.20 changed Perl’s behavior to back up and restore capturing groups the way that PCRE does.
For most practical purposes, however, you’ll only use backreferences after their corresponding capturing groups. Then the difference between the way Perl 5.18 and prior deal with capturing groups during recursion and the way PCRE and later versions of Perl do is academic.
Ruby’s behavior is completely different. When Ruby’s regex engine enters or exits recursion, it makes no changes to the text stored by capturing groups at all. Backreferences match the text stored by the capturing group during the group’s most recent match, irrespective of any recursion that may have happened. After an overall match is found, each capturing group still stores the text of its most recent match, even if that was during a recursion. This means you can use capturing groups to retrieve part of the text matched during the last recursion.
JGsoft V2 behaves like Ruby when you use the \g syntax borrowed from Ruby. It behaves like Perl 5.20 and PCRE when you use any other syntax.
In Perl and PCRE you can use \b(?'word'(?'letter'
Let’s see how this regex matches radar. The word boundary \b matches at the start of the string. The regex engine enters the two capturing groups. [
Because (?&word)
After matching (?&word)
The regex engine has again matched (?&word)
Now, \k'letter' matches the second a in the string. That’s because the regex engine has arrived back at the first recursion during which the capturing group matched the first a. The regex engine exits the first recursion. The capturing group is restored to the r which it matched prior to the first recursion.
Finally, the backreference matches the second r. Since the engine is not inside any recursion any more, it proceeds with the remainder of the regex after the group. \b matches at the end of the string. The end of the regex is reached and radar is returned as the overall match. If you query the groups “word” and “letter” after the match you’ll get radar and r. That’s the text matched by these groups outside of all recursion.
To match palindromes this way in Ruby, you need to use a special backreference that specifies a recursion level. If you use a normal backreference as in \b(?'word'(?'letter'
Let’s see why this regex does not match radar in Ruby. Ruby starts out like Perl and PCRE, entering the recursions until there are no characters left in the string for [
Because \g'word'
After matching \g'word'
The regex engine has again matched \g'word'
Now, \k'letter' matches the second a in the string. The regex engine exits the first recursion which successfully matched ada. The capturing group continues to hold a which is its most recent match that wasn’t backtracked.
The regex engine is now at the last character in the string. This character is r. The backreference fails because the group still holds a. The engine can backtrack once more, forcing (?'letter'
If the subject string is radaa, Ruby’s engine goes through nearly the same matching process as described above. Only the events described in the last paragraph change. When the regex engine reaches the last character in the string, that character is now a. This time, the backreference matches. Since the engine is not inside any recursion any more, it proceeds with the remainder of the regex after the group. \b matches at the end of the string. The end of the regex is reached and radaa is returned as the overall match. If you query the groups “word” and “letter” after the match you’ll get radaa and a. Those are the most recent matches of these groups that weren’t backtracked.
Basically, in Ruby this regex matches any word that is an odd number of letters long and in which all the characters to the right of the middle letter are identical to the character just to the left of the middle letter. That’s because Ruby only restores capturing groups when they backtrack, but not when it exits from recursion.
The solution, specific to Ruby, is to use a backreference that specifies a recursion level instead of the normal backreference used in the regex on this page.
| Quick Start | Tutorial | Search & Replace | Tools & Languages | Examples | Reference |
| Recursion | Subroutines | Infinite Recursion | Recursion & Quantifiers | Recursion & Capturing | Recursion & Backreferences | Recursion & Backtracking |
| Introduction | Table of Contents | Special Characters | Non-Printable Characters | Regex Engine Internals | Character Classes | Character Class Subtraction | Character Class Intersection | Shorthand Character Classes | Dot | Anchors | Word Boundaries | Alternation | Optional Items | Repetition | Grouping & Capturing | Backreferences | Backreferences, part 2 | Named Groups | Relative Backreferences | Branch Reset Groups | Free-Spacing & Comments | Unicode Characters & Properties | Mode Modifiers | Atomic Grouping | Possessive Quantifiers | Lookahead & Lookbehind | Lookaround, part 2 | Lookbehind Limitations | (Non-)Atomic Lookaround | Keep Text out of The Match | Conditionals | Balancing Groups | Recursion and Subroutines | POSIX Bracket Expressions | Zero-Length Matches | Continuing Matches | Backtracking Control Verbs | Control Verb Arguments |
Page URL: https://www.regular-expressions.info/recursecapture.html
Page last updated: 13 October 2025
Site last updated: 29 October 2025
Copyright © 2003-2025 Jan Goyvaerts. All rights reserved.