Optimizing Regex performance with RegexOptions.RightToLeft

Regex is fast when it is scanning text that doesn't match it's pattern at all. However, when finding text that almost matches, things can start to slow down. You really want the Regex to either match a text or discard it as soon as possible. Building up large potential matches and finding that they don't fit towards the end can end up being very costly. 

For patterns that are string literals, the Regex will run in linear time. So if you get lots of partial matches that get discarded after 20 characters and another Regex discards potential matches after 2 characters then the first will be in the order of ten times slower.

However for patterns that involve optional quantifiers and alternation constructs the performance can degrade exponentially.

So you really want your long and complex patterns to discard potential matches as soon as possible.

So how does RegexOptions.RightToLeft come into it? What if multiple patterns all start with the same pattern but their endings are different? Or what if your pattern initially finds potential matches of 20+ characters but discards 99% of these potential matches based on the last few characters? In these cases reversing the scan order from right to left can avoid all this costly effort of building up potential matches that end up being discarded.

Let's take an example.

Finding Exceptions and Stack Frames in a a Stack Trace

An exception might be: MyCompany.MyApplication.SomeBusinessException

A stack frame might be: MyCompany.MyApplication.SomeClass.SomeMethod(string someString)

We can identify exceptions with the pattern "\b(?:\w+\.)+\w+Exception" and identify stack frames with the pattern "\bat \b[^()\s=']+\((?:.*)\)"

The problem is the exception pattern. Stack traces often have a large number of stack frames in them and every stack frame will be identified initially by the exception pattern as a potential match and be discarded once it reaches the (. 

However if you reverse the scan order when looking for exceptions then no stack frames will ever be contenders in the first place.

Running this exception regex over a stack trace 10000 times took 38 seconds when scanning left to right. After passing the RegexOptions.RightToLeft option to the regex it took 25 milliseconds.

So next time you use regex and performance is important, think about which scan order will allow the regex engine to discard potential matches sooner or even avoid potential matches at all.