Why does my regular expression work in X but not in Y?

8 points by dasm


number5

This is where tools like https://regex101.com/ comes handy, it support 8 different regex flavours

srpablo

I’m now compelled to make a YouTube video called “standards are fake, actually” and point out how most file formats with a standard are more of a guidance, because none of the major players you’ll use ever really implement it faithfully.

So regexes is an example where most languages kind of have the same things happening, but you almost always have to look at your specific language’s documentation to use it on the fringes.

Another would be SQL. There’s a SQL standard, but Postgres, MySQL, MS SQL Server, and Oracle all have extensions outside it or ways where they differ from the standard.

Markdown would be another. It was never formally specified but every Markdown parser will do wildly different things (CommonMark was a way to try to unite them).

Even ISO8601: Python’s datetime iso8601 method, IIRC, only guarantees that it will be parsed into the same datetime object within the datetime library; I once had some kind of format error when I passed its output into a database that expected valid 8601.


There’s also a funny storyline about Scheme here: the RnRS standard system, even after forking to big and small versions of the language, didn’t prevent each Scheme from making up their own world. I think the Steering Committee had a line in one of their documents like “the only benefit of a standardization process and spec is that things are consistent across implementations; and yet that hasn’t happened here either.”

Not throwing a value judgement on any of these examples, they just come to mind.

masklinn

Answer is missing an intermediate level: FA-based engines tend to have PCRE features, but not all of them (specifically not lookaheads or lookbehinds).

special characters \n, \t, etc.; word boundaries \b and \B, word constituents \b and \B, …

Nota: the backslashed character classes (\d, \w, \s, …) are Perl / PCRE extensions to EREs.

fanf

There’s another subtlety that I learned recently from a paper about JavaScript regexes: Javascript doesn’t follow the traditional unix/perl leftmost-longest rule, so ambiguous ( ) matches can vary between at least Henry Spencer-style and JavaScript and possibly other different regex engines.

abareplace

There are many differences between the engines, for example: