regex – headaches and solutions

Regex could save your bacon. Try it.

Need to parse a string that doesn’t have a consistent layout but does contain regular patterns that need to be keyed on to extract information (or change it)? Want to extract values from a string without complex nested IN(…, MID(…, SUBSTR(…, LEN(… functions?

Regex seems to be the tool to use, if you can wrap your mind around the syntax for patterns that need to be keyed on. There’s plenty of examples to be found that illustrate searching for phone numbers, social security numbers, ZIP codes, ZIP+4, IPv4 addresses and so on. Add in features like look ahead, look behind, greedy or non greedy searching and I get the feeling there’s more than enough material to develop an entire graduate level course to formulate and understand regex expressions.

My own forays into regular expressions have focused mostly on mundane simple patterns. I did however find a couple uses for the look behind to extract a substring from an irregular length string. And the substring to extract was not fixed length. To me the function based way of parsing the string would have been much more complex than working out the regex syntax.

In one case there was a substring for the size of a hard drive. That substring varied in length depending on the size of the drive. In another case strings for monitor make and model, which could each be different lengths because more than one manufacturer was in the dataset and model names varied.

In both cases the target string was preceded by a fixed string that could be used to locate the sought for value within the larger string. And the position of the target string within the larger string varied because the details before it changed from pc to pc. Rescued by the regex look behind!

I posted about it on LinkedIn if you’re interested to look. Samanage and Powershell – two tools to produce helpful reports.

The significant portions of the statements are below. They allowed looking into the string for a pattern, `Fixed hard disk media”~size: “` or `name: “` or `manufacturer: “`, and then extracting the portion of the string that immediately followed.

$regexdrv = '(?<=Fixed hard disk media"~size: ")([^"]+)'

Select-String -Pattern $regexdrv | %{$_.Matches}

$regexdisp = '(?<=name: ")([^"]+)|(?<=manufacturer: ")([^"]+)'

Select-String -Pattern $regexdisp -AllMatches | %{$_.Matches}

If you find yourself struggling to parse out some information from a string, take a look at regex. You could find the solution to your problem.