Regex and SQL Server: A Poor Man’s Quick Formatter.

I still value well formatted queries thought! Don’t give me junk!

Words: 854

Time to read: ~ 4.5 minutes

It’s only since I’ve started delving into PowerShell that I’ve started to use Regular Expressions (Regex).

Most of you probably don’t see an issue with this since you’ve encountered Regex before and know what it’s like.

For those that don’t, Google/Bing “xkcd 99 problems regex” to find out the gist of what people feel about it.

For those of the uniquely SQL Server persuasion, this may not even be a big deal.

So what? SQL Server doesn’t even have Regex. Why are you talking about this?

They

SSMS and Azure Data Studio

As I’m sure many of you have before, I recently received some SQL code that involved xml stored as nvarchar and some funky use of SUBSTRING() and PATINDEX() functions to get information out of it. e.g.

Write that Funky SQL…

This is a contrived example but I was given a script that got the “Discipline”, “DocumentVersion”, “DocumentNumber”, “SectionNumber”, and “SectionName” out of the above.

And while it works, I hate that formatting. Everything is all squashed and shoved together.

No, thanks. Let’s see if we can make this more presentable.

Manually

Manual effort

This is … okay. It gets old fast though and is very, very manual. It’s okay when everything lines up, like in the first part of the above gif, and we can use the “Ctrl + Alt & Mouse” trick but when they don’t line up, like in the second part, we have to do format each individual part.

Regex

There is an option in SSMS and Azure Data Studio when you hit “Ctrl + F” (Find) or “Ctrl + H” (Find and Replace) where you have the option to use Regex.

.* = whatever I guess

You can place regex in the “Find” box and it will use that for searching.

Let’s break the script I was given down into the formats I like:

  • I like spaces before and after brackets
    • So I need to find spaces
  • I like spaces after commas,
    • So I need to find commas
  • I like spaces before and after plus signs, and
    • So I need to find plus signs “+”
  • I like spaces before and after minus signs.
    • So I need to find minus signs “-“
  • I like nvarchar data types to have the N” before the single quotes.
    • So I need to find single quotes “‘”

Let’s turn that into Regex

Straight away, we hit a problem. Regex has the idea of “capturing groups” and the way to specify which is captured into a group is by wrapping things in brackets.

This means that if we are searching for brackets, we’ll need to escape them, which is nearly the same as escaping characters in SQL.

WHERE Name LIKE '\_underscore' ESCAPE '\';

The difference between SQL escape and Regex escape is that, in SQL, you can choose any character to be the escape character while, in Regex, the backslash “\” character is used.

As for Groups, I’ll show you later more about them and what you can do with them.

  • \(
  • ,
  • \+

Now that we have all that converted, we can put them into our search and add what we need to format them.

I added an extra search for double spaces to be replaced with single spaces since I kept replacing with a space before characters and a space after characters and that created double spaces! In the end, we are left with something more readable.

Regex and Groups

I told you earlier I’d show you more about Groups, especially what you can do with them.

Let’s take the example of changing the above example to use xQuery.

I’ve cheated a bit and created the OUTER APPLYs for the different nodes and created the XQuery for the Discipline record.

Now we can use Regex and groups to change all those NULL AS ColumnName to valid syntax

Regex time

We need the following regex

  • NULL AS (\w+)

What I’m searching for here is

  • The exact letters “NULL AS “
  • One or more “word” letters, here it means everything up to the comma or a newline, and
  • I’ve put that “word” regex in brackets so it’s our “capture group”

All together, this just finds the rows
NULL AS DocumentVersion,
...
NULL AS SectionName

Since we have a capture group, we can use that what is in that group, by specifying $1, so

$1s.$1.value(‘.’, ‘nvarchar(255)’) AS $1Returns
NULL AS DocumentVersionDocumentVersions.DocumentVersion.value(‘.’,’nvarchar(255)’) AS DocumentVersion
NULL AS SectionNameSectionNames.SectionName.value(‘.’, ‘nvarchar(255)’) AS SectionName

Since we only have 1 capture group we have access to $1 but if we have multiple capture groups, then we have access to multiple $1, $2, $3

And we have a fully functioning XQuery using Regex in SQL Server.

$

There’s always a case of cross pollination learning when you look into different technologies.

I’m happy to add another tool to my belt but at the same time, I’m happy that it’s only available in SSMS & Azure Data Studio and not the SQL Server engine.

If you need standard formatting on queries, and I assure you that you do, consider getting a tool that will do a standard formatting for you.

Regex is a tool, just like any other. Add it to your belt but know when you can and should not use it.

Author: Shane O'Neill

DBA, T-SQL and PowerShell admirer, Food, Coffee, Whiskey (not necessarily in that order)...

One thought on “Regex and SQL Server: A Poor Man’s Quick Formatter.”

What's your opinion?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s