#TSQL2sday 171: Describe the Most Recent Issue you Closed

What happens when nothing changes, sp_send_dbmail gets given sysadmin, and you still can’t get emails.

Words: 817

Time to read: ~4 minutes

Invitation

Welcome to TSQL2sday, the monthly blogging party where we are given a topic and are asked to write a blog post about it.

This month we have Brent Ozar ( b ) asking us about the latest issue closed.

I don’t like posting about issues unless I fundamentally understand the root cause. That’s not the case here. A lot of the explanation here will be hand-waving while spouting “here be dragons, and giants, and three-headed dogs”, but I know enough to give you the gist of the issue.

sp_send_dbmail

Like most issues, even those not affecting production, this one was brought to the DBA team as critical and needed to be fixed yesterday.

A Dev team had raised that a subset of their SQL Agent jobs had failed. The error message stated:

Msg 22050, Level 16, State 1, Line 2
Failed to initialize sqlcmd library with error number -2147467259

That makes sense; the only jobs that were failing were ones that called sp_send_dbmail using a @query parameter. And I know that when you use that parameter, the code is given to the sqlcmd exe to run it for you.

Google fu

From research (most of the time, I ended up in the same post), the error fragment “failed to initialize sqlcmd library with error number” could be related to a number of things.

  1. The database object not existing
      No, the agent fails even when the query is SELECT 1;
  2. The wrong database context used
      No, SELECT 1;
  3. sqlcmd not being installed or enabled
      It was working beforehand, so I would say not.
  4. Permissions

Permissions

Well I had already tested that the query worked as long as it was valid SQL, so let’s try permissions.

I increase permissions…no luck.
I grant a bit more permissions…nope.
A bit more permissions…still nothing.
ALL the permissions… MATE, YOU’RE A SYSADMIN! WHAT ARE YOU ON ABOUT?!! …ahem… nothing.

Workaround

Strangely enough, the post mentioned that using a SQL Authentication account worked.
So we tested it using EXECUTE AS LOGIN = 'sa'; and it worked.
Which was weird, but I’ll take a workaround. Especially since it gave us time to investigate more.

Thanks to dbatools, I threw together a PowerShell script that went through all of the SQL Agent jobs that contained sp_send_dbmail and wrapped them up.

EXECUTE AS LOGIN = 'sa'; EXEC dbo.sp_send_dbmail ...; REVERT

I’m not going to share that script here cause it is a glorious mess of spaghetti code and if branches.
I gave up on regex and did line-by-line thanks to the massive combinations of what was there e.g.

  • EXEC msdb.dbo.sp_send_dbmail .
  • EXECUTE sp_send_dbmail.
  • EXEC msdb..sp_send_dbmail.
  • sp_send_dbmail in a cursor so we need to revert after each call in the middle of the cursor.
  • sp_send_dbmail where there are spaces in the @body parameter so I can’t split on empty lines.

What’s Happening?

After getting the workaround in place, the team lead and I noticed something strange.

Sure, EXECUTE AS LOGIN = 'sa'; worked, but try it as a Windows Domain login and you get something different.

Could not obtain information about Windows NT group/user '<login>', error code 0x5

Something weird was happening between SQL Server and Windows.
Great for me! I get to call in outside help.
Not great for ye! The remaining explanation is going to be shallower than the amount of water I put in my whiskey.

What Changed

Nothing!

Or so we were told. Repeatedly.
We did not believe that. Repeatedly.

Next, we started to get “The target principal name is incorrect. Cannot generate SSPI Context” on the servers.

Not being able to send emails from a SQL Agent Job is one thing, but not being able to Windows Authenticate into a SQL Instance at all is another thing all together.

Eventually, as awareness of the issue increased, the problem was narrowed down to a server configuration on a Domain Controller. I’m assuming that the name of this server configuration is “nothing”.

“Nothing” was set on one server but not the other meaning that using one DC over another meant Kerberos did something that it was not supposed to. I’m reliably informed that “nothing” has something to do with encryption and gatekeeping. I reliably replied that Kerberos should be spelt Cerberus but was ignored.

Testing

With “nothing” in place properly, the SSPI Context errors disappeared.

I reverted the workaround EXECUTE AS wrapper on sp_send_dbmail and emails started flowing again, even without the wrapper. Even with permissions reduced back to what they were.

Research

Sometimes the problem is actually outside SQL Server. Those are the good days. Other days it is a SQL Server problem and you have to fix it yesterday.

All I can do is take what happens, do a bit more research, and learn from them. That way, if it were to happen again, I can speed up the resolution process.

At least this way, if someone ever asks me if I know anything about Kerberos, I can tell them that I know about “nothing”.

I’ll enjoy that.

Author: Shane O'Neill

DBA, T-SQL and PowerShell admirer, Food, Coffee, Whiskey (not necessarily in that order)...

5 thoughts on “#TSQL2sday 171: Describe the Most Recent Issue you Closed”

  1. For reason as those (or always, when you use an Active Directory account to run the SQL server service and need to access any network ressources) there is the Kerberos Configuration Manager (https://www.microsoft.com/en-us/download/details.aspx?id=39046) which can be downloaded for free and produces (or executes, when you have the permissions) the commands to set up the correct Kerboros config.

    Needs to be done usually once per life for each SQL server installation.

    PS: I really hope, that you are using an explicit AD account for every server you run. It is okay, to use the same account for standby / fallback machines, but different SQL servers should use different accounts.

    PPS: I don’t know, if the Kerberos Configuration Manager would be able to fix the “Nothing”-error that you were facing -never had this problem by myself, either because our domain admins configured it correct / otherwise or because I use the tool always after setting up a new server every few years.

    1. Thanks! That’s good information, and it came up in our investigation.

      Unfortunately with big organisations, we’re restricted in what we can install without signoff from Security.
      Rather than go down the road of requesting, explaining, defending, and then installing the Kerberos Configuration Manager in X (hours/days), we kept going with our investigation.

      It’s a good point, though. Do we request it now in case something like this happens again?
      It’s something to think about when I get a chance.

      PS: No worries, we use explicit accounts for all machines except for replicas, etc.

      1. this is not tool you need to install, it’s just an EXE you can run as portable (okay, maybe you install it local and copy to somewhere).

        Theoretical you could even use a VM or your local PC, install there a similar SQL Version which uses the same account to run the SQL Server service, run the Kerberos Configuration Manager there and let it suggest / create the commands that needs to be executed.

        Then you take those two commands (are very simple, but I didn’t bother to rembember them), replace the server name in them and sent them to your Domain Admin, who will need execute them once.

Leave a ReplyCancel reply

Discover more from No Column Name

Subscribe now to keep reading and get access to the full archive.

Continue reading

Exit mobile version
%%footer%%