2022-12-01

Removing Transparent Data Encryption (TDE)

This month blog post is about removing Transparent data encryption (TDE) from a SQL Server database. It's not as simple as you thought. If you do it carelessly, you will be doomed. You will find that the database which was previously encrypted will not mount. You are faced with the error "Cannot find server certificate with thumbprint 'XXXX'.". You then try to restore the database from a backup and get the error "The file "DatabaseFile" failed to initialize correctly. Examine the error logs for more details." This has happened because there are still encrypted values in the log file. If you reach this point and you still have a copy of the certificate used to encrypt the database, restore that certificate to the Master Database and the database that was previously encrypted should mount. If you no longer have the certificate then you can try attaching just the mdf file which should rebuild the log file. If you only have access to a backup file and not the certificate, then you really are stuck.

Let's go to the sample script about how to remove TDE correctly:

/* HOW TO REMOVE TDE
*/
-- CHECK
USE master
GO
SELECT DB_NAME(database_id), * FROM sys.dm_database_encryption_keys;
GO
--DECRYPT
ALTER DATABASE TestDB SET ENCRYPTION OFF
GO
-- CHECK encryption_state 3 --> 1
SELECT DB_NAME(database_id), * FROM sys.dm_database_encryption_keys;
GO
-- WAIT
USE TestDB
GO
DROP DATABASE ENCRYPTION KEY
GO
-- Clear Log
USE master
GO
ALTER DATABASE TestDB SET RECOVERY SIMPLE;
GO
USE [TestDB]
GO
DBCC SHRINKFILE (N'TestDB_log' , 0, TRUNCATEONLY)
GO
-- Reset Log
USE master
GO
ALTER DATABASE TestDB SET RECOVERY FULL;
GO
-- Make Compressed Backup
BACKUP DATABASE TestDB TO  DISK = N'D:\Backup\TestDB.bak' WITH NOFORMAT, NOINIT,  NAME = N'TEstDB-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION,  STATS = 10
GO

2022-11-04

Error Creating AlwaysOn Availability Group

This month blog post, I would like to share an error case on creating sql server availability group.

I had a pair of sql servers, formed a WSFC, and enabled alwayson availability group feature in SSCM. But later our system administrator said he need to destroy and create again the WSFC. After the WSFC re-created, I tried to create availability group, then I get the following errors:

"Failed to obtain cluster information. Either the specified instance of SQL Server is not running on a Windows Server Failover Cluster (WSFC) node, or the user lacks sysadmin permissions on the SQL Server instance to obtain the cluster information."

"The specified instance of SQL Server is hosted by a system that is not a Windows Server Failover Cluster (WSFC) node."

Here are the troubleshooting steps:

1. Double check availability group feature enabled in all sql replicas.

2. Check your current login got sufficient right to create availability group in all sql replicas, for simplicity, sql server sysadmin server role.

3. Check cluster permission by right-click windows cluster in fail-over cluster manager -> Properties -> Cluster Permissions

By default, the Local Administrator group has Full Control on the WSFC. So for simplicity, the sql sysadmin login you are using should be added to the OS local administrator group too.

4. Run the following sql queries on each sql replica:
SELECT * FROM sys.dm_hadr_cluster_members;
SELECT * FROM sys.dm_hadr_cluster;

It should return some rows like below:

But if no any rows returned, then that sql replica needed to disable and re-enable availability group feature in SSCM.

Solution:
Disabled alwayson availability group feature on the “bad” node using SSCM, then restarted the sql service. Then enable availability group feature again followed by another sql service restart. After finishing this, you will able to get information about the cluster and nodes via the query which I mentioned earlier.

2022-10-02

Repeatable Read causing Deadlock

Last month, one of my client consult me for an incident in its production sql server. They found a stored procedure that being called by multiple concurrent application threads raising deadlock errors frequently, and more seriously the deadlock errors were accumulated. Which means the sql server Lock Monitor (deadlock detection) resolve the deadlocks (by choosing one of the thread, which is the least expensive to roll back, as a deadlock victim, and kill it) not fast enough as the new threads spawned by the client application. (Another ref.: https://learn.microsoft.com/en-us/answers/questions/988682/sql-server-deadlock-detection-only-kills-one-victi.html). This case was very hard to troubleshoot, because the it is a single stored procedure, meaning the threads are accessing tables/objects in exactly the same logical order. Below shows the deadlock graph that being captured:

It seems a very typical deadlock graph, not enough clue provided. But when I opened the deadlock.xdl file in text, I saw isolationlevel="repeatable read (3)":

So I asked my client why the application need to set repeatable read to call that stored procedure. And you guess what they said? They told me that application was developed by outsource software vendor, and it's end-of-support now. In order to convince my client that repeatable read is the problem cause, I showed them a simple demo provided in Microsoft Tech Community:


2022-09-03

Cursor and Lock

Last month one of my client asked me a question, how SQL Server locks a table and its rows while a cursor is looping over it? So let's do an experiment using the StackOverflow2013 sample database.

Run below query on a window in SSMS. (For simplicity I just show you FAST_FORWARD which was mostly recommended by SQL Server MVPs. You can also try any other types of cursor, but I can tell you that the results are the same):

IF @@TRANCOUNT <> 0 ROLLBACK;
DECLARE @Id int;
DECLARE cur CURSOR LOCAL FAST_FORWARD FOR
    SELECT V.Id FROM Votes V JOIN Users U ON V.UserId = U.Id
OPEN cur
WHILE 1 = 1
BEGIN
    FETCH cur INTO @Id
    IF @@FETCH_STATUS <> 0 BREAK
END
CLOSE cur
DEALLOCATE cur

 Get the SPID from the query, here I got spid 57:

While the first query is running, open a new query window, change the spid value, then execute it:

SELECT      r.start_time [Start Time],session_ID [SPID],
            DB_NAME(database_id) [Database],
            SUBSTRING(t.text,(r.statement_start_offset/2)+1,
            CASE WHEN statement_end_offset=-1 OR statement_end_offset=0
            THEN (DATALENGTH(t.Text)-r.statement_start_offset/2)+1
            ELSE (r.statement_end_offset-r.statement_start_offset)/2+1
            END) [Executing SQL],
            Status,command,wait_type,wait_time,wait_resource,
            last_wait_type
FROM        sys.dm_exec_requests r
OUTER APPLY sys.dm_exec_sql_text(sql_handle) t
WHERE       session_id = 57    -- Change it to the SPID number of the first query
ORDER BY    r.start_time
EXEC sp_lock 57;    -- Change it to the SPID number of the first query
-- Change below values to the dbid and ObjId values that you got from above result
SELECT DB_NAME(12), OBJECT_NAME(149575571, 12), OBJECT_NAME(181575685, 12);

Run the second query multiple times during the first one still executing. You can see different locks hold by DECLARE CURSOR, OPEN CURSOR, and FETCH statements:


Here is the summary of the locks being held by each cursor statement:

DECLARE: just defines the attributes of a cursor and the query used to build the result set, but NOT execute the query, so it always run too fast for you to get its lock, and actually it takes no any locks.
FETCH: retrieves one row from the cursor, it only takes DB S lock.
OPEN: opens a cursor and populates it by executing the cursor query. Table/page/row locks will be obtained by this statement. By default (read committed isolation level), once the OPEN statement finished, locks will be released. So NO table/page/row locks inside each loop for the cursor query.

Final thought: cursors could be used in some applications for serialized operations, but generally they should be avoided because they bring a negative impact on performance, especially when operating on a large sets of data.

2022-08-04

Implicit Conversion performance hurt and how to solve it

SQL Server will perform Implicit Data Type Conversion to convert values from the original data type to another data type, in order to be able to make a comparison on two expressions of different types, when the query didn't explicitly specified the conversion on either side. SQL Server decides which side to apply the implicit conversion based on the Data Type Precedence which lower type will be converted to upper type. These hidden conversions can hurt performance, especially if it has to apply row-by-row. Mostly it will make the query engine to ignore any useful index and so will do table scanning.

Let's do an experiment to show how the problem happens and the solution of it. Here I use the StackOverflow2013 sample database's User table, but I altered the DisplayName column in it to type varchar, as varchar is a lower precedence data type. Then I created an index on this column. Let's see a query that will perform implicit conversion:
DECLARE @p Nvarchar(40);
SET @p = 'Community';
SELECT * FROM Users WHERE DisplayName = @p;

The input parameter @p is Nvarchar type, which is higher precedence than the column type varchar, so the CONVERT_IMPLICIT function was applied on the table column. In this case, the query is not SARGABLE and so it cannot do index seek, the query performs bad. A yellow band on the left-most root SELECT operator also indicates there's an implicit type conversion warning.


Let's try to solve it by giving the same type to the parameter:
DECLARE @p varchar(40);
SET @p = 'Community';
SELECT * FROM Users WHERE DisplayName = @p;

Now the query becomes sargable and can do index seek.
Another solution is to do an explicit conversion on the scalar value:
DECLARE @p Nvarchar(40);
SET @p = 'Community';
SELECT * FROM Users WHERE DisplayName = CAST(@p AS varchar(40));

It can get the same good result.

Implicit conversions hurt performance, especially when the implicit conversion causes SQL Server to perform it row-by-row on a table. If we found an implicit conversion in our query, the best solution is to put an explicit conversion somewhere else. For instance, if we have a scalar value, rather than allowing SQL Server to convert the column to match the type of the scalar, we can put an explicit conversion to change the scalar to match the type of the column. This allows SQL Server to choose index seek, and carry out the query with fewer overall resources.

2022-07-06

Will a Cancelled Query also count in Performance Statistics?

SQL Server has two very useful dynamic management views, sys.dm_exec_query_stats and sys.dm_exec_procedure_stats DMVs, which let us DBA to find out which queries and stored procs are expensive in terms of processing time, I/O, duration, etc. Every time a query or a stored procedure is executed, that execution's performance figures will be added into the performance statistics. But what if the execution was cancelled by the client application, such as due to being blocked and then the client timeout? Will the cancelled execution still be counted in the performance statistics? This month blog post I will do an experiment to check it out.

1. Let's create a simple testing table and stored proc:
USE TestingDB;
GO
DROP TABLE IF EXISTS dbo.Table1;
CREATE TABLE Table1 (
    pk int IDENTITY(1,1) NOT NULL PRIMARY KEY,
    col [varchar](50) NULL
);
GO
CREATE OR ALTER PROC TestSP
    @p varchar(50)
AS
BEGIN
SET NOCOUNT ON;
BEGIN TRAN
SELECT 'Before', * FROM Table1;
INSERT Table1 (col) VALUES (@p);
SELECT 'After', * FROM Table1;
COMMIT
END
GO

2. Execute the stored proc 3 times in order to populate the performance statistics:
EXEC TestSP 'X';
EXEC TestSP 'X';
EXEC TestSP 'X';


3. Check the performance statistics of the queries inside the stored proc, and the whole stored proc:
SELECT DB_NAME(qt.[dbid]) as DB, OBJECT_NAME(qt.objectid, qt.[dbid]) AS Obj,
    SUBSTRING(qt.[text], qs.statement_start_offset/2+1,
    (CASE WHEN qs.statement_end_offset = -1
        THEN LEN(CONVERT(nvarchar(max), qt.[text]))*2 ELSE qs.statement_end_offset END - qs.statement_start_offset)/2
    ) AS query_text, qs.statement_end_offset,
    qs.execution_count, qs.creation_time, qs.last_execution_time, qs.last_elapsed_time, qs.max_elapsed_time
FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_sql_text(qs.[sql_handle]) AS qt
WHERE qt.[dbid]=DB_ID('TestingDB') AND qt.objectid=OBJECT_ID('TestingDB.dbo.TestSP')
ORDER BY qs.statement_start_offset;

SELECT DB_NAME(database_id) AS DB, OBJECT_NAME([object_id], database_id) AS Obj,
    execution_count, cached_time, last_execution_time, last_elapsed_time, max_elapsed_time
FROM sys.dm_exec_procedure_stats
WHERE database_id=DB_ID('TestingDB') AND [object_id]=OBJECT_ID('TestingDB.dbo.TestSP');

As you can see, both the queries and the stored proc are executed 3 times.

4. Open another query window, execute below query to lock the whole table:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN
SELECT * FROM Table1;

5. In the previous window, execute the TestSP stored proc again, now it's being blocked. You can check it on the Activity - All Blocking Transaction server report:

You can see that the query being blocked is the INSERT Table1 statement inside the stored proc.

6. Now press the red square button to CANCEL the TestSP execution. Then check the performance statistics again:

It shows the first SELECT statement in the stored proc is executed (execution_count = 4), the last SELECT is skipped (exec_count = 3), which can be imagined. The most interesting point is, SQL Server was really tried hard to execute the INSERT statement, unfortunately being blocked, and then the whole stored proc was cancelled by the client application (yes, you, haha), and sadly its query_stats does NOT count in this hardworking try (exec_count = 3). But the procedure_stats surprisingly counts in that cancelled execution.

Conclusion:
Cancelled QUERY execution will NOT count in sys.dm_exec_query_stats.
But cancelled STORED PROC will count in sys.dm_exec_procedure_stats.

The moral:
* That's why DBA likes stored procs rather than ORM/query generated from application code!
* To monitor blocking, we should enable blocked process report session.

2022-06-10

[Microsoft][ODBC SQL Server Driver][DBNETLIB]SSL Security error

One of my clients encountered the following error, when they tried to make a connection from an application to the SQL server. [Microsoft][ODBC SQL Server Driver][DBNETLIB]SSL Security error.

From my experience, most likely this is a TLS issue where the TLS version being used in the application server doesn't match the TLS version required by the database server. Below steps demonstrate how to fix it.

1. In the database server, check the TLS versions being enabled, by checking in below registry keys:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.x\<Client and Server>\DisabledByDefault
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.x\<Client and Server>\Enabled
If you see that both TLS 1.0 and 1.1 are DisabledByDefault=1 and Enabled=0 likes below screenshot, then your database server was set to accept TLS 1.2 only.


2. Check the SQL Server version, by running SELECT @@VERSION query. If you use TLS 1.2, not every SQL Server release supports TLS 1.2, please refer to KB3135244 - TLS 1.2 support for Microsoft SQL Server to determine whether you need to upgrade or update your SQL Server.

3. In the application server, download and install the most recent ODBC driver for SQL Server (Microsoft ODBC Driver 17 for SQL Server already support TLS 1.2). Then change the connection string in the application config to use the new driver.