2018-12-03

Export Query Result in CSV File using T-SQL

Let's say you have a program or sql agent job, which runs a stored procedure, extract data from some tables, join them, and do some calculations. Then you want to export the result into a csv file, into a local drive or shared folder somewhere in your network. You can NOT fire BCP utility by xp_cmdshell because your company security administrator not allow your program spawning Windows command shells, and if that stored procedure is to be executed per each user button click, it will impact your sql server hardware performance.
In this post, I will show you how to do this by using OLE Automation Procedures in SQL Server. The prerequisites are you must turn on this server option, and the owner of your user database must be the same as that of the master database (using sa account as the owner of all your databases actually is the best practice).
USE master
GO
sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
sp_configure 'Ole Automation Procedures', 1;
GO
RECONFIGURE;
GO
USE TESTDB
ALTER AUTHORIZATION ON DATABASE::[TESTDB] TO sa
GO
-- =============================================
-- Author:  Peter Lee
-- Description: Convert local temp table #TempTblCSV into a CSV string variable
-- =============================================
CREATE OR ALTER PROCEDURE ConvertLocalTempIntoCSV
 @csvOutput nvarchar(max) OUTPUT
AS
BEGIN
 SET NOCOUNT ON;

 -- get list of columns
 DECLARE @cols nvarchar(4000);
 SELECT @cols = COALESCE(@cols + ' + '', '' + ' + 'CAST([' + name + '] AS nvarchar(4000))', 'CAST([' + name +'] AS nvarchar(4000))')
  FROM tempdb.sys.columns WHERE [object_id] = OBJECT_ID('tempdb..#TempTblCSV') ORDER BY column_id;

 CREATE TABLE #TempRows (r nvarchar(4000));
 EXEC('INSERT #TempRows SELECT ' + @cols + ' FROM #TempTblCSV');
 SELECT @csvOutput = COALESCE(@csvOutput + CHAR(13)+CHAR(10) + r, r) FROM #TempRows;
END
GO

/****** Object:  StoredProcedure [TLB].[WriteToFile] ******/
CREATE OR ALTER PROC [WriteToFile]
 @file varchar(2000),
 @text nvarchar(max)
WITH EXECUTE AS 'dbo'
AS  
 DECLARE @ole int;
 DECLARE @fileId int;
 DECLARE @hr int;

 EXECUTE @hr = master.dbo.sp_OACreate 'Scripting.FileSystemObject', @ole OUT;
 IF @hr <> 0 
 BEGIN 
  RAISERROR('Error %d creating object.', 16, 1, @hr)
  RETURN
 END

 EXECUTE @hr = master.dbo.sp_OAMethod @ole, 'OpenTextFile', @fileId OUT, @file, 2, 1;  -- overwrite & ALTER if not exist
 IF @hr <> 0 
 BEGIN 
  RAISERROR('Error %d opening file.', 16, 1, @hr)
  RETURN
 END

 EXECUTE @hr = master.dbo.sp_OAMethod @fileId, 'WriteLine', Null, @text;
 IF @hr <> 0 
 BEGIN 
  RAISERROR('Error %d writing file.', 16, 1, @hr)
  RETURN
 END

 EXECUTE @hr = master.dbo.sp_OADestroy @fileId;
 IF @hr <> 0 
 BEGIN 
  RAISERROR('Error %d closing file.', 16, 1, @hr)
  RETURN
 END

 EXECUTE master.dbo.sp_OADestroy @ole;
GO
-- Create a local temp table, name must be #TempTblCSV
-- you can define any columns inside, e.g.
CREATE TABLE #TempTblCSV (pk int, col1 varchar(9), dt datetime2(0));
-- Fill the local temp table with your query result, e.g.
INSERT #TempTblCSV VALUES (1, 'a', '20180112 12:00');
INSERT #TempTblCSV VALUES (2, 'b', '20180113 13:00');
INSERT #TempTblCSV VALUES (3, 'c', '20180113 14:00');

-- convert the local temp table data into a string variable, which is the CSV content
DECLARE @csv nvarchar(max);
EXEC ConvertLocalTempIntoCSV @csv OUTPUT;

-- write the CSV content into a file
EXEC WriteToFile 'H:\TEMP\PETER\output.csv', @csv;

-- Not a must if to drop the temp table, especially if you're writing a stored proc
DROP TABLE #TempTblCSV;

And here's the sample output:


2018-11-09

Another Way to Changing String Columns Collation

Let's say you have a database table, which contains millions rows of string data. Someday you need to change the collation, for example originally it's case sensitive, now you want to change it to case insensitive. According to SQL Server BOL, you can change the column collation, but first you need to drop all the reference objects referring to that column, including computed column, indexes, statistics, CHECK constraints, and FOREIGN KEY constraints. It's not a simple task, and if you do this way in your production database, you need a prolonged maintenance period. In this blogpost, I want to show you another way to do this, but it needs a schema comparison tool and a data comparison tool, such as Red Gate® SQL Compare and Data Compare.
  1. Create a new database as the destination database, specify the new collation.
  2. From the Source database, generate the creation script for all Tables, clear all options EXCEPT "Schema qualify object names". In this step, you just need to create the table, without any index nor constraints.
  3. Change the current query context database to the destination database, execute the table creation script.
  4. Generate the data copy script (INSERT…SELECT statements) for all tables, using below script:
    USE <source DB>GO
    DECLARE @destDb varchar(50) = '<destination DB>';
    SELECT CASE WHEN I.column_id IS NULL THEN '' ELSE 'SET IDENTITY_INSERT ' + @destDb + '.' + S.name + '.' + O.name + ' ON; ' END + 'INSERT ' + @destDb + '.' + S.name + '.' + O.name + ' (' + RTRIM(LTRIM(STUFF((SELECT ', [' + name + ']' FROM sys.columns WHERE object_id = O.object_id FOR XML PATH ('')), 1, 1, ''))) + ') SELECT ' + RTRIM(LTRIM(STUFF((SELECT ', [' + name + ']' FROM sys.columns WHERE object_id = O.object_id FOR XML PATH ('')), 1, 1, ''))) + ' FROM ' + S.name + '.' + O.name + '; ' + CASE WHEN I.column_id IS NULL THEN '' ELSE 'SET IDENTITY_INSERT ' + @destDb + '.' + S.name + '.' + O.name + ' OFF; ' END FROM sys.objects O JOIN sys.schemas S ON O.schema_id = S.schema_id LEFT JOIN sys.identity_columns I ON I.object_id = O.object_id WHERE O.[type] = 'U' ORDER BY S.name, O.name;
  5. Execute generated the INSERT…SELECT statements.
  6. Backup the Destination database.
  7. Run Red Gate® SQL Compare to synchronize table schema (PK, FK, check, index, etc.) and objects (view, stored proc, function, etc.) from source database to destination database.
  8. Freeze the source database, e.g. ALTER DATABASE <source DB> SET READ_ONLY;
  9. Run Red Gate® SQL Data Compare to compare and synchronize the source database and destination database.
  10. Rename the source database to XXX_OLD, and destination database to the source database original name.

2018-10-04

Using Service Broker to implement Asynchronous Trigger

Suppose you have an OLTP system, which lets users to place purchase orders (PO), the UI response reply to the user must be fast, but there's a time consuming background process need to undergo for each PO placed by the users. Such system can be implemented by using triggers and Service Broker service. A trigger queues a message that requests work from a Service Broker service. The trigger does not actually perform the requested work. Instead, the trigger creates a message that contains information about the work to be done and sends this message to a service that performs the work. The trigger then returns. When the original transaction commits, Service Broker delivers the message to the destination service. The program that implements the service performs the work in a separate transaction. By performing this work in a separate transaction, the original transaction can commit immediately. The application avoids system slowdowns that result from keeping the original transaction open while performing the work.
Below is a demonstration. There's a main table named DepartmentMaster. It has an insert trigger, which sends a message to a service broker service. The service broker service is implemented by internal activation stored procedure on the target queue, with MAX_QUEUE_READERS = 5, meaning the maximum number of instances of the activation stored procedure that the queue starts at the same time. When you insert a row into DepartmentMaster table, finally an XML file will be created inside a folder C:\TEMP\PETER\ in your testing database server. For the sake of simplicity, the example uses OLE Automation Stored Procedures for SQL Server to write the XML file.

USE master;
GO
EXEC sp_configure 'show advanced options', 1;
GO
RECONFIGURE;
GO
EXEC sp_configure 'Ole Automation Procedures', 1;
GO
RECONFIGURE;
GO

CREATE DATABASE [SBTEST];
GO
ALTER DATABASE [SBTEST] SET TRUSTWORTHY ON;
ALTER DATABASE [SBTEST] SET ENABLE_BROKER;
GO

USE [SBTEST]
GO
CREATE PROCEDURE [dbo].[WriteToFile]
  @file varchar(2000),
  @text nvarchar(max)
AS  
BEGIN
  DECLARE @ole int;  
  DECLARE @fileId int;
  EXECUTE sp_OACreate 'Scripting.FileSystemObject', @ole OUT;
  EXECUTE sp_OAMethod @ole, 'OpenTextFile', @fileId OUT, @file, 8, 1;
  EXECUTE sp_OAMethod @fileId, 'WriteLine', Null, @text;
  EXECUTE sp_OADestroy @fileId;
  EXECUTE sp_OADestroy @ole;
END
GO
CREATE MESSAGE TYPE [TestMessage] VALIDATION = WELL_FORMED_XML
GO
CREATE CONTRACT [TestContract] ([TestMessage] SENT BY INITIATOR)
GO
CREATE QUEUE [dbo].[TestQueue] WITH STATUS = OFF
GO
CREATE SERVICE [TestServiceInitiator]  ON QUEUE [dbo].[TestQueue] ([TestContract])
GO
CREATE SERVICE [TestServiceTarget]  ON QUEUE [dbo].[TestQueue] ([TestContract])
GO
CREATE TABLE [dbo].[auditlog](
 [xmlstring] [xml] NULL,
 [logTime] [datetime2](2) NOT NULL,
 [isSuccess] [bit] NOT NULL,
 [err_num] [int] NULL,
 [err_msg] [nvarchar](4000) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
-- DEMO TABLE
CREATE TABLE [dbo].[DepartmentMaster](
 [DepartmentId] [int] IDENTITY(1,1) NOT NULL,
 [Name] [varchar](50) NULL,
 [Description] [varchar](50) NULL,
 CONSTRAINT [PK_DepartmentMaster1] PRIMARY KEY CLUSTERED
(
 [DepartmentId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE  TRIGGER [dbo].[Trg_DepartmentMaster_INSERT]
ON  [dbo].[DepartmentMaster]
FOR INSERT
AS
BEGIN
            SET NOCOUNT ON;
            DECLARE @MessageBody XML 
            DECLARE @TableId int 
            --get relevant information from inserted/deleted and convert to xml message 
            SET @MessageBody = (SELECT DepartmentId,Name,Description FROM inserted 
            FOR XML AUTO)               
            If (@MessageBody IS NOT NULL) 
            BEGIN 
      -- **** INSERT TRIGGER SEND a SB request message, then return immediately ****
                        DECLARE @Handle UNIQUEIDENTIFIER;  
                        BEGIN DIALOG CONVERSATION @Handle  
                        FROM SERVICE [TestServiceInitiator]  
                        TO SERVICE 'TestServiceTarget'  
                        ON CONTRACT [TestContract]  
                        WITH ENCRYPTION = OFF;  
                        SEND ON CONVERSATION @Handle  
                        MESSAGE TYPE [TestMessage](@MessageBody);
            END
END
GO
/* **** HANDLER SP, automatically fired by Service Broker **** */
CREATE PROCEDURE [dbo].[spMessageProcTest]
AS
SET NOCOUNT ON;
DECLARE @message_type varchar(100), @dialog uniqueidentifier, @message_body XML;
 
BEGIN TRY
   
 BEGIN TRAN;
 WAITFOR (
  RECEIVE TOP(1)
   @message_type = message_type_name,
   @message_body = CAST(message_body AS XML),
   @dialog = conversation_handle
  FROM dbo.TestQueue
 ), TIMEOUT 500
 ;
 -- Received an message
 IF (@@ROWCOUNT != 0 AND @message_body IS NOT NULL)
 BEGIN
  IF @message_type = 'TestMessage'
  BEGIN
   -- **** PROCESSING HERE, e.g. Export To File ****
   DECLARE @fileNum int = @message_body.value('(/inserted/@DepartmentId)[1]', 'int');
   DECLARE @fileContent nvarchar(max) = CAST(@message_body AS nvarchar(max));
   DECLARE @filePath varchar(2000) = 'C:\TEMP\PETER\' + CAST(@fileNum AS varchar(99)) + '.xml';
   EXEC dbo.WriteToFile @filePath, @fileContent;
   -- Success Log
   INSERT INTO auditlog (xmlstring, isSuccess) values(@message_body, 1);
  END
  END CONVERSATION @dialog;
 END
 COMMIT;
END TRY
BEGIN CATCH
 DECLARE @error int, @message nvarchar(4000);
 SELECT @error = ERROR_NUMBER(), @message = ERROR_MESSAGE();
 -- Uncommittable tx.
 IF XACT_STATE() = -1
 BEGIN
  ROLLBACK;
 END
 -- Error but Commitable tx.
 IF XACT_STATE() = 1
 BEGIN
  COMMIT;
 END
 -- FAILED Audit
 INSERT INTO auditlog (xmlstring, isSuccess, err_num, err_msg)
  VALUES (@message_body, 0, @error, @message);
 -- End Conversion with Error
 END CONVERSATION @dialog WITH error = @error DESCRIPTION = @message;
END CATCH
GO
ALTER QUEUE [dbo].[TestQueue] WITH STATUS = ON , RETENTION = OFF , ACTIVATION (  STATUS = ON , PROCEDURE_NAME = [dbo].[spMessageProcTest] , MAX_QUEUE_READERS = 5 , EXECUTE AS OWNER  ), POISON_MESSAGE_HANDLING (STATUS = ON);
GO
-- TESTING --
INSERT DepartmentMaster (Name, Description) VALUES ('IT Dept', 'IT DEAPRTMENT');
SELECT TOP (1000) *, casted_message_body =
CASE message_type_name WHEN 'X'
  THEN CAST(message_body AS NVARCHAR(MAX))
  ELSE message_body
END
FROM [SBTEST2].[dbo].[TestQueue] WITH(NOLOCK)
;
SELECT xmlstring.value('(/inserted/@DepartmentId)[1]', 'int'), * FROM auditlog;
SELECT * FROM DepartmentMaster;

2018-09-07

Estimating SQL Server Recovery Time

Every time when an SQL Server instance is restarted, no matter it is a planned restart or cluster failover, SQL Server undergoes the recovery process, which takes time to rollback uncommitted transactions, and roll-forward committed transactions (hardened to log file, but not yet applied to data file) during the server restart. Depending on the workload on your databases and how many databases are there, recovery may take a long time to run. There is no any exact way to calculate the time a recovery will take, however, you can get an estimation based on the last recovery that you SQL Server took. Below sql script employs the undocumented system stored procedure, xp_readerrorlog, to get the latest recovery start time and completion time, and calculates the last recovery time:
SET NOCOUNT ON;
DECLARE @RecoveryStart datetime, @RecoveryStop datetime;
DECLARE @tbl_RECOVERY_START table (LogDate datetime, ProcessInfo nvarchar(max), [Text] nvarchar(max));
DECLARE @tbl_RECOVERY_STOP table (LogDate datetime, ProcessInfo nvarchar(max), [Text] nvarchar(max));
INSERT @tbl_RECOVERY_START EXEC xp_readerrorlog 0, 1, N'(c) Microsoft Corporation.', NULL, NULL, NULL, N'ASC';
INSERT @tbl_RECOVERY_STOP EXEC xp_readerrorlog 0, 1, N'Recovery is complete. This is an informational message only. No user action is required.', NULL, NULL, NULL, N'DESC';
SELECT @RecoveryStart = LogDate FROM @tbl_RECOVERY_START;
SELECT @RecoveryStop = LogDate FROM @tbl_RECOVERY_STOP;
SELECT DATEDIFF(second, @RecoveryStart, @RecoveryStop) AS RecoveryTimeSeconds;

2018-08-14

ClearTrace - Free Tool to summarize SQL Trace Results

Let's say you capture the workload on your production SQL Server, by using SQL Profiler or SQL Trace to gather the events of query executions. In order to determine the most expensive queries within the workload, you need to summarize the trace results, group the events by query text, and order the summarized items by CPU or disk IO or duration. You also want to normalize the query text, such that common statements can be grouped together. ClearTrace is a free tool that can help, it can imports SQL Profiler and SQL Trace results, from SQL Server 2000 to the latest SQL Server version. Below are the steps demonstrating how to use ClearTrace:
  1. Download ClearTrace from scalesql.com. You don't need to set it up on the database server, it can be executed in any computer which able to access the trace result files.
  2. Extract the downloaded ClearTrace.XX.zip compressed file.
  3. Execute the ClearTrace.exe executable file, which shows the ClearTrace GUI.
  4. When you first time start ClearTrace in your computer, you will be asked to set the database in the Tools -> Options menu. If the database doesn’t exist you will be given the option to create it. If it does exist ClearTrace will create the necessary objects in the database. All tables and views are prefixed with “CT” so they are easy to identify.
       
  5.  Select the first trace result file to import. Trace files in the same directory will also be imported in sequence. ClearTrace only processes RPC:Completed and SQL:BatchCompleted events. You can create the trace by yourself, including the required event columns: EventClass, TextData, HostName, ApplicationName, LoginName, SPID, Duration, EndTime, Reads Writes, and CPU. Or you can download the sql script from ClearTrace website. Anyway, I prefer creating it by myself.
     

     
  6. Press Import Files button. ClearTrace will display the status. After the process completed, the summarized result will be shown. Then you can change the grouping and ordering of it.
     
  7. You can also get the summarized data from the ClearTrace database, by querying the view CTTraceSummaryView.
Please be reminded that ClearTrace is still in BETA. But it still a handy tool for database performance tuning, and it's free.

2018-07-17

Reduce PAGELATCH Waits in TempDB on MSSQL 2016

An Microsoft KB article, Performance issues occur in the form of PAGELATCH_EX and PAGELATCH_SH waits in TempDB when you use SQL Server 2016, really catch my eye. This article mentions that in order to workaround that TempDB PAGELATCH wait symptom in SQL Server 2016, you should remove the DROP TABLE commands for any temporary tables that won't be reused within the same execution of the stored procedure. What? Don't drop temporary tables inside stored procedures? Dropping them before the stored procedure finish should be the coding practice that many SQL developers believed in! But in fact, according to SQL Server book online, a local temporary table created in a stored procedure is dropped automatically when the stored procedure is finished, so actually SQL developers not need to worry about it. Another suggestions mentioned in the KB article also worth to note:
  • Avoid using the TRUNCATE TABLE command against temporary tables.
  • Avoid altering temporary tables after they have been created.
  • If indexes are created against the temporary table, consider moving these to inline index creation statements within the CREATE TABLE command.
Inline Index Creation was introduced in SQL Server 2014, and further improved in SQL Server 2016. Below is an simple example demonstrating the idea of that KB:

CREATE PROCEDURE uspTestTempTable
AS
BEGIN
    CREATE TABLE #tempTable1 (
        pk int PRIMARY KEY,
        col1 int INDEX idxTempTbl1Col1,
        col2 int,
        col3 int,
        INDEX idxTempTblCol2Col3 (col2, col3)
    );
    INSERT #tempTable1 (pk, col1, col2, col3) VALUES (0, 1, 2, 3);
    SELECT * FROM #tempTable1;
    /* DON'T DROP the TEMP TABLE */
END
GO

EXEC uspTestTempTable;
EXEC uspTestTempTable;





2018-06-14

SQL Stress Test using OStress

Although not bundled with SQL Server, Microsoft provides a sql query stress tool, named OStress, comes with RML utilities package which is absolutely free of cost. OStress is a command line tool similar to SQLCMD utility, so database developers and administrators should find it's easy to pick up. You can download OStress in the RML Utilities package available here: Description of the Replay Markup Language (RML) Utilities for SQL Server. There are 32bit and 64bit versions, just download the right one for your server. Below are the steps to demonstrate how to use OStress to stress test your sql server:

1. Download the RML Utilities package as mentioned above.
2. Install the package, just tick term and condition check box and press next to install.
3. Open the RML Utilities Command Prompt, you can find it at:
    Start > All Programs > RML Utilities for SQL Server > RML Cmd Prompt.
4. Type "ostress" then press enter, which shows you all the usage options.
5. For example, you wanna stress test a stored procedure named uspTest1, with 25 user connections having concurrent execution of 50 iterations, and output the result log files into a directory. The command should be like this:
ostress -S.\DEV2014 -dStackOverflow2010 -E -n25 -r50 -q -Q"EXEC uspTest1" -oC:\Temp\OStressOutput
6. Then you can check the query.out log files inside the output directory.

2018-05-16

Free SQL Code Formatter - ApexSQL Refactor

I found a free tool, ApexSQL Refactor, which can be integrated into your SSMS and being used for better formatting your sql code. Let me demonstrate how it can be done.
For example, you want to create a stored procedure that you found very useful from the internet, but it wasn't formatted very well:
Then you can format the code inside your active query window, by choosing ApexSQL > ApexSQL Refactor > Format SQL by profile, then click the formatting profile you like:
Then your code will be formatted as below:
Also, you can create your own formatting profile too, by going to ApexSQL > ApexSQL Refactor > Options... > New. There are many options that you can specify in order to create your own favorite sql code formatting profile.


2018-04-22

Reclaim Data Space in VarBinary(Max) Column

SQL Server allows applications to store binary data, such as word files and images, in varbinary(max) column. Most likely the applications don't need to store those binary data permanently, so that we can define a retention period for the binary data, and data purging job should be created in order to free up disk space. Some applications only allow purging binary data such as photos, but other data fields related, such as id-number and name must be kept permanently. By the way, UPDATE varbinary(max) column to NULL cannot free up the unused space, only DELETE the row can make it. Below example is a proof:
CREATE TABLE [dbo].[TestBlob](
    [pk] [int] NOT NULL PRIMARY KEY,
    [blob] [varbinary](max) NULL
)
GO

TRUNCATE TABLE TestBlob;

SELECT 'EMPTY'
SELECT blob, COUNT(*) AS cnt FROM TestBlob GROUP BY blob;
EXEC sp_spaceused @updateusage = N'TRUE'
EXEC sp_spaceused 'TestBlob';

SET NOCOUNT ON;
DECLARE @i int = 1
WHILE @i < 100000
BEGIN
INSERT TestBlob (pk, blob) SELECT @i, CONVERT(varbinary(max), '
asdafdfdsfdsfdsfdfsdfsdgfgdfghghfjgfhjgkgjkjhkhlkljkljkljklkjljkljkljkljlkjlkkkkkkkkkkkkkkkkkkkkkkkkkkkkkjaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa12113433aaaaaaaaaaaaaaaaaaaaaaaaa
asdafdfdsfdsfdsfdfsdfsdgfgdfghghfjgfhjgkgjkjhkhlkljkljkljklkjljkljkljkljlkjlkkkkkkkkkkkkkkkkkkkkkkkkkkkkkjaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa12113433aaaaaaaaaaaaaaaaaaaaaaaaa
')
SET @i +=1
END

SELECT 'FULL'
SELECT blob, COUNT(*) AS cnt FROM TestBlob GROUP BY blob;
EXEC sp_spaceused @updateusage = N'TRUE'
EXEC sp_spaceused 'TestBlob';

UPDATE TestBlob SET blob = NULL;

SELECT 'UPDATE NULL';
SELECT blob, COUNT(*) AS cnt FROM TestBlob GROUP BY blob;
EXEC sp_spaceused @updateusage = N'TRUE'
EXEC sp_spaceused 'TestBlob';

DELETE TestBlob;

SELECT 'DELETE';
SELECT blob, COUNT(*) AS cnt FROM TestBlob GROUP BY blob;
EXEC sp_spaceused @updateusage = N'TRUE'
EXEC sp_spaceused 'TestBlob';

GO
Here's the result:
As the result shows, UPDATE varbinary(max) to NULL cannot reduce the used space by data, only DELETE can reduce it. In order to make it possible to free up disk space, one method is separating it into another table, e.g.
UserTable
userId int primary key
username varchar(50)
...
PhotoTable
userId int primary key
photo varbinary(max)
Then the binary data can be deleted.

2018-03-22

Myth: Index Seek Always Better Than Table/Index Scan

Many DBAs and developers believe index seek always performs better than scan. But in fact, it depends. Let's take a look on an example:

SET STATISTICS IO, TIME ON;
SELECT TOP 1000 * FROM [Users] WHERE DisplayName LIKE 'B%';
SELECT TOP 1000 * FROM [Users] WITH(FORCESEEK) WHERE DisplayName LIKE 'B%';


In this example, I use the Stack Overflow public database StackOverflow2010 which is free to download. The Users table there has a primary key clustered index on its id column, and a nonclustered index on its DisplayName column, which has no any included columns. Below shows the table schema and the index creation statement:
CREATE NONCLUSTERED INDEX IX_DisplayName ON Users (DisplayName);

When the 1st select query (without any table hints) being executed, the engine picks Clustered Index Scan operator to run it. And for the 2nd select query, as it has a FORCESEEK table hint, Index Seek on IX_DisplayName and Key Lookup on the primary key will be used. Below shows the actual execution plans:

So many people will jump in and suggest to optimize the query by the FORCESEEK hint. No, it's not so simple. Let's take a look on the STATISTICS IO output:

Table 'Users'. Scan count 1, logical reads 1156, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Users'. Scan count 1, logical reads 3081, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The "tuned" 2nd query induces more reads, and so more memory usage and more disk IO. That's why the SQL Server engine decides to scan rather than seek by itself. The extra reads on the 2nd query come from the Key Lookup for each rows returned from the Index Seek operator. Also, Clustered Index Scan in the 1st query isn't really scan the whole table, it's because there's a residual predicate on the DisplayName column, and also the TOP operator already placed a limit on the number of rows to be read.
From this demonstration, we can conclude that scan sometimes can perform better than seek.

2018-02-22

SSMS New Security Features: Vulnerability Assessment and Data Classification

The latest SSMS version 17.x provides two new features on security, Vulnerability Assessment and Data Classification. Vulnerability Assessment is supported for SQL Server 2012 and later. Data Classification is supported for SQL Server 2008 and later. The Vulnerability Assessment runs a scan on your database, based on Microsoft’s recommended security best practices, highlights any vulnerabilities found, and gives you actionable steps to resolve those security issues. You can run the VA scan on your application database to check any vulnerabilities on it, and also run the VA scan on the master database that checks for server-level security issues. The VA scan can be started by expand Databases > right-click the database to check > point to Tasks > select Vulnerability Assessment > click on Scan for Vulnerabilities. After the scanning completed, the report of the VA results will be shown, containing passed and failed checking items. Each failed checking item has a suggested remediation, mostly an executable SQL script to fix the security issue. For example:
 
SQL Data Discovery and Classification is a new tool to discover and classify sensitive data in your database tables, helps you to meet data privacy standards such as GDPR required by EU. This tool scans your application database, every column in every table, discovers any possibly sensitive data, and classifies those columns by two metadata attributes: Sensitivity Labels - the main classification attributes to define the sensitivity level of the data stored in the column; and Information Types - the additional granularity into the type of data stored in the column. The scan can be started by right click on the database > choose Tasks > Classify Data. Below is the classification result of AdventureWorks2017 database:


2018-01-23

SQL Sever 2017 CU3 TempDB Spill Diagnostics

When SQL Server has poorly under-estimated the amount of rows that will be returned from an operator in a query execution plan, less memory will be granted, finally the query execution will spill out to tempdb, and the query runs slow as more disk I/O will be incurred. You can fix those queries by adding missing indexes and update statistics with reasonable sampling size. The latest SQL Server 2017 CU3 added some improvements on tempdb spill diagnostics in DMV and Extended Events to let us find out which queries have tempdb spilling problem. Below query employs the new _spills columns in sys.dm_exec_query_stats DMV:
SELECT
DB_NAME(QP.[dbid]) AS [db_name],
OBJECT_NAME(QP.objectid, QP.[dbid]) AS [object_name],
SUBSTRING(ST.[text], (QS.statement_start_offset/2)+1,  
    ((CASE QS.statement_end_offset
    WHEN -1 THEN DATALENGTH(ST.[text]) 
    ELSE QS.statement_end_offset 
    END - QS.statement_start_offset)/2) + 1) AS stmt_text,
QS.execution_count AS execution_count,
QS.total_spills / 128.0 AS total_spills_mb,
QS.last_spills / 128.0 AS last_spills_mb,
QS.min_spills / 128.0 AS min_spills_mb,
QS.max_spills / 128.0 AS max_spills_mb,
(QS.total_spills / QS.execution_count) / 128.0 AS avg_spills_mb
FROM sys.dm_exec_query_stats QS
CROSS APPLY sys.dm_exec_sql_text(QS.[sql_handle]) ST
CROSS APPLY sys.dm_exec_query_plan(QS.plan_handle) QP
WHERE QS.total_spills > 0
ORDER BY total_spills_mb DESC;