Feed aggregator

How It Works: FileStream (RsFx) Garbage Collection–Part (2)

SQL Server SQL CSS - Fri, 04/04/2014 - 08:45

In a previous post I outlined the basics of File Stream, garbage collection: http://blogs.msdn.com/b/psssql/archive/2011/06/23/how-it-works-filestream-rsfx-garbage-collection.aspx

This post continues the discussion, outlining specific details as to how the garbage collection progresses.

A Single GC Thread

The instance of SQL Server contains a single, background worker performing the garbage collection activities.   The FSGC worker, wakes up every ~5 seconds and loops over each database, managing the tombstone table entries in small batches. 

foreach database
      Do File Stream Garbage Collection (batch size)

Fixed Batch Size

The batch size for the background, file stream, garbage collection (FSGC) worker is currently 20 rows (a hard coded value.)   This allows FSGC to remain a background and unobtrusive process.

Note:  The type of tombstone row and status determines the action FSGC takes.   For example a delete of a row enters a single delete entry into the tombstone table.  A truncate table may enter a single row in the tombstone table and FSGC understands the single file delete vs truncate status and may take broader action.

sp_filestream_force_garbage_collection (http://technet.microsoft.com/en-us/library/gg492195.aspx)

If you are like me the first thing I did was the math on the 20 rows per batch every 5 seconds and determined there is a finite number of rows in a 24 hour period the FSGC can accomplish.  Enter sp_filestram_force_garbage_collection in SQL Server 2012 as a new feature.   

The procedure allows you to execute the FSGC as a foreground process without the batching limits.   This can be helpful if you have a large number of changes (inserts/updates/deletes) to file stream data where the tombstone rows and associated disk files have grown to a large number you wish to aggressively cleanup.

Delete and Status

When you do an update to a file stream column or perform a delete of the row an entry for the delete is placed in the tombstone table.

For example:

update tblDocuments set FSData = ‘New Image’ where PK = 1

Checkpoint and Log Backup: The ‘Orig Image’ can’t be deleted until properly synchronized with the backup stream activities.   The first log backup after the update secures the Orig and New Image.  This allows point-in-time restore capabilities, just before the update during restore.

If you select from the tombstone table before FSGC and proper checkpoint/log backup has been executed the status column for the entry will contain a (####7) such as 17 or 217.

Prepare for delete (7): The status column is a combination of bits used by FSGC to determine the state of the row.   A value of (7) in the lower position is an indication of prepare for delete to the FSGC.

Do physical file deletion (8):  FSGC must transition the entry from the status of 17 or 273 to 18 or 283, requiring a second checkpoint and log backup take place before the physical file can be removed from disk storage.

Confused yet?  I was so let me walk you through the activity (checkpoints, log backups and status changes.)

  1. Update – Enters new entry with status #7 into tombstone
  2. FSGC runs – can’t do anything no log backup and checkpoint has advanced the checkups and backup LSNs (you can see with dbcc dbtable).  Entries stay at #7 status.
  3. Checkpoint/Log backup executes
  4. FSGC runs – finds first batch (20) rows by lowest LSN values and attempts appropriate actions.  Finds entry of status #7 with proper checkpoint/log backup of status #7 and updates status to #8 so next FSGC after another checkpoint/log backup can do physical removal of file.  (*See below for update details.)
  5. FSGC runs – can’t do anything no log backup and checkpoint has advanced the checkups and backup LSNs (you can see with dbcc dbtable).  Entries stay at #8 status.
  6. Checkpoint/Log backup executes
  7. FSGC runs – finds first batch (20) rows by lowest LSN values and attempts appropriate actions.  Finds entry of status #8 with proper checkpoint/log backup of status #8; removes the physical file and deletes tombstone entry.

* The update of #7 to #8 status is not an in-place update.   The update is a delete (old tombstone row) and insert (updated) row pairing.    This matters to you because of the following.

FSGC is looking at the top 20 rows during each batch.   When it transitions the 1st row (shown above) from #7 to #8 status the row is added to the end of the tombstone table because it is the newest LSN. 

In this example if you delete 10,000 rows from the base table you get 10,000 entries in the tombstone table.   The physical, file deletion won’t occur until the FSGC has first transitioned all 10,000 entries from #7 to #8 status.  Since FSGC runs every ~5 seconds in batches of 20 the math is really 10,000 * 2 (because you have to go through #7 to #8 status changes and then divide by 20 * 5 or  (20000/20) = 1000 FSGC batches @ 5 seconds per batch = 5000 seconds or ~83 minutes (assuming proper checkpoints and log backups) to physically remove the original files.

In Summary

Understanding the transitions states, batch sizes and requirements for log backups and checkpoints should allow you to maintain your databases with file stream data better.

  • Make sure you have regular log backups
  • If database is inactive you may want to run manual checkpoint(s)
  • Consider sp_filestream_force_garbage_collection on SQL Server 2012 and newer versions

Bob Dorr - Principal SQL Server Escalation Engineer
     

Categories: SQL Server MS Blogs

I think I am getting duplicate query plan entries in SQL Server’s procedure cache

SQL Server SQL CSS - Thu, 04/03/2014 - 10:12

Before the post dives into the subject I need to point out that Keith did most of the work.  I just kept pestering him with various scenarios until he sent me the e-mail content I needed.   Thanks Keith –

Keith devised a set of steps that you can use to collect information about the plans and the associated plan, key attributes.  Using these queries you can track down entries in procedure cache with the same handles and determine what attribute is different, indicating why there appears to be duplicate entries for the same query.    It is often as simple as a SET statement difference.

From: Keith Elmore

Bob asked me to take a quick look and see if I could make some headway on understanding why there appears to be duplicate plans in cache for the same sql_handle and query_hash. In researching this, if you call a procedure that references a temp table created outside of that scope, the cached plan has the session_id as part of the cache key for the plan. From http://technet.microsoft.com/en-us/library/ee343986(v=SQL.100).aspx

If a stored procedure refers to a temporary table not created statically in the procedure, the spid (process ID) gets added to the cache key. This means that the plan for the stored procedure would only be reused when executed again by the same session. Temporary tables created statically within the stored procedure do not cause this behavior.

Because the customer is invoking this query via sp_executesql and the temp table is created outside of the sp_executesql the above condition applies, and the theory is that this could be causing the larger number of entries even though the sql_handle and query_hash are the same. But in order to confirm this theory we need some additional data. If the customer wants to pursue this, the following queries is what I’d want to run:

1. A single execution of this query from SSMS.

-- Look and see if there is any hash bucket with a large number of entries (> 20)

-- which may cause slower lookup of entries

select p1.* from sys.dm_exec_cached_plans p1

join (select bucketid, count(*) as cache_entries, count(distinct plan_handle) as distinct_plans from sys.dm_exec_cached_plans p

group by bucketid

having count(*) > 20) as p2 on p1.bucketid = p2.bucketid

2. Run the following query from SSMS, which will save all of these "duplicate" queries into a permanent table that we’ll retrieve.

-- Save all of the "duplicate" plans for this specific query in a table in tempdb

select qs.sql_handle, qs.statement_start_offset, qs.statement_end_offset,

qs.creation_time,

qs.execution_count,

qs.plan_generation_num,

p.*

into tempdb..DuplicateCachePlans

from sys.dm_exec_query_stats qs

join sys.dm_exec_cached_plans p on qs.plan_handle = p.plan_handle

where qs.sql_handle = 0x0200000093281821F68C927A031EDA1B661FC831C10898D0

and qs.query_hash = '0x07BD94E2146FD875'

3. From a command prompt, bcp out the data from the table above, as well as the plan_attributes data for each of these plans (add appropriate server name with –S parameter and optionally add path to where you want the file written, filename highlighted in yellow below)

bcp "select * from tempdb..DuplicateCachePlans" queryout cached_plans.out -n –T

bcp "select p.plan_handle, pa.* from tempdb..DuplicateCachePlans p cross apply sys.dm_exec_plan_attributes (p.plan_handle) as pa" queryout plan_attributes.out -n –T

4. Back in SSMS, you can drop the temp table created in step 2

drop table tempdb..DuplicateCachePlans

-Keith

 Bob Dorr - Principal SQL Server Escalation Engineer

Categories: SQL Server MS Blogs

SQL Server 2014’s new cardinality estimator (Part 1)

SQL Server SQL CSS - Tue, 04/01/2014 - 12:14

One of the performance improvement in SQL Server 2014 is the redesign of cardinality estimation. The component which does cardinality estimation (CE) is called cardinality estimator. It is the essential component of SQL query processor for query plan generation. Cardinality estimates are predictions of final row count and row counts of intermediate results (such as joins, filtering and aggregation). These estimates have direct impact on plan choices such as join order, join type etc. Prior to SQL Server 2014, cardinality estimator was largely based on SQL Server 7.0 code base. SQL Server 2014 introduces new design and the new cardinality estimator is based on research on modern workloads and learning from past experience.

A whitepaper planned by the SQL Server product team will document specific scenarios where new and old cardinality estimators differ. We will follow up with a later blog post when that paper is released. Additionally, Juergen Thomas has posted an overview of the feature on "Running SAP on SQL Server blog".

In this blog, we will provide a quick overview about controlling the SQL Server 2014 feature, guidelines on troubleshooting issues. We have plans to release more blog posts related to SQL Server 2014 new cardinality estimator in the future.

One of the goals for this blog post is to help make customers aware of this feature for upgrades and new deployments as query plans may change. We encourage users test sufficiently prior to upgrading to avoid performance surprises.

New deployments vs upgrade

SQL Server 2014 uses database compatibility level to determine if new cardinality estimator will be used. If the database compatibility level is 120, new cardinality estimator will be used. If you create a new database on SQL Server 2014, compatibility level will be 120. When you upgrade or restore from a previous version to SQL 2014, a user database compatibility level will not be updated. In other words, you will continue to use old cardinality estimator in upgrade and restore situations by default. This is to avoid plan change surprises for upgrades. You can manually change the compatibility level to be 120 so that new cardinality estimator can be used. Please refer to online documentation on how to view and change database compatibility level. Be aware that changing database compatibility level will remove all existing query plans from the plan cache for the database.

Please note the following:

  1. Which version of cardinality estimator to use is based on current database context where the query is compiled even if the query references multiple databases. Let's assume you have db1 with compatibility level of 120 and db2 with compatibility level of 110, and you have a query that references two databases. If the query is compiled under db1, new cardinality estimator will be used. But if the query is compiled under db2, old cardinality estimator will be used.
  2. Regarding system databases and upgrade, the compatibility levels of model, msdb and tempdb will be changed to 120 following the upgrade to SQL Server 2012.  But the master system database retains the compatibility level it had before upgrade per online documenation.  Therefore in upgrade scenarios, if your query is compiled under the context of master, old cardinality estimator will be used, but new cardinality estimator will be used for a query compiled under the context model, msdb, or tempdb.

  3. If your query references temporary tables, the database context under which the query is compiled determines which version of cardinality estimator to be used. In other words, if your query is compiled under a user database, the user database compatibility level (not tempdb) will determine which version of cardinality estimator to be used even though the query references temp table.
How to tell if you are using new cardinality estimator

There are two ways you can tell if new cardinality estimator is used.

In the SQL 2014 XML plan, there is a new attribute in StmtSimple called CardinalityEstimationModelVersion. When the value is 120, it means the new cardinality estimator is used. If the value is 70, it means the old cardinality estimator is used. This new XML attribute is only available for SQL 2014 and above (see screenshot below).

If you start capturing a new SQL Server 2014 XEvent called query_optimizer_estimate_cardinality, this event will be produced during compilation if new cardinality estimator is used. If the old cardinality estimator is used, this XEvent won't be produced even if you enable the capturing (see a screenshot below). We will talk more about how to use this XEvent to help troubleshoot cardinality issues in future blogs.

Additional ways to control new cardinality estimator

In addition to database compatibility level, you can use trace flags 2312 and 9481 to control if new or old cardinality estimator will be used. Trace flag 2312 is used to force new cardinality estimator while 9481 is used to force old cardinality estimator regardless of the database compatibility level setting. If you enable both trace flags, neither will be used to determine which version of cardinality estimator. Instead, database compatibility level will determine which version of cardinality estimator to be used. When such a case occurs, a new XEvent "query_optimizer_force_both_cardinality_estimation_behaviors" will be raised to warn user (if you enable this XEvent).

You can enable these trace flags at server, session or query level. To enable the trace flag at query level, you use QUERYTRACEON hint documented in 2801413. Below is an example query

select * from FactCurrencyRate where DateKey = 20101201 option (QUERYTRACEON 2312)

Precedence

Since we have multiple ways to control the behavior, let's talk about order of precedence. If the query has QUERYTRACEON hint to disable or enable the new cardinality estimate, it will be respected regardless of server/session or database level settings. If you have a trace flag enabled at server or session level, it will be used regardless the database compatibility level setting. See the diagram below.

 

 

 

Guidelines on query performance troubleshooting with new cardinality estimator

When you run into issues with new cardinality estimator, you have a choice to revert to the old behavior. But we encourage you spend time troubleshooting the query and find out if the new cardinality estimator even plays a role in terms of your slow query performance. Basic troubleshooting query performance stays the same.

Statistics

Regardless of the versions of cardinality estimators, the optimizer still relies on statistics for cardinality estimate. Make sure you enable auto update and auto create statistics for the database. Additionally, if you have large tables, auto update statistics threshold may be too high to trigger statistics update frequently. You may need to schedule jobs to manually update statistics.

Indexes

You may not have sufficient indexes on tables involved for the slow query. Here are a few ways you can help tune your indexes.

  1. XML Plan will display missing index warning for a query.
  2. Missing index DMVs. SQL Server tracks potential indexes that can improve performance in DMVs. This blog has sample queries on how to use the DMVs. Additionally, SQL Nexus also has a report on missing indexes server wide.
  3. Database Tuning Advisor (DTA) can be used to help you tune a specific query. Not only can DTA recommend indexes but also recommend statistics needed for the query. Auto create statistics feature of SQL Server doesn't create multi-column statistics. But DTA can identify and recommend multi-column statistics as well.
Constructs not significantly addressed by the new cardinality estimator.

There are a few constructs that are known to have cardinality estimate issues but are not addressed by the new cardinality estimator. Below are a few common ones.

  1. Table variables. You will continue to get low estimate (1) for table variables. This issue is documented in a previous blog.
  2. Multi-statement table valued function (TVF): Multi-statement TVF will continue to get estimate of 100 instead of 1 in earlier version. But this can still cause issues if your TVF returns many rows. See blog for more details.
  3. Behaviors of Table valued parameter (TVP) and local variables are unchanged. The number of rows of TVP at compile time will be used for cardinality estimate regardless if the rows will change for future executions. Local variables will continued to be optimized for unknown.
References
  1. New cardinality estimator online documentation.
  2. Juergen Thomas's blog on New cardinality estimator and SAP applications

In the future blogs, we will document more on how to use new XEvent query_optimizer_estimate_cardinality to troubleshoot query plan issues and how plan guide may be used to control the new cardinality estimator behavior.

Many thanks to Yi Fang, a Senior Software Design Engineer from SQL Server Query Processor team at Microsoft, for reviewing and providing technical details on this blog.

 

Jack Li - Senior Escalation Engineer and Bob Ward - Principal Escalation Engineer, Microsoft SQL Server Support

Categories: SQL Server MS Blogs

The DBA Detective: Disturbing Developments

Simple-Talk on SQL - Sun, 03/30/2014 - 17:00

Originally one of the articles in the first DBA Team series, Grant wonders what Raymond Chandler or Dashiell Hammett would have done if asked to write technical articles for Simple-Talk. He came up with the DBA detective, hard-boiled Joe Dee Beay

Don’t Rely On a Static IP Address for Your SQL Database

SQL Server SQL CSS - Thu, 03/27/2014 - 13:34

I’ve seen a number of customers open support incidents because they couldn’t connect to their SQL Database server which was ultimately due to the incorrect assumption that the server’s IP address is static. In fact, the IP address of your logical server is not static and is subject to change at any time. All connections should be made using the fully qualified DNS name (FQDN) rather than the IP address.

The following picture from the Windows Azure SQL Database Connection Management Technet article shows the network topology for a SQL Database cluster.

Your logical server (e.g., with a FQDN of xyz.database.windows.net) resides on a SQL Database cluster in one of the backend SQL Server nodes. Within a given region (e.g., North Central US, South Central US, North Europe, etc) there are generally many SQL Database clusters, as required to meet the aggregate capacity of all customers.  All logical servers within a cluster are accessed through the network load balancer (the single blue block with the note saying “Load balancer forwards ‘sticky’ sessions…” in the diagram) via a virtual IP address.

If you do a reverse name lookup from your server’s IP address you will actually see the name of the cluster load balancer. For example, if I try to ping one of my servers (whose actual server name starts with ljvt in the screenshot below) you will see that the displayed name associated with the IP address is instead data.sn3-1.database.windows.net, where the sn3-1 portion of the name maps to the specific cluster in the region (South Central) hosting this server.

Microsoft may do an online migration of your logical server between clusters within a region, load balancing capacity across the clusters within the region. This move is a live operation and there is no loss of availability to your database during the operation. When the migration completes, existing connections to your logical server are terminated and upon reconnecting via fully qualified domain name your app will be directed to the new cluster.  However, if your application caches or connects by IP address instead of FQDN then your connection attempts will fail.

A migration moves all of your settings, including any SQL Database firewall rules that you have.  Consequently there are no Azure-specific changes that are required in order to connect.  However, if your on-premise network infrastructure blocks/filters outgoing TCP/IP traffic to port 1433—the port used for SQL connections—and you had it restricted to a fixed IP address then you may need to adjust your client firewall/router.  The IP address of your SQL Database server will always be a part of the address ranges listed in the Windows Azure Datacenter IP Ranges list.  You should allow outgoing traffic for port 1433 to these address ranges rather than a specific IP address.

Keith Elmore – Principal Escalation Engineer

Categories: SQL Server MS Blogs

14 SQL Server Indexing Questions You Were Too Shy To Ask

Simple-Talk on SQL - Mon, 03/24/2014 - 17:00

One of the first things one needs to understand well about SQL Server is indexes, but somehow many of the basic questions don't often get asked or answered much on forums. Rob Sheldon answers some of these shy-making questions about SQL Server Indexes and indexing: the ones we somehow feel silly asking in public, and think twice about doing so.

SQL Server Indexing Questions We Were Too Shy To Ask

Simple-Talk on SQL - Mon, 03/24/2014 - 17:00

One of the first things one needs to understand well about SQL Server is indexes, but somehow many of the basic questions don't often get asked or answered much on forums. Rob Sheldon answers some of these shy-making questions about SQL Server Indexes and indexing: the ones we somehow feel silly asking in public, and think twice about doing so.

The Data Science Laboratory Series is Complete

Bucky Woody - Mon, 03/24/2014 - 05:27

I wrote a series of articles on creating a Data Science Laboratory over on Simple-Talk - you can find the complete list of articles below. The series covers installing various software tools and packages on a Virtual Machine running the Windows operating system. I think there's no substitute for installing, configuring and experimenting with various systems to learn how they work, and when you should use them.

Just having tools isn't enough, of course - you should combine these tools with a solid understanding of data science, including statistics, programming, and analysis. With what you'll find in these article, you should be able to implement that learning on your own system.

 

Categories: SQL Server MS Blogs

High Performance Relational Division in SQL Server

Simple-Talk on SQL - Sun, 03/23/2014 - 17:00

Relational division is used in SQL to select rows that conform to a number of different criteria. It is a neglected but effective technique for a number of tasks. Although the SQL can appear daunting and complex, it can perform very well if you reduce the rows as much as possible before applying the final logic. Dwain Camps explains how, and shows the performance gains.

Cumulative Update #16 for SQL Server 2008 SP3

SQL Server Release Blog - Mon, 03/17/2014 - 16:46
Dear Customers, The 16 th cumulative update release for SQL Server 2008 Service Pack 3 is now available for download at the Microsoft Support site. Cumulative Update 16 contains all the hotfixes released since the initial release of SQL Server 2008...(read more)
Categories: SQL Server MS Blogs

Cumulative Update #9 for SQL Server 2012 SP1

SQL Server Release Blog - Mon, 03/17/2014 - 16:45
Dear Customers, The 9 th cumulative update release for SQL Server 2012 SP1 is now available for download at the Microsoft Support site. Cumulative Update 9 contains all the SQL Server 2012 SP1 hotfixes which have been available since the initial release...(read more)
Categories: SQL Server MS Blogs

Using whoami to check for instant file initialization

Tibor Karaszi - Wed, 03/12/2014 - 04:22
Read this if you want to read more about instant file initialization (IFI). In an earlier blog post , I posted a script that uses trace flags, created a dummy-database and then sp_readerrorlog to check for IFI. Another option is to use the OS command...(read more)

SQL Nexus 4.0 Released to codeplex

SQL Server SQL CSS - Tue, 03/11/2014 - 13:07

 

We have just released SQL Nexus 4.0 (https://sqlnexus.codeplex.com/) which supports latest SQL Server (2012) with enhanced reports. 

In addition to read release notes, make sure you also read the top issues. Please report any issues under the Issues page under https://sqlnexus.codeplex.com/.

New Release Notes (4.0.0.64):

You must meet the following requirements:

Categories: SQL Server MS Blogs

AlwaysOn Availability Groups - What Not to do when Adding Databases

Simple-Talk on SQL - Sun, 03/09/2014 - 17:00

SQL Server's AlwaysOn Availability Groups provide a very resilient way of providing High-availability for SQL Server databases, but there are inevitable limits to their capacity. How many databases can you reasonably add? It depends on the resources available and the workload, but you can come up with a reasonable estimate as Warwick Rudd explains

Applying The Scientific Method to SQL Server Performance Tuning

Simple-Talk on SQL - Sun, 03/09/2014 - 17:00

Database administrators tend to rely more on science than intuition, and may therefore seem exasperatingly cautious to other IT people. A relational database is a complex machine, however, that is asked to deal with intricate problems. When things go wrong, it usually takes knowledge, deduction and a systematic approach to track down the root core of problems and to determine the best fix.

The PoSh DBA – Specifying and Gathering Performance Counters

Simple-Talk on SQL - Thu, 03/06/2014 - 17:00

If you are needing to keep tabs on a number of servers and applications in a Windows domain then performance counters provide the bedrock of information. It is important to identify the counters you need and gather baseline data to allow you to create alerts when abnormal conditions occur. When it comes to monitoring SQL Server, don't guess, collect.

How often do you rebuild your heaps?

Tibor Karaszi - Thu, 03/06/2014 - 04:46
Never? You are not alone. None of the maintenance solutions I use includes functionality to rebuild a heap, and that includes Maintanance Plans and Ola Hallengren's maintenance solution. "So why would you want to rebuild a heap, when it isn't sorted in...(read more)

Setting max server memory

Tibor Karaszi - Thu, 03/06/2014 - 01:27
If there is one server setting that is close to universal to configure, then it is probably the "max server memory" setting. The setting is documented here . There are plenty of articles out there on this subject. The purpose for this article is for me...(read more)

On Handling Dates in SQL

Simple-Talk on SQL - Wed, 03/05/2014 - 17:00

The calendar is inherently complex by the very nature of the astronomy that underlies the year, and the conflicting historical conventions. The handling of dates in TSQL is even more complex because, when SQL Server was Sybase, it was forced by the lack of prevailing standards in SQL to create its own ways of processing and formatting dates and times. Joe Celko looks forward to a future when it is possible to write standard SQL date-processing code with SQL Server.

SQL Server 2008 R2 SP2 Cumulative update #11

SQL Server Release Blog - Wed, 03/05/2014 - 15:46
Dear Customers, The 11 th cumulative update release for SQL Server 2008 R2 SP2 is now available for download at the Microsoft Support site. Cumulative Update 11 contains all the SQL Server 2008 R2 SP2 hotfixes which have been available since the initial...(read more)
Categories: SQL Server MS Blogs