SQL Server MS Blogs

Slow query using non-deterministic user defined function

SQL Server SQL CSS - Tue, 07/08/2014 - 16:00

Recently we worked with a customer who reported a query that used to run a few seconds in SQL Server 2000 but it never finishes in SQL Server 2008 R2 following upgrade.

We went around and tried quite a few things but couldn't get SQL Server 2008 R2 to generate similar plan. Upon closer look at 2008 R2's query plan, we noticed something unusual. The plan has a warning "NO JOIN PREDICATE". What this means is that a cartesian product is introduced.

To illustrate the problem, let's use an example setup:

 

drop function dbo.myfunc
go
drop view v1, v2
go
drop table t1, t2
go
create table t1 (c1 int not null, c2 varchar(100))
go
create table t2 (c1 int not null, c2 varchar(100))
go
set nocount on
go
declare @i int
begin tran
select @i = 0
while (@i < 1000)
begin
insert into t1 (c1, c2) values (@i, 'a')
insert into t2 (c1, c2) values (@i, 'b')
select @i = @i + 1
end
commit tran
go
drop function dbo.myFunc
go
create function dbo.myfunc (@c1 int)
returns int
--with schemabinding
as
begin
return (@c1 * 100 )
end
go
create view v1 as select c1, c2, dbo.myfunc(c1) as c3 from t1
go
create view v2 as select c1, c2, dbo.myfunc(c1) as c3 from t2
go

 

 

Now, let's run the following query

dbcc freeproccache
go
set statistics profile on
go

-- But by pulling UDF above join in this query we actually introduce a cartesian product (NO JOIN PREDICATE)
-- UDF is called 1 million times instead of 1000 times each for the two views!
select count(*) from v1 as t1 join v2 as t2 on t1.c3 = t2.c3
go
set statistics profile off
go

 

The above query is very slow as illustrated in the query plan below. In the line 6 for the query plan, there is a warning "no join predicate". The join resulted in 1,000,000 rows (1,000 x 1,000 rows from each table).

In line 5, the myfunc is called 2,000,000 times (1,000,000 for computing t1.c1 and 1,000,000 for t2.c1).

This is because starting SQL Server 2005, optimizer has rule changes that will disallow non-deterministic scalar functions to be 'pushed down' in some situations (like this one).

 

 

Solution

 

Many times, you can simply make a function deterministic by adding schemabinding option. In the above example, re-write the function with schemabinding, it will be much faster.

From the query plan, you will no longer see that the "NO JOIN PREDICATE". The scalare UDF is pushed down right after table scan and applied only 100 times on each table.

drop function dbo.myFunc
go
create function dbo.myfunc (@c1 int)
returns int
with schemabinding
as
begin
return (@c1 * 100 )
end

 

 

 

Obviously, the function can be made deterministic. If you use following, the function will not be deterministic even you use schemabidning because of getdate(). In such cases, you will continue to see "NO JOIN PREDICATE" Cartesian product joins.

drop function dbo.myFunc
go
create function dbo.myfunc (@c1 int)
returns int
with schemabinding
as
begin
return (@c1 * 100 * datepart (mm,getdate()))
end

 

 

 

Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support

 

 

Categories: SQL Server MS Blogs

SQL Server 2012 with SP2 Slipstream ISO images do not install SP2

SQL Server Release Blog - Tue, 07/08/2014 - 13:08
Hi all, We have been informed that SQL Server 2012 with SP2 Slipstream ISO images do not install SP2. Unfortunately there is the same issue that we had with SP1 that was documented here . The same workaround applies. We are working on fixing...(read more)
Categories: SQL Server MS Blogs

SQL Server 2008 R2 SP2 Cumulative update #13

SQL Server Release Blog - Mon, 06/30/2014 - 19:52
Dear Customers, The 13 th cumulative update release for SQL Server 2008 R2 SP2 is now available for download at the Microsoft Support site. Cumulative Update 13 contains all the SQL Server 2008 R2 SP2 hotfixes which have been available since the initial...(read more)
Categories: SQL Server MS Blogs

Cumulative Update #2 for SQL Server 2014 RTM

SQL Server Release Blog - Fri, 06/27/2014 - 11:09
Dear Customers, The 2 nd cumulative update release for SQL Server 2014 RTM is now available for download at the Microsoft Support site. To learn more about the release or servicing model, please visit: CU#2 KB Article: http://support.microsoft...(read more)
Categories: SQL Server MS Blogs

Getting Cross Domain Kerberos and Delegation working with SSIS Package

SQL Server SQL CSS - Thu, 06/26/2014 - 15:04

I started working on this issue started by way of a Twitter conversation between myself, Jorge Segarra (@sqlchicken) and Matt Masson (@mattmasson). I then found out that Jorge was working with a customer that had opened up a Support Case.  Also special thanks to Joey Seifert for letting me bounce some Active Directory questions off of him.

The issue we were troubleshooting was that when running an SSIS 2012/2014 Package from SQL Server A (Parent Domain) that connected to SQL Server B (Child Domain) it would fail with the dreaded “Login failed for user ‘NT AUTHORITY\ANONYMOUS’”


Illustration Credit: Svetlin Velinov

Whenever I see this error, I always start with configuration.  That is typically the cause.  Back in 2010 I wrote out my Kerberos Checklist for validating configuration.  Jorge was aware of this and had gone through it. He also mentioned that they had ran the Kerberos Configuration Manager and it didn’t find any issues. Although in this scenario, it wouldn’t have as it doesn’t yet support cross domain topologies.

I was able to reproduce the issue they were seeing in a local environment on my end.  Here is what it looked like.

So, I have two domains (battlestar.local & cylons.battlestar.local).  The SQL Server in the Parent Domain (battlestar.local) is using a Service account from the child domain (cylons.battelstar.local).  From a delegation standpoint, we are using full delegation.  I’ll touch on Constrained Delegation later on. To make sure that everyone understand what I mean by full delegation, with the CYLONS\sqlservice AD Object, I have the following setting:

How SSIS 2012 and later work

When you run a package that is hosted in the SSIS Catalog, it will cause a child process to get spawned from the SQL Service itself. This process is the ISServerExec.exe.

The other thing to note is that this process context is the context of the session that launched the package, not the SQL Server Process Context (service account).  Here you can see that the ISServerExec is running as BATTLESTAR\asaxton where as the SQL Service is running as CYLONS\sqlservice.

This is the case regardless of how you execute the package.  This could be through the SSMS GUI, via Stored Procedure or even by way of DTExec.  If you want it to run under the context of the SQL Service account, you can “fake it” by doing a runas like operation  on a process (Command Prompt, SSMS or SQL Agent Job security account).

I initially thought that this was the cause of the problem, however I later found that it is not.  While I haven’t dug fully into that, I believe this to be due to the way we are launching the child process. My guess is it has to do with handle inheritance in the child process.

The Single Hop

Before I even get into the SSIS package, I want to verify that the first leg of the journey is working.  To do this, I connected with Management Studio from a remote machine to my SQL Server in the Parent Domain.  I then used DMV’s to validated it I had connected via NTLM or Kerberos.  Here is the query I used. depending on where you are doing this from, you may want to include a predicate to exclude the session you are running this from, if you are trying to find out a different sessions.

select c.session_id, s.login_name, c.auth_scheme, c.net_transport, st.text
from sys.dm_exec_connections c
JOIN sys.dm_exec_sessions s ON c.session_id = s.session_id
JOIN sys.dm_exec_requests r ON c.session_id = r.session_id
cross apply sys.dm_exec_sql_text(r.sql_handle) as st

This showed NTLM which is not what we want. I had enabled Kerberos Event Logging, and I saw the following:

Log Name:      System
Source:        Microsoft-Windows-Security-Kerberos
Date:          6/26/2014 11:22:42 AM
Event ID:      3
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      DrBaltar.battlestar.local
Description:
A Kerberos error message was received:
on logon session
Client Time:
Server Time: 16:22:42.0000 6/26/2014 Z
Error Code: 0x7  KDC_ERR_S_PRINCIPAL_UNKNOWN
Extended Error:
Client Realm:
Client Name:
Server Realm: BATTLESTAR.LOCAL
Server Name: MSSQLSvc/captthrace.home:49375
Target Name: MSSQLSvc/captthrace.home:49375@BATTLESTAR.LOCAL

captthrace.home?  What is that?  This is because I’m working from home.  At the office I saw a different domain. If we do a Ping on Captthrace, we will see the same result. In this case, I have my machines multi homed.  This was picking up the external nic and not the internal which should have had an IP Address of 10.0.0.10.

We want this to resolve to captthrace.battlestar.local [10.0.0.10].  On my end I can do that multiple ways.  I could disable the external nic, add an entry in the HOSTS file.  In my case I decided to update the DNS Search suffix for the internal adapter.  Making sure that battlestar.local and cylons.battlestar.local were listed.  After doing that we get a properly result.

That looks better.  retrying the connection to my Parent SQL Server shows that we are connected with Kerberos now.

Now when I try a single hop to the Destination SQL Server, I was seeing Kerberos as well for a single hop.

The Double Hop

Now I wanted to run the package from the client machine off of the Parent SQL Server.  When I did that, I got the error.

Looking at the the Parent SQL Server I saw the same issue as we had on the client box.  So, I adjusted the DNS Suffixes on that machine as well. After that, the package connected successfully using Kerberos to the Destination server in the Child Domain.

Cross Domain SPN Lookups with Active Directory

One item I ran into the first time I was going through this was that I kept getting NTLM even though the name resolution was fine on the Parent SQL Server.  It was using an account from the Child Domain though which had the SPN for the server.  When Domains are within the same forest, the KDC should consult the GC (Global Catalog) and provide a referral if the account is in a different domain.  If the account is not in the same forest you would need to define Host Mapping for the account, unless you are using a forest trust.  Then you could define a Kerberos Forest Search Order.

What happened was that the Parent DC was not able to communicate with the Child DC.  I discovered this when I tried to force Domain Replication.  It errored out saying it couldn’t find the CYLONS domain.  This could also lead to a failure as the SPN may not be noticed from the GC Perspective if replication wasn’t working.  So, if you made a change in the Child Domain, the Parent Domain wouldn’t pick it up.

What about Constrained Delegation?

With the amount of work I do with SharePoint Integration, Constrained Delegation comes up a lot when we talk about Claims to Windows Tokens.  This may force your environment to use Constrained Delegation.  Before Windows 2012, this means that all Service Accounts and machines hosting the services all had to be in the same Domain.  You were really restricted to one domain.  Starting with Windows 2012, you can cross domain boundaries, but the configuration is different for Constrained Delegation from what it used to be.  It is modified via PowerShell commands.  If you want to read more about that, you can have a look at the following:

Kerberos Constrained Delegation Overview for Windows 2012
http://technet.microsoft.com/en-us/library/jj553400.aspx

How Windows Server 2012 Eases the Pain of Kerberos Constrained Delegation, Part 1
http://windowsitpro.com/security/how-windows-server-2012-eases-pain-kerberos-constrained-delegation-part-1

I did get this to work in my environment and will look to get some posts specific to how to get this to work in the future.

Takeaway – What’s in a name?

If you have verified your Kerberos configuration, be sure to validate your name resolution within your environment.  It may not be resolving to the proper names.  When we go to build out the SPN to use, we based on the DNS name that was resolved from the NETBIOS name. If DNS resolution isn’t working properly, then it can lead to all sorts of problems. 

I’ve learned the hard way, over time, that DNS and Active Directory really blend together.  If DNS has issues, then AD will more than likely have some issues. Hopefully this helps to at least show that it could be more than the normal Kerberos Configuration items that are causing an issue.  Be sure to check out DNS Forwarders, Network Configuration (including Networking Binding/Order and DNS Suffix if needed).

A Ping command should return the proper name, or an NSLookup.  If you have doubts, do an IPConfig /flushdns and try again.  Verify the DC’s can talk/replicate to each other.  As you can see from above, this should work for Full Delegation.  Constrained Delegation would work with some modifications.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

Categories: SQL Server MS Blogs

Kerberos Configuration Manager updated for Analysis Services and SQL 2014

SQL Server SQL CSS - Thu, 06/26/2014 - 08:48

Kerberos Configuration Manager was released back in May of 2013.  It initially released with only SQL Server support. This was followed up with support for Reporting Services in November 2013. You can download the latest release from the following link:

Microsoft® Kerberos Configuration Manager for SQL Server®
http://www.microsoft.com/en-us/download/details.aspx?id=39046

This month we have released Version 3 of the Kerberos Configuration Manager which provides support for Analysis Services. This will work with Analysis Services 2005 and later (including 2014).

This release also includes support for SQL 2014 services.

 

Logging

If you happen to encounter something that I didn’t highlight above, you may be able to find additional information.  Each time you run the tool, we will create a log file.  The default location for this is the following:  C:\Users\<user>\AppData\Roaming\Microsoft\KerberosConfigMgr.

The details of the log file will be flushed when you close the program.  So, if it is blank, just close the tool and the log should populate.  You may also find some details in the Event Log under the Source “Kerberos Configuration Manager”.  If we encounter an error, it should be logged in the Application Event Log as well as the tool’s log file.

Limitations Delegation

For Analysis Services, there is no Delegation check at this time.  The scenarios for that are limited and may be looked at in a future release.

SQL Browser DISCO SPN

This also does not do a validation on the SQL Browser SPN.

Multiple Domains

Right now, the tool will only work in a single domain scenario.  So, if you have the service installed in Domain A, but want to use a Service Account from Domain B, we won’t be able to discover and correct the issue appropriately.  As long as the machine the instance is in and the Service Account are in the same domain, you should be good to go.  This is true for Reporting Services and the SQL Server discovery.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

Categories: SQL Server MS Blogs

A Writer's Toolkit

Bucky Woody - Tue, 06/24/2014 - 06:01

Even in a technical role, communication is paramount. If you don't read widely and communicate well, your progression suffers. And one of the primary modes of communication is writing. I've written several books, articles and blogs (see "Publications" on this page), but it isn't just writing books that requires clear communication skills. You should learn to write well for something as trivial e-mails - or perhaps you should especially learn to write well for e-mails, since those might not be that trivial.

Over the years I've found a few books that have helped me write. This list contains references on grammar, structure and plot, all of which are important for technical writing as well as fiction. In fact, writing fiction helps you write better in every other area.

This isn't a complete list of all writing books by any means - it's been said that there are more books about writing than there are books, and a quick web search makes me believe that to be believable. Even so, I have a "core" set of books that I kept printed copies of, even though my other thousands (yes thousands) of books I keep on an e-reader. I recently gave these away to a friend's son who I think will make an amazing author some day:

  • The Elements of Style - Small, concise, effective. It's on my desk all the time.
  • Writers Inc - A very old book, originally designed for pre-teens to use in school. It's one of the best I've ever used, very readable, and a lot of fun.
  • Elements of the Writing Craft - This is a great "read it and try it" book that has example texts and then what you can learn from that writing style.
  • Handbook for Freelance Writing - At some point you might want to make money at writing. This book is the one I used to learn to do that.
  • The Writer's Book of Checklists - I'm a checklist guy. A little outdated now, but I still refer to it when I write.
  • The Observation Deck - Not actually a book, it's a series of cards that helps you come up with writing ideas when you're stuck. And I get stuck a lot.
  • 20 Master Plots (And how to build them) - OK, this is my big secret. If you read this book and follow the plot contrivances in them, you can tell any kind of story to anyone. Spoiler alert: After you read this, you've seen every movie, TV show, read every book, and attended every play ever written. You have been warned.

I urge you to try these out - they've helped me tremendously. Do you have a favorite set of books that help you write well? Let us know in the comments.

 

 

Categories: SQL Server MS Blogs

Fix for index corruption issue now available for SQL Server 2012 Service Pack 2

SQL Server Release Blog - Fri, 06/20/2014 - 14:45
Fix for issue described in KB http://support.microsoft.com/kb/2969896 is now available for download for SP2 via the hotfix download link in the KB article....(read more)
Categories: SQL Server MS Blogs

These resources may help resolve your issue….

SQL Server SQL CSS - Thu, 06/19/2014 - 11:09

Some of you may have seen a list of links that pop up when opening a case through our Microsoft support site. These resources are internally referred to as solution assets and are meant to be top bets for the problem category and are aimed at helping you solve the problems on your own.

We keep these updated on a regular basis especially for the top problem categories. The content is curated with help of our experts and based on support incident data. Currently these resources are only shown during case creation process except for the top 5 categories. For the top 5 categories these are also shared and maintained at the following blog:

· Top Support Solutions for Microsoft SQL Server 2012

· Top Support Solutions for SQL Server 2008

Note: During case creation process only the top 3 are shown by default and the rest can be seen by clicking Show More. The blog links above does show the expanded list of these resources for top 5 categories.

For example here are the top solutions for various AlwaysOn problems:

KB articles:

1. Troubleshooting AlwaysOn availability databases in a "recovery pending" or "suspect" state in SQL Server 2012

2. Troubleshooting automatic failover problems in SQL Server 2012 AlwaysOn environments

3. Troubleshooting AlwaysOn availability group listener creation in SQL Server 2012

4. Time-out error and you cannot connect to a SQL Server 2012 AlwaysOn availability group listener in a multi-subnet environment

5. "General Network error," "Communication link failure," or "A transport-level error" message when an application connects to SQL Server

6. Cannot create a high-availability group in Microsoft SQL Server 2012

7. Voting nodes are not available when you try to use AlwaysOn availability groups in SQL Server 2012

8. How to restore a replica to the primary role after the quorum is lost in SQL Server 2012

9. You experience slow synchronization between primary and secondary replicas in SQL Server 2012

Blog posts

1. Create Listener Fails with Message 'The WSFC cluster could not bring the Network Name resource online'

2. SQL Server 2012 - True Black Box Recorder - CSS SQL Server ...

3. Connecting to Availability Group Listener in Hybrid IT

Books Online

· AlwaysOn Availability Groups Troubleshooting and Monitoring Guide

Team blogs

1. Always On support team’s blog

2. SQL AlwaysOn Product Team’s Blog

Forums:

· SQL Server High availability and Disaster Recovery forum

As always please share any feedback you may have around these links. You can also refer to the following links for additional information and for finding more top solutions for other SQL topics and other Microsoft products:

· Top Solutions from Microsoft Support

· Microsoft Top Solutions app for Windows 8

Ramu Konidena
SQL Server Support

Categories: SQL Server MS Blogs

Using SQL Server in Microsoft Azure Virtual Machine? Then you need to read this…

SQL Server SQL CSS - Thu, 06/12/2014 - 14:41

Over the past few months we noticed some of our customers struggling with optimizing performance when running SQL Server in a Microsoft Azure Virtual Machine, specifically around the topic of I/O Performance.

We researched this problem further, did a bunch of testing, and discussed the topic at length among several of us in CSS, the SQL Server Product team, the Azure Customer Advisory Team (CAT), and the Azure Storage team.

Based on that research, we have revised some of the guidelines and best practices on how to best configure SQL Server in this environment. You can find this collective advice which includes a quick “checklist” at this location on the web:

http://msdn.microsoft.com/en-us/library/azure/dn133149.aspx

If you are running SQL Server already in Microsoft Azure Virtual Machine or making plans to do so, I highly encourage you to read over these guidelines and best practices.

There is other great advice in our documentation that covers more than just Performance Considerations. You can find all of these at this location:

http://msdn.microsoft.com/en-us/library/azure/jj823132.aspx

If you deploy any of these recommendations and find they are not useful, cause you problems. or are not effective, I want to hear from you. Please contact me at bobward@microsoft.com with your experiences

Bob Ward
Microsoft

.

Categories: SQL Server MS Blogs

SQL Server 2012 Service Pack 2 (SP2) RTM has released.

SQL Server Release Blog - Tue, 06/10/2014 - 12:12
Dear Customers, Microsoft SQL Server team is pleased to announce the release of SQL Server 2012 Service Pack 2 (SP2). As part of our continued commitment to software excellence for our customers, this upgrade is free and doesn’t...(read more)
Categories: SQL Server MS Blogs

REPL_SCHEMA_ACCESS wait type

SQL Server SQL CSS - Tue, 06/03/2014 - 18:09

Recently we have worked with a customer on replication latency issue with transactional replication. Customer has over 30 published databases on a single server. All of them are very active. Periodically, they will see up to 30 minutes latency from publisher to distributor. When they see waits on REPL_SCHEMA_ACCESS. Below is a sample screenshot of sys.dm_exec_requests during problem period.

 

What does this wait type mean?

Our current online document states "Occurs during synchronization of replication schema version information. This state exists when DDL statements are executed on the replicated object, and when the log reader builds or consumes versioned schema based on DDL occurrence." But this wait type is also used by synchronize memory access to prevent multiple log reader to corrupt internal structures on the publishers. Each time log reader agent runs sp_replcmds, it needs to access memory buffer. If it results in growing the buffer, the action needs to be synchronized among log reader agents with REPL_SCHEMA_ACCESS.

Contention can be seen on this wait type if you have many published databases on a single publisher with transactional replication and the published databases are very active.

Troubleshooting and reducing contention on REPL_SCHEMA_ACCESS waits?

This issue is dependent on number of log reader agents accessing the same publisher and transaction rate. If you have a single log reader agent access the publisher, you shouldn't see this type of contention.

In general, you can watch transactions/sec performance counter for all published databases to measure how active your system is. The higher your transaction rate is, the more likely you hit the issue assuming you have multiple log reader agents accessing the same publisher.

We charted the waits on REPL_SCHEMA_ACCESS and transaction/sec for multiple published databases. We saw a very clear correlation.

SQL Nexus report

 

Transaction/sec for one example published database

 

 

Here are a few things you can do to reduce contention:
  1. Do not use large transactions. Large transaction that results in many commands can make the situation worse because of higher requirement of memory buffer. If you do have large transactions, experiment MaxCmdsIn value.
  2. Try to spread out transactions among different published database to different time. for example, if you have batch load jobs for different databases, don't schedule them at the same time
  3. Reduce number of log readers. In this customer, they have over 35 published databases on the same publisher and they are all active. The wait is at server level. So if you split your published databases into two different instances (even on the same hardware), contention can be reduced
  4. Experiment decreasing –PollingInterval for your log reader agent. The default is 5 seconds. If you reduce the PollingInterval. This will allow log reader agent to catch up more frequently once the wait gets cleared.

     

 

Jack Li

Senior Escalation Engineer | Microsoft SQL Server Support

Categories: SQL Server MS Blogs

Update on the Service Pack plans for SQL Server 2008 and SQL Server 2008 R2.

SQL Server Release Blog - Thu, 05/29/2014 - 11:35
Dear Customers, We are planning to ship one last Service Pack for both SQL Server 2008 and SQL Server 2008 R2. Because of the maturity of SQL Server 2008 and 2008 R2, these Service Pack(s) will be an exception in terms of timing and will ship...(read more)
Categories: SQL Server MS Blogs

Capping CPU using Resource Governor – The Concurrency Mathematics

SQL Server SQL CSS - Wed, 05/21/2014 - 10:44

Here is what you need to know: A = πr2

Okay, not really as that is the formula for the area of a circle but it does set the stage for this discussion.   I have been working with the CAP_CPU_PERCENT (RESOURCE POOL) setting as it relates to concurrency.   This turned into a mathematical exercise I was not planning on.

You all have had that one user who keeps running their ‘special report’, they just have to have, in the middle of the day.  No matter how many times you have asked them to stop doing this until after hours they continue to ignore you.   So one day you decide that you will put them in their own resource pool and workload group and cap their CPU to a small percentage and MAX DOP = 1.   This way they quit impacting the overall production server.  Shortly after you do this you discover you may have made matters worse.  How could that have happened?

Simply put, capping the CPU means you forced a wait and when you force a wait where shared state is involved you impact all entities attempting to use the shared resource.   I created a pathological example but it shows the behavior very well.

User 1:  Default Pool 100% CPU and Full DOP

User 2:  Limited to 1% CPU CAP and MAX DOP = 1

I have them both do a select with UPDLOCK on the same table.   I created an execution pattern of User 1, User 2, User 1, User 2, … acquiring and releasing the lock to each other.

begin tran

select count(*) from dbRG..tblRG with (UPDLOCK)    -- Only 1 row in the table

commit tran

I then used OStress to run the batch in a tight loop in various configurations.  The diagram below shows the basics of the CPU activity.

1. User 1 has ownership of the lock and owns the scheduler, using the CPU while User 2 is blocked waiting on the lock.  Standard blocking scenario where User 2 is not taking any CPU, it is just placed on a waiter list of the lock.

2. User 1 completes the transaction and releases the lock.   During release User 2 is granted ownership and placed on the runnable list of the scheduler.  User 1 then yields to the scheduler because it has used its quantum.  Placing User 1 at the tail of the runnable queue.

3. User 2 yields and User 1 attempts to access the lock.  If User 2 has cleared the lock the lock can be granted.  If User 2 still owns the lock User 1 will be blocked and added to the lock’s wait list.

The ‘Delay’ is the interesting part of this scenario.   When User 1 grants User 2 ownership of the lock, User 2 becomes the next runnable worker on the scheduler.   However, the resource pool for User 2 is set to cap the CPU percentage.  This means SQL Server will delay User 2’s execution to keep it at the configured cap.   Even if during the delay User 1 is allowed to execute User 1 simply becomes blocked on the shared resource and does not make meaningful forward process.

What just happened is that by limiting the CPU cap the shared resource (lock in this example for the same row) it results in limiting the overall resource acquire and release frequency.

Now let’s get to some of the math behind this I promised you.  It won’t be all that difficult, you’ll see, I will use some nice, round numbers.

Assume each transaction takes 1ms or 1000 transactions per second if utilizing the CPU at 100%.  If you cap the CPU at 10% the math is 1000 * 0.10 = 100 transactions/sec.    Meaning the user in the 10% CPU CAPPED pool should only be able to execute the transaction 100 times to the 1000 times the uncapped user can execute.  

When I combine the two users the 10% CAP introduces the delay and causes the conflict, lowering the transaction rate near the 100 mark for both users combined.

Here are some actual numbers from my laptop running the pathological, tight loop on the same row.   

Transactions/Sec User Stress Connections 192 1 – 1% CPU CAPPED User 435 1 – 100% CPU User 240 1 – 1% CPU CAPPED User
1 – 100% CPU User 920 2 – 100% CPU Users 1125 1 – 1% CPU CAPPED User
2 – 100% CPU Users

Most of the time you won’t even notice the impact I am describing in this post.   As you can see the 1125 transactions/sec level is achieved by 2 – 100% users and 1 – 1% user.   Back to the math the 1% user is only 1/3 of the workers on the scheduler.  Each of the 2 – 100% users get full quantum so the more workers the more this scenario becomes a standard blocking issue as if you had a slow client, for example.   You just have to figure out how to reduce the shared state interaction behavior(s) and things will run smoothly, as you expect them to.

I was looking at this in more detail and I noticed I was not accumulating large numbers for LCK_M_U waits, wait time or signal time.   What I found is that the execution quantum is generally such that the lock can be acquired and released before the next user is allowed to execute.  In my attempt to create the pathological case I tuned away some of the blocking aspects that would make it worse.    If I add more rows to my table I can get into the ultimate pathological I go, you go, I go, you go, … scenario.

Instead it was enlightening that the delay that was necessary to control the 1% user introduces overall delay at the CPU resource.   The CPU resource became my shared entity and when the 1% user exceeded its CPU target the scheduler may need to force the delay and in doing so other users on the same scheduler can become impacted.   The more users I added to the testing the less, forced delay required.

While I believe this will be an edge case scenario and unlikely that you will encounter this I wanted to share this so you could put it in your toolbox.  We are always looking for the connection that is holding a resource (usually a lock) and not responding on the client fast enough.   Instead, you could introduce some of the delay attempting to isolate a user like this.

Bob Dorr - Principal SQL Server Escalation Engineer

Categories: SQL Server MS Blogs

Cumulative Update #10 for SQL Server 2012 SP1

SQL Server Release Blog - Mon, 05/19/2014 - 13:49
Dear Customers, The 10 th cumulative update release for SQL Server 2012 SP1 is now available for download at the Microsoft Support site. Cumulative Update 10 contains all the SQL Server 2012 SP1 hotfixes which have been available since the initial...(read more)
Categories: SQL Server MS Blogs

The Case for Moving From TPC to Database Throughput Units in Database Performance Comparisons

Bucky Woody - Wed, 04/30/2014 - 05:28

Scientific testing is based on controls, transparency, and repeatability. Whenever we as technical professionals want to test the performance of a database system, we search for a series of tests that show the system’s metrics against a standard.

But the scientific basis for using the most common standard, the Transaction Performance Council (TPC) measurements (http://www.tpc.org/), is difficult for most database professionals. The TPC metrics are divided up in “Benchmarks”, classified as C, DS, E, H and “Energy” as of this writing. These involve everything from measuring OLTP (in multiple types), virtualization technology, and all the way to business-intelligence type workloads. It takes no small amount of study to understand what these measurements show and how they apply to the systems that are tested.

And that forms the main issue with TPC numbers – the testing is done by and for the various database vendors (Microsoft included), which leads to the problems in the other areas – controls, transparency and repeatability. While the TPC standard is public (and lengthy, and sound), each vendor tunes the hardware, platform and workloads as much as possible to favor their database (controls), doesn’t often disclose those parameters (transparency) which of course leads to a problem of your reproducing those results to ensure that you can verify them (repeatability).

And in the end, none of this matters anyway – your workloads don’t resemble those controls at all. They are a statistically spread, standardized way of measuring various vendor systems and hardware using transactions. In your case, you want something that resembles your workloads, future workloads, and you want a standard way of reproducing those results. So in many shops where I’ve worked, I created my own tests. This works, but I was never sure that I had covered all of the areas I needed to ensure that the workloads were representative.

So at Microsoft we’re starting to focus more on a scientific methodology that more closely resembles real-world workloads, is repeatable on your own systems, and measured (starting with our SQL Databases offering in Microsoft Azure) in a published document. We call this new measurement “Database Throughput Units” or DTU.  You can find the complete document here: http://msdn.microsoft.com/en-us/library/azure/dn741327.aspx. It’s short – and that’s on purpose. A more simple description allows you to replicate what we’ve done, and change it to be more relevant to your own workloads. Almost all parts of the process are under your control. And while we have standards published based on our testing, we recommend you use the same methodology on all your systems and ours, to show a true benchmark. The culmination of the process is throughput – the time it takes a user to make a request for a database operation and get a result. That’s all they care about, and in the end it’s what your final decision will be judged on.

There are multiple areas in the standard, including:

  • The Schema – A variety and complexity within the structure to show the broadest range of operations.
  • Transactions – A mix of types within the CREATE, READ, UPDATE and DELETE operations (CRUD Matrix) that can be tuned to a real-world observation.
  • Workload Mix – A distribution of the above measures that more accurately resemble your environment.
  • Users and Pacing – The number of virtual “users” that a measurement should show, and how often each user performs each action to show spikes, lulls and other anomalies faced in real-world systems.
  • Scaling Rules – A scale factor applied to the number of virtual users per database.
  • Duration – The length of time for the test run – one hour is considered minimum, longer is better for a true statistical result.
  • Metrics – DTU focuses on only two end measurements for simplicity: throughput and response time.

You can read the full document at the link above. As always, all comments are welcomed.

 

Categories: SQL Server MS Blogs

How It Works: Behavior of a 1 Trillion Row Index Build (Gather Streams from SORT)

SQL Server SQL CSS - Tue, 04/29/2014 - 07:07

I ran into this behavior working on a 1 trillion row, spatial index build but the behavior can apply to any Gather Streams operator, retaining the sort order as rows pass though it.   I was just surprised a bit by the behavior until I dug deeper to understand.

The index was taking just short of 2 hours to build on my 64 way, 128 GB RAM test system.  The behavior I observed was the drop in CPU usage and parallelism.  The first ~40 minutes all 64 CPUs are using 100% of the CPU but the last 1 hour and 10 minutes of the index build, only the controlling worker consumes 100% CPU on a single processor.

Digging into this I found the top of the plan looked like the following.   Node 1 being the final, SORTED insert, from the gather streams activity.

During the index build the lower part of the plan was executing the parallel, nested loop behavior on all 64 CPUs and building the large sort groupings, in parallel, on each of the child workers (first 40 minutes.)    Once all the rows are sorted, per worker, the gather streams activity has to merge the 64 individual sorts into the overall sort order as it performs the final inserts.

This is deemed an order preserving gather (merge) operation.  The consumer pulls a row from each worker and keeps an in-memory tree.  In the case of the 64 parallel workers the tree would have 64 entries.  If MAX DOP is 16 the tree would contain 16 entries.    The tree is maintained in sorted order so the pattern of execution will look like the following on a 4 processor system.

  1. Get Row From Worker/Partition #1 – Insert into tree
  2. Get Row From Worker/Partition #2 – Insert into tree
  3. Get Row From Worker/Partition #3 – Insert into tree
  4. Get Row From Worker/Partition #4 – Insert into tree
  5. While (entries in tree)

       Output lowest, sorted value from tree
       Get Row from Worker/Partition you just removed from tree as lowest value
    }

This design keeps the tree pruned equal to or just above the maximum, sub-process workers.  The consumer performs the merging of the individual, parallel, sort operations and inserts the results into the index as requested.   The serialization of the final sort order is what I am seeing during the final phase of my index build.

Since my table and index all fit into memory on this large system the activity is taking place in memory and leveraging the CPU fully.  Looking at the details in sys.dm_exec_requests and sys.dm_os_waiting_tasks I can see the gather streams activity, associated with Node 1 in the plan, is driving the last ~01:10:00 on a single CPU.  In fact, setting processor affinity you will observe the controlling workers’ CPU light up for the final 01:10:00 of the index build.

In observing the behavior the sys.dm_os_wait_stats shows a sharp increase in CXPacket waits and wait times.   This is expected as the final thread is going to be pulling the data from 64 workers that ran at 100% CPU already, so it is unlikely a single CPU can process the data as fast and the CXPacket exchange activity will encounter waits.

The description for the wait indicates a wait at node 1 asking for a new row from the producers.

exchangeEvent id=Pipe307f0be530 WaitType=e_waitPipeNewRow nodeId=1

There are lots of documents and posts about the best setting to optimize parallel plans.  Some state MAX DOP = 8, others MAX DOP = 32 and both are correct depending on the type of plan (actual query patterns) as the performance may vary.   Since this was a create index I decided to do some experiments with the DOP level.

  • 64 CPUs = 01:50:00
  • 32 CPUs = 02:17:00
  • 16 CPUs = 03:16:00

What I observed is that for this specific create index (spatial) the lower the MAX DOP the larger the change in the original 40 minute part of the plan.   This is what I expected to see.   The first part of the plan is already CPU bound so adding more CPU resources lets that part of the plan execute faster.

The CXPacket waits on the final portion of the plan don’t change significantly with different DOP levels.  The time remains generally steady at 01:10:00.

This was a unique pattern because the controlling worker was not showing additional waits (usually the I/O will show up but because everything fit into memory the CPU was the dominant resource.)  The controlling worker only showed normal scheduler yield activities.

What I found was it takes 01:10:00 on my 1.88Ghz CPU to sort (merge) 1 trillion rows onto the index pages.  If I want to reduce the final portion of the plan I would need to move to a faster CPU.   SQL Server did use parallel resources as much as possible to build the index.

Bob Dorr - Principal SQL Server Escalation Engineer

Categories: SQL Server MS Blogs

How to grab multiple parent/child elements from XML Data Source

SQL Server SQL CSS - Tue, 04/29/2014 - 06:51

We had a question come up about why an XML query wasn’t pulling a 2nd Parent/Child element from the resulting data.  This was grabbing data off of a Web Service.  The example I was given was a public Web Service call for getting Weather information.

http://wsf.cdyne.com/WeatherWS/Weather.asmx?op=GetCityForecastByZIP

The data that is returned looks like the following:

<ForecastReturn xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://ws.cdyne.com/WeatherWS/">
  <Success>true</Success>
  <ResponseText>City Found</ResponseText>
  <State>TX</State>
  <City>Keller</City>
  <WeatherStationCity>Euless</WeatherStationCity>
  <ForecastResult>
    <Forecast>
      <Date>2014-04-22T00:00:00</Date>
      <WeatherID>17</WeatherID>
      <Desciption>Drizzle</Desciption>
      <Temperatures>
        <MorningLow>64</MorningLow>
        <DaytimeHigh>89</DaytimeHigh>
      </Temperatures>
      <ProbabilityOfPrecipiation>
        <Nighttime>20</Nighttime>
        <Daytime>10</Daytime>
      </ProbabilityOfPrecipiation>
    </Forecast>

The problem we were seeing is that when we grab the initial query, we don’t see the values under ProbabilityOfPrecipiation (yes I know there is a typo there). 

We only see the items under Temperatures (MorningLow & DaytimeHigh).  The query that was being used is the following:

<Query>
  <Method Name="GetCityForecastByZIP" Namespace="http://ws.cdyne.com/WeatherWS/">
    <Parameters>
    <Parameter Name="ZIP" type="string">
           <DefaultValue>76244</DefaultValue>
     </Parameter>
    </Parameters>
  </Method>
  <SoapAction>
   http://ws.cdyne.com/WeatherWS/GetCityForecastByZIP
  </SoapAction>
  <ElementPath IgnoreNamespaces="true">*</ElementPath>
</Query>

You’ll notice that ElementPath is using *.  Using * is the same as leaving ElementPath blank, and will cause the query to use the default element path.  The first path to a leaf node collection, which would be Temperatures in this case and be something like ForecastReturn/ForecastResult/Forecast/Temperatures.  As a result, we don’t see the values in ProbabilityOfPrecipiation.  The following MSDN document outlines this as well.  It’s with regards to RS 2005, but still applicable.

Reporting Services: Using XML and Web Service Data Sources
http://technet.microsoft.com/en-US/library/aa964129(v=SQL.90).aspx

With regards to Auto-Detection of XML Structure:
Multiple parent-child hierarchies are not supported. In this example, Customer has both Orders and Returns. The provider may return only one set. Because the Orders hierarchy is specified first, auto-derivation will resolve it as the skeleton.

With a single query, you can’t grab multiple Parent-Child elements.  You can grab it with a second query and then use the Lookup function to pull related data.  I created a second query with the following syntax:

<Query>
  <Method Name="GetCityForecastByZIP" Namespace="http://ws.cdyne.com/WeatherWS/">
    <Parameters>
    <Parameter Name="ZIP" type="string">
           <DefaultValue>76244</DefaultValue>
     </Parameter>
    </Parameters>
  </Method>
  <SoapAction>
   http://ws.cdyne.com/WeatherWS/GetCityForecastByZIP
  </SoapAction>
  <ElementPath IgnoreNamespaces="true">GetCityForecastByZIPResponse{}/GetCityForecastByZIPResult{}/ForecastResult{}/Forecast/ProbabilityOfPrecipiation</ElementPath>
</Query>

You may be asking where ForecastReturn is, as that is the root element.  For the Web Service call, you need to put the headers for the actual method call itself, which is GetCityForecastByZIP.  As a result, we have to use GetCityForecastByZIPResponse and then GetCityForecastByZIPResult.  You’ll also notice the {}.  This is to indicate I don’t want the values from those elements.  This is what we see for the field list.

Now we have two queries and Date is the unique item here.  So, we can do a lookup on the Date across the two DataSets.  I’ll start the main table off of the first DataSet.  Then add two columns to the Table with the following Expressions:

Daytime Probability
=Lookup(Fields!Date.Value, Fields!Date.Value, Fields!Daytime.Value, "Precipitation")

Nighttime Probability
=Lookup(Fields!Date.Value, Fields!Date.Value, Fields!Nighttime.Value, "Precipitation")

The result is the output that we originally wanted.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

Categories: SQL Server MS Blogs

SharePoint Adventures : Using Claims with Reporting Services

SQL Server SQL CSS - Mon, 04/28/2014 - 09:17

Back in February of 2011, I created a blog that walked through using Kerberos with Reporting Services. Since then, we have moved Reporting Services to a shared service within SharePoint.  This changes the game and we are now in the Claims world.  I’ve been asked a bunch of times regarding Claims Configuration, and just clearing up some general confusion.  I have also presented at PASS on this topic as well, and thought it was time to get the Blog post out there on this topic.  This blog will show SharePoint 2013, but the steps are the same in SharePoint 2010.  To start, I’ll reference a few other blogs for background that we can refer back to.

Reference Blogs:

My Kerberos Checklist…
What SPN do I use and how does it get there?
SharePoint Adventures : How to identify if you are using Claims Authentication

Isn’t Kerberos Dead?

I’ve heard some comments along the lines of – “Well now that I’m using claims, I don’t need to worry about Kerberos.”  This isn’t true.  Claims changes the perspective a bit, but if our goal is to get to a back end data source using Windows Authentication, we need Kerberos.  Within the Claims/SharePoint Bubble, we don’t have a Windows Token.  We have a Claims Token.  When we want to leave the bubble, we need to go get a Windows Token.  This is done by way of the Claims to Windows Token Service (C2WTS).  From there it is all Kerberos.  So, everything you know about Kerberos is still relevant.  We just need to add a few things to your utility belt.

Shared Service

Starting with Reporting Services 2012, we are now a Shared Service within SharePoint.  We are no longer an external service as we were with RS 2008 R2 and earlier versions. This means we are inside of the SharePoint bubble.  In the Using Kerberos with Reporting Services, I talk a lot about needing to have the front end SPN’s (HTTP) in place.  However, now that we are inside of the SharePoint bubble, we don’t need the HTTP SPN’s any longer.  Everything from the client (Browser) to the Report Server, does not require Kerberos any longer.  You can still setup Kerberos for the SharePoint Web Front End (WFE), but when we go to hit the Report Server, it will be Claims.  Any communication with the Report Server is done via a WCF Web Service and will be Claims Auth.  Regardless of how the WFE is configured.  So, in this setup, we really only care about the RS Service and going back into the backend.  It’s all about Delegation now. 

Common Errors

Before getting into the configuration, I wanted to highlight some of the errors you may see that are related to this topic.  These are at least ones I’ve seen.

Cannot convert claims identity to a windows token.  This may be due to user not logging in using windows credentials.

Login failed for user ‘NT AUTHORITY\ANONYMOUS’

Could not load file or assembly ‘System.EnterpriseServices, Version=2.0.0.0, culture=neutral, <-- see this blog post

Claims to Windows Token Service (C2WTS)

This is where the magic happens.  As mentioned above, we are using a Claims Token when we are within the RS Shared Service.  We are in the SharePoint bubble.  So, what happens when we want to leave the bubble?  We need a helper.  This helper is the Claims to Windows Token Service (C2WTS). It’s whole purpose in life is to extract the User Principal Name (UPN) claim from a non-Windows security token, in our case a SAML token, and generates an impersonation-level Windows Token.  Think Kerberos Delegation. This is actually a Windows Service that sits on the same machine as the service that is trying to call into it to get the Windows Token.

This service is enabled via Central Admin –> Application Management –> Service Applications –> Manage services on server.

Be sure to start it here as opposed to the Windows Service directly.  The SharePoint Timer jobs will just stop the service if you start it manually.

C2WTS Configuration

There are a few things that need to make sure that you configure C2WTS correctly.  We will have a look at everything except for the delegation piece.  We will save that for last.

Service Account

You will need to decide what Service Account you want to use. By default, C2WTS is set to use the Local System account.  I’ve seen people use this, and it will work fine.  However, I usually don’t ever recommend you use Local System for any Service Account.  This is just a security standpoint, and the ideal of least privileged.  Local System has a lot of power on the machine.  So, I typically recommend a Domain Account to use.  On my deployment, I use a Claims Service account that I created.  If you use an account you created, you will need to add it as a managed account within Central Admin.  This is done via Security –> General Security –> Configure managed accounts.

After that is done, you need to change the C2WTS service to use that managed account.  This is done via Security –> General Security –> Configure service accounts.  Then select C2WTS from the drop down.

When you do this second step, it should also add the service account to the WSS_WPG local group on the SharePoint boxes.

Local Admin Group

You will need to add this service account to the Local Admin Group on the machine that it will be used on.  If you have two SharePoint boxes and one is a WFE and the other is the App Server that will be using it, you only need to do this on the App Server.  C2WTS will not work unless it is in the local admin group.  I haven’t narrowed down what exact permissions it requires to avoid the local admin group.  If someone has figured this out, let me know. 

Local Security Policy

The service account you are using needs to be listed in the Act as part of the operating system policy right. Again, this only needs to be done on the SharePoint box that will be using the service. 

c2wtshost.exe.config

Remember the WSS_WPG group?  This is why we want the service account in that group.  The location of this config file is C:\Program Files\Windows Identity Foundation\v3.5.  In this config file will be defined who can make use of C2WTS.  If your account isn’t listed here, or covered by a group that is listed, it won’t work.

<allowedCallers>
  <clear />
  <add value="WSS_WPG" />
</allowedCallers>

RS Shared Service Configuration

The only real configuration point here is with regards to the service account.  Again, I would recommend a Domain Account for use with this.  In my deployment, my account is rsservice.  We will need to make sure that account is added as a managed account within SharePoint (see above under the claims account).  Once that is done, it will be added to the local WSS_WPG group.  The addition into the WSS_WPG group allows for the RS Service to call into C2WTS because that group is set in the config file.

We then need to associate that account to the RS Service, if you didn’t already do that during initial configuration of the RS Service.

Delegation

The last part on our journey is configuring delegation.  Remember we mentioned that we don’t care about the front end piece of this.  So, we don’t need to be concerned with HTTP SPNs at all.  We just want to configure delegation from the point of C2WTS and the RS Service.  These both need to be configured in order for this to work.  They need to match with regards to which service you want to hit. I would start with the RS Service Account, and then make sure that the C2WTS account matches what the RS Service Account has.

NOTE:  The C2WTS service may have other services configured that RS doesn’t need.  This could be due to other services making use of C2WTS such as Excel Services or PerformancePoint.

To configure this, we need to go into Active Directory Users and Computers.  There are other ways to configure delegation, but this is probably the easiest.  Ut Oh!  Where’s the delegation tab?  The delegation tab will only show up if there is an SPN configured on that account.  But, we said we didn’t need the HTTP SPN that we would have with RS 2008 R2.  As a result, nothing was configured on the RS Service Account and we don’t see the delegation tab.  What’s the fix?  Add a fake SPN.

Here you can see I added an SPN called my/spn.  This won’t hurt anything and won’t otherwise be used.

For this to work, we need to choose the settings for Constrained Delegation.  More specifically we need to enable Protocol Transitioning (Use any authentication protocol).  This is because we are transitioning from one authentication scheme (Claims) to another (Windows Token).  This also has the adverse effect of limiting you to a single domain for your services and computer accounts.  This has changed starting in Windows 2012 R2, but I haven’t tested that yet to see how it works. I’ve read that you can do cross domain traffic with constrained delegation in Windows 2012 R2.

After that, I add the service that I want to delegate to.  Basically, what data sources are you hitting with your reports.  In this case, I added my SQL Server.  This assumes you have your SQL SPN in place.  You can reference the other blog posts at the top of this blog if you need assistance getting your SQL SPN configured.

We then need to make sure that the Claims Service matches this configuration. Don’t forget the fake spn on the Claims service account.

And that’s it!  After that, we should see that our data source works.

 

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

Categories: SQL Server MS Blogs

Version 9.04.0013 of the RML Utilities for x86 and x64 has been released to the download center

SQL Server SQL CSS - Thu, 04/24/2014 - 14:17

 

 

 

 

X64: http://download.microsoft.com/download/0/a/4/0a41538e-2d57-40ff-ae85-ec4459f7cdaa/RMLSetup_AMD64.msi

X86: http://download.microsoft.com/download/4/6/a/46a3217e-f523-4cc6-96e9-df73dd0fdd04/RMLSetup_X86.msi

 

This build encompasses previous features, fixes and enhancements designed from recent case work. 

 

·         SQL Server 2014 compliant

·         SQL Server 2012 and 2014 XEL input to ReadTrace compliant (sample .XEL capture script shipped with help documentation) no need for .TRC anymore. (PSSDiag updates align with XEL capabilities as well)

·         Microsoft Azure SQL Server Database (formally WASD) connectivity compliant

·         OStress true replay of MARS connections

·         Addition of client (OStress Expressions) allowing client side Sleep, Repeats, Identity and Guid generation to craft additional scenarios

·         Tuned for larger memory and cpu systems

·         Updated compression libraries for formats such as RAR5

·         Enhanced TVP capabilities

·         Custom CLR Expression capabilities

·         Additional filtering options

·         Workarounds for some .XEL capture bugs such as invalid packet size captured in Existing Connection events

·         … and more …

Categories: SQL Server MS Blogs