Feed aggregator
Analysis Services Thread Pool Changes in SQL Server 2012
As part of the SQL Server 2012 release several changes were made to Analysis Services that should alleviate some issues previously seen on large NUMA node machines.
Separation of IO and Processing Jobs
To better understand how the changes that were made work, it is helpful to have some background information about what types of issues were seen in earlier releases.
During testing, it was found that a 2 NUMA node 24 core server was handling roughly 100 queries per second during stress testing. The same tests were then run on a 4 NUMA node server with 32 cores and the queries answered per second actually decreased. Investigation indicated the cause as cross NUMA node access to the system file cache. For example if the first job that read a file was scheduled on a CPU assigned to NUMA node 1 and a subsequent job, which needed to read the same pages from the file, was scheduled on a CPU assigned to NUMA node 3, a significant performance penalty was seen reading the cached file page. Deeper investigation determined that the bottleneck appeared to be in accessing the standby list of file cache pages. The performance impact appeared to become more pronounced the higher the number of CPUs and NUMA nodes on the machine. For machines with less than 4 nodes the impact was not found to be significant. However, on machines with 4 or more nodes the impact was such that performance could begin to regress.
One possible work around is to open files in Random mode since this changes the file cache access behavior. To accommodate this SSAS 2012 now allows Random file access to be configured through the msmdsrv.ini file by setting the RandomFileAccessMode property to a value of 1. This server property does not require a service restart to take effect, but a restart is recommended. If the server is not restarted then Analysis Services will not release open files or change the way it accesses open files, but the setting will take effect for newly opened or created files.
While changing the file access mode may provide some relief, eliminating cross NUMA node file access whenever possible is a better long term solution. Towards this goal, SQL Server 2012 Analysis Services now has separate Process and IOProcess thread pools which are NUMA node aware. The new IOProcess thread pool will handle read jobs while the Process thread pool will continue to handle ROLAP and processing related jobs, including writing new files. On machines with 4 or more NUMA nodes, the IOProcess thread pool is not a single thread pool but instead a collection of thread pools with each NUMA node having its own pool of IOProcess threads. Assigning each NUMA node its own IOProcess pool doesn’t by itself help with cross NUMA node file access unless file IO operations are consistently assigned to the same IOProcess thead pool, so an algorithm for assigning partition file read operations to specific IO thread pools was also added. At RTM, the algorithm spreads partition reads across IO thread pools based on the ordinal position of the partition in the partitions collection for a measure group. The algorithm is subject to change without notice in future builds so design decisions should not be made based on this behavior.
An attentive reader will notice that the discussion has highlighted partition reads to this point, what about dimension operations? Dimension read jobs are always assigned to the IOProcess thread pool for NUMA node 0. While one could argue that this scheme could result in NUMA node 0 being assigned a larger percentage of the work, it is expected that most dimension operations will be operating on cached data in memory and won’t have a noticeable impact.
Because IOProcess threads are expected to perform short duration read operations on files and as such do not register themselves as cancellable objects. This means that even if a query which requested an I/O operation was cancelled, the in-process I/O jobs could continue to run for a short period of time after the query was cancelled, however new I/O jobs should not be created for the canceled query.
As with other SSAS features, the default behavior of the per NUMA node IOProcess thread pools is intended to cover the most common scenarios. However, in some situations it may make sense to override the 4 NUMA node threshold for the NUMA node affinitized IOProcess thread pools. To allow administrators to revert back to a single IOProcess thread pool or force per NUMA node IOProcess thread pools for machines with less than 4 NUMA nodes a new entry has been added to the Analysis Services configuration file (msmdsrv.ini). The PerNumaNode setting under ThreadPool\IOProcess has a default value of -1 which tells the server to use the automatic 4 NUMA node threshold. Changing this value to 0 disables the per NUMA node thread pool behavior, while setting it to 1 will turn on this behavior (even if there are less than 4 nodes).
The splitting of Process and I/O jobs into separate thread pools and the assignment of I/O jobs to consistent NUMA nodes should alleviate some of the performance impacts of cross NUMA node operations, significantly increasing the performance of SSAS workloads on higher end servers.
Greater than 64 CPU support and Thread Affinitization
In adition to giving IO operations their own thread pools, Analysis Services now supports more than 64 CPUs, something the relational engine has had for a while.
In order to add support for more than 64 CPUs, Windows uses the concept of processor groups. A processor group in Windows can contain a maximum of 64 CPUs and systems with more than 64 CPUs will contain multiple processor groups. For more details on processor groups and support for more than 64 CPUs read the following:
http://msdn.microsoft.com/en-us/library/dd405503%28VS.85%29.aspx
http://blogs.msdn.com/b/saponsqlserver/archive/2010/09/28/windows-2008-r2-groups-processors-sockets-cores-threads-numa-nodes-what-is-all-this.aspx
To support multiple processor groups and thus more than 64 CPUs, SSAS 2012 was updated to set the process affinity mask for the msmdsrv.exe process to span multiple processor groups. Along with this capability a new configuration property named GroupAffinity was added for each thread pool in the server. This property allows an SSAS administrator to have fine grain control over which CPUs on a machine are used for each thread pool. The GroupAffinity setting is a bitmask that is used to determine which CPUs in a processor group can be used for the thread pool in which the GroupAffinity mask is defined. For example if the following entry:
<GroupAffinity>0xFFFF,0xFFFF</GroupAffinity>
were to appear under <ThreadPool> <Process> in the msmdsrv.ini file, it would affinitize threads to 16 logical processors in the first two processor groups on the server. Where the following entry:
<GroupAffinity>0x00F0,0xFFFFFFFF</GroupAffinity>
would affinitize threads to the CPUs 4-7 in the first processor group, and the first 32 CPUs in the second processor group. The GroupAffinity property can have as many comma separated hex values as there are defined CPU groups on a server. If the mask contains less bits than the number of CPUs for the processor group then it is assumed that non-specified bits are zeros. If no GroupAffinity value is specified for a thread pool (default) then that thread pool is allowed to spread work across processor groups and CPUs on the box.
For diagnostic purposes the msmdsrv.log file now contains entries at service start that reflect the size of each of the five thread pools (Query, ParsingShort, ParsingLong, Processing, and IOProcessing) and their settings, including affinity.
Example:
(12/21/2011 4:16:04 PM) Message: The Query thread pool now has 1 minimum threads, 10 maximum threads, and a concurrency of 4. Its thread pool affinity mask is 0x0000000000000003. (Source: \\?\C:\Program Files\Microsoft SQL Server\MSAS11.DENALIRC0\OLAP\Log\msmdsrv.log, Type: 1, Category: 289, Event ID: 0x4121000A)
Note: Although VertiPaq can use more than 64 CPUs, GroupAffinitization is not currently supported for the VertiPaq thread pool, even though an entry exists in the msmdsrv.ini file.
While the GroupAffinity setting was added as part of the work to support more than 64 CPUs, this property can also be used on servers with less than 64 CPUs to control which CPUs are used for specific operations. Through use of the GroupAffinity mask administrators can push I/O, processing, query, or parsing threads to specific CPUs to obtain optimal resource usage, or better enable resource sharing across multiple processes on the same server.
Wayne Robertson, Sr. Escalation Engineer Analysis Services
Management Studio tricks you may or may not know about
I was sitting through SQL Server 2012 training, and Ajay Jagannathan was showing us Management Studio. Eric Burgess had worked on this topic. He started covering some neat things, that have apparently been there for a while, but I just never played around with it enough to see what all you could do. I’m sure a lot of others are not aware of these either, so I thought I would highlight some of them. These may help to make you more productive, or not - depending on how long you spend playing with these. The biggest change for Management Studio within SQL Server 2012 is that we are using the Visual Studio 2010 Shell.
These items aren’t exactly new, but they were new to me.
1. Keyboard shortcutsThere are a ton of Keyboard shortcuts that you can use within Visual Studio. The default settings are based on Visual Studio 2010. Here is a list of those shortcuts. SQL Server Management Studio Keyboard Shortcuts
You can get to these through Tools -> Options -> Environment/Keyboard
2. Block Selection
Sometimes you may only want to select and copy a column of text as opposed to the normal text selection done by holding down the Shift Key. To do Block selection, you can do SHIFT+ALT and drag your mouse to only select certain areas of your text in column fashion.
3. Status Bar
Most people should be familiar with the Status Bar at the bottom of a query window.
Under Tools -> Options -> Text Editor/Editor Tab and Status Bar, there are a bunch of options that you can play with.
One is the Status Bar Location. By default, this is set to Bottom, but you can move it to the top if you choose. This was a little weird for me though - although I’m just not use to it. It does put it more front and center.
From a color perspective, you’ll notice Group Connections and Single Server Connections. I’ll talk about Group connections below, but just be aware that you can change the coloring for when it is a group connection.
4. Cycle through Query Windows
I’ve known about ALT+TAB to cycle through programs (Windows). I’ve also known about CTRL+TAB to cycle through components within a given application. For example in Excel you could use CTRL+TAB to move between worksheets. I’ve never tried it in Management Studio, but it allows you to cycle through the Query Windows.
Also, CTRL + F6 will cycle through the actual tabs without the graphic switching display.
5. Group ConnectionsYou can go to View and select to show Registered Servers. Within Registered Servers you can create a group of SQL Servers. This then allows you to start a query that will be run against all of the servers within the group. This is where the Group Connection Color for the status bar comes into play.
This could be really handy if you need to execute items across multiple servers. The color of the status bar is there to help you realize that that query is a group query as opposed to a single server connection.
Hopefully these are useful to you.
Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton
Performance impact: the cost of NUMA remote memory access
Book Review (Book 8) - The Elements of Persuasion
This is a continuation of the books I challenged myself to read to help my career - one a month, for year. You can read my first book review here. The book I chose for January 2012 was: The Elements of Persuasion by Richard Maxwell and Robert Dickman.
Why I chose this Book:
As I mentioned in a previous review, I think good storytelling is an essential part of any career. Communication is basic in not only our professional but personal lives, and everyone I’ve met responds well to stories, from children to executive audiences. Not only that, learning to tell a story helps you formulate concepts about the topic, which is yet another way of learning.
I heard about this book from a couple of folks, and it landed within search of “storytelling” and “business”. Whenever I just search for “Storytelling” I either end up with lists of stories (which is fine) or lists of children’s books on storytelling (which is also fine) but neither of these are quite what I’m looking for.
A quick search on Amazon and I located the book, and then a quick check of my various e-library offerings and I downloaded it to my laptop for reading.
What I learned:
This is a “selling” book, but not like you might think. It’s not a book of a quick sale like at a car-lot or a “quick-sale” environment. It’s more along the lines at the executive level and longer-term sales - those involve stories as well.
Sadly, this is another “business book” - the kind I normally don’t like much. There are typical case-study layouts with lots of examples, but in my mind not enough didactic information to actually help you develop a good story-telling mantra.
Even so, I learned some interesting things about the process these authors use. Some of the case studies are interesting, and I did pull out that a story should work towards a single, defining sentence. This isn’t unique to this book, but it is a reinforcement of what I’ve learned elsewhere. Although nothing to do with storytelling, I did like the reference to Lockheed’s “14 Rules”, which I hadn’t read before. They also break down the storytelling process into five elements, which is actually covered better (in my mind) in a book called “20 Master Plots” , which may actually be the storytelling book I’ve been searching for.
Or perhaps I should just write the one I’m looking for.
At any rate, not sure I would recommend this book to others - perhaps as a check-out, but not a purchase, at least if this is for the same reason I looked it up.
Raw Notes:
As I read, I take notes - it’s called “reading with a pencil”. These are the notes I made to myself, in no particular order and with no context other than the book itself:
- Stories are interesting to us all.
- Describes five elements in a story, but in fact this is for only one type of story. Other books describes more story types.
- Very standard business book, but there are good tips in some of the chapters.
- Explained how to connect with the audience, good points Spends a lot of time referring to other books The book of five rings Work towards a single, memorable sentence.
- Changes partway through into stories about stories. This is better.
- A mix of storytelling and sales, although this was touted for sales, feels much more like selling than storytelling, advertisements.
- Interesting story about memory championships, where contestants memorize cards. They use unusual stories.
- Look up Lockheed and the 14 rules
How to Document and Configure SQL Server Instance Settings
The case of the incorrect page numbers
As you may or may not know, SSRS 2008 R2 added the ability to automatically create page breaks on group changes. Historically, people attempted to use custom code to accomplish this and, while it worked, the pagination logic we use in SSRS 2008 R2 breaks the standard implementation of this for several reasons. You can see the typical implementation of this at http://blogs.msdn.com/b/chrishays/archive/2006/01/05/resetpagenumberongroup.aspx.
Unfortunately, in SSRS 2008 R2 the pagination engine does multiple passes in some scenarios thus breaking this logic. Also, the logic referenced above makes the assumption that the report is rendered from page 1 to page N and this assumption is not necessarily true either. We recognized that losing that ability to do the pagination at a group level is a key feature so we actually rolled it completely into the product. Robert Bruckner (one of the SSRS developers) gives a really good explanation of all of the ins and outs of the feature at http://blogs.msdn.com/b/robertbruckner/archive/2010/04/25/report-design-reset-page-number-on-group.aspx.
I wish I could say that this feature has been completely without pain, but that is not true. If any of you have been working with SSRS for any length of time, you know that there are all kinds of quirks to the pagination process. Given the correct information, it is always fairly easy to figure out why the pagination engine resulted in a certain pagination, but understanding all the pertinent details is….a bit difficult. In a lot of ways, I view pagination sort of like the query optimizer in the SQL Server engine. It seems like total black magic if you don’t know the rules it follows, but once you understand the rules it generally makes good sense. However, again like the query optimizer, the pagination logic is very complex and is something that we are constantly refining. As you can imagine, people regularly find edge cases where the pagination either doesn’t function as they want or flat out makes a mistake. In addition, layering group level page breaks on top of normal pagination makes the scenario even more complicated. Thus, I have to make the admission that we have had several fixes released in this specific area. Sigh…
The good news is that I have also generally found that a good chunk of my page numbering problems go away by using the latest CU. The bad news is that a simple fix like upgrading to the most recent CU didn’t fix my most recent problem.
So, here’s the situation. My customer wanted to reset the page number whenever the group changed. No big deal in theory. As documented both in the MSDN documentation and Robert’s blog, all they had to do was to add a couple of properties to the target group:
No big deal, right? They completely expected to see the pagination look something like this:
This is what my customer saw instead:
What the heck????!?!!?!?!
Needless to say, this is when they decided to open a case with Microsoft Support. The problem got even weirder when I discovered that under no circumstances could I duplicate the behavior!!! I could reproduce a whole host of problems (2 of 1, 9 of 6), but by applying some workarounds and the latest CU’s I could always get back to normal page numbering resets. At this point, I was completely stumped. I even engaged the SSRS Product Group to ask if there was some sort of race condition that could lead to this behavior. Unfortunately, the answer I got back was that there was no identifiable race condition scenario where 1 of 337 could happen.
Then, as always seems to be the case, I put the problem aside for a couple of days to let it stew. And then, while driving into work the next Sunday, it hit me. Target server version!
You see, it had occurred to me that one of the good things about BIDS 2008 R2 is that it allows you to publish against both a 2008 and a 2008 R2 instance of SSRS. *However*, the thing I had never checked before was how we dealt with a 2008 R2 feature when deploying against 2008. It turns out we strip the feature out during the publish process. When I change my project properties to target a 2008 instance even though my instance is actually 2008 R2, this is what I saw:
Building ‘FailingScenario.rdl' for SQL Server 2008 Reporting Services.
A PageBreak property was removed from the data region, rectangle, or group ‘GroupOnWhichToIterate’. SQL Server 2008 Reporting Services does not support the PageBreak property ResetPageNumber.
The PageName property specified on the data region, rectangle, or group ‘GroupOnWhichToIterate’was removed from the report. SQL Server 2008 Reporting Services does not support the PageName property.
Build complete -- 0 errors, 2 warnings
This led me to ask my customer what version of SSRS their instance of BIDS was targeting as guess what? It was SSRS 2008!
The good news is that getting the desired group level page breaks was as easy as setting the target version to Reporting Services 2008 R2 and deploying the report.
No respect: NUMA affinity meets query parallelism
The official release of System Center Advisor…
If you have followed this blog, you have probably seen a series of posts documenting the life of a project I’ve been working on, Atlanta, to a product called System Center Advisor. Today marks the official release of that product. This has been a particularly rewarding journey for me to see an idea about giving our knowledge in CSS to customers turn into a full proactive assessment service powered by System Center and the cloud.
In my past blog posts, I’ve shown you examples of the knowledge this product brings in the form of alerts and configuration history. With the official release comes new features such as a new dashboard about the assessment of your servers, a revised and simplified setup program, voting buttons to tell us if the alerts are useful, and the ability to provision other users to view or administer your console.
While I love to see new features like these that make this product compelling, the power of this product is in the knowledge. So while System Center Advisor has been ramping up towards this release while being a release candidate, our CSS teams have been adding knowledge in the form of alerts each month based on actual customer experiences.
Here is an example. Several months ago, one of our engineers in CSS pointed out to me that we have seen a trend where our customers would contact us and report the following error in Management Studio when trying to expand the Databases Node:
After some investigation, we found out the customer had disabled the ‘guest’ user in the msdb database. As it turns out, Management Studio requires the login connecting to have “CONNECT” access to msdb. If the login is not mapped directly to a user in msdb (usually not the case), it needs to use guest. If guest is disabled, then any access to msdb (Ex. use msdb) would result in this error.
We looked into this further and found out this was not a one time occurrence. Several customers had reported this problem to us and from other sources (Microsoft Connect, …). Why would so many people hit this? We asked one customer and they said they were simply following guidelines established in our documentation as outlined at:
http://msdn.microsoft.com/en-us/library/ff848752.aspx
In this part of the docs, the tip for the guest account says:
While the server itself doesn’t allow you to disable guest in master or tempdb (You get the error Msg 15182 Cannot disable access to the guest user in master or tempdb. if you try), for msdb we unfortunately allow this. While the server will run in this scenario, the use of tools like Management Studio will have problems. So customers were just doing what we told them.
While changing the product to prevent this is the right thing to do, what do we do in the meantime to warn customers this could be a problem? We can certainly publish articles and documentation for them to discover but why not automate the check? When I heard about this issue, I thought What a perfect rule for System Center Advisor
So as part of our monthly update to rules for SQL Server we introduced this check on your system. Should you disable guest for msdb you will get an alert which will direct you to the following technical article written by CSS talking about the problem and how to solve it:
http://support.microsoft.com/kb/2539091
This is the power of what Advisor can bring to you. We continue to update our rules each month as we discover trends about common customer problems or customer problems that may be difficult to detect or find by just searching the web.
This journey is far from over. While we to continue to create new rules for SQL Server, both the System Center Advisor team and CSS will continue to enhance this product including:
- Rules for other server products (there are already several rules for the core Windows OS, Active Directory, and Hyper-V)
- Provide support for SQL Server 2012 (and new rules unique to that version)
- Expand features and capabilities of the System Center Advisor software and portal
System Center Advisor is available in 26 countries and comes at no charge to customers that have a Software Assurance agreement. Don’t have Software Assurance but want to try it out for free? Install it from the System Center Advisor site and choose the 60 day free trial option. Don’t know if your company has Software Assurance? Talk to the team in your company that handles volume licensing for Microsoft purchases (they typically have access to the Volume Licensing Service Center site).
If you try System Center Advisor and want to provide feedback, there is a Feedback link at the top of the Advisor console. However, I’m always personally interested to hear directly from customer experiences good and bad on how this product works for you. So feel free to comment on this post on your experiences.
Bob Ward
Microsoft
T-SQL stored procedure for finding/replacing strings in a text file. Really?
Performance impact: SQL2008 R2 audit and trace
How It Works: sys.dm_tran_session_transactions
For some reason I have been looking at DMV output closely the last couple of weeks. I just blogged about the pending I/O requests and now I have a behavior to outline for dm_tran_session_transactions for todays blog.
select * from sys.dm_tran_session_transactions
The scenario I was looking at is as follows.
Server A Server B Broker calls activation procedure Begin Tran (local) Linked Server Query Transaction Promoted to DTC Transaction Imported from Server A xp_cmdshell Looped back and blockedSeparate Transaction Blocking
- The process is based on SQL Service Broker.
- The service broker session on Server A is 20s.
- It starts a local transaction and eventually performs a linked server query.
- This causes the transaction to be promoted to a DTC transaction.
At this point in time the *active transaction* DMVs on both Server A and B show the enlisted UOW of the DTC transaction. That is, except the sys.dm_tran_session_transactions on server A.
After some digging I uncovered that the dm_tran_session_transactions DMV only outputs rows for sessions (s) that are NOT system level sessions. Since the broker activity is handled on a system session the DMV will not materialize a row for Session 20s on Server A in this example. Instead you have to use the additional *active transaction* tables to track the UOW across this system.
Note: I am able to use any transaction (local or DTC) as part of SQL Service Broker activation which will not show rows in the session transactions DMV because it is considered a system session.
Bob Dorr - Principal SQL Server Escalation Engineer
Team Foundation Server (TFS) in the Cloud - My Experience So Far
I recently joined a software development project that involves not only myself and other internal Microsoft employees, but a partner and a customer as well. We are building a hybrid solution that uses assets on premises as well as Windows Azure for processing. When we put the team together we picked a methodology (Agile) for the project (we use multiple methodologies at Microsoft - whatever the project needs) and then we started talking about Source Control.
We’re all comfortable with various tools for check-in-check-out, branching, and so on. We have all used GIT, SVN, and TFS. Some of us have even used Source Safe in past, but that’s another post. Each company has a full set of Source Control systems in place. But using each other’s systems requires logins, firewalls and the like - so we decided to use the TFS Service Preview to run the entire project from “the cloud”. Here are my experiences with that.
The process was really simple. In fact, we talked about using the cloud TFS in the first SCRUM, and the team was working from the Work Items list that afternoon. The original account login provides a web interface to allow people to join the team. Each of us happened to have a Live.Com address, so we just invited those addresses to join and they got a link, like this:
projectname.tfspreview.com
I’m using Visual Studio, and it’s a requirement for TFS preview to have SP1 installed, and this patch: KB2581206
From there, I opened Visual Studio and navigated from the main menu to Team and then Connect to Team Foundation Server. I’m given this menu:
Selecting port 443 and HTTPS (for security) and then ensuring the lower link has the “tfs” appended as the location, I opened the project.
(This VSTS screenshot is of a project I did in my University of Washington class I teach - I never show client code or names in a blog post)
From there it’s a normal set of operations. Right now the preview doesn’t have some things I’d really like, such as an automated build or some of the testing tools, but you can read this blog entry to learn more about the entire sign-up process, and what the team has planned.
Each day I log in to the project, and I’m given this new sign-in option:
I click the option, and I open the environment, hit My Work Items query, and get to work. All in all, a seamless - although basic - experience. The speed at which we could set up and work on a project was really sweet. It’s remarkable how un-remarkable this is - I just do my work each day, everything is running and backed up in the cloud. I think that’s the point.
How It Works: sys.dm_io_pending_io_requests
I have been working an issue where the DMV was returning io_pending_ms_ticks incorrectly. The following output is an example of ~164 hour delay. Unlikely to occur without any other side effects noted.
select * from sys.dm_io_pending_io_requests
In the process of this investigation I uncovered several details that I found helpful to share.
Full Text and File Streams
There is a small set of I/Os that could show a large io_pending_ms_ticks value but io_pending should = 1. There are administrator interfaces for both the full text and file stream features. Think if them a bit like a new TDS connection to the server. When full text or a file stream request arrives the request needs to be processed. These requests are simply waiting on the arrival of a new request from the respective features. They seldom show up on this list and the file handle will not map to any of the handles exposed in the DMF - virtual_file_stats.
io_pending is the key
The io_pending column indicates 1 if the I/O is still pending within the kernel. A value of 0 indicates the I/O only needs to be processed by the local SOS scheduler. In this case we are not getting any I/O delay warnings, performance monitor is not showing I/O issues and there are no SOS scheduler issues.
Dirty Read
After some more digging the issue is a DIRTY READ. SQL Server maintains a set of I/O request structures (request dispenser). When the I/O completes the request structure is returned to the dispenser and can be reused for another I/O. The DMV needs to materialize the list without causing scheduler issues on the system so it is designed to perform (NO LOCK) dirty reads of the I/O list. This is where your system may return you incorrect io_pending_ms_ticks value when the io_pending flag = 0.
Take the diagram below as an example. It is possible that the DMV is being materialized on one scheduler but the I/O is being completed on the I/Os owning scheduler. (Each scheduler has its own I/O list).
The DMV query can get access to the I/O request structure but it does not hold any locks on the structure. If it did it could lead to unwanted blocking. Take for instance that the application fetching the DMV output stalls. If a lock is held on the I/O list the lock can't be released until the entire list is properly traversed. Because the client is not fetching results it could lead to the I/O being stalled on the scheduler for a long period of time.
To avoid this the list is run in a dirty fashion. However, this means the IO request could finish I/O completion in parallel with the reading of the structure data. The SQL Server protects the DMV query but does not indicate to the DMV user that no lock data movement has occurred. Instead it is possible that the output can become skewed, as shown in previous output. The IO data structure can be re-assigned between the time the DMV query starts to read the information and the time all columns are produced, leading to unexpected output.
SQL Server 2012 (Denali) updates this behavior by adding a signature to the I/O request. This allows SQL to maintain the dirty read capability while also identifing I/O requests that fall into this category.
When you see large pending_io_ms_ticks consider the io_pending flag and additional scheduler warning information (178** messages) in the error log.
Bob Dorr - Principal SQL Server Escalation Engineer
Assigning SQL Server, SQL Agent to a Processor Group (OOM, Hang, Performance Counters Always Zero for Buffer Pool, …)
Suresh brought to my attention that we have been getting questions as to why SQL Server starts on group 1 and then group 2 and it is not predictable? Then Tejas brought up another issues and since I worked on this way back before we released SQL 2008 R2 I went back to my notes to pull up some details that I thought might be helpful.
The answer is that you need a SQL Server that is Group Aware to use more than one group worth of CPUs in Windows. However, if you have an older version or a SKU that does not support enough CPUs to span groups the default is for Windows to start the service on any group.
SQL Server 2008 R2 uses the group aware so it will use new APIs and establish proper use of the entire system.
Legacy XPROC / COM Object / Linked Server using a CPU based scheme.
Some designs of legacy components may not be group aware or safe. In the following example, if originally loaded on Group 2 the initialization would see 60 CPUs and create 60 partitions for a local memory manager which might work perfectly with proper synchronization. However, if the original initialization of the partitions occurs on Group 1 it will only create 40 partitions and access from CPUs 41 ... 60 on Group 2 may fail as they don't exist. (SQL Server does not have any of these components.)
Great Legacy Reference: http://msdn.microsoft.com/en-us/windows/hardware/gg463349.aspx
Windows allows you to configure node to group assignments
By configuring Windows to see the same number of CPUs and Nodes within each processor group the SQL Server, SQL Agent, … services can start on any processor group and they will see the same amount of resources.
Note: There is a hotfix for Windows 2008 R2 that automatically balances the processor assignments are corrects the issue for most systems: http://support.microsoft.com/kb/2510206
The following knowledge base article outlines how to assign nodes to the processor groups so you have control over the group assignments. How to manually configure K-Group assignment on multiprocessor machines: http://support.microsoft.com/kb/2506384
You can use Windows Task Manager to see and set the affinity for the process
Start with the Windows Task Manager | Process Tab.
Select a process, Right Mouse | Set Affinity -- the following dialog is presented showing you the Processor Groups (K-Group), Nodes and CPUs on the machine.
The Problem
Even after all this work there is an issue using older versions of SQL Server (SQL 2008 pre-SP3 and SQL 2008 R2 pre-SP2) which requires SQL Server to startup in the group that holds physical NUMA Node = 0 and at least 1 CPU assigned from physical node 0 to the instance, or lazywriter activity is not started property. This can lead to buffer pool sizing, performance counters not getting updated and other stall and hang like behaviors (OOM, Latch Timeouts, …) of the SQL Server process.
As a general rule is best to start them in group 0 (usually contains physical NUMA node = 0 but you must confirm this.)
You can enforce this on the system by setting the ImageFileExecutionOptions - NodeOptions value and by making sure the SQL Server affinity mask contains at least 1 CPU in physical NUMA node 0. However, this forces all instances of SQL Server to start on the same node and all instances have to share physical NUMA node = 0.
Excerpt from the Windows 2008 R2 Release Notes
"Allocation of child processes among the ideal nodes of Non-Uniform Memory Access (NUMA) nodes is not efficient, which results in performance degradation, increased latency, and cache misses, depending on the affected processes.
To correct this, edit the registry to change the inheritance of ideal NUMA nodes so that generated child processes are assigned the same IdealNode as their parent process. This setting is not system wide, but per-process.
To change the NUMA inheritance
-
Open Registry Editor (Regedit.exe).
-
Add the following registry key:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\<process_name>.exe
where process_name is the name of the process you want to change the inheritance behavior for.
Assign a DWORD value to this key with the name NodeOptions and data 0x00001000.
Close Registry Editor and restart the computer."
Windows does not currently provide a per service, instance option for starting the service in a specific group. To avoid starting multiple instances of SQL Server on the same processor group use SQL Server 2008 R2 or newer version and establish the proper processor affinity.
SQL Server 2005
SQL Server 2008 Requires SQL Server Affinity Mask to use a CPU in physical NUMA Node = 0 as presented by the operating system. Use NodeOptions to start SQL Server instance in proper group that presents CPUs from physical NUMA Node = 0
- or -
Apply Windows SMC.exe QFE to assign startup group to a specific group containing physical NUMA node = 0 SQL Server 2008 SP3 No longer requires use of CPU in physical NUMA Node = 0 Requires Windows SMC.exe QFE to assign startup to a specific group or you can allow random group startup behavior. SQL Server 2008 R2 No longer requires CPU in physical NUMA Node = 0 Is group aware so it does not need the NodeOptions or SCM.exe QFE.
Other Helpful References
- http://blogs.msdn.com/b/saponsqlserver/archive/2011/10/08/uneven-windows-processor-groups.aspx
- http://msdn.microsoft.com/en-us/library/aa363804(VS.85).aspx
- http://archive.msdn.microsoft.com/64plusLP
- http://technet.microsoft.com/en-us/magazine/2008.03.kernel.aspx
- http://support.microsoft.com/default.aspx?scid=kb;EN-US;2028687
Bob Dorr - Principal SQL Server Escalation Engineer
How does Microsoft IT do it? Check out these IT Showcase resources for SharePoint Server 2010
One common question we get from customers who are evaluating SharePoint Server 2010 or a specific SharePoint scenario is: “How did someone else do it?” Or, more specifically: “How did Microsoft IT do it?”
Fortunately, Microsoft IT documents and publishes a set of resources known as Microsoft IT Showcase, which describes how they designed and deployed various Microsoft technologies and products. These resources include Webcasts, TechNet Edge videos, and Quick Reference Guides.
Here is a list of Microsoft IT Showcase resources for SharePoint Server 2010:
- Architecting a Global Human Resources System Using SharePoint, Azure, & Duet
- Building the Next Generation Intranet Using SharePoint 2010
- Consolidating and Sharing Business Intelligence Reports for Finance TechNet Radio
- Consolidating and Sharing Business Intelligence Reports for Finance Video
- Consumerization of IT
- Creating BI Dashboards Using SharePoint 2010 Without Programming
- Creating Scorecards to Report Organizational Rhythm of Business
- Customizing the User Experience with FAST Search
- Enabling Managed Metadata for Information Discovery at Microsoft
- Enabling Self Service BI from EDW using SharePoint 2010 and SQL Server 2008 R2
- Facilitating Employee Rewards Management at Microsoft
- How Microsoft Built Its Internal Social Video Platform
- How Microsoft IT Enables Office Web Applications in SharePoint 2010
- How Microsoft IT is Moving to the Cloud to Meet and Exceed the Needs of the Business
- How Microsoft IT Manages Project and Portfolio Management
- Implementing Social Computing at Microsoft
- Initial Best Practices & Lessons Learned from Microsoft IT's SharePoint 2010 Upgrade
- Interactive Web Technologies Deliver Financial Message to Investors and Analysts
- An Inside Look at Finance at Microsoft: Leveraging Technology
- IT Showcase On: Business Intelligence (BI)
- IT Showcase On: Microsoft SharePoint
- Managing a Parallel Upgrade to SharePoint 2010
- Measuring and Driving Microsoft IT Using Microsoft BI Solutions
- Microsoft IT CIO Office Scorecards – Delivering Dashboards from KPI Catalog to the Masses (IT Pro Webcast)
- Microsoft IT CIO Office Scorecards – Delivering Dashboards from KPI Catalog to the Masses (Video)
- Migrating to SharePoint 2010
- New FAST Search Solution Speeds Retrieval of Relevant BI Data
- The Power of ECM2: Enterprise Content Management Using SharePoint 2010
- Server Virtualization
- SharePoint 2010 Governance and Life Cycle Management
- Using Excel Services to Improve Financial Data Retrieval Performance
This is the same list from the SharePoint Server MSIT Showcase page (http://technet.microsoft.com/library/bb687797). Check this page for the most current list and for links to IT Showcase resources about SharePoint Server 2007.
To get started with SharePoint Server 2010, see the following:
- SharePoint Server 2010 TechCenter (http://technet.microsoft.com/sharepoint/ee263917)
- SharePoint Server 2010 Technical Library on TechNet (http://technet.microsoft.com/library/cc303422)
- SharePoint Developer Center (http://msdn.microsoft.com/sharepoint/aa905688)
- Getting Started with SharePoint Server 2010 (http://office.microsoft.com/en-us/sharepoint-server-help/?CTT=97)
Enjoy!
Joe Davies
Principal Writer
SharePoint Server Platforms and Infrastructure Writing Team
Confessions of a DBA: My worst mistake
Performance impact: hyperthreading on Intel Westmere-EP processors (X5690)
Bug-Out Bags and Cloud Architecture Considerations
I served in the U.S. Military for a while, and as part of my training we had to maintain a “Bug-Out Bag”, which was a large duffle-bag full of certain items that we could live on/fight with in an emergency. I’ve carried the spirit of that idea forward with me into civilian life, in Florida and especially here in the Pacific Northwest.
In Florida we dealt with the threat of hurricanes - I went through four of those in one year that hit my area. You’re without power, it floods quickly, and it gets wicked hot. You roof might be gone, whatever. Here in the Pacific Northwest, I live near one of the largest volcano's in the world, we have flooding, and recently we were hit with an ice-storm. Now I’ve lived all over the world, from Alaska to North Dakota and even near the Kamchatka Peninsula in Russia, and I can handle the snow. But ice - that’s a toughie no matter where you live. We had so much that it split my little pine tree in front of the house in half.
We lost power - although I think the folks at Puget Sound Energy did an amazing job at getting us back up in less than 24 hours, but we weren’t worried anyway. That bug-out bag mentality carried forward to a “second pantry” we keep in the garage.
We have a large plastic box (that will fit in the back of the Subaru) with dried goods like pasta, and canned goods and even a little cook stove. We have 25 gallons of clean water in Jerry-Cans. We have batteries, candles and matches. And we have flashlights around every door. We use supplies from the “pantry” to fill our house pantry, and then refill the emergency one from the grocery store. That way everything is fresh, rotated, and we can “bug-out” here at home or on the road.
So what does this have to do with Distributed Computing Architectures?
It’s the thought process. In both the military and civilian life, I’ve done a few things:
- Sat down and thought carefully about exactly what I need. Did I include a can-opener? A small shovel to dig out of whatever I got stuck in? Then I weed out what I *really* don’t need.
- Put those things into a small, manageable container.
- Tried them - even when (especially when) I didn’t have an emergency
- Tweaked the process to see what I could do better.
Have you done this when you moved an app to the “cloud”? Each of these has a computing parallel - do you know what you would do if you couldn’t access the Distributed Computing Environment?
I’ve found these thoughts are actually a great place to start - keeps the process simplified from the start, and gives you a sense of assurance when you’re asked if you can recover from an emergency.
