A great way to find out where your sql sever bottle necks are is to monitor the wait stats.
A technique to monitor your wait stats for the purpose of diagnosing a server can be found here.
What are wait statistics ?
Whenever a sql server thread starts on the cpu it is given a certain amount of time to complete it’s task (5ms). There can only ever be one thread on the processor at any time (this it true for multi thread systems where the common misconception is that multiple threads can be executed at any one time). Whenever the thread cannot complete its task, either because it has to wait for something such as a disc i/o to complete, or retrieve some data from memory, then the thread leaves the cpu and is placed in a Wait queue until the resource becomes available, whereby the thread then enters the ‘Runable’ queue where it moves up the stack until it can get back onto the cpu to complete.
SQL Server keeps track of how long these threads have been waiting in the form of ‘wait statistics’. These can be queried from the dynamic Management View (dmv’s) which will give you an overall picture of where the bottlenecks lie.
Here is a list of some of the most common wait types with a brief explanation of what they mean:
CXPacket
This is usually a sign that you are experiencing queries that are splitting into multiple threads and running in parallel. This is not necessarily a bad thing. It just means that one of the threads has finished processing its data and is waiting for the other threads that spawned to finish also. The threshold at which a query breaks into parallel threads is governed by the setting ‘Cost Threshold for Parallelism’. The default is set to 5 which these days on multi core/multi processor systems is generally way way to low. Also the ‘Max Degree of Parrallism’ or MAXDOP setting determines how many cpu’s will be involved in the processing of a query with 0 being the default and meaning all available processors will be used. So in a 8 CPU box, with a MAXDOP setting of 4, only half of the available cpu’s will be used in any parallel execution. Things to check:
- Check the execution plan of any queries that are running in parallel as any potential issues may show up here
- Tune queries where you can
- Make sure statistics are upto date on your tables
- If PageIOLatch waits are high check the query plan for large table scans
- Check your default setting for ‘Cost Threshold for Parallelism’ – it may be way to low and still on the default of 5
- Check the MAXDOP setting *Please note setting this to 1 is bad practice – It essentially limits queries to 1 processor so while it will remove the CXPacket wait (because NO queries can can now go parallel ) it can have a negative impact on performance.
PageIOLatch
Think of a latch as the smaller relation of a lock ! It’s a lightweight lock taken out by the system when ever it needs to manage access to the page buffer. Whenever sql server has to read pages from the disc into memory or vice-versa then it must place these latches on the buffer pages for this process to complete.
If you are seeing wait times above 10ms for this wait type, then it suggests you are under some kind of pressure on the I/O sub system. Things to check:
- Check the latency of your disc sub system
- Check your disc average read and average write time metrics
- Run a query to check the most intensive I/O queries on your system and look at the execution plans for missing indexes, large table scans
- Make sure log files are on separate discs to your data files and consider partitioning your data to alleviate pressure
BackupIO
If you are backing up to a network location you may see this wait type which indicates that your network is suffering from some kind of latency. This is also very common if you are backing up to a tape system. Things to check :
- Check the ping results between systems to test any latency
- Check the NIC’s in your server are configured correctly
- You could perhaps separate your backups and access networks with separate network switches
- Check the network storage to see if it is under any kind of pressure which will in turn cause waits on your backups
Aysnc_Network_IO
This is generally an indication of network issues between sql server and your clients. SQl server will hold data in the output buffer until it receive’s an acknowledgement from the client that it has finished consuming the data. This wait type is a good indicator that the client is not able to consume all the data in a timely manner – so if the network is poor, you will see this wait type while the client receives all its data from sql server and then sends the acknowledgement back. Things to check:
- Check your network hardware between client and server – check
- It may be that you application is requesting too much data in one chunk and would be better served by breaking this down into smaller requests.
- Also look to see if the filtering is done on the client side as opposed to the server side. Why send extra amounts of data over the network for it to be filtered at the client when sql server can do this far more efficiently ?