If you aren't subscribing to Database Weekly, then I would highly suggest it. It is a simple news aggregator sponsored by Red Gate, and pulls most of its stories from SQLServerCentral.com.
Today, via the Database Weekly e-mail, I received notice of another article posted on solid state storage; in particular, this article references some interesting developments in IBM's lab with solid state arrays.
For more direct from IBM's site, go here.
Back in July, I had posted about Solid State Storage and its beneficial use for SQL Server; as a follow up, a few of my co-workers have attempted to use these drives, but it seems that current drive manufacturers need to catch up with Vista and in particular, Server 2008 so that we can get steady and reliable performance benefits from using them. Once we do find a drive that works and provides a performance benefit with Server 2008, I'll be sure to post the details.
Tags: storage subsystem,
sql server,
hardware
Categories: SQL Server
If you haven't picked up on a theme yet in my posts, then shame on you for not reading everything on this site….. To be frank, I love working with performance issues on databases (mostly from a design and code perspective, my hardware skills are a little out of date, but I try to keep up).
This code snippet will be just one more for your tool belt in evaluating performance and looking for optimizations. The script below will give you the name of the statistic and the table column it is associated with for a given database. Stats names that start with PK are usually going to be your primary keys whereas names that are similar to WA_Sys are going to be system generated statistics for your objects.
select object_name(s.object_id) as TableName
, s.name as StatisticName
, COL_NAME(s.object_id, sc.column_id)as ColumnName
from sys.stats s
inner join sys.stats_columns sc
on s.stats_id = sc.stats_id
and s.object_id = sc.object_id
where OBJECTPROPERTY(s.object_id, 'ISMSShipped') = 0
order by object_name(s.object_id)
With this output, you can generate details about specific stat objects (the StatisticName output column from above). The output will give you three result sets:
- First result set is general information, such as update date, number of rows in the table, rows sampled for the statistics, etc. The step value in here should be equal to the number of rows in your third set, which is the detailed histogram data for this specific index.
- If DENSITY_VECTOR is specified, then the second result set will describes the columns returned in the result set
-
Third result set is the detailed data for every step in the statistic. From books online:
- RANGE_HI_KEY: Upper bound value of a histogram step
- RANGE_ROWS: Estimated number of rows from the table that fall within a histogram step, excluding the upper bound
- EQ_ROWS: Estimated number of rows from the table that are equal in value to the upper bound of the histogram step
- DISTINCT_RANGE_ROWS: Estimated number of distinct values within a histogram step, excluding the upper bound
- AVG_RANGE_ROWS: Frequency or average number of duplicate values within a histogram step, excluding the upper bound (RANGE_ROWS / DISTINCT_RANGE_ROWS for DISTINCT_RANGE_ROWS > 0)
For specifics on strings, Karen Delany wrote this article for the http://www.sqlcommunity.com website.
Tags: performance,
t-sql,
statistics
Categories: SQL Programming |
SQL Server
To follow up with my previous post, Looking at Query IO stats, I thought I would add in one more performance notion that you are able to negotiate with your hosting provider (talking about majority, not specialists): SQL Server file groups.
What does additional file groups help solve? Many things, but the basic concept from a performance perspective is to let SQL Server use parallel processing by splitting the IO tasks over multiple threads. Even file groups on the same logical disk can improve throughput for a given query (all other variables held static).
Of course, if you do get to the point where this really is a necessity, hopefully you are doing well enough with your venture that you can get a more customized SQL Server environment; either way, the script below will help you take a look at your objects per file group.
Some of the basic concepts inside SQL Server file groups have not changed for a long time. Take, for example, the stored procedure sp_objectfilegroup, which has been around since SQL Server 7.0, and has a created date of 1996-08-30.
Here is the basic script:
select ds.name, o.name, o.type_desc
from sys.objects o
inner join sys.indexes i
on i.object_id = o.object_id
and o.is_ms_shipped = 0
and i.index_id < 2 –- excludes non-clustered indexes
inner join sys.data_spaces ds
on ds.data_space_id = i.data_space_id
If you want to build upon this to get information on the sizes of individual tables for planning, you can utilize the stored procedure sp_spaceused to get details on the space a table is using. I posted some detail on this back in May in post: Tables and Sizes.
I would be interested to hear from anyone that is using a hoster that is friendly to data intensive web applications; most host providers I have come across are geared heavily towards web and web platform offerings, but tend to skimp on features you can purchase database side. I have one that I am happy with for now, but if anyone else has an experience they would like to share, please feel free to comment or e-mail me.
Tags: performance,
management,
storage subsystem
Categories: SQL Server
A friend and co-worker of mine operates his own website for a business he has started. We have talked a lot about database design for performance, query stats and so on, but one of his difficulties is that he is currently using a hosting service for both web and database, thereby limiting his ability to access the SQL Server and really make some underlying performance enhancements – lucky for him, his site is growing, so he should be able to change that soon.
In hosted situations, IO is going to suffer. For most basic plans, you are sharing a SQL Server with many, many, many other people and you will have absolutely no say in how that server is maintained, operated and configured. I hate to say it, but very few hosting providers have done a good job providing SQL service. I have used and interacted with over 20 of the larger host providers over the past two years, and of those, only three even kept their SQL Servers adequately patched.
Disk setups are basic in these situations, not that I am blaming these guys, you get what you are paying for, and most people just need a basic service – it's is really only a pain point for those who have a good product and are experiencing growth and the performance pain with that growth – so onto the point of this post:
In a hosted situation, your options to debug for IO performance is limited as you cannot run traces or even know the disk configuration, but what you can do is debug your queries inside of SSMS to understand the IO statistics given the configuration you are using. By using SET STATISTIC IO ON, you are telling SQL Server to print information in the messages tab:
So what does this information mean and how can you use it? From Books On Line:
- Table: Name of the table.
- Scan count: Number of index or table scans performed.
- logical reads: Number of pages read from the data cache.
- physical reads: Number of pages read from disk.
- read-ahead reads: Number of pages placed into the cache for the query.
- lob logical reads: Number of text, ntext, image, or large value type (varchar(max), nvarchar(max), varbinary(max)) pages read from the data cache.
- lob physical reads: Number of text, ntext, image or large value type pages read from disk.
- lob read-ahead reads: Number of text, ntext, image or large value type pages placed into the cache for the query.
Scans are bad – if this number is large for one of your queries, you will want to look at how you wrote it, are you using aliases so SQL is not doing lookups, are your where clauses and joins hitting indexes? The number of scans will directly impact performance as it means that the query does not have a pointer into the B-tree, or that you are going against a table that is a heap.
The higher the logical reads, the more that query is hitting plan or data cache from other queries. Depending upon maintenance tasks run on the server, this will change easily from one day to the next.
Read ahead can be both a blessing and a curse. If you have a large number of columns in your set returned that are variable character columns, then you could be taking a performance hit. With data types such as nvarchar and varchar, the SQL Engine does not know how many pages it can exactly pre-fetch as all the rows are potentially different in length – whereas is can estimate with a fixed column of char(50). Because of this, SQL may either read ahead too many or too few pages. In some cases, it is best to take the space hit in setting a fixed character length for often used columns in order to have better performance in read ahead and pre-fetch scenarios.
Tags: performance,
t-sql
Categories: SQL Programming
Jamie Thomson's post T-SQL: Do you use aliases? Sparked a lively discussion inside the comment thread, with several people chiming in on pros and cons of the different ways to use aliases, whether to use them, or if fully qualified names were the way to go. I have my own preferred method which we implemented as standard at my team when I was at Dell (for the curious, it matches Jamie's suggestion).
That being said, books on coding standards are a dime a dozen, and if you work over any given time period on N number of projects, you will encounter just about every method with a devout developer backing up his/her methods as being most productive for a given assignment. Thankfully, the SQL alias argument is a little simpler in scope than naming conventions, error handling, etc. in C#.
I would suggest reading the post and thread, but if you are too lazy, here is a summation of the cogent points against using aliases:
- Using aliases instead of fully qualified names can make large stored procedures hard to read as you have to find the associated alias to know where the column comes from in the database(s)
- Aliases do not convey meaning or source as most developers use a cryptic or non-repeated pattern for aliases, ex: sys.columns uses alias a – what is the association? why a?
- Multiple developers on a single project will use their own rules for aliases, making one letter aliases such as c mean different things in different batches of code
All of these are very direct and considerable points. I think that most of us would agree that the end goal pertinent to this discussion is readable, maintainable and relatively self-documenting code.
In the argument for using aliases instead of fully qualified names, SSMS 2008 helps those worried about incoherent aliases not being able to identify underlying objects (hover over the alias in c.name to get the ToolTip):
You also have to love what SSMS 2008 intellisense does for making coding easier(easily available with the use of an alias, or full object name, whichever you prefer to type):
While this feature may not have an impact on the practices of some of the people in Jamie's thread, it will, at a minimum, make it easier for those of us who do use aliases, and for anyone that inherits a large script.
Tags: t-sql,
practices
Categories: SQL Programming
SQL 2008 RTM it out and you've managed to get a copy installed (legally), but wonder, where are my samples?
Many things are being posted to CodePlex, and this is one more in a long string. Here is the samples home page for SQL 2008: http://www.codeplex.com/MSFTDBProdSamples.
To get your detail flavors by processor type, this page holds the many variations.
Enjoy.
Tags: sql 2008,
code samples
Categories: SQL Server
I almost overlooked the early registration myself, but luckily one of my fellow Catapult colleagues is on the ball and sent around a reminder to all of us.
For those out there interested in attending, you can register here for the 2008 BI conference. Microsoft just extended the early registration discount to August 22nd from this Friday, August 8th. It is a $300 dollar difference, so well worth the extra planning.
BI Conference 2008 Tracks:
- Microsoft BI Platform and Infrastructure
- Microsoft BI Clients and
- Deployment and Best Practices
- Customer and Industry Solutions
- Business Value of Business
- Partner Training Track
Tracks and sessions are listed, but no specific times yet for the individual sessions.
Anyone else going? Leave a comment or send me an e-mail.
Tags: business intelligence
Categories: Business Intelligence
I have been a little lazy in getting this one resolved - however for anyone facing a similar issue in getting this hot fix to install on their SQL 2005 box, there is a simple fix.
For most people I know who have experience this issue, the error message is always the same:
" Error Number 1920, Error Description: MSP Error: 1920 Service 'SQL Server VSS Writer' (SQLWriter) failed to start. Verify that you have sufficient privileges to start system services."
The failure message is based on the fact that many of us do not have the VSS service enabled:
Expecting UAC in Vista to be a potential hindrance, I even tried downloading the package separately, right clicking on it and selecting "Run as Administrator", however that produces the same message, so it appears the hot fix does not try to enable the service itself, rather, it expects it to be enabled and/or running.
If you simply enable this service, the hot fix can be applied and you can then disable the service again if it is not needed.
If you've manually downloaded the hot fix like I do, then you'll get:
Once you apply the hot fix, you will go to version 9.0.3068 from 9.0.3045.
Tags: sql server,
management,
service packs
Categories: SQL Server