*If necessary
Paging data in your application, be it thin or thick client, is a frequent need: list of customers, address directory, invoice details, etc. – any number of situations may crop up where the paging grid in the UI becomes a necessity.
If your data is big enough to be paged in your display, then, for the most part, it is a large enough set that you are not advised to pull all of the set back at once to be cached by the UI, particularly if your storage medium is SQL Server. A lot of you are probably familiar with using row_number, the OVER statement and maybe even a CTE to resolve ordering the data and grabbing groups at a time for the application. But what do you do if you need to support paging, and you are not sure what the table is going to be?
I will point out that the title is asterisked, my caveat to this whole discussion is: should this be necessary. Beside that point though, it is a relatively interesting issue. How would you implement a paging query when you don't know what table you might want, or when the schema might be unknown to you? (Again, this should not be used widespread, but scenarios can come up where this makes sense).
The first requirement in this scenario is getting table meta-data back to your application; in SQL 2005 and above, this is a simple matter of querying sys.tables. After you have the table list, the other two variables to declare in this scenario are your page size and what page in the paged data you desire. The full script is provided below – most of it is pretty straight forward, except that you will notice that I pull a little stunt in getting the column(s) to do the order by on for the row_number and over functions.
Given that we need something to order this data by, and we don't know the schema, I query the information_schema tables to get the tables primary key column or columns – if none, it will just take the first column. Not the most advanced, but then again, this script shoots blind – which is the point of this example.
The other little bonus is that it also removes column types from the query that would likely choke an auto-column generating .Net control – varbinary, xml, etc. are excluded in the columns dynamically selected from the table parameter – to get a closer look at this, you can modify the script and print the @TableSelectColumns variable.
I would be interested to hear how others might apply this scenario, so if you are up for it, post how you used this or a similar script!
declare
@TableObjectId int, @PageSize int,
@PageIndex int, @TableRows int,
@TableName nvarchar(75), @KeyOrderColumn nvarchar(50),
@StrSQL nvarchar(4000), @PageRowStart int,
@PageRowEnd int, @TableSelectColumns nvarchar(2000),
@Params nvarchar(50)
-- Set the parameters
set @TableName = Your Table Name Here
set @PageSize = Your Page Size Here
set @PageIndex = Your Page Index here
set @StrSQL = 'select @columnsOUT = coalesce(@columnsOUT + '','', '''')+ ''[''+c.name+'']''
from sys.columns c
inner join sys.types t on t.system_type_id = c.system_type_id
where c.object_id = OBJECT_ID(''' + @TableName + ''')
and t.system_type_id not in (34,35,98,99, 165,173,241,231)'
set @Params = '@columnsOUT nvarchar(2000) output'
exec sp_executesql @StrSQL,@Params, @columnsOUT = @TableSelectColumns output
if(@PageIndex < 2)
set @PageRowStart = 0
if(@PageIndex > 1)
set @PageRowStart = ((@PageIndex * @PageSize) - @PageSize)
set @PageRowEnd = (@PageIndex * @PageSize)
set @TableObjectId = (select OBJECT_ID(@TableName))
-- select the key column(s)
select @KeyOrderColumn = coalesce(@KeyOrderColumn + ', ', '')
+ '['+convert(nvarchar(25),B.COLUMN_NAME,20)+']'
from INFORMATION_SCHEMA.TABLE_CONSTRAINTS A
, INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE B
where CONSTRAINT_TYPE = 'PRIMARY KEY'
and A.CONSTRAINT_NAME = B.CONSTRAINT_NAME
and object_id(A.TABLE_NAME) = @TableObjectId
-- Grab the first column since we have no key column declared
if(len(@KeyOrderColumn) < 1 or @KeyOrderColumn is null)
begin
set @KeyOrderColumn = ( select name from sys.syscolumns where id = @TableObjectId and colorder = 1)
end
set @TableRows = (select rows from sys.partitions where object_id = @TableObjectId and index_id < 2)
if(@PageSize > @TableRows)
begin
set @StrSQL = 'select ' + @TableSelectColumns + ' from ' + @TableName + ' order by ' + @KeyOrderColumn
end
if(@PageSize < @TableRows)
begin
set @StrSQL = 'select ' + @TableSelectColumns
+ ', RowNumber into #Results from (select *, row_number() over(order by '
+ @KeyOrderColumn + ' ) as RowNumber from '
+ @TableName + ') SortTable where RowNumber > '
+ cast(@PageRowStart as nvarchar) + ' and RowNumber < '
+ cast(@PageRowEnd as nvarchar)
+ ' alter table #Results drop column RowNumber '
+ ' select * from #Results'
end
exec sp_executesql @StrSQL
Tags: paging,
t-sql
Categories: SQL Programming
I have been meaning to post this for a while, but you know how work gets in the way….
This came out in some of the Microsoft newsletters and as a poster in SQL Server magazine: www.getonthecase.com
The sessions offered online are as follows:
- Working with asynchronous data in SQL Server 2008
- SQL Server 2008 and ADO.Net entity framework integration
- Uncovering T-SQL on SQL Server 2008
- Deploying managed code to SQL Server 2008
- Communicating with SQL Server 2008 using HTTP
- Working with unstructured data in SQL Server 2008
- Developing for SQL Server 2008 using Visual Studio
- Working with SQL Server 2008 and disconnected clients
For anyone who loves web design, the site is also an incredibly cool exhibit of SilverLight 2.0 programming.
Tags: training,
sql 2008,
sql server
Categories: SQL Programming
In my post, Why use aliases?, I spoke in favor of using a certain syntax in your SQL programming, particularly because SQL Server 2008 gives you almost no reason not to with integrated Intellisense.
One little note for those who may have run into a frustration while developing – Intellisense in SQL 2008 does not always refresh timely. This can be particularly annoying if you are in the middle of developing, make some schema changes and then try to write a script – intelllisense will not prompt you with the correct values, or tell you that objects and schemas do not exist.
Only two known ways to resolve this for now:
- Main Menu: Edit > Intellisense > Refresh Local Cache
- Or, shortcut keys: CTRL + Shift + R
Tags: sql 2008,
t-sql
Categories: SQL Programming
If you haven't picked up on a theme yet in my posts, then shame on you for not reading everything on this site….. To be frank, I love working with performance issues on databases (mostly from a design and code perspective, my hardware skills are a little out of date, but I try to keep up).
This code snippet will be just one more for your tool belt in evaluating performance and looking for optimizations. The script below will give you the name of the statistic and the table column it is associated with for a given database. Stats names that start with PK are usually going to be your primary keys whereas names that are similar to WA_Sys are going to be system generated statistics for your objects.
select object_name(s.object_id) as TableName
, s.name as StatisticName
, COL_NAME(s.object_id, sc.column_id)as ColumnName
from sys.stats s
inner join sys.stats_columns sc
on s.stats_id = sc.stats_id
and s.object_id = sc.object_id
where OBJECTPROPERTY(s.object_id, 'ISMSShipped') = 0
order by object_name(s.object_id)
With this output, you can generate details about specific stat objects (the StatisticName output column from above). The output will give you three result sets:
- First result set is general information, such as update date, number of rows in the table, rows sampled for the statistics, etc. The step value in here should be equal to the number of rows in your third set, which is the detailed histogram data for this specific index.
- If DENSITY_VECTOR is specified, then the second result set will describes the columns returned in the result set
-
Third result set is the detailed data for every step in the statistic. From books online:
- RANGE_HI_KEY: Upper bound value of a histogram step
- RANGE_ROWS: Estimated number of rows from the table that fall within a histogram step, excluding the upper bound
- EQ_ROWS: Estimated number of rows from the table that are equal in value to the upper bound of the histogram step
- DISTINCT_RANGE_ROWS: Estimated number of distinct values within a histogram step, excluding the upper bound
- AVG_RANGE_ROWS: Frequency or average number of duplicate values within a histogram step, excluding the upper bound (RANGE_ROWS / DISTINCT_RANGE_ROWS for DISTINCT_RANGE_ROWS > 0)
For specifics on strings, Karen Delany wrote this article for the http://www.sqlcommunity.com website.
Tags: performance,
t-sql,
statistics
Categories: SQL Programming |
SQL Server
A friend and co-worker of mine operates his own website for a business he has started. We have talked a lot about database design for performance, query stats and so on, but one of his difficulties is that he is currently using a hosting service for both web and database, thereby limiting his ability to access the SQL Server and really make some underlying performance enhancements – lucky for him, his site is growing, so he should be able to change that soon.
In hosted situations, IO is going to suffer. For most basic plans, you are sharing a SQL Server with many, many, many other people and you will have absolutely no say in how that server is maintained, operated and configured. I hate to say it, but very few hosting providers have done a good job providing SQL service. I have used and interacted with over 20 of the larger host providers over the past two years, and of those, only three even kept their SQL Servers adequately patched.
Disk setups are basic in these situations, not that I am blaming these guys, you get what you are paying for, and most people just need a basic service – it's is really only a pain point for those who have a good product and are experiencing growth and the performance pain with that growth – so onto the point of this post:
In a hosted situation, your options to debug for IO performance is limited as you cannot run traces or even know the disk configuration, but what you can do is debug your queries inside of SSMS to understand the IO statistics given the configuration you are using. By using SET STATISTIC IO ON, you are telling SQL Server to print information in the messages tab:
So what does this information mean and how can you use it? From Books On Line:
- Table: Name of the table.
- Scan count: Number of index or table scans performed.
- logical reads: Number of pages read from the data cache.
- physical reads: Number of pages read from disk.
- read-ahead reads: Number of pages placed into the cache for the query.
- lob logical reads: Number of text, ntext, image, or large value type (varchar(max), nvarchar(max), varbinary(max)) pages read from the data cache.
- lob physical reads: Number of text, ntext, image or large value type pages read from disk.
- lob read-ahead reads: Number of text, ntext, image or large value type pages placed into the cache for the query.
Scans are bad – if this number is large for one of your queries, you will want to look at how you wrote it, are you using aliases so SQL is not doing lookups, are your where clauses and joins hitting indexes? The number of scans will directly impact performance as it means that the query does not have a pointer into the B-tree, or that you are going against a table that is a heap.
The higher the logical reads, the more that query is hitting plan or data cache from other queries. Depending upon maintenance tasks run on the server, this will change easily from one day to the next.
Read ahead can be both a blessing and a curse. If you have a large number of columns in your set returned that are variable character columns, then you could be taking a performance hit. With data types such as nvarchar and varchar, the SQL Engine does not know how many pages it can exactly pre-fetch as all the rows are potentially different in length – whereas is can estimate with a fixed column of char(50). Because of this, SQL may either read ahead too many or too few pages. In some cases, it is best to take the space hit in setting a fixed character length for often used columns in order to have better performance in read ahead and pre-fetch scenarios.
Tags: performance,
t-sql
Categories: SQL Programming
Jamie Thomson's post T-SQL: Do you use aliases? Sparked a lively discussion inside the comment thread, with several people chiming in on pros and cons of the different ways to use aliases, whether to use them, or if fully qualified names were the way to go. I have my own preferred method which we implemented as standard at my team when I was at Dell (for the curious, it matches Jamie's suggestion).
That being said, books on coding standards are a dime a dozen, and if you work over any given time period on N number of projects, you will encounter just about every method with a devout developer backing up his/her methods as being most productive for a given assignment. Thankfully, the SQL alias argument is a little simpler in scope than naming conventions, error handling, etc. in C#.
I would suggest reading the post and thread, but if you are too lazy, here is a summation of the cogent points against using aliases:
- Using aliases instead of fully qualified names can make large stored procedures hard to read as you have to find the associated alias to know where the column comes from in the database(s)
- Aliases do not convey meaning or source as most developers use a cryptic or non-repeated pattern for aliases, ex: sys.columns uses alias a – what is the association? why a?
- Multiple developers on a single project will use their own rules for aliases, making one letter aliases such as c mean different things in different batches of code
All of these are very direct and considerable points. I think that most of us would agree that the end goal pertinent to this discussion is readable, maintainable and relatively self-documenting code.
In the argument for using aliases instead of fully qualified names, SSMS 2008 helps those worried about incoherent aliases not being able to identify underlying objects (hover over the alias in c.name to get the ToolTip):
You also have to love what SSMS 2008 intellisense does for making coding easier(easily available with the use of an alias, or full object name, whichever you prefer to type):
While this feature may not have an impact on the practices of some of the people in Jamie's thread, it will, at a minimum, make it easier for those of us who do use aliases, and for anyone that inherits a large script.
Tags: t-sql,
practices
Categories: SQL Programming
With the introduction of the varchar(max) and nvarchar(max) data types, it would seem that one would be safe to use T-SQL and T-SQL functions for large string manipulation. Hold tight though, there are a some instances where this seems to break. Take the replace function for example. Run the code below:
<
You will get the following error:
I'll save you a little time and explain what I see happenning. Replace works with varchar or nvarchar max types for the first parameter, however, it appears the second parameter, or search function, does not accept anything greater than nvarchar(4000) or varchar(8000). If you simply change the replicate on @str1 to replicate 8000 or less times , then you do not even have to recast the varchar(max) to varchar(8000) and it will work.
Interesting behavior and just a little food for thought when consider your large text data types.
Tags: t-sql
Categories: SQL Programming
I found an old USB memory key of mine in an old work bag and plugged it up to see what lost treasures might be on it -- it's a small purvue into old thoughts and coding patterns. One of the interesting things I found among the different code snippets was a mortgage calculator I had written a few years back when my wife and I had bought our first house. It prints your whole schedule, principal and interest and shows the current value of balance at the specific points in time. I never expanded it beyond the typical fixed period, so now that I found it again, I'll probably look at making it more dynamic in so far as allowing you to do schedules for ARM and expected rate increases/decreases.
There are a couple of values to set in order to get your schedule printed out:
-
Quoted Interest Rate
-
Compounding Period your bank uses, most will use months, so this value would be 12
-
Loan Amount
-
Amoritization Period -- this needs to be in the same format as the compounding period, which is months
-
Mortgage Start Date - -when you first start paying the loan
Here's the code, have fun:
/*** Loan Calculation Variables ***/
@InterestRate FLOAT, @Loan FLOAT, @AmoritizationMonths FLOAT, @Payment FLOAT, @Period FLOAT,
/*** CALCULATED PROCEDURAL VARIABLES ***/
@Payment2 FLOAT, @TotalPayment FLOAT, @FinanceCharges FLOAT, @CompoundingPeriod FLOAT,
@CompoundingInterest FLOAT,
/*** CALCULATED LOAN VARIABLES ***/
@CurrentBalance FLOAT,
@Principal FLOAT,
@Interest FLOAT,
/*** MORTGAGE TIME VARIABLES ***/
@MortgageStartDate SMALLDATETIME, @MortgageEndDate SMALLDATETIME,
@MortgagePayDate SMALLDATETIME, @MortgageDueDate SMALLDATETIME
/*** USER VARIABLES ***/
SET @InterestRate = 0.0719
SET @CompoundingPeriod = 12
SET @Loan = 100000
SET @AmoritizationMonths = 360
SET @MortgageStartDate = '2005-2-01'
/*** END USER VARIABLES ***/
SET @CompoundingInterest = @InterestRate/@CompoundingPeriod
SET @Payment =
ROUND(
(
((@InterestRate/12) * @Loan)/(1- ( POWER( (1 + (@InterestRate/12)),(-1 * @AmoritizationMonths) )))
),2)
SET @TotalPayment = @Payment * @AmoritizationMonths
SET @FinanceCharges = @TotalPayment - @Loan
if exists(select object_id from tempdb.sys.objects where name like '#Schedule%')
begin
drop table #SCHEDULE
end
create table #SCHEDULE
(
PERIOD INT
,PAYDATE SMALLDATETIME
,PAYMENT MONEY
,CURRENT_BALANCE MONEY
,INTEREST MONEY
,PRINCIPAL MONEY
)
SET @Period = 1
SET @MortgageEndDate = DATEADD(year,@AmoritizationMonths/12,@MortgageStartDate)
SET @MortgagePayDate = @MortgageStartDate
BEGIN
WHILE (@Period < = @AmoritizationMonths)
BEGIN
SET @CurrentBalance = ROUND (
@Loan * POWER( (1 + @CompoundingInterest) , @Period )
- ( (@Payment/@CompoundingInterest) * (POWER((1 + @CompoundingInterest),@Period ) - 1) )
,2)
SET @Principal = CASE
WHEN @Period = 1 THEN ROUND((@Loan - @CurrentBalance),2)
ELSE ROUND((SELECT CURRENT_BALANCE - @CurrentBalance
FROM #SCHEDULE WHERE PERIOD = @Period -1),2)
END
SET @Interest = ROUND(ABS(@Payment - @Principal),2)
SET @MortgageDueDate = @MortgagePayDate
INSERT #SCHEDULE(PERIOD, PAYDATE, PAYMENT, CURRENT_BALANCE, INTEREST, PRINCIPAL)
SELECT @Period, @MortgageDueDate, @Payment, @CurrentBalance, @Interest, @Principal
SET @Period = @Period + 1 SET @MortgagePayDate = DATEADD(MM,1,@MortgagePayDate) END
END
SELECT * FROM #SCHEDULE
Tags: t-sql
Categories: SQL Programming