Calender
<<  January 2009  >>
MoTuWeThFrSaSu
2930311234
567891011
12131415161718
19202122232425
2627282930311
2345678
Wire Up
Posted on: November 25, 2007, 22:19 by Sid

Pivot is a useful function in 2005, but it still has its limits. When I first attempted to use this new feature, I was a bit disappointed. I have to admit, I usually try something before I read the details on how it works – mostly to see if something works the way I would picture it to work. As you can imagine, pivot is a nice function to use for prepping data for display, ad-hoc reporting, etc. I'll go over a trick here to make it dynamic the way I expected the pivot operator to be.

I first tried something like this:

select    territoryid, OrderDate

from    Sales.SalesOrderHeader h

pivot

    (    count(orderdate) for

        OrderDate in (select OrderDate from    Sales.SalesOrderHeader where OrderDate < '2001-07-10')

    ) p

Quite often, date is the dimension on which you would like to pivot data on – and you do not always know what range you would like. Additionally, if you would like to make this retrieval operation into a stored procedure that accepts a date range as parameters, you would now have a problem in generating a cross-tab data set using the pivot operator. How does a working query look? Here is one:

select * from

(select    territoryid, OrderDate

from    Sales.SalesOrderHeader) h

pivot

    (    count(orderdate) for

        OrderDate in ([2001-07-01], [2001-07-02], [2001-07-03], [2001-07-04], [2001-07-05], [2001-07-06], [2001-07-07], [2001-07-08], [2001-07-09])

    ) p

The first thing to note is that pivot relies directly on table aliases; in the query above I have built a nested select that gives us the first alias table in order to build our data to pivot on. This is the basic format a query will need to be in order to use the pivot function, though you can also make use of CTE to help keep your code neat. The second thing to note is that your pivot columns must be enclosed in the [] brackets for the operation to work.

So how can we make the pivot columns dynamic? The first step is to build a delimited list of the values you wish to pivot on. For this example, we are using the AdventureWorks database (not the dimensional database, the OLTP example). We'll use a trick that worked in SQL 2000 as well to build a result set into a single delimited string using coalesce. Here is how it looks:

select    @StrSql = coalesce(@StrSql + ', ', '')+ '['+convert(nvarchar(10),OrderDate,20)+']'

from        Sales.SalesOrderHeader

where        OrderDate < '2001-07-10'

group by    OrderDate

order by    OrderDate

The output of this looks like:

[2001-07-01], [2001-07-02], [2001-07-03], [2001-07-04], [2001-07-05], [2001-07-06], [2001-07-07], [2001-07-08], [2001-07-09]

Now that we have a way to build our pivot columns string, then is a simple matter of building a dynamic statement to execute:

set @StrSql =

            '

            select * from

            (select    territoryid, OrderDate

            from    Sales.SalesOrderHeader) h

            pivot

                (    count(orderdate) for

                    OrderDate in (' + @StrSqlP +')

                ) p

            '

exec sp_executesql @StrSql

And there you go – a better way to use the pivot function in SQL 2005. As you can image, you can add a lot of different variables to this in order to get a more dynamic result set from your query – the biggest trick is getting your delimted list of pivot columns.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: November 20, 2007, 09:47 by Sid

Developers using Vista have some minor adjustments to make with the new security model in Microsoft's latest operating system. One of the first annoyances you might notice after installing SQL 2005 (I'm running Developer Edition), is that you will get permission errors when trying to create a new database. Luckily, there is an easy way around this using one of the new management features in SQL 2005: the Surface Area Configuration Tool. Simply launch this from All Programs > Microsoft SQL Server 2005 > Configuration Tools > Surface Area Configuration.

You will get the screen below; what we want to do is click on the "Add New Administrator" button.

Our next screen recognizes that we are on a Vista machine. In my case, I have two instances of SQL 2005 Developer edition on my laptop. Select one of both and move it across to your privileges to be granted. Click OK. You are now configured and will be able to create databases on your selected SQL instances.

My laptop, by default, came with SQL Express installed. I had worked on a few items before I noticed the need for Developer edition to test some extended functionality in SQL. After uninstalling SQL Express and installing Developer edition, I went to reattach my databases and got the following error:

This did not make a lot of sense – the SQL service is running under Local System, which has full permissions to the Data folder where all the database files are stored. What appears to have happened is that Vista reset the permissions on the existing files (i.e., my old databases) during the install, and these files did not inherit the permissions from the Data folder after Developer was done installing. Don't be tempted to tell all files to inherit permissions from the Data folder, as it is marked as read only and this would cause you other problems – apply the appropriate permissions to the database files and then you will be able to reattach without issue. It's a strange quirk of Vista, one that XP does not have, but still workable.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: November 15, 2007, 18:56 by Sid

If you have not seen this feature on Live.com yet, then check it out as it is an intersting way to filter or attempt pre-validation of searching for particular topic areas; the Windows Live Gallery is here.

I have created my own search macro for business intelligence -- this is my first stab at using this, so it will get better over time as I add more tags and specific content areas for it to search, but please use and let me know how it works for you. Also, if you have your own site(s) covering BI topics that you feel my search macro is not covering, please e-mail me and let me know.

Here is my Business Intelligence Search Macro, enjoy!

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: November 15, 2007, 10:47 by Sid

What is data profiling? The concept is pretty simple – data profiling is a process or set of processes you walk through in order to gather statistics and information about an existing data system, regardless of type, vendor, etc. In profiling, you are building a picture of the existing system so that you can plan and thus meet the goals of the project – what you do during your data profiling will depend on the goals and needs of the project. For example, in versioning an application and its data store, a lot of your profiling might be focused on checking referential integrity and building a plan to improve this for better data quality as customers complained about this in the old system. A warehousing project will most likely not have the scope to touch the applications sourcing the data and thus domain and pattern would be higher on the list for the data profiling efforts.

For some further reading, here is a quick reference to one of Ralph's design tips on data profiling. If you don't already, I would recommend subscribing to the Kimball Group's Design Tips; their website is: http://www.kimballgroup.com.

As you have probably guessed by now, I will focus on data profiling with respect to warehousing and business intelligence, however the same concepts presented here can be useful for those engaging in other project types, i.e. versioning/upgrading an application with a data store, etc. This entry will be the start of my thread on data profiling; subsequent posts will have details and examples from past and current projects as well as some script and report downloads.

Now let's get to our initial question on what is data profiling -- focusing on what data profiling is with respect to a business intelligence/reporting project. If read Kimball link above, one of the major deliverables of profiling he outlines is project feasibility. It sounds simple, and it would seem that everyone would know to do this, however, I have seen a large number of projects proceed to the development phase, have full warehouse and reporting table designs with partial ETL, only to realize that the most important domain data did not meet the project criteria. In a reporting project, the customer is asking you to build a way to measure a process – either because they do not currently have measurements or they want to see the metrics faster, more efficiently … the list goes on. If you have properly mapped out the process or processes the customer is trying to report on, then it you will readily be situated to delve into the data to see if you can build the requested metrics.

This step is one of those that is better seen than described – I will use the AdventureWorks database as a backdrop, describe a process we want to build some reporting around, and show how we might do some basic profiling on the data to determine what, if any, actions that will need to be taken.

For anyone still unsure of where you do your profiling, I will reiterate that this is a planning step – you want to be able to either exit early if the project is not feasible, or know how your scope is increasing/decreasing given the data environment. The work you do may feel like development, but it is essential for building your timeline, particularly for warehousing and intelligence projects where the ETL design will depend on the source data and what is contained within.

I will be following this post up with the concrete examples, please feel free to e-mail or comment.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: November 11, 2007, 18:27 by Sid

Just a quick footnote to go with the SQL Server 2005 Proxies: An Intro posting. It is important to note that any SSIS package that you will use the proxy configuration on, you will need the underlying credential account to have access to the WINNT\temp folder on the SQL server -- if you have script tasks in the package. During execution, a temp compiled file is created that represents the run time executable for your code, and that file is created in the WINNT\temp folder.

Hope that saves some people time!

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: November 7, 2007, 10:01 by Sid

I received a couple of questions following my post: Extracting SharePoint Data using SSIS, about how to set the package to run properly given that many WSS lists are secured by NT Group or list of accounts. This led right into a discussion on security and what you could now do in SQL 2005 that was not available in SQL 2000. Both SQL 2000 and 2005 use a proxy account to access resource and run the T-SQL subsystem, however in 2005, you can break out permissions more granularly. The screenshot below shows the different subsystems that are available in a typical SQL 2005 server:

Why go over this? To achieve our goal of pulling the list data from our WSS site and saving it as an XML file, we needed our scheduled job to execute with the proper permissions; if you remember, the line in our script task that set our credentials will inherit the credentials of the executing account – which by default is the SQL Agent Service Account. In my case, the process account running the SQL Agent service does not have permission to our WSS list. To get our script task to perform properly, one of my choices would be to request access to the list for the SQL Agent process account – this would work fine in some situations, however in larger environments, you end up with one account that has access to many, many different bits of information and you have not tiered information access and security in any way. What I will walk through is setting up a proxy that an SSIS package can use during execution in a scheduled job. Our end state will be the Run as selection box below having more than one option:

Our first step will be to create Credentials that our proxy will use. To do this using SMS, under ServerName > Security > Credentials, right click and select New Credential. Fill in the information for the account you will be using. You can also create the credential from SQL script: CREATE CREDENTIAL MyNewCredential WITH IDENTITY = 'TestUser', SECRET = 'TestUserPassword'.

After you have create the credential, in SMS expand the options below SQL Server Agent > Proxies. You will see the list from Figure 1. We are going to create a proxy for SSIS Package Execution: right click and select New Proxy. Notice that in the New Proxy Account window that you can select multiple SQL subsystems for a given proxy.

After you create the proxy, we will now have our selection in our Job Step as we stated in our goal. Note that the selections you made for which subsystems the proxy is active for will also determine when it shows in the Run As box below; in this example, we gave our proxy access to the SQL Server Integration Services Package subsystem, and thus we get the proxy as an option below as they Type matches. If the proxy does not have access to a subsystem that you have chosen as your type in the Job Step, then it will not appear as an available option for you.

This covers the basics on how and why you might want to create a proxy; be sure to post if you have any more detailed questions.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5