Calender
<<  January 2009  >>
MoTuWeThFrSaSu
2930311234
567891011
12131415161718
19202122232425
2627282930311
2345678
Wire Up

This all started off in trying to help out a fellow SSIS enthusiast, and now I have made a complete project as a sample….. catch the MSDN forum thread here.

There are many reasons why you would want to remap source columns to destination at runtime – I would not necessarily recommend this type of solution for some of your larger and more impactful data imports, but there are many cases where being able to modify the mapping metadata at runtime can be incredibly useful.

I have posted the project example in the files directory on my blog, you can download the Visual Studio 2008 project here – keep in mind that I did this in VS 2008 Team System Architecture Edition, so if you don't have this version, then you might have an issue with the unit test project attached.

This should not pose a problem for many though, as the only reason I threw the unit test in there was to provide a simple and lazy way to call into the class and perform the remapping.

This example demonstrates the following:

  • Load an existing SSIS package and indentify the source and destination components
  • Instantiate the source and destination components to read their column metadata
  • Remap a source column to a destination column
  • Save the modified SSIS package

When you open the SSISExamples solution, build it, and then either copy the DynamicRemap.dtsx file from the bin directory to your c:\ root or modify the PackageToLoad string in the CreatePackage class.

Once you call into the CreateAndRunPackage method, you will end up with a copy of the package in your root named PackageModified.dtsx.

Please feel free to e-mail me with any questions. I'll try to make a few improvements to the example when I have free time over the coming holiday, but for now, enjoy!

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: November 7, 2008, 11:16 by Sid

I've been trying to be a good citizen and post more in the SQL Server section of the MSDN forums, and so I came across a question in the SQL Server > SSIS section about start-up performance that made me realize not a lot of people are aware of how a package initially executes.

Since Kirk Haselden was my original source for some of this information so I will point to his post, Caching in The SSIS Service, for anyone to reference.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: September 19, 2008, 13:07 by Sid

 

Kevin Idzi of Microsoft was kind enough to leave a comment on my Extracting SharePoint Data using SSIS post informing me of Integration Services enhancements posted on CodePlex.

I had been delaying posting as I wanted to have a full sample showing it in use, but between work and buying a new house recently, I have not had the time to put together my virtual machine images for a complete demo.

The community sample has the following components available to download (in separate install packages):

  • XML Destination Pipeline component
  • Regular Expression Flat File for regular expression based flat file parsing
  • Delimited Flat File Parser, for files with rows that are missing column fields
  • Package Generation Sample
  • Hello World Sample for demonstrating using the UI API
  • SharePoint List Source and Destination – this component was written by Kevin
  • WMI Source Component

All of these components are available for download here.

The base requirements for installing these components are .NET 3.5 and SQL 2008 Integration Services.

Once you run the MSI, you will be able to select the two new adapters from the "Choose Toolbox Items" dialogue:

One of the nice features that Kevin included in this adapter is the ability to execute CAML queries against the SharePoint List:

 

 

If you do not know how to write CAML or are not good at it like I am, then use the U2U CAML Query Builder written by Patrick Tisseghem; I've got a quick tutorial here: CAML Query Builder.

If you have any feedback about using this SharePoint List Adapter with SSIS, please be sure to leave comments or suggestions on the CodePlex site for Kevin and his team.

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: June 23, 2008, 15:10 by Sid

It is hard enough to manage your data repository, modify it to meet new business needs, and add to that: keeping documentation and metadata up to date and relevant for your consumers.

Several friends of mine who are Informatica fans let me know on a repeated basis that the data manager in the Informatica Suite helps them out here; while functional; I was never fully impressed with the feature set. I showed my colleagues the SSIS API and they were blown away with how much untapped information sat inside SSIS.

The next question – why hasn't Microsoft put any of the metadata features directly into the SSIS tool?

While I cannot answer that, I do know that Microsoft has a very active user community and has supported this exact type of development. It is, in fact, nothing new and you can get the SQL Server 2005 Metadata white paper from Microsoft off of the download center, available from December 19, 2005 forward.

The trick though is the samples – for some reason, they seem to have an issue in keeping the sample projects online, so I have made it available from my site as well so insure that it is available to the general public. (See below)

What the MSI contains:

  • DependencyAnalyzer.exe – Tool that evaluates and loads into a database the lineage on SSIS packages, Analysis Services and SQL Server. All the source code for this program is provided.
  • DependencyViewer.exe – A tool that lets you graphically see the dependencies and lineage of objects in the lineage repository. Source code is provided for this program.
  • Data Source View – A DSV that connects to the lineage repository (SSIS META database) that can be used by Reporting Services.
  • Lineage Repository – A database called SSIS_META that can be used to house metadata from nearly any system.
  • Reports – Some standard reports for impact analysis studies. You will find two key reports out of the box with several sub-reports.
  • Report Model – A report model that you can use with Report Builder to allow end-users to create ad-hoc reports.
  • Integration Services Samples – A few sample packages to start auditing and viewing lineage on.

For anyone who reads from the Kimball line, the ideas presented in this paper look familiar, particularly if you have read the Microsoft Data Warehouse Toolkit book. In chapter 13 of this book, they discuss a metadata plan for a warehouse -- the website has extra materials: sample scripts that make use of the extended properties and reporting services reports to pull out metadata repository information.

I encourage reading both the whitepaper and the book, particularly the Kimball book as the Whitepaper assumes that you know the importance of metadata in your warehouse. The Kimball Microsoft Warehouse book is great in that it takes the concepts that they have developed over the years and applies it directly to the Microsoft stack.

SQL 2005 Metadata Samples (.msi, 311 kb)

Feel free to contact me for any questions. Thanks!

Currently rated 3.0 by 1 people

  • Currently 3/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: February 29, 2008, 17:58 by Sid

Yes, it has been too long since I last posted – My wife and I are in the middle of selling our house, so it has been a busy month doing little projects here and there. For anyone doing ETL projects or working on a BI project and in the ETL phase, auditing is a major part of your work. In this instance, I am specifically looking at physical statistics on a table in my database. Here is a simple schema for a table to track these statistics:

CREATE TABLE [dbo].[TrackTableProcessLog]

(       [TableProcessKey] [int] IDENTITY(1,1) NOT NULL,       [ProcessLogKey] [int] NOT NULL,       [Object] [varchar](250) NOT NULL,       [ObjectDatabase] [varchar](75) NOT NULL,       [CountExtractRows] [int] NULL,       [CountInsertRows] [int] NULL,       [CountUpdatedRows] [int] NULL,       [CountDeletedRows] [int] NULL,       [CountErrorRows] [int] NULL,       [CountInitialRows] [bigint] NULL,       [CountFinalRows] [bigint] NULL,       [RecordInsertDate] [datetime] NOT NULL CONSTRAINT [DF_TableProcessLog_RecordInsertDate]  DEFAULT (getdate()), CONSTRAINT [PK_TrackTableProcessLog] PRIMARY KEY CLUSTERED (       [TableProcessKey] ASC)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)

As you can see, I am tracking counts on several different items for my table objects in my database. Most of the counts are derived directly from the ETL process, inside my SSIS package, however, two of the counts are not derived from the ETL action itself: CountInitialRows and CountFinalRows. Both of these are counts of the rows in my table object before and after the ETL operation – I need this to be exact as this table is used for auditing purposes and all the numbers need to line up properly (or not, so that I can be alerted of an issue with the ETL process).

Thankfully, with SQL 2005, there are several ways to readily grab these figures, even on very large tables, without resorting to “select count(*) from …”. I will illustrate the several methods available, and you can pick which one you like best. Just as a note, each method returns the same result – I have run large data operations against some target tables and test that each method returns the same answer, so thus far, any or a combination thereof should return the same result.

First, let’s look at the system stored procedure: sp_spaceused. In executing this stored procedure with a table object as the parameter, your output will be: name, rows, reserved, data, index_size and unused. In this case, I just need rows, so while the other information is nice, it is unnecessary. The procedure also performs some other steps, so if I am shooting for efficiency in my data movement processes, then I want to eliminate any extraneous querying.

If you break open sp_spaceused, you will find that it uses the dynamic management view sys.dm_db_partition_stats to deliver the row metric. If you want to use this dynamic management view, you will need to be sensitive to the fact that a single table object can live on multiple partitions – and your indexes can live on other partitions as well. To query this table, you will need to sum the row_count column for a given object_id and where the index_id value is less than 2; this is necessary to exclude counts from any non-clustered indexes related to a table object.

Row count data is also available in sys.sysindexes – to get this data you will filter on the id column = object_id of your target object, however this table is not necessarily purposed for this information, so while possible, I would recommend using one of the other methods.

The last way to conveniently access row count information is through sys.partitions table. The filter for this data is the same as the dynamic management view; you will need to sum across all rows for a specific object_id and include only rows where the index_id is less than 2.So all in all, here are several methods to use to retrieve row count information for your table object. Each method provides sub-second response times for my ETL operations, allowing me to conveniently gather statistics about my tables as I populate my warehouse.Please feel free to post or e-mail with any questions!

References:

Sys.partitions: http://msdn2.microsoft.com/en-us/library/ms175012.aspx

Sys.dm_dm_partition_stats: http://msdn2.microsoft.com/en-us/library/ms187737.aspx

Table and index organization: http://msdn2.microsoft.com/en-us/library/ms189051.aspx  

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: January 23, 2008, 13:01 by Sid

Recently a package of mine with a good running history suddenly started throwing errors. This package had been scheduled and running just fine for over a month – no changes to the code had been made. The error is consistently thus:

The script threw an exception: A deadlock was detected while trying to lock variables MyVariables for read access and variables MyVariables for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out.Task MyTask failed.

Luckily, I am not the only one who has run into this issue, and it has now inspired a new practice for me in designing Script Tasks inside my SSIS packages. Previously in one of my posts, I had shown the following screenshot and described the standard method for using read only and read/write variables in a script task:

For many situations, this will probably work out just fine. However, my variable locking issue has evolved from my use of a Foreach container that holds my Script Task. There are other examples in my package store not involving a Foreach that also throw this error, however all of my packages that consistently throw this error all have in common the use of a Script Task inside a repeated process – for some reason, SSIS is not properly releasing the variables between runs of the step.

Daniel Read has written a detailed method on how to work around this; you may find his post here: http://www.developerdotstar.com/community/node/512/

Jamie Thomson also has a post over the same issue – he actually has some environmental information on what updates/patches could have been the culprit, but it is not entirely conclusive. Here is his post: http://blogs.conchango.com/jamiethomson/archive/2007/08/28/Beware-of-variable-usage-in-script-tasks.aspx

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5