Posted on: October 30, 2007, 16:44 by Sid

One important follow up note -- in my post Extracting SharePoint Data using SSIS, I use a script task to do the bulk of the work. If you follow this example and extend it to use variables (or any script task using variables), there is an important and often overlooked step to enable the script task to use your package variables. On the Script Task editor, on the Script Tab, there are two properties that you can set:

  • ReadOnlyVariables
  • ReadWriteVariables

It is easy to overlook these two properties, both take a comma separated list of the variables that are in scope for the script task (another point, make sure your variables have proper scope, which trips a lot of developers up when trying to have certain control flow items access a variable). Here is a screen shot of what a set variable on your script task looks like:

With ListViewGUID in the ReadOnlyVariables property, I can now access this item inside my Script Task:

spView = Dts.Variables("ListViewGUID").Value.ToString()

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: October 25, 2007, 04:04 by Sid

!!!!   Follow-up to this posted September 19, 2008: SQL 2008 SSIS Adapter for SharePoint from Microsoft Dev team posted to Codeplex !!!!

 

SharePoint and WSS are fantastic in that almost everything is driven by Web Services. For the data and reporting gurus, our job is a lot easier when it comes to extracting data from the mini-applications and workflows that the business community creates inside a WSS or SharePoint Portal framework – though not completely straight forward, so that is what we’ll discuss today: using SSIS to extract data from WSS lists using the out of the box WSS web services.

 

In this example, we have a user community that has data in a WSS list that needs to be extracted so it can be reported on. The data in this list represents a collection of information about a business process.

 

One of our major limitations in using out of the box SSIS features is the Web Service Task. If you look at Books Online, article: http://msdn2.microsoft.com/en-us/library/ms140114.aspx, the first note indicates that the Web Service task supports primitive types only. If you look at the WSDL for our particular method we’ll be using, GetListItems, you can see from the WSDL below that we will have a problem:

 

To get around this we’ll use a proxy since the Web Service task is the only item currently that can use a web reference. For our first step, we will use the same tool the VS 2005 uses when you add a web reference to a web or windows project: wsdl.exe. We’ll create our class by executing the following from the command line (wsdl.exe should be located in your C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin directory):

 Wsdl.exe /language:VB /out:SharePointListService.vb http://SharePointURL/TargetSite/_vti_bin/lists.asmx?wsdl 

Note that you will need to do this again for any sub-site that you are targeting for your import as part of the WSDL description is the location of the service.

 

Our SSIS package will have three simple control flow steps:

-         Script Task

-         XML Task

-         Data Flow Task

Script Task

After you run this command, open your newly created vb file (I normally code C#, but the SSIS script task only uses VB). In your script task, we are going to make a new class reference. Right click on the solution, go to Add > Add Class. You can use the same name as newly created file from wsdl.exe or something that will more appropriately fit given your project. Once you create the new class file, erase any text/code automatically generated and then paste the content of your wsdl.exe outputted file into this code file in your project. Add references to the System.XML and System.Web.Services.

After you have completed this step, you are ready to consume your new class and mimic the Web Service SSIS task.

 

Here is sample code from the default ScriptMain file. My code in this sample uses the proxy class Lists (generated by wsdl.exe) to call the service, pass the necessary parameters and then save the output to an xml file. Note that for the view name for a list, you have to pass the GUID. It seems that the method was not set to return values using the common view name.

One of the easiest ways to grab the GUID is to go to modify the view you want to use to pull the data from and look in the URL, the portion of the query string after View= has the GUID for this view.

Another way to do this is to use the Web Service Task against the Views web service in WSS/SharePoint, call the method GetViewCollection, pass the List name as a parameter and you can get an XML output with all the views and GUIDs associated with a list item.

 

After implementing the code above, your Script Task is complete and your output will be an XML file of all the data from the list given the view you hit. An important point to note is to set your rowLimit parameter. The others can be blank or null, but if the rowLimit is not set, then by default you will only get the data that displays in the default view paging size, which is usually 100 records. By setting it to a figure larger than the collection, you will get the entire collection of records.

 

XML Task

 

The SSIS XML Source in the Data Flow section of your project does not like XML files that have multiple namespaces, which is almost a guarantee when you are retrieving data from a SharePoint List.

 

The XML Task is pretty basic in that it strips the extraneous namespaces from the XML file. I got this from Ashvini Sharma’s blog when trying to solve the same problem in hitting RSS feeds for a real estate project. Here is the link to his post:

http://sqljunkies.com/WebLog/ashvinis/archive/2005/05/25/15654.aspx

 

Right click on the XML Task, choose edit, and then we will set the properties on the task as such:

 

 

Under the SecondOperand property, paste the following XSLT:

 

 <?xml version="1.0" encoding="utf-8" ?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="no" />

<xsl:template match="/|comment()|processing-instruction()">

<xsl:copy>

<xsl:apply-templates />                        

</xsl:copy>           

</xsl:template>           

<xsl:template match="*">                                    

<xsl:element name="{local-name()}">                       

<xsl:apply-templates select="@*|node()" />                       

</xsl:element>           

</xsl:template>           

<xsl:template match="@*">                       

<xsl:attribute name="{local-name()}">                                   

<xsl:value-of select="." />                        

</xsl:attribute>           

</xsl:template>

</xsl:stylesheet>

 

I have the FileConnection set to the file that my first task outputted, and the operation saves the results so that I end up with a clean version of my outputted WSS/SharePoint list data.

 

 

Data Flow Task

 

Setting up the import can be one of the more painful tasks, depending on the number of columns you have on the view and list your are hitting. The XML that you will have after pulling the data will have GUIDs for all of the user defined columns in your list, which probably is where most of your data is.

To know exactly how the GUIDs in your XML export map up to columns on your list, you can export the schema for SharePoint. In the Lists web service, hit the method GetList and pass it the name of the List you are using. Your output will have the schema of the list. Here is a sample for one column:

 

 <Field ReadOnly="TRUE" DisplayName="TestColumn" Name="xd__x007b_8CEEC711_x002d_0B7C_x002d_4C6A_x002d_9F82_x002d_5584303A997F_x007d_" XName="{8CEEC711-0B7C-4C6A-9F82-5584303A997F}" Node="/my:myFields/my:TestList/my:TestColumn" Type="Text" ColName="nvarchar18" /> 

The XName element has the GUID for the column that will show up in the data export. You can use the output from the GetList method to build mappings for your export.

 

Your Data Flow Task at a minimum will have two objects: an XML Source and your OLE DB Destination. I would recommend putting a Data Conversion object between the two – that way you can save on space on the insert into the database and have a way to redirect failed rows for analysis should you get something really strange in your incoming dataset.

  

That rounds it out; please feel free to post any questions.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: October 12, 2007, 20:47 by Sid

Derived Columns are a great feature in adding new data or metadata to your transforms in SSIS. In this example, I will show you how I used the Slowly Changing Dimension transform against the Customers table in the Northwind database to build a Type II transform. There are several choices you have when marking your active records; my preference is to sue three columns: a start date, and end date and an active flag.

In the SSIS SCD Wizard, you will get this screen in choosing how to mark records that are in transition. I use the wizard to set my bit flag and will later show you how I use the derived columns to set the dates:

 Fig. 1

Here's a quick screen grab of the entire Data Flow Task, our first step is to modify the item highlighted:

 Fig. 2

Which we will set thusly; note the second entry for RecordStartDate, which is my manual input. The wizard handled adding the derived column information for the IsCurrent column:

 Fig. 3

And for the final step, we edit the OLE DB Destination object (called "Insert Destination" in transform):

Fig. 4

Next, we'll edit the secondary branch in the SCD transform -- the branch that deals with changed records. From the overall transform map in Fig. 2, this is the branch on the right side of the picture. The Derived Column object details will look the same as Fig. 3 except that you will name the column RecordEndDate. After editing the Derived Column object, you will next modify the OLE DB Command object, which in Fig. 2 is very blandly called OLE DB Command. There are two places that you will edit this, the first being on the Component Properties table, under the SQLCommand property. You will then set the SQL as in Fig. 5:

After editing the SQL, go to the Column Mappings tab. Here is what it will look like before you edit:

And here is the after picture:

The parameter mappings are done in order of appearance in the SQLCommand property, thus CustomerID moves to the last parameter and RecordEndDate takes over the second spot.

And that's it! You have not set your Slowly Changing Dimension transform to use three columns to indicate the state of a Type II record. I find that using the bit flag for indicating a record's active state works great for performance when querying the system to always get current attributes for a record set. The dates allow for flexibility in looking at point in time states for attribute records around your fact tables -- particularly good for processes you are reporting on that require auditing. Feel free to post any questions/comments.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Posted on: October 3, 2007, 23:19 by Sid

I'm re-purposing my Dell Dimension E510 and trying it out as a full media center machine using Windows Vista Ultimate. I have Vista Business on my work laptop currently (mixed reviews in using development tools and VPN connections). I'll post how the work goes and what issues I run into in making it work for recording shows, using surround sound, T.V. and general media use.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5