SharePoint and WSS are fantastic in that almost everything is driven by Web Services. For the data and reporting gurus, our job is a lot easier when it comes to extracting data from the mini-applications and workflows that the business community creates inside a WSS or SharePoint Portal framework – though not completely straight forward, so that is what we’ll discuss today: using SSIS to extract data from WSS lists using the out of the box WSS web services.
In this example, we have a user community that has data in a WSS list that needs to be extracted so it can be reported on. The data in this list represents a collection of information about a business process.
One of our major limitations in using out of the box SSIS features is the Web Service Task. If you look at Books Online, article: http://msdn2.microsoft.com/en-us/library/ms140114.aspx, the first note indicates that the Web Service task supports primitive types only. If you look at the WSDL for our particular method we’ll be using, GetListItems, you can see from the WSDL below that we will have a problem:
To get around this we’ll use a proxy since the Web Service task is the only item currently that can use a web reference. For our first step, we will use the same tool the VS 2005 uses when you add a web reference to a web or windows project: wsdl.exe. We’ll create our class by executing the following from the command line (wsdl.exe should be located in your C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin directory):
Wsdl.exe /language:VB /out:SharePointListService.vb http://SharePointURL/TargetSite/_vti_bin/lists.asmx?wsdl Note that you will need to do this again for any sub-site that you are targeting for your import as part of the WSDL description is the location of the service.
Our SSIS package will have three simple control flow steps:
- Script Task
- XML Task
- Data Flow Task
Script Task
After you run this command, open your newly created vb file (I normally code C#, but the SSIS script task only uses VB). In your script task, we are going to make a new class reference. Right click on the solution, go to Add > Add Class. You can use the same name as newly created file from wsdl.exe or something that will more appropriately fit given your project. Once you create the new class file, erase any text/code automatically generated and then paste the content of your wsdl.exe outputted file into this code file in your project. Add references to the System.XML and System.Web.Services.
After you have completed this step, you are ready to consume your new class and mimic the Web Service SSIS task.
Here is sample code from the default ScriptMain file. My code in this sample uses the proxy class Lists (generated by wsdl.exe) to call the service, pass the necessary parameters and then save the output to an xml file. Note that for the view name for a list, you have to pass the GUID. It seems that the method was not set to return values using the common view name.

One of the easiest ways to grab the GUID is to go to modify the view you want to use to pull the data from and look in the URL, the portion of the query string after View= has the GUID for this view.
Another way to do this is to use the Web Service Task against the Views web service in WSS/SharePoint, call the method GetViewCollection, pass the List name as a parameter and you can get an XML output with all the views and GUIDs associated with a list item.
After implementing the code above, your Script Task is complete and your output will be an XML file of all the data from the list given the view you hit. An important point to note is to set your rowLimit parameter. The others can be blank or null, but if the rowLimit is not set, then by default you will only get the data that displays in the default view paging size, which is usually 100 records. By setting it to a figure larger than the collection, you will get the entire collection of records.
XML Task
The SSIS XML Source in the Data Flow section of your project does not like XML files that have multiple namespaces, which is almost a guarantee when you are retrieving data from a SharePoint List.
The XML Task is pretty basic in that it strips the extraneous namespaces from the XML file. I got this from Ashvini Sharma’s blog when trying to solve the same problem in hitting RSS feeds for a real estate project. Here is the link to his post:
http://sqljunkies.com/WebLog/ashvinis/archive/2005/05/25/15654.aspx
Right click on the XML Task, choose edit, and then we will set the properties on the task as such:
Under the SecondOperand property, paste the following XSLT:
<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no" />
<xsl:template match="/|comment()|processing-instruction()">
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*|node()" />
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="." />
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
I have the FileConnection set to the file that my first task outputted, and the operation saves the results so that I end up with a clean version of my outputted WSS/SharePoint list data.
Data Flow Task
Setting up the import can be one of the more painful tasks, depending on the number of columns you have on the view and list your are hitting. The XML that you will have after pulling the data will have GUIDs for all of the user defined columns in your list, which probably is where most of your data is.
To know exactly how the GUIDs in your XML export map up to columns on your list, you can export the schema for SharePoint. In the Lists web service, hit the method GetList and pass it the name of the List you are using. Your output will have the schema of the list. Here is a sample for one column:
<Field ReadOnly="TRUE" DisplayName="TestColumn" Name="xd__x007b_8CEEC711_x002d_0B7C_x002d_4C6A_x002d_9F82_x002d_5584303A997F_x007d_" XName="{8CEEC711-0B7C-4C6A-9F82-5584303A997F}" Node="/my:myFields/my:TestList/my:TestColumn" Type="Text" ColName="nvarchar18" />
The XName element has the GUID for the column that will show up in the data export. You can use the output from the GetList method to build mappings for your export.
Your Data Flow Task at a minimum will have two objects: an XML Source and your OLE DB Destination. I would recommend putting a Data Conversion object between the two – that way you can save on space on the insert into the database and have a way to redirect failed rows for analysis should you get something really strange in your incoming dataset.
That rounds it out; please feel free to post any questions.
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5