Gather stats from SharePoint

I was not actually looking for this, but that does not mean it is not incredibly useful.

I was actually looking for what was available from the PerformancePoint Team blog in the way of updates to the SDK documentation, or heck, an SDK versus just a single page in the MSDN canon, when I came across this code sample posted on code.msdn.microsoft.com (what the real difference between this site and codeplex.com is, I do not know, but it gets confusing having both).

This code sample, titled "Gather information from SharePoint to a database for reporting", does not come with a lot of documentation or instruction, but by odd coincidence, it seems I found this on its publishing day – today, May 11. The code sample consists of two parts, the C# console app project and the database.

To get this tool up and running:

  • Download the project file for the app and the database
  • Unless you are in France or have French collation set up on your server or personal machine, I would recommend hand editing the script to remove that language collation……………Or just do what I did and create a database called Stats and used the BAK file included to restore to (the DDL does not have any table create scripts?)
  • Start up VS 2008 and open the unzipped project. From here you can either compile so you have an executable or debug through it and watch it run

One important note: This app is not yet configured for remote execution, so it has a dependency on being run directly on the SharePoint server in order to gather stats. SQL Server configuration is just a connection string in the app.config, so that can be a different box/remote server.

This handy little app produces quite a few usage statistics on a SharePoint farm, with the most rich being at the site level. Be interesting to see where this goes next, I have no immediate need for it, but I figured it is interesting enough of a find to post and share!

Tags: , , , ,
Categories: .NET | SharePoint

May 11, 2009 20:23 by Sid
E-mail | Permalink | Comments (2) | Comment RSSRSS comment feed

Decoding IBM DB2

I'd almost mark this to a new category titled: "Trivia".

At least, that's what it feels like – particularly when it came to finding out how to do this.

Scenario: Client has a financial package running on an AS/400, IBM OS v5 r4, which has DB2 for the iSeries installed. We are building a lightweight application to do some gap filling on this package so some basic reporting can be done while they wait to move to the new one. New data will be stored in SQL Server, however, necessary key descriptions and financial data points live in the AS 400 – and no one on the staff wants to maintain an SSIS package to pull over just a little bit of data.

So what to do? Connect directly to the DB2 database via .Net and surface up the few necessary data points in the application. Sounds simple, right?

Ah – the other bit in the scenario - -we need to do this using .Net OleDB versus the IBM native .Net driver so we don't have to configure the production web servers (also being upgraded soon).

Still sounds like it should be simple though – and connecting is.

And that is where you will probably come across text and other variable character data that comes over as binary. This is where the fun begins..

DB2 has some interesting data types for storing text data (reference this link for supported sql data types for the DB2 .Net data provider), and when you run across these in work with DB2, it can be frustrating trying to translate the binary that is coming over into usable text.

One of the keys to resolving the mystery is figuring out what code page the IBM machine is using – in many cases this can be a little known (now a days) and IBM only code page lovingly titled "Extended Binary Coded Decimal Interchange Code" or EBCDIC. In using C# to decode the binary into text, this is a crucial bit of information.

In this instance, I am using EntLib, and the connection is pretty much the same as any DBMS. Below is the pertinent code that decodes the record:

using (IDataReader reader = db.ExecuteReader(System.Data.CommandType.Text, commandtext))

{

while (reader.Read())

{

byte[] b = (byte[])reader[0];

//here is where we tell the Encoder what code page we need

Decoder ebcdicDecoder = Encoding.GetEncoding(37).GetDecoder();

//initialize a char array that will hold the decoded output

char[] c = new char[ebcdicDecoder.GetCharCount(b, 0, b.Length)];

//call into the decoder and get the decoded char array

ebcdicDecoder.GetChars(b,0,b.Length,c,0);

//convert to a string if necessary

string accountCodeId = new string(c);

//do whatever else you want to do with the results now

}

}

When I was first getting the byte array back, a coworker and I were puzzled at the values in the byte array, as they did not match up to most U.S. character sets we would have expected to see—then we stumbled across Wikipedia's entry for IBM code pages (and EBCDIC in particular) and the pieces started to come together. We were able to look at the integer values coming across in the byte array and see in the code table what the text values were – and that intelligible values were in those arrays.

The next bit was finding what code page on the Microsoft side to use to decode the byte array – which did not turn out to be as easy as I expected, but in the end, I stumbled across this MSDN entry on Host Integration Server 2000 and EBCDIC code page support. With this last bit (the 37 you see in the code on Encoding.GetEncoding(37), I was finally able to get the records translated.

Hopefully this will save some other time, headache and pain pill abuse – please feel free to e-mail me if you have any questions about this or leave a comment!

Tags: , , ,
Categories: .NET

April 23, 2009 06:49 by Sid
E-mail | Permalink | Comments (12) | Comment RSSRSS comment feed

Extending the Data Profiling Task

 

SQL Server 2008 RTM has been out a while now and yet, I am just now making my way to using the Data Profiling Task, new to SSIS in this release. Time and client demands are always a factor governing when and what I get to look at, however I had read up on the features and was hoping that what I had read was not all the new task could do.

The new task is a good start, but unfortunately, it is a little more limited than I would have hoped. Fortunately though, with the SSIS APIs open to us, we can use the class object as we please.

So first -- what did I find limiting about the Data Profiling Task?

It appears that its intended design was to serve as a checkpoint during loads/transforms and that conditional logic can be applied based upon results; this is incredibly useful. My major concern is with activities that happen before the design phase – and quite directly, before any development begins at all.

Reference this great little tip from Kimball here on data profiling and its necessity according to his framework – profiling is tantamount to understanding the system that you are getting ready to work with in the course of a project.

To get to the heart of what I wanted to see the Data Profiling task do: I want to use it to generate an aggregate view of a given system so that I can view the results and learn more about the data within the database. Many of us have been given a database, custom built or extended far from the original software vendor's spec, with little documentation.

Even for those who are not preparing to move this data for warehousing or reporting, anything that provides a reference to what the data is, how it is stored, how unique, etc. is a boon for developers.

The Data Profiling task can deliver this data -- but only one table at a time?

There are no expression settings or variables that you can use to set which table and where the output goes. For two of the profiling tasks, I can see this as a bit of a problem – both the Candidate Key and Value Inclusion profile do not lend themselves to being easily generated dynamically via iterating over all the tables and views in a database.

But all the others are useful in this aggregated manner, so why, if I am looking at a client database that has 96 tables, must I build 96 tasks via Visual Studio and still not end up with an aggregate report (just 96 xml files)? I would like to be able to run this on the entire database, and start looking at the hotspots or data profile results that are the fat tails in the distribution chart.

My solution is to build the SSIS package programmatically, run it and then read and aggregate the results. I still need to clean up the solution a bit and finish some unit tests, so I will post it soon for everyone to download and enjoy.

My envisioned endgame for this project is for the output to be configurable for user desired thresholds and for that output to render in HTML or Word – still a little ways away from that, but I have the bulk of the SSIS work done. If anyone is interested in contributing, let me know and I will see about getting this on CodePlex.

Notes for anyone who wants to work with this project or attempt this on their own:

  • This effort makes use of the Microsoft.DataDebugger.DataProfiling class, which is not necessarily a supported API. An example of using this class is on the SSIS Team Blog here.
  • DLL references necessary for this project come from multiple places. For example, the Microsoft.SqlServer.DataProfilingTask is in ~\Program Files\Microsoft SQL Server\100\DTS\Tasks and Microsoft.SqlServer.DataProfiler is in ~\Program Files\Microsoft SQL Server\100\DTS\Binn
  • It is a good idea to read up on building a package programmatically on MSDN as there are class objects, such as TaskHost, which are not immediately intuitive as they are not seen when using the designer
Tags: , , , ,
Categories: .NET | SSIS

March 31, 2009 09:19 by Sid
E-mail | Permalink | Comments (4) | Comment RSSRSS comment feed

Dynamically remapping source to destination in SSIS

This all started off in trying to help out a fellow SSIS enthusiast, and now I have made a complete project as a sample….. catch the MSDN forum thread here.

There are many reasons why you would want to remap source columns to destination at runtime – I would not necessarily recommend this type of solution for some of your larger and more impactful data imports, but there are many cases where being able to modify the mapping metadata at runtime can be incredibly useful.

I have posted the project example in the files directory on my blog, you can download the Visual Studio 2008 project here – keep in mind that I did this in VS 2008 Team System Architecture Edition, so if you don't have this version, then you might have an issue with the unit test project attached.

This should not pose a problem for many though, as the only reason I threw the unit test in there was to provide a simple and lazy way to call into the class and perform the remapping.

This example demonstrates the following:

  • Load an existing SSIS package and indentify the source and destination components
  • Instantiate the source and destination components to read their column metadata
  • Remap a source column to a destination column
  • Save the modified SSIS package

When you open the SSISExamples solution, build it, and then either copy the DynamicRemap.dtsx file from the bin directory to your c:\ root or modify the PackageToLoad string in the CreatePackage class.

Once you call into the CreateAndRunPackage method, you will end up with a copy of the package in your root named PackageModified.dtsx.

Please feel free to e-mail me with any questions. I'll try to make a few improvements to the example when I have free time over the coming holiday, but for now, enjoy!

Tags: ,
Categories: .NET | SSIS

November 14, 2008 11:21 by Sid
E-mail | Permalink | Comments (2) | Comment RSSRSS comment feed

C# Forms and Controls Threading……. C#?

Yes, it is odd for me to post an entry on this blog covering thread safe ways to talk to windows user controls in C#, but I was surprised at how much trolling I had to do on Google/MSN in order to piece together a complete answer, so I figured I would share my complete and final(for now) solution.

In this particular situation, I have a windows form with a tab control, with one of the tabs hosting a text box control that displays messages. This control is not initially visible to the user, yet it is instantiated during the main form construction.

The first issue to resolve was to get the logging class and other components to make thread safe calls to this control. Microsoft has a great tutorial posted about making thread safe calls to Windows forms controls, so I won't repeat that code or explanation here.

Where I had to step beyond the MSDN article was in handling the thread safe calls to a form whose handle property may or may not be created. I am not a Windows GDI expert, I barely know what it means, but one thing I did find out during this process is that I could not guarantee when the handle for the control was created, even though its constructor was called by the containing form constructor during the application start.

To handle this, there were two steps I needed to take:

  1. Ensure that the control's handle was instantiated and available on the same thread as the main UI. To do this, I simply added this line to my code following the construction of the user control on the main form: IntPtr userControlHandle = textUserControl.Handle; This ensures that the control's handle is not created on the background thread when you access properties or methods on the control in other parts of your application
  2. I next added these lines of code to my user control:

    delegate void SolveThreadingDelegate();

    public void UserControl()

{

if (this.InvokeRequired)

{

this.Invoke(new SolveThreadingDelegate (UserControl));

}

else

{

}

}

The logging pounds this control pretty well and so far I have gotten around some of the previous locking issues I experienced – it now handles well thousands of messages an hour flying at it (not all displayed on the screen at a time, but still a constant stream of information) and I have yet to have a cross thread access violation pop up.

Hopefully this helps anyone else stuck in a similar situation; if anyone has improvements, suggestions or questions, please feel free to post.

Tags: , ,
Categories: .NET

October 3, 2008 16:25 by Sid
E-mail | Permalink | Comments (4) | Comment RSSRSS comment feed

Sponsors

Author

Tags

Recent Posts