SharePoint 2010 Excel Services and SSIS

Tuesday, January 7, 2014
by jsalvo

As increasing amounts of data are stored in SharePoint in various formats such as lists, Excel documents and Word documents, gaining access to this data is an on-going challenge for Business Intelligence developers.  Fortunately, SharePoint 2010 provides some nice features that make accessing this data a bit less onerous. 

For example, Excel Services in SharePoint 2010 provides two new programmatic models to help developers gain access to data in workbooks.  These models are the ECMAScript (JScript, JavaScript) object model and the Excel Services REST API. 

This blog post will focus specifically on the Excel Services REST API and how this API may be used in conjunction with a SSIS script component to retrieve data from a SharePoint based Excel spreadsheet in a data flow.

As a first step, you will need to ensure that the data in your spreadsheet is structured as an Excel table.  This isn’t necessary to use the Excel Services REST API in general, but it is required for the example I will demonstrate in this blog post. 

To structure your data as an Excel Table, simply highlight the range of data in your Excel sheet and click ‘Table’ (under the INSERT menu).  This will add a Table called Table1.

image

Once you have the data in an Excel Table format, you will need to identify the appropriate URL to leverage the REST API features.  The URL will be similar to the following:

https://intranet.mysite.com/sites/mysite/_vti_bin/ExcelRest.aspx/Shared%20Documents/book1.xlsx/model $format=atom

Let’s break the URL down into smaller components:

URL Component Description
http://intranet.mysite.com/sites/mysite The path to the site or sub-site where the Excel workbook resides.
/_vti_bin/excelrest.aspx The relative path to the web service
/shared%20documents/book1.xlsx The relative path to the Excel workbook
/model$format=atom The command or commands for the Web service. In this case, you are requesting a “model” of the workbook in an ATOM feed.

Since we are interested in the data contained in Table1, we will modify the URL slightly. 

https://intranet.mysite.com/sites/mysite/\_vti\_bin/ExcelRest.aspx/Shared%20Documents/book1.xlsx/**Model/Tables%28%27Table1%27%29?%24format=ato*

We have now reached the point where we can add a Script Component to our data flow and start writing some code to access the REST API.  I am going to assume you already have an understanding of Script Components in SSIS, so I won’t go into too much detail here other than the following points:

1.  When you add the script component to your dataflow, be sure to select the option labeled ‘Source’, since we are using the script to pull data from SharePoint.

image

2.  Make sure you add the columns you intend to pull from the Excel spreadsheet as Output Columns in your Script Component and set their data types accordingly.  An example is shown below:

image

Now, we will edit the script and add some code (I am using the C# language in this example).

First, you’ll need to add a new Service Reference to the Script Component.  To accomplish this, right click on ‘Service References’ and select ‘Add Service Reference’.

You’ll need to populate the Address field with the correct service URL, which should be similar to: https://intranet.mysite.com/sites/mysite/_vti_bin/ExcelService.asmx

You’ll also want to update the Namespace field with a logical name, in my example I called the Namespace ‘ExcelService’.  When finished, click ‘OK’.

image

You’ll also need to add the following references and namespaces to your script, if they aren’t already included.

#region Namespaces  
using System;  
using System.Data;  
using System.Collections.Generic;  
using System.Linq;  
using System.Text;  
using System.Xml.Linq;  
using System.IO;  
using System.Xml;  
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;  
using Microsoft.SqlServer.Dts.Runtime.Wrapper;  
#endregion

Two namespaces are used in the XML you will be querying:  the ATOM feed namespace and the Excel Services REST namespace.  We need to include strings in our class that specify these namespaces. 

Within the class ScriptMain, add the following two lines of code:


const string atomNameSpace = "http://www.w3.org/2005/Atom";  
const string xlsvcNameSpace = "http://schemas.microsoft.com/office/2008/07/excelservices/rest";

We’ll now add a method called LoadRanges to our ScriptMain class.  In the LoadRanges method we will set credentials to access the API, build a URI and return a XDocument via the XDocument.Load method that stores the data from the spreadsheet. Sample code is shown below:

Note:  Variables.SharePointURI is referencing a variable in my package that stores a URL path similar to the one shown earlier in this blog post (https://intranet.mysite.com/sites/mysite/\_vti\_bin/ExcelRest.aspx/Shared%20Documents/book1.xlsx/Model/Tables%28%27Table1%27%29?%24format=at)

private XDocument LoadRanges()  
{  
  // Use XMLUrlResolver to pass credentials to the Web service.  
  XmlUrlResolver resolver = new XmlUrlResolver();  
  resolver.Credentials = System.Net.CredentialCache.DefaultCredentials; 

  // Build the URI to pass the resolver.  
  Uri fullUri = new Uri(Variables.SharePointURI); 

  return XDocument.Load((Stream)resolver.GetEntity(fullUri, null, typeof(Stream)));  
} 

We now need to update the CreateNewOutputRows() method.  You won’t be able to simply copy the code below because this code is specific to the spreadsheet structure (number of columns, ordering of columns, etc.).  You will likely need to spend some time debugging to determine what logic works for your specific Excel workbook.

public override void CreateNewOutputRows()
{
    XNamespace a = atomNameSpace;
    XNamespace x = xlsvcNameSpace;

    var doc = LoadRanges();

    // Query the XDocument for all the row elements, skip first row with headers
    var rows = doc.Descendants(x + "row").Skip(1).ToList();

    foreach (var r in rows)
    {
        var allValues = r.Descendants().Where(n => n.Name == (x + "v") || n.Name == (x + "fv")).ToList();

        Output0Buffer.AddRow();
        Output0Buffer.Rank = Convert.ToInt32(allValues[0].Value);
        Output0Buffer.Category = allValues[2].Value;
        Output0Buffer.Score = Convert.ToDecimal(allValues[11].Value);
        Output0Buffer.DRI = allValues[13].Value;
    }
}

Hopefully, this example code helps you get started with Excel Services.  If you need a more detailed overview, I suggest you check out this documentation.

Comments

comments powered by Disqus