SSIS: Dataflow Script Component

Saturday, November 5, 2011
by jsalvo

The SSIS dataflow script component is useful for complex data cleansing and data validation.  I recently used the dataflow script component to validate the format of emails and phone numbers based on a regular expression.

In the Toolbox, click on the ‘Script Component’ button.

image

You are then prompted to select the ‘Script Component Type’.  The options are:

image

  • Source: Used when you want to apply advanced logic to data from a source system.
  • Destination: Used when you need to write data to non-OLE DB destinations.
  • Transformation: Most common scenario used when you need to perform advanced logic for data cleansing or validation.

Inputs and Outputs

Assuming you selected ‘Transformation’ as the script component type, you will have the ability to specify input and output columns.  In the ‘Script Transformation Editor’, the input columns are specified by checking the desired checkboxes in the ‘Available Input Columns’ list. 

image

The outputs are specified in the ‘Inputs and Outputs’ tab.  You need to include an output for any column that you wish to use in subsequent tasks in the dataflow.

Click the ‘Add Column’ button to add additional columns to the Output.  In the example below, columns are being added to the default Output0.

image

Variable Usage

In the Script Transformation Editor window, click ‘Script’.  Under ‘Custom Properties’ you can specify ReadOnlyVariables and ReadWriteVariables.  You may specify any SSIS variables you created.   In the example below, the ‘ValidEmail’ variable is specified as a read-write variable.  If you only need to read the value stored in the variable, you should set it to read-only.

image

The following code snippet demonstrates how to read/write from an SSIS variable in the ‘Script Component’ (Note: the code shown below is not applicable to the Control Flow Script Task.  Todd McDermid’s blog article provides an excellent comparison between variable usage in the Control Flow Script Task and Data Flow Script Component Task).

Read:

bool scriptVariable;

scriptVariable = Variables.ValidEmail;

Write:

Variables.ValidEmail = true;

It is important to note that while read-only variables can be accessed anywhere in the script, the read-write variables can only be used in the PostExecute function.

Edit Script

You are now ready to add code to the script.  On the Script tab, click the ‘Edit Script’ button.

SNAGHTML2c1be6c

To read data from the input buffer and write it to the output buffer, you need to modify the Input0_ProcessInputRow function.

public override void Input0_ProcessInputRow(Input0Buffer Row)

The follow code snippet demonstrates reading data from an input column:

string emailAddress;

emailAddress = Row.EmailAddress;

The following code snippet adds an output row and sets the column values.  FirstName, MiddleName and Surname were specified as columns in the Ouput0Buffer (in the Inputs and Outputs tab).

Output0Buffer.AddRow();

Output0Buffer.FirstName = Row.FirstName;  
Output0Buffer.MiddleName = Row.MiddleName;  
Output0Buffer.Surname = Row.Surname;

The following are some blog articles that provide additional information regarding the Script Component task:

Demystifying-the-SSIS-Script-Component-Part-1.aspx (links to Parts 2 and 3 are included in the article)

Variables In SSIS Scripts

Comments

comments powered by Disqus