Using the SSIS transformation script component in an ETL

 

Notice! Warning!

I’m not using this method anymore. Lately I have been using Jamie Thompson’s superior method (see link below). He talks about using a Lookup (or Merge Join) together with a Conditional Split component. The method on this web page is still good for some odd scenarios but, for most common ETL’s I highly recommend Jamie’s method.

 

http://blogs.conchango.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx

 

Written by: Greg Van Mullem

Date: September 13, 2006

Updated: March 29, 2009

 

This web page is part of a collection of web pages containing difficult to find technical information about Microsoft SQL Server 2005 Standard Edition. Click here to go to the main index.

 

 

The algorithm documentation:

This example shows how to ETL (Extract Translate and Load) customer records from a source (staging) table to a destination table. It uses a standard update or insert algorithm. My code is a very loose adaptation of the unrelated code samples found in these Microsoft web pages. You might be able to get a little extra understanding from looking at them.

 

Creating a Destination with the Script Component

Developing Specific Types of Script Components

 

The algorithm extracts each record from the source table and looks it up in the destination table. If the record does not exist in the destination then it is inserted. If the destination record does exist then it will be compared to the source record. If there is a difference then the record will be updated. If the records are exactly the same then an update statement will not be issued. Deletes are not allowed with this data so I’m skipping that part. The algorithm does not rely on date fields to detect changes.

 

During the ETL I will use counter components to calculate the following values that will be written to a custom logging table.

 

1. RowsComplete; Count of the rows extracted from the source table.

2. RowsInserted; Count of the rows inserted.

3. RowsUpdated; Count of the rows updated.

4. RowsNotUpdated; Count of the rows where no action was taken. They matched exactly in the source and destination.

 

Here is my data flow screen:

The “Test For Insert or Update” task is where the magic happens. It is a “script component” configured with the “Transformation” option.

 

I have 4 connection managers:

1. LoggingDW is the custom logging database. The counter variables will be written to it in a later control flow task.

2. StagingDW is the source database. The customer records were inserted into it by a previous ETL that got them from our AS400.

3. OrderTracking is the destination database. This connection uses a “Native OLE DB\SQL Native Client” provider.

4. OrderTrackingADONET is used by the script component to look up records in the destination table. This connection uses a “.Net Provider\SQLClient Data Provider” provider. Depending or your particular package you may be able to use this as the only destination database connection manager.

 

Here are complete screen shots of the Script Transformation Editor:

 

1. The usage type for all input columns must be ReadWrite. CUNO is an integer primary key and CalcCustomerName is a string (varchar). Note: this data is coming from a mainframe style AS400 database so I’m using some odd field naming conventions (they actually make a little sense in the grand scheme of things).

 

2. I have configured 1 input and 3 outputs. I did not configure any output columns. They will be auto created. Note that the Input ID is 423.

 

3. All of my outputs are configured exactly the same but with different names. I have set the ExclusionGroup to 1. I have also set the SyncronousInputID to the input ID value from the input configuration (see the previous screen). The SyncronousInputID will be different for every component input you create. Please read the Microsoft help documentation for these 2 properties. They are critical to making the script work.

 

4. Here is my IgnoreRecordsOutput properties.

 

5. Here is my UpdateRecordsOutput properties.

 

6. Here is the Connection Managers screen.

 

7. Here is the script screen. The Name property is the only thing I changed in here.

 

8. After all this configuration is done you can press the “Design Script…” button. Here is the script editor that pops up.

 

9. Here is my complete source code for the script:

 

 

' Microsoft SQL Server Integration Services user script component

' This is your new script component in Microsoft Visual Basic .NET

' ScriptMain is the entrypoint class for script components

 

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Imports Microsoft.SqlServer.Dts.Runtime

Imports System.Data.SqlClient

 

Public Class ScriptMain

    Inherits UserComponent

 

    Dim connMgr As IDTSConnectionManager90

    Dim sqlConn As SqlConnection

    Dim sqlCmd As SqlCommand

    Dim sqlParam As SqlParameter

 

    Public Overrides Sub AcquireConnections(ByVal Transaction As Object)

        connMgr = Me.Connections.LookupConnection

        sqlConn = CType(connMgr.AcquireConnection(Nothing), SqlConnection)

    End Sub

 

    Public Overrides Sub PreExecute()

        sqlCmd = New SqlCommand("SELECT KeyCustomer, CustomerName FROM tblCustomer WHERE(KeyCustomer = @KeyCustomer)", sqlConn)

        sqlParam = New SqlParameter("@KeyCustomer", SqlDbType.Int)

        sqlCmd.Parameters.Add(sqlParam)

    End Sub

 

    Public Overrides Sub CustomerRecordsInput_ProcessInputRow(ByVal Row As CustomerRecordsInputBuffer)

        Dim reader As SqlDataReader

        sqlCmd.Parameters("@KeyCustomer").Value = Row.CUNO

        reader = sqlCmd.ExecuteReader()

        If reader.Read() Then

            'do all field comparisons here to determine if

            '    the record changed since the last ETL.

            If (reader("CustomerName").ToString() <> Row.CalcCustomerName) Then

                Row.DirectRowToUpdateRecordsOutput()

            Else

                Row.DirectRowToIgnoreRecordsOutput()

            End If

        Else

            Row.DirectRowToInsertRecordsOutput()

        End If

        reader.Close()

    End Sub

 

    Public Overrides Sub ReleaseConnections()

        connMgr.ReleaseConnection(sqlConn)

    End Sub

 

End Class

 

 

10. The End…