Tuesday, September 27, 2011

SCD Type 2

Configuring the SCD for a Type 2 Dimension

The first work you need to do for a Type 2 dimension actually resides in your dimension table design – you need to decide whether you are going to track changes in your table using either a simple indicator to identify current and expired records, or if you want to use effective dates – the component doesn’t natively allow you to use both, though you can customize the output to do so. The Current / Expired indicator actually uses a small text string which can either be set to the string value pairs ”True” / ”False” or “Current” / “Expired” – no customization of these is allowed in the component (again, you can customize the output to change this, but the wizard will only allow mapping of the column to one that will accept text strings). The Effective dates option requires a start and end date date time column, and in the wizard you use a variable to set the time used. The sample package demonstrates a few possibilities but below I will describe using effective dates.

b

Fig 2: Select a Dimension Table and Keys

First of all, note when on the first page, Select a Dimension Table and Keys the Effective dates (and Current indicator) are not mapped. Because I have named the columns in line with what the SCD expects for such indicators, it ignores them completely in the mapping – they cannot even be selected as Input Columns. If you name them differently in your design, simply map them as “Not a Key Column”.

b

Fig 3: Slowly Changing Dimension Columns

In the Slowly Changing Dimension Columns page set the change type of each column to “Historic” so the component will track history of changes.

b

Fig 4: Historical Attribute Options

The wizard will present a page that is only displayed when you have selected Historic change type columns. Here the start and end date columns are specified, and the component needs a datetime variable to use to set the expiry of old records and the start date of new records. Here I have just used the Package Start time variable – in practice you may well want to specify a variable populated with something else, such as the extract date of the data.

b

Fig 5: Finishing the SCD Wizard

When go to finish the wizard, you will note the additional “Historical Attribute Output” will be generated. In practice this means a set-of components will be output to manage the changes, which are illustrated below (click to zoom in). The derived columns add the effective start and end date columns and the OLE DB Command expires old records. Please review the sample component to see how this works in practice.

b

Fig 6: Historical Change Outputs

SCD Considerations for Type 2 Dimensions

One of the most important things to bear in mind is that the component is not intelligent in terms of knowing which data is new – so if you had two records for a given key in the sample file, you would have to sort it so it would feed it the most recent item last so that item would be the current one. It also provides no support for data which has its own change dates – for example if a record had an update date and you wished to use that to form the effective date.

The SCD component is only really suitable for tracking Type 2 changes in sources where there will be one record per key per extract and the source itself has no change tracking capabilities. Given this weakness and the difficulties with using this component generally (in terms of configuration and performance) – you may well want to look at the alternatives I mentioned in my original post about the SCD. This is a component that definitely needs an overhaul for the next release.

No comments:

Post a Comment