SQL Server 2005 Data Transformation Services (DTS) - A First Look
 
This article is based on the pre-release version of SQL Server 2005. SQL Server 2005 is currently in its Beta-2 version. The features and capabilities discussed in this article may change in future versions.

With the disclaimer out of way, let us dive into the new Data Transformation Services solution in SQL Server 2005. Before we begin the discovery of this new tool, there is one word that can immediately describe it: WOW!! That is the first reaction I had when I started playing around with DTS. Its a totally different beast in SQL Server 2005 and is a lot better than the SQL Server 2000 counterpart. Although the DTS tool in SQL Server 2000 is quite powerful, certain simple tasks (like conditional branching, looping etc), required ActiveX script code and there was also no strong support for debugging etc. All this (and more) is fixed in the new version of DTS in SQL Server 2005. With the advent of this new release, DTS is now a true ETL (Extract, Transformation and Load) tool that is worth learing and adding to your arsenal of expertise.

SQL Server 2005 Data Transformation Services (DTS) is an enterprise data transformation and data integration solution that you can use to extract, transform, and load (ETL) data from disparate sources and then move it to single or multiple destinations. DTS provides a complete set of services, graphical tools, programmable objects, and application programming interfaces (APIs) that you can use to work with DTS. The biggest advantage however is that you do not have to be a programmer to use DTS. With little or no coding you can build very powerful solutions. But, if the programmer in you wants finer control, you can use the APIs to create custom elements or integrated other capabilities into your solutions.

DTS can be used to build the following kinds of solutions:

  1. Merge data from heterogeneous data stores
  2. Refresh data in data warehouses and data marts
  3. Cleanse data before loading it destinations
  4. Bulk-load data into online transaction processing (OLTP) and online analytical processing (OLAP) databases
  5. Send notifications
  6. Build business intelligence into a data transformation process
  7. Automate administrative functions

To build DTS solutions, we now have a new designer called the BI (Business Intelligence) Development Studio. This environment represents a Visual Studio like interface and gives you all the flexibility and capability of a rich IDE like Visual Studio. You now build DTS solutions using the familiar metaphor of projects, solutions etc and also compile, debug and execute DTS solutions.

The architecture of DTS is much more structured than what it was in SQL Server 2000. The various components are clearly demarcated and identified with specific tasks. The following are the architectural elements of DTS:

  • DTS Service. The DTS service tracks DTS packages that are running and manages the storage of packages. The service is implemented as a Windows service and is off by default. When you run packages, the service is automatically started
  • DTS run-time engine its executables saves package layouts, runs packages, and provides support for logging, debugging etc. The run-time executables are the containers and tasks that DTS provides
  • DTS data-flow engine and its components provides the in-memory buffers that move data from source to destination and calls the source adapters that extract data from files and relational databases
  • DTS clients include all the tools and utilities that are provided by DTS

DTS provides various objects that you can use to program with. These objects work together to provide all the functionality that you will ever need. The following are some of the core objects that DTS provides:

  • The package is the most fundamental object of DTS. It is the package that you will create, maintain, execute and deploy. This is the topmost object in the hierarchy and will contain one or more of the following types of objects.
    • Tasks and workflow objects provide the functionality and control flow for a package. Control flow includes familiar constructs like looping, conditional branching, precedence constraints etc
    • Data flow, data adapters and transformation objects provide the ability to connect to build data flows and also extract, transform and load data into an ordered flow
    • Connection managers that connect to different types data sources at run time to extract and load data
    • Variables that can be used in expressions to dynamically update column values, control execution of repeating workflows, and define other types of conditions
    • Event handlers that run in response to various events that are raised by the DTS system
    • Log providers that provide various options for logging package run-time information

OK, with some of the basics out of the way, let us consider a simple DTS package that allows us to extract data from a SQL Server table and write it to a flat-file. After building this simple solution, you will see that you have written no code, but have got a simple data export working.

Start the BI Development Studio and choose File > New > Project. In the New Project dialog box, choose Business Intelligence Projects as the project type and Data Transformation Project as the template and provide a name for our project. Let's call our sample DTSSample. The following figure shows the dialog box with information about our project.

Once you click the Ok button, the environment shows the package in design mode. Each package has 4 tabs to it:

  • Control Flow. The control flow designer can be used to design the control flow for a package. The control flow provides the structure for the package like branching, looping etc. A package can have a single control flow and one or more data flows
  • Data Flow. The data flow designer includes various data adapters for connecting to various sources and destinations and transformations that can be used to aggregate data
  • Event Handlers. The event handler designer can be used to build various event handlers for a package
  • Package Explorer. The package explorer reflects the container hierarchy of the DTS object model. The package container is at the top of the hierarchy, and you expand the package to view the connections, executables, event handlers, log providers, precedence constraints, and variables in the package

You then have toolboxes that show the various tools available for you in each of the tab. Note that the concept of control flows and data flows allow you to clearly seggregate package flow management and data flow management. Something that was not present in the 2000 version of DTS.

Click on the control flow tab and from the toolbox select the data flow task and drop it onto the design surface. The following figure shows the toolbox item that you will need to select.

Now click on the data flow tab and you should now see the data flow task that we added in the control flow designer in the data flow task combo-box. Let us now define what we want to happen in this data flow. As per our earlier problem statement, we will write a data flow to transfer data from a table into a text file. For the sake of this example, we will transfer the contents of the authors table (pubs database) into a flat-file. To transfer contents from a table to a text file, we need to create connections to an instance of SQL Server and also create the destination file. For this, right click on the Connections area (found at the bottom of the designer). You will now see the following popup menu.

For our example, we will need an OLE DB connection and a Flat-file selection. Let us first create an OLE DB connection. When you choose New OLD DB Connection..., you get the Select Data Connection dialog box. Choose New to create a new connection. You will see the following dialog box in which you can fill-out the information as shown. Click on Ok to create the connection and then click Ok on the select data connection dialog box.

At this point, before we create the flat-file connection, we need to drag-and-drop two tasks from the toolbox. The two tasks that we need are OLD DB Source and then the Flat file Destination task. Once you drop them on the design surface, drag the green arrow from the OLE DB source task to the Flat file destination task. This establishes the source and destination that we need. The following figure shows how the designer will look after you complete this.

Now, double-click the OLD DB Source task. This shows the OLE DB Source Editor dialog box. In this you define the table that you want to get the data from and then the columns that you need to get data from. The following figure shows the dialog.

Note that we have selected the authors table from the list. You can leave the other items to their default. For example, if you choose the Columns output, you can decide what columns you want to export from the table and the Error Ouput option lets you control how you want to react to the errors. We do not need to change any of these, so click on the Ok button to close the dialog box. Now double-click on the Flat file destination task. This opens the Flat File Destination Editor where you can define the properties of the flat-file into which you want to export the data. Again, the only input that we will provide in this dialog is the name of the file to which to export the data. The other options will be left to their defaults (but make sure that you click on the Columns options and the Mappings option to ensure that all defaults take effect). The following figure shows the ensuing dialog boxes.

Note that if you click on the Columns option, you can see that the flat-file connection shows the columns that we selected from the authors table. Close all the dialog boxes by clicking on the OK button.

Actually, you are all set now! Click on the green Execute button on the toolbar and you should see the compiler kick into place and if everything is fine, you should see that 23 rows are exported. The designer reflects this by shading both the task items to green as shown in the following figure.

You can see that a file called Authors.txt has been created in the C:\ folder as specified by us. Well that's it!! A rather lengthy article, but in the end worth it :-) I went into a great detail to explain the various elements and options that you need to choose and once you get the hang of it, things get a lot simpler. In future articles, we will get into more details about various other packages that you can create. Till then, have fun with the new DTS!

Home