Creating Hierarchical Data Using DataSets

Last week, when I was playing around with the Microsoft Office Research Pane (which I will deal with in a seperate article), one of my requirements was to get data from multiple tables and then relate them in a hierarchical fashion. At this point of time, I had to understand how the DataRelation object worked in DataSets and how you can use this concept to get data for related tables and display them. In this article I will put down what I had learnt.
 
Linking data has always been left to the realm of relational databases. If you take the example of SQL Server, you can create what are called foreign keys between two tables to relate data in them. For example, if you have a Orders table and a LineItems table, you can relate these two tables using the OrderNumber. When you want to now query data from these tables, SQL Server provides the INNER JOIN operator that you can use to get all the line items for an order. If you were to extend this concept into .NET, you have the most powerful in-memory database called DataSet. Within a data set, you can have many tables, which are represented by the DataTable object. Each table will have multiple columns, represented by the DataColumn object. You can relate the tables in a data set by using relations, which are represented by the DataRelation object. Conceptually, these relations look like the following (the actual representation inside the DataSet are different).
 
 
In this article, we will see how to create a hierarchy inside a data set. For our example, we will consider the pubs database. This database has a table called authors. Each author could have authored a set of books and these are represented by a couple of tables called titles and titleauthor. There will also be authors who would not have authored any books. The steps then, to create a relation between these two tables via a data set is as follows:
  1. Define a data adapter for fetching data from the authors table.
  2. Define a data adapter for fetching data from the titles and the titleauthor tables.
  3. Create a data set and fill-in data from both these adapters.
  4. Create a DataRelation object and specify the columns that need to be related in both the tables.
  5. Iterate through the master table (authors in this case) and fetch the child records.

Here is the code that I've written as a command line application.


' **********************************************************************
' This example code shows how to create a dataset that has two tables
' and then relate these tables together. This code uses the following
' two stored procedures:
' 1. dbo.GetAllAuthors
' 2. dbo.GetTitlesForAllAuthors
' **********************************************************************

Imports System.Data.SqlClient

Module HierarchicalDataSet

    Sub Main()
        Dim authors As String, authorTitles As String
        Dim myConnection As SqlConnection
        Dim authorsAdapter As SqlDataAdapter, titlesAdapter As SqlDataAdapter
        Dim result As DataSet
        Dim authorTitle As DataRelation
        Dim authorRow As DataRow, titleRow As DataRow

        ' Populate the two strings that will be used to fetch the data
        authors = "SELECT au_id, au_lname, au_fname, city, state, zip FROM authors"
        authorTitles = "SELECT ta.au_id, ta.title_id, t.title, t.price, t.notes FROM titleauthor ta INNER JOIN titles t ON ta.title_id = t.title_id"

        ' Create a connection
        myConnection = New SqlConnection("Server=.;Database=pubs;Uid=sa;Pwd=password")
        result = New DataSet

        ' Create and initialize the adapters for fetching data
        authorsAdapter = New SqlDataAdapter(authors, myConnection)
        titlesAdapter = New SqlDataAdapter(authorTitles, myConnection)

        Try
            ' Open the connection
            myConnection.Open()

            ' Fill the dataset with data from the two adapters
            authorsAdapter.Fill(result, "authors")
            titlesAdapter.Fill(result, "titles")

            ' Important to close the connection early
            myConnection.Close()

            ' Define the relation between the authors and titles table
            authorTitle = New DataRelation("authorTitles", result.Tables("authors").Columns("au_id"), result.Tables("titles").Columns("au_id"))
            authorTitle.Nested = True
            result.Relations.Add(authorTitle)

            ' Iterate through the master and child tables
            For Each authorRow In result.Tables("authors").Rows
                For Each titleRow In authorRow.GetChildRows("authorTitles")

                    ' Print author and title information only if there is title data
                    Console.WriteLine(authorRow("au_id") & " - " & authorRow("au_lname") & ", " & authorRow("au_fname"))
                    Console.WriteLine(vbTab & titleRow("title") & ", " & titleRow("price"))
                Next
            Next
        Catch ex As SqlException
            Console.WriteLine(ex.Message)
        End Try

        Console.WriteLine("Finished Operation. Press any key...")
        Console.Read()
    End Sub

End Module

 
The code is pretty self-explanatory, but I'll spend some time going over some of the major pieces of the code. The code defines the two strings that will be required to fetch data from the various tables involved in this example. We then create a connection to SQL Server and then initialize a new data set. Since we require data from two tables in this case, we create two data adapters and initialize them with the command that we want to execute and the connection that is to be used. We now open the connection and then fill the data set with data from the two adapters. After that, note that we close the connection. It is important for us to close the connection early because at this stage, all the information required for us is present in the data set. Not releasing connections in an early fashion can compromise the performance of your application.
 
Having fetched the required data, the next step is the creation of a relation that will tie the data in the two tables together. We do this by creating a new DataRelation object and giving it a name and then specifying the source and destination columns that need to be linked. In our example, it is the au_id column that we need to link. Next, we specify that the nested property of the relation is true. This will ensure that the data set will contain data in a related and hierarchical fashion. If you were to save the data set as an XML file, you will notice title elements under each author element. Finally, we add the relation to the data set.
 
That's all there is to it! If you now want to access each author and print the titles associated with the author, we will need to create two loops as shown in the example. The first loop will iterate over all the rows in the authors table and the next loop will loop over all the titles associated with the current author. This looping is achieved by using the GetChildRows method of the author row. The parameter to this method is the name of the relation to use to get the related rows. Note that we print the contents of the two tables inside the innermost loop. The reason for this is, I did not want to print those authors for which there are no titles (and these kind of records do exist). If you want to see how this works, move the first Console.WriteLine that prints the au_id, au_lname and au_fname columns outside the innermost loop to the outer one.
 
As such, the code that we have written will produce the following output:
 
 
The results have been abridged to keep the output small, but you get the basic idea about how the results will look. Finally, we print a console message to the user and then close the application.
 
Well, that all there is to it. Relating data present inside a data set is a very common operation and will almost always be required in all applications. The advantage that we have in performing such operations in the application tier is that the results could have come from different data sources and yet you can create a relation between them. For example, if all your customers data reside in your Oracle database and all purchase order data for customers resided in your SQL Server database, you can still fetch data from these two data sources and then relate them inside the data set. Without this capability, it would have been practically impossible to do this operation and ensure data integrity and proper data values.
 

Home