SQLCLR Streaming Table Valued Functions (TVFs)
 
This article is based on the September CTP of SQL Server 2005 and Visual Studio 2005 RTM release.

Table valued functions are not new to SQL Server 2005. They existed in SQL Server 2000 also. A table valued function is one that returns its output as a table that can then be consumed as part of say, a SELECT statement. Here is an example of table valued functions as it exists today (this example was done in SQL Server 2005 in the AdventureWorks database).


IF (OBJECT_ID ('Commission') IS NOT NULL)
	DROP FUNCTION Commission
GO
CREATE FUNCTION Commission () RETURNS
	@result TABLE
	(
		UnitedStates	DECIMAL (5,2),
		Canada			DECIMAL (5,2),
		France			DECIMAL (5,2),
		Germany			DECIMAL (5,2),
		Australia		DECIMAL (5,2),
		GreatBritain	DECIMAL (5,2),
		None			DECIMAL (5,2)
	) AS
BEGIN
	INSERT INTO @result
		SELECT
			5.0, 3.0, 3.0, 2.0, 2.5, 4.5, 2.0
	RETURN
END
GO

The idea behind this table valued function is very simple. I basically want to simulate a table of constant values with only one row. For my case, I have defined the function called Commission to return a table that contains the various commission percentages for different countries. Once, I have defined this table valued function, I can now use it in a query like this:


SELECT
	s.SalesPersonID, s.TerritoryID, s.SalesYTD, t.CountryRegionCode,
	Commission = 
		CASE t.CountryRegionCode
			WHEN 'US' THEN c.UnitedStates
			WHEN 'CA' THEN c.Canada
			WHEN 'FR' THEN c.France
			WHEN 'DE' THEN c.Germany
			WHEN 'AU' THEN c.Australia
			WHEN 'GB' THEN c.GreatBritain
			ELSE c.None
		END
FROM
	Sales.SalesPerson s
	LEFT OUTER JOIN Sales.SalesTerritory t ON s.TerritoryID = t.TerritoryID
	CROSS JOIN (SELECT * FROM Commission()) c

In the above example, we iterate through all the SalesPerson records and for each sales person, based on the territory that they are assigned to, we associate the new commission value from the table valued function that we defined earlier. Its a very simple example, but you get the point of the usage of table-valued functions.

We all know that with the integration of the CLR with SQL Server, you can now write many T-SQL objects using CLR languages. This also means that you can write functions (scalar and table-valued functions (hereinafter called TVF)) in any CLR language. However, there is an important distinction between a T-SQL TVF and a CLR TVF. T-SQL TVFs materialize the results into a temporary table and then return the results back to the caller. This means that the caller has to wait until the entire table has been materialized. The CLR TVF on the other hand, provides a streaming alternative. Here, the caller can start receiving the results as soon as they are available (and keep pooling for more). This means that when the TVF has to return a large number of rows, a CLR implementation of the TVF may be beneficial, since the results are immediately available.

In this article, we are going to understand how to write streaming TVFs using the CLR features. We will implement some interesting examples to understand how they work and behave. By the time you finish reading this article, I hope to give you a full sense of the features so that you can start using this capability as you see fit.

Implementing a CLR TVF

To implement a streaming TVF in SQLCLR, the number of steps required is quite minimal. The following is a broad list of what you will have to do:

  • Implement a class that implements the IEnumerable interface.
  • Typically, the class that implements the IEnumerable interface, will also have a class that implements the IEnumerator interface to provide the iteration aspects of the objects.
  • Mark-up the functions with suitable attributes that helps the SQL Server query optimizer to call into the various methods.

Looks pretty simple, right? It is. But before we actually implement an example in SQL Server 2005, let us first understand what it takes to implement classes that support these interfaces. Once we understand this model, mapping it to SQL Server's equivalent is very simple.

Implementing a regular .NET Enumerable Object

Ok, let us first define the example problem that we are trying to implement. My 1 year old son has a full box of toys. The box contains many things apart from toys too and once in a day, he empties the box just for fun! The way he proceeds is that he pulls out each toy out and leaves it on the ground (you can imagine who gets to put it back :-)). This presents a very simple problem that can be implemented using the enumeration interfaces. We will simulate a toy box, fill it with toys and implement an enumeration function that can be used to remove toys from the box. Here is the .NET class code for the same:


using System;
using System.Collections.Generic;
using System.Collections;
using System.Text;

// This class indicates a collection of toys in a bin that can
// be enumerated using a custom enumerator
class ToysCollection : IEnumerable
{
    // This holds the collection of all the toys
    private ArrayList toys;

    // In the default constructor, we create a toys holder
    // with some default capacity
    public ToysCollection() { toys = new ArrayList(10); }

    // This method allows the storage of anything of
    // interest into the toy bin
    public void Store(object someThing) { toys.Add(someThing); }

    // This method returns the enumerator object that allows us
    // to inspect what is there in the toy box
    public IEnumerator GetEnumerator() { return new ToysEnumerator(this); }

    // This class implements the enumeration of the toy box
    private class ToysEnumerator : IEnumerator
    {
        ToysCollection myToys;
        object currentToy;

        // Make locak copies of whatever we need
        public ToysEnumerator (ToysCollection toys)
        {
            this.myToys = toys;
            Reset();
        }

        // Returns the current identified toy in the box
        public object Current { get { return currentToy; } }

        // This method allows us to iterate in the toy box and
        // return the top most toy that was found
        public bool MoveNext()
        {
            int totalToys = myToys.toys.Count;

            if (totalToys > 0)
            {
                currentToy = (object) myToys.toys[totalToys-1];
                myToys.toys.RemoveAt(totalToys - 1);
                return true;
            }
            else
                return false;
        }

        // This method is used to initialize the starting point. Since
        // we rely on build in properties to help us navigate, we just
        // have an empty implementation
        public void Reset() { ; }
    }
}

The code is quite simple. We create a class called ToysCollection that implements the IEnumerable interface. Classes that implement this interface, must have a method called GetEnumerator that returns an object that implements the IEnumerator interface. To implement the IEnumerator interface, we have another class (declared as private) called ToysEnumerator. Any class that implements IEnumerator, has to provide implementation for 3 methods: Current, MoveNext and Reset. The rest of the code is quite self-explanatory from the in-code comments. We just create an instance of an ArrayList to hold the toys and the MoveNext method simply fetches the toys off the list (much like a stack). Let us now see how we will use this class. The following are the steps that we will need to implement:

  1. Create an instance of the ToysCollection class
  2. Call the GetEnumerator method and store the reference into an IEnumerator object
  3. Implement a while loop that loops over the collection using the MoveNext method

Here is the code that implements the above steps to examine our toy box:


class Program
{
    static void Main(string[] args)
    {
        ToysCollection mySonsToys;

        // Add some toys to the collection
        mySonsToys = new ToysCollection();
        mySonsToys.Store("Car");
        mySonsToys.Store("Turtle");
        mySonsToys.Store("Caterpillar");
        mySonsToys.Store("Bunny");

        // Get the enumeration object and print out the collection
        IEnumerator t = mySonsToys.GetEnumerator();
        while (t.MoveNext())
        {
            Console.WriteLine("{1}, {0}", t.Current, t.Current.GetType().ToString());
        }
        Console.ReadLine();
    }
}

When you run the above program, it prints all the contents in the toy box along with the .NET type of that object. Since we have implemented the collection of toys as an object, you can put anything into it and retrieve it back.

Now that we have implemented a simple example, let us see its correspondence with respect to SQL Server's CLR TVF. It turns out that the SQL Server query execution mechanism for CLR TVFs behaves the same way like our Main program. The query execution engine calls into the object that implements the IEnumerable interface and then consumes the results in an incremental fashion. This model makes it very easy for you to implement streaming TVFs that can then be consumed in other queries.

Requirements for a CLR TVF

To implement a function in the CLR that can act as a TVF, we need to follow some conventions in our coding. There are two things that we need to do:

  • Implement a function that will act as an entry point for SQL Server to call. This function will return an instance of the object that implements the IEnumerable interface.
    • The function defined above will have to be decorated with the [Microsoft.SqlServer.Server.SqlFunction (FillRowMethodName=methodName, TableDefinition=tableDefinition)] attribute. This attribute has two parameters that we need to fill for implementing a TVF. The FillRowMethodName indicates a function that SQL Server will call to decode the values returned from the enumeration. This is required because SQL Server has no understanding of what the enumeration will return. The TableDefinition parameter indicates the structure of the table that represents the return value from the function. This is similar to mentioning the TABLE definition when we create a regular table valued function.
  • Implement a function that will help decode the values returned from the enumeration. The function will take in as parameter an object and returns a series of out parameters that indicate the values in each column of the table defined in the attribute mentioned above.

Now that we have seen the basics of how to implement a SQLCLR TVF, let us now build a sample to implement the above requirements. The example that we are going to build will enumerate the contents of a text file line-by-line. Thus, you would be able to do a SELECT from our function passing in the file-name to read and the output would be each line of the file as a separate row. Along with this, we also print the number of characters in each line and number of words in each line (a word for us is any sequence of characters separated by a blank space). Before, we actually see the code, let us create a database project in Visual Studion 2005. To do this:

  1. Choose File > New > Project
  2. In the Project Types pane, choose the Visual C# tree and choose the Database node
  3. Give a name for the project called ReadFile
  4. If you asked to select a database connection, choose the required database connection (or create a new one). This is where the assembly will be registered
  5. Once the project has been created, right-click the project and choose Add > User Defined Function
  6. In the resulting dialog box, give a name for the function as readFile.cs
  7. Visual Studio will now create a skeleton code for you. Replace the entire code block with the snippet below:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections;
using System.IO;

public partial class UserDefinedFunctions
{
    // This class implements the reading of a text file and
    // enumerating its contents
    public class FileReader : IEnumerable
    {
        // Represents the full path to the file
        string filePath;

        // The constructor that takes as argument the file path and
        // name to read
        public FileReader(SqlString filePath)
        {
            this.filePath = filePath.Value;
        }

        // Called when the contents of the file have to be enumerated
        // This method just returns another object that implements the
        // enumeration
        public IEnumerator GetEnumerator()
        {
            return new FileEnumerator(this);
        }

        // This class implements the actual enumeration of the file
        private class FileEnumerator : IEnumerator
        {
            FileReader reader;
            StreamReader sr;
            string line;

            // In the constructor, we just store a reference to the caller
            // and then initialize the reading
            public FileEnumerator(FileReader reader)
            {
                this.reader = reader;
                Reset();
            }

            // Return the current line that was read from the file
            public object Current
            {
                get { return line; }
            }

            // This method is called to iterate through the file. Each
            // line is read in succession
            public bool MoveNext()
            {
                line = sr.ReadLine();
                if (line != null)
                    return true;
                else
                {
                    // When there are no more lines, we can close the
                    // stream and return failure
                    sr.Close();
                    return false;
                }
            }

            // Initialize the reader and open the stream
            public void Reset()
            {
                if (File.Exists(this.reader.filePath))
                    sr = new StreamReader(this.reader.filePath);
                else
                    throw new Exception(String.Format("File {0} Does not exist", this.reader.filePath));

            }
        }
    }

    // This function is called by SQL Server to inialize the TVF. This function
    // simply returns a reference to our FileReader class
    [Microsoft.SqlServer.Server.SqlFunction (
            FillRowMethodName="readRow",
            TableDefinition="Line NVARCHAR(MAX), NumCharacters INT, NumWords INT"
        )]
    public static IEnumerable readFile(SqlString filePath)
    {
        return new FileReader(filePath);
    }

    // This method is called when SQL Server needs to translate and understand what
    // the enumeration returned. Here we do some processing and return the various
    // result columns
    public static void readRow(object fileLine, out SqlString theLine,
        out SqlInt32 numCharacters, out SqlInt32 numWords)
    {
        string line = (string)fileLine;

        theLine = line;
        numCharacters = line.Length;
        numWords = line.Split(' ').Length;
    }
};

The code is quite similar to our enumeration example shown earlier. We first define a class called FileReader that is responsible for enumerating a file. This class implements the IEnumerable interface. In the GetEnumerator method of this class, we just return a new instance of the FileEnumerator class. The FileEnumerator class implements the IEnumerator interface. In the Reset method, we open the stream using the provided file name and in the MoveNext method, we just keep reading lines from the file. The Current property just returns the line that was read. Note that wherever applicable, we have used SQL Server Datatypes (as indicated by the namespace System.Data.SqlTypes) so that SQL Server can correctly pass the values around. To get the .NET underlying data type from a SQL Server type, we just refer to the Value property of the SQL Server type. The Main portion from our earlier example, is now substituted by two new functions called readFile and readRow. The readFile is the entry point for SQL Server and this function is annotated by the two attributes that we mentioned earlier (note their definition). The readRow function is called by SQL Server to understand the enumerated value and in this function, we return the line content, its character count and word count through the various declared out parameters.

Once you are done with understanding the code, you can now compile the code. Before compiling, we need to set the PERMISSION_SET for the assembly. For this, right-click the project, choose Properties and in the property page, choose the Database node and set the value of the Permission Level combo to External. This informs the CLR hosting layer that this code does some external access to, for example, the file system. Once this done, you can build the project and if there is nothing wrong, you should see the "Build Succeeded" message in the status bar. You can then right-click the project and choose Deploy. This will deploy the assembly into SQL Server (into the database that you originally selected) and if there were no errors, you will see the "Deploy Succeeded" message in the status bar. At this stage, you are all set! Let us now test the code with a simple query like this:


SELECT * FROM readFile
	('C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\CONFIG\machine.config')

Basically, I'm reading the contents of my machine.config file. If there were no hitches during compilation, you should see the output as shown in the following figure (results abridged)

Output result of the TVF

Pretty cool, was'nt it? Did you also notice that it was quite fast as it read this file? This is because, the results are shown to you as soon as they are there. You can try around with different file sizes and see for yourself. Having the two extra columns helps to write queries like: Show me all lines of the file that have less than 20 characters in a line. Here is how we would write this:


SELECT * FROM readFile
	('C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\CONFIG\machine.config')
	WHERE NumCharacters < 20

This is great flexibility! I leave it to your imagination to improvise on this example and build other useful samples to test this great feature. For example, you can write code that would enumerate the event log.


Note that there are certain classes in the .NET Framework that already implement IEnumerable. If you want to use such objects as the output of a TVF, you can just implement just the two functions and return the object of interest in the first function. You would just mention how you want the results to be parsed. For example, if you want to implement a TVF for reading the event log, the EventLog class already has a GetEnumerator class, thus we can just open the log of interest and return the collection of entries. We would then just indicate how to parse the EventLogEntry object in the second function so that SQL Server can display the results. I will probably post this and a few other possibilities in my blog.

Ok, that brings me to the end of this article on how to use the new CLR streaming TVFs to build some really cool stuff. There are a lots of use cases that can be solved by this new mechansim which earlier required some lengthly and complex coding in SQL Server 2000. Do send me your feedback and comments to srisamp@gmail.com.

Home