|
SQLCLR Streaming Table Valued Functions (TVFs)
|
|
|
This article is based on the
September CTP of SQL Server 2005 and Visual Studio 2005 RTM release.
Table valued functions are not new to SQL Server 2005. They existed in SQL Server 2000 also. A table valued function is one that returns its output as a table that can then be consumed as part of say, a SELECT statement. Here is an example of table valued functions as it exists today (this example was done in SQL Server 2005 in the AdventureWorks database).
The idea behind this table valued function is very simple. I basically want to simulate a table of constant values with only one row. For my case, I have defined the function called Commission to return a table that contains the various commission percentages for different countries. Once, I have defined this table valued function, I can now use it in a query like this:
In the above example, we iterate through all the SalesPerson records and for each sales person, based on the territory that they are assigned to, we associate the new commission value from the table valued function that we defined earlier. Its a very simple example, but you get the point of the usage of table-valued functions. We all know that with the integration of the CLR with SQL Server, you can now write many T-SQL objects using CLR languages. This also means that you can write functions (scalar and table-valued functions (hereinafter called TVF)) in any CLR language. However, there is an important distinction between a T-SQL TVF and a CLR TVF. T-SQL TVFs materialize the results into a temporary table and then return the results back to the caller. This means that the caller has to wait until the entire table has been materialized. The CLR TVF on the other hand, provides a streaming alternative. Here, the caller can start receiving the results as soon as they are available (and keep pooling for more). This means that when the TVF has to return a large number of rows, a CLR implementation of the TVF may be beneficial, since the results are immediately available. In this article, we are going to understand how to write streaming TVFs using the CLR features. We will implement some interesting examples to understand how they work and behave. By the time you finish reading this article, I hope to give you a full sense of the features so that you can start using this capability as you see fit. Implementing a CLR TVF To implement a streaming TVF in SQLCLR, the number of steps required is quite minimal. The following is a broad list of what you will have to do:
Looks pretty simple, right? It is. But before we actually implement an example in SQL Server 2005, let us first understand what it takes to implement classes that support these interfaces. Once we understand this model, mapping it to SQL Server's equivalent is very simple. Implementing a regular .NET Enumerable Object Ok, let us first define the example problem that we are trying to implement. My 1 year old son has a full box of toys. The box contains many things apart from toys too and once in a day, he empties the box just for fun! The way he proceeds is that he pulls out each toy out and leaves it on the ground (you can imagine who gets to put it back :-)). This presents a very simple problem that can be implemented using the enumeration interfaces. We will simulate a toy box, fill it with toys and implement an enumeration function that can be used to remove toys from the box. Here is the .NET class code for the same:
The code is quite simple. We create a class called ToysCollection that implements the IEnumerable interface. Classes that implement this interface, must have a method called GetEnumerator that returns an object that implements the IEnumerator interface. To implement the IEnumerator interface, we have another class (declared as private) called ToysEnumerator. Any class that implements IEnumerator, has to provide implementation for 3 methods: Current, MoveNext and Reset. The rest of the code is quite self-explanatory from the in-code comments. We just create an instance of an ArrayList to hold the toys and the MoveNext method simply fetches the toys off the list (much like a stack). Let us now see how we will use this class. The following are the steps that we will need to implement:
Here is the code that implements the above steps to examine our toy box:
When you run the above program, it prints all the contents in the toy box along with the .NET type of that object. Since we have implemented the collection of toys as an object, you can put anything into it and retrieve it back. Now that we have implemented a simple example, let us see its correspondence with respect to SQL Server's CLR TVF. It turns out that the SQL Server query execution mechanism for CLR TVFs behaves the same way like our Main program. The query execution engine calls into the object that implements the IEnumerable interface and then consumes the results in an incremental fashion. This model makes it very easy for you to implement streaming TVFs that can then be consumed in other queries. Requirements for a CLR TVF To implement a function in the CLR that can act as a TVF, we need to follow some conventions in our coding. There are two things that we need to do:
Now that we have seen the basics of how to implement a SQLCLR TVF, let us now build a sample to implement the above requirements. The example that we are going to build will enumerate the contents of a text file line-by-line. Thus, you would be able to do a SELECT from our function passing in the file-name to read and the output would be each line of the file as a separate row. Along with this, we also print the number of characters in each line and number of words in each line (a word for us is any sequence of characters separated by a blank space). Before, we actually see the code, let us create a database project in Visual Studion 2005. To do this:
The code is quite similar to our enumeration example shown earlier. We first define a class called FileReader that is responsible for enumerating a file. This class implements the IEnumerable interface. In the GetEnumerator method of this class, we just return a new instance of the FileEnumerator class. The FileEnumerator class implements the IEnumerator interface. In the Reset method, we open the stream using the provided file name and in the MoveNext method, we just keep reading lines from the file. The Current property just returns the line that was read. Note that wherever applicable, we have used SQL Server Datatypes (as indicated by the namespace System.Data.SqlTypes) so that SQL Server can correctly pass the values around. To get the .NET underlying data type from a SQL Server type, we just refer to the Value property of the SQL Server type. The Main portion from our earlier example, is now substituted by two new functions called readFile and readRow. The readFile is the entry point for SQL Server and this function is annotated by the two attributes that we mentioned earlier (note their definition). The readRow function is called by SQL Server to understand the enumerated value and in this function, we return the line content, its character count and word count through the various declared out parameters. Once you are done with understanding the code, you can now compile the code. Before compiling, we need to set the PERMISSION_SET for the assembly. For this, right-click the project, choose Properties and in the property page, choose the Database node and set the value of the Permission Level combo to External. This informs the CLR hosting layer that this code does some external access to, for example, the file system. Once this done, you can build the project and if there is nothing wrong, you should see the "Build Succeeded" message in the status bar. You can then right-click the project and choose Deploy. This will deploy the assembly into SQL Server (into the database that you originally selected) and if there were no errors, you will see the "Deploy Succeeded" message in the status bar. At this stage, you are all set! Let us now test the code with a simple query like this:
Basically, I'm reading the contents of my machine.config file. If there were no hitches during compilation, you should see the output as shown in the following figure (results abridged)
Pretty cool, was'nt it? Did you also notice that it was quite fast as it read this file? This is because, the results are shown to you as soon as they are there. You can try around with different file sizes and see for yourself. Having the two extra columns helps to write queries like: Show me all lines of the file that have less than 20 characters in a line. Here is how we would write this:
This is great flexibility! I leave it to your imagination to improvise on this example and build other useful samples to test this great feature. For example, you can write code that would enumerate the event log.
Ok, that brings me to the end of this article on how to use the new CLR streaming TVFs to build some really cool stuff. There are a lots of use cases that can be solved by this new mechansim which earlier required some lengthly and complex coding in SQL Server 2000. Do send me your feedback and comments to srisamp@gmail.com. |
| Home |