|
SQL Server 2005 User Defined Aggregates
|
|
This article is based on the
pre-release version of SQL Server 2005. SQL Server 2005 is currently in its
Beta-2 version. The features and capabilities discussed in this article may
change in future versions.
Aggregates have always been part of SQL Server. For example, each one of us would have used functions like SUM, AVG, COUNT etc. These functions iterate over a set of rows (called a group) and then provide the output for each group. For example, the following statement:
Provides the total sales in each territory. Note that the SUM function will iterate over each group of territories as specified in the GROUP BY clause. While the built aggregate functions are sufficient for most cases, sometimes there is a need to write some sort of a custom aggregate functions. For example, one of the often asked questions in the public newsgroups is to provide a comma-separated list of items. For example, if you have an PurchaseOrderHeader table and a PurchaseOrderDetail table, let us suppose that you need an output like the following:
What we basically see as the requirement is, for each order we need a comma-separated list of the items in the order. How do you do it?? The simplest and most easy way is to write a CURSOR based solution that iterates over the list of orders and then accumulates the list of order line items into a string and then output the string each time the order changes. Here is a sample implementation of the same written off the AdventureWorks database:
Although the implementation works fine, the approach that we have taken is a structured programming approach and not a set-based approach (something that SQL Server is great at). Because of this structured programming approach, we need to think about issues like code maintenance, performance etc. Rather, would it not be nice if we could write the above as:
In the above statement, note that FormDelimitedString (itemName) is the name of a custom aggregation function. Where is its implementation and how does it look like?? Enter User Defined Aggregates! User Defined Aggregates (UDAs) are a great feature in SQL Server 2005 which allows you write custom accumulation function (like the above). With the user-defined aggregate functions made possible by the .NET Framework, you now only need to implement the code for the accumulation logic. The iteration is managed by the query processor, and any user-defined aggregates referenced by the query are automatically accumulated and returned with the query result set. This approach can be orders of magnitude faster than using cursors. In this article, we will see how you can write UDAs using SQL Server 2005 and Visual Studio 2005. Writing User Defined Aggregates Writing a UDA is very simple and is broadly broken into the following steps:
We will now see how to implement each of the above steps. Note that if you are using Visual Studio 2005, step (2) is done automatically for you. Step 1: Authoring the .NET Component The first step is to author a .NET class that implements the contract required for UDAs. The contract for a UDA requires:
The SQLUserDefinedAggregate indicates to SQL Server that the .NET type conforms to the contract for a user-defined aggregate. There are two required properties on the SqlUserDefinedAggregate attribute that control the serialization format used. These are:
Apart from these properties, a set of optional properties can also be specified as shown. These properties do not provide any automatic functionality to your program, but help the optimizer to choose a better plan. You will still need to implement specific behaviors if needed (for example, if you need to eliminate duplicates for your UDA, you will need to manage that).
After specifying the attribute, the next step is to provide the various aggregate instance methods. The following are the instance methods that need to be supported:
Let us now implement a user defined aggregate. Actually, we will implement two:
To implement the order ranking UDA, start Visual Studio 2005 and then choose a new database project and name the project as SQLServerProjects as shown.
Once you have created the project the next step is to make sure that the project deploys to the correct database. Right-click the project and choose Properties and specify the database connection properties as shown:
Once you have done the above steps, right-click the project and choose Add New Item and in the resultant dialog box, choose Aggregate as shown:
For our example, we will name the aggregate as RankFunction. Note that as soon as you accept the dialog box, Visual Studio generates a template class code for you will all the relevant namespace included and sample code stub written out. For our RankOrder function, we will need to change the code as follows (changes are highlighted in bold):
The logic is very simple. When the Init function is called, we initialize a variable called mOrderQuantity to 0 and then increment it by the value argument in the Accumulate function. Finally, in the Terminate function, we return an appropriate value based on the value accumulated. Note that the Accumulate function needs to represent the data type that it expects. The Terminate function also should return the appropriate function. Note that we have specified that the serialization format is Native and this requires us to specify the Layoutkind.Sequential attribute which is required to specify the layout of an object. For more information on these attributes refer to the books online. That's it!! Its very simple!! You can now deploy the UDA into SQL Server and see how it works. For this, right-click the project and choose Deploy and if everything works well, you should see the Deploy Succeeded message in the status bar. It is interesting to see what Visual Studio is doing behind the scenes for deploying this project. You can use SQL Server Profiler to trace the calls and among the many calls that Visual Studio makes, you will see a line like the following:
Note that the information that we specified in the Accumulate and Terminate functions are used to decide the input and output values of the aggregate. Once you have successfully deployed the aggregate function, you can use the following query to test the same (note that I'm using the tables in the new AdventureWorks database).
When you execute the above query, you can see an output that consists of 1, 2, or 3 in the [Rank] column. Note that UDAs are invoked using the two part schema like: schema_name.udagg_name. Let us see another example of an UDA. In this scenario, we will form a comma-separated list of items for an order and display the result. Follow the same steps as above to create a new UDA and name it FormDelimitedString. The following is the code for this UDA.
Some things that you notice different here are:
Apart from this, the code is very similar to the earlier UDA and I encourage you to get a grasp of it without me taking you through the code :-). Deploy the above UDA as done earlier and you can test the aggregate with the following SQL statement.
If you execute the above code, you will see the following output:
Well that was easy! You can create many other aggregate functions as your situation demands. However, before you start writing .NET classes for all types of aggregations, verify whether the existing SQL aggregates itself can be used for the same and do performance tests to ensure that you aggregates perform the way you want them too. Sometimes, it may be cheaper to do the aggregation in the application tier of your application and free resources in the database server. Apply caution always!! That brings us to the end of this article and we explored the new feature of user defined aggregates in SQL Server 2005. Have fun! |
| Home |