SQL Server 2005 User Defined Aggregates
 
This article is based on the pre-release version of SQL Server 2005. SQL Server 2005 is currently in its Beta-2 version. The features and capabilities discussed in this article may change in future versions.

Aggregates have always been part of SQL Server. For example, each one of us would have used functions like SUM, AVG, COUNT etc. These functions iterate over a set of rows (called a group) and then provide the output for each group. For example, the following statement:


SELECT
	Territory, SUM(Sales)
FROM
	TerritorySales
GROUP BY
	Territory

Provides the total sales in each territory. Note that the SUM function will iterate over each group of territories as specified in the GROUP BY clause. While the built aggregate functions are sufficient for most cases, sometimes there is a need to write some sort of a custom aggregate functions. For example, one of the often asked questions in the public newsgroups is to provide a comma-separated list of items. For example, if you have an PurchaseOrderHeader table and a PurchaseOrderDetail table, let us suppose that you need an output like the following:


OrderID	ItemName, ItemName, ItemName
OrderID	ItemName, ItemName

What we basically see as the requirement is, for each order we need a comma-separated list of the items in the order. How do you do it?? The simplest and most easy way is to write a CURSOR based solution that iterates over the list of orders and then accumulates the list of order line items into a string and then output the string each time the order changes. Here is a sample implementation of the same written off the AdventureWorks database:


SET NOCOUNT ON
-- Declare some scratch variables
DECLARE @itemString VARCHAR(8000)
DECLARE @orderID INT
DECLARE @itemName VARCHAR(10)
DECLARE @outputTable TABLE (OrderID INT, itemString VARCHAR(8000))

-- Declare a cursor that gets the order header records
DECLARE orderList CURSOR FOR
	SELECT PurchaseOrderID FROM Purchasing.PurchaseOrderHeader
		WHERE PurchaseOrderID < 10
		
-- Iterate through the orders
OPEN orderList
FETCH NEXT FROM orderList INTO @orderID
WHILE (@@FETCH_STATUS = 0)
BEGIN
	SET @itemString = ''

	-- Get the list of order items for the order that was fetched earlier
	DECLARE orderItems CURSOR FOR
		SELECT p.productNumber FROM Purchasing.PurchaseOrderDetail pd
			INNER JOIN Production.Product p ON pd.ProductID = p.ProductID
		WHERE pd.PurchaseOrderID = @orderID
		
	-- Iterate through the order items
	OPEN orderItems
	FETCH NEXT FROM orderItems INTO @itemName
	WHILE (@@FETCH_STATUS = 0)
	BEGIN
		SET @itemString = @itemString + @itemName + ', '
		FETCH NEXT FROM orderItems INTO @itemName
	END
	CLOSE orderItems
	DEALLOCATE orderItems

	-- When the group ends, insert a record into the temporary table
	INSERT INTO @outputTable VALUES (@orderID, @itemString)
	FETCH NEXT FROM orderList INTO @orderID
END
CLOSE orderList
DEALLOCATE orderList

-- Output the final table
SELECT * FROM @outputTable

Although the implementation works fine, the approach that we have taken is a structured programming approach and not a set-based approach (something that SQL Server is great at). Because of this structured programming approach, we need to think about issues like code maintenance, performance etc. Rather, would it not be nice if we could write the above as:


SELECT
	orderName, FormDelimitedString (itemName)
FROM
	orders o
	INNER JOIN orderItems oi ON o.orderID = oi.orderID
GROUP BY
	o.orderID

In the above statement, note that FormDelimitedString (itemName) is the name of a custom aggregation function. Where is its implementation and how does it look like?? Enter User Defined Aggregates! User Defined Aggregates (UDAs) are a great feature in SQL Server 2005 which allows you write custom accumulation function (like the above). With the user-defined aggregate functions made possible by the .NET Framework, you now only need to implement the code for the accumulation logic. The iteration is managed by the query processor, and any user-defined aggregates referenced by the query are automatically accumulated and returned with the query result set. This approach can be orders of magnitude faster than using cursors.

In this article, we will see how you can write UDAs using SQL Server 2005 and Visual Studio 2005.

Writing User Defined Aggregates

Writing a UDA is very simple and is broadly broken into the following steps:

  • Authoring the .NET class that implements the UDA contract
  • Registering the aggregate using CREATE AGGREGATE
  • Using the aggregate functions

We will now see how to implement each of the above steps. Note that if you are using Visual Studio 2005, step (2) is done automatically for you.

Step 1: Authoring the .NET Component

The first step is to author a .NET class that implements the contract required for UDAs. The contract for a UDA requires:

  • A SQLUserDefinedAggregate attribute
  • A public constructor
  • Set of instance methods for the aggregation

The SQLUserDefinedAggregate indicates to SQL Server that the .NET type conforms to the contract for a user-defined aggregate. There are two required properties on the SqlUserDefinedAggregate attribute that control the serialization format used. These are:

  • Format. This property indicates the serialization format for the type. The query processor uses serialization when it needs to propagate temporary results of an aggregation into work-tables.
  • MaxByteSize. This property indicates the maximum size in bytes needed to store the result of the aggregation. The maximum size is 8000. Note that this also means that your aggregated string cannot be greater than 8000 bytes in length.

Apart from these properties, a set of optional properties can also be specified as shown. These properties do not provide any automatic functionality to your program, but help the optimizer to choose a better plan. You will still need to implement specific behaviors if needed (for example, if you need to eliminate duplicates for your UDA, you will need to manage that).

  • IsInvariantToDuplicates. This property indicates whether the aggregate function is invariant to duplicate values.
  • IsInvariantToNulls. This property indicates whether the aggregate function is invariant to null values.
  • IsInvariantToOrder. This property indicates whether the aggregate function is invariant to order. Being invariant to order allows the optimizer to be flexible on choosing the execution plans.
  • IsNullIfEmpty. This property indicates whether the aggregate function will return NULL if no values have been accumulated.

After specifying the attribute, the next step is to provide the various aggregate instance methods. The following are the instance methods that need to be supported:

  • Init. This method is called when a new group is encountered. This method usually initializes some container (say a StringBuilder) to hold the accumulated results.
  • Accumulate. This method is called for each value in a group. The parameter to this method is the value in the group. Note that an appropriate data type must be specified.
  • Merge. This method is called when the result of one aggregation needs to be merged with the current instance.
  • Terminate. This method is called at the end of each group and is expected to return the result of the aggregation.

Let us now implement a user defined aggregate. Actually, we will implement two:

  • The first aggregate function is called RankOrder, which we will use to accumulate the quantity of items in the order and then rank the order
  • The second aggregate function is called FormDelimitedString, which we will use to form a comma-separated string of order items

To implement the order ranking UDA, start Visual Studio 2005 and then choose a new database project and name the project as SQLServerProjects as shown.

Once you have created the project the next step is to make sure that the project deploys to the correct database. Right-click the project and choose Properties and specify the database connection properties as shown:

Once you have done the above steps, right-click the project and choose Add New Item and in the resultant dialog box, choose Aggregate as shown:

For our example, we will name the aggregate as RankFunction. Note that as soon as you accept the dialog box, Visual Studio generates a template class code for you will all the relevant namespace included and sample code stub written out. For our RankOrder function, we will need to change the code as follows (changes are highlighted in bold):


Imports System
Imports System.Data.Sql
Imports System.Data.SqlTypes
Imports System.Runtime.InteropServices

<Serializable()> _
<StructLayout(LayoutKind.Sequential)> _
<SqlUserDefinedAggregate(Format.Native, IsInvariantToDuplicates:=False, IsInvarianttoNulls:=True, _
	IsInvarianttoOrder:=True, IsNullIfEmpty:=True)> _
Public Class RankOrder
    Private mOrderQuantity As SqlInt32

    Public Sub Init()
        ' Initialize the order count
        mOrderQuantity = 0
    End Sub

    Public Sub Accumulate(ByVal value As SqlInt32)
        ' For each order line item encountered, increment the order quantity
        If (value.IsNull) Then
            Return
        Else
            mOrderQuantity = mOrderQuantity + value
        End If
    End Sub

    Public Sub Merge(ByVal value As RankOrder)
        mOrderQuantity = mOrderQuantity + value.mOrderQuantity
    End Sub

    Public Function Terminate() As SqlInt32
        ' Rank the order here and return the result
        Dim nRank As Integer

        nRank = 0
        Select Case mOrderQuantity
            Case Is < 100
                nRank = 1
            Case Is < 200
                nRank = 2
            Case Is > 200
                nRank = 3
        End Select

        Return New SqlInt32(nRank)

    End Function
End Class

The logic is very simple. When the Init function is called, we initialize a variable called mOrderQuantity to 0 and then increment it by the value argument in the Accumulate function. Finally, in the Terminate function, we return an appropriate value based on the value accumulated. Note that the Accumulate function needs to represent the data type that it expects. The Terminate function also should return the appropriate function. Note that we have specified that the serialization format is Native and this requires us to specify the Layoutkind.Sequential attribute which is required to specify the layout of an object. For more information on these attributes refer to the books online.

That's it!! Its very simple!! You can now deploy the UDA into SQL Server and see how it works. For this, right-click the project and choose Deploy and if everything works well, you should see the Deploy Succeeded message in the status bar. It is interesting to see what Visual Studio is doing behind the scenes for deploying this project. You can use SQL Server Profiler to trace the calls and among the many calls that Visual Studio makes, you will see a line like the following:


CREATE AGGREGATE [RankOrder](@value int) RETURNS [int] EXTERNAL NAME [SQLServerProjects].[SQLServerProjects.RankOrder]

Note that the information that we specified in the Accumulate and Terminate functions are used to decide the input and output values of the aggregate. Once you have successfully deployed the aggregate function, you can use the following query to test the same (note that I'm using the tables in the new AdventureWorks database).


SELECT
	ph.PurchaseOrderID, [Rank] = dbo.RankOrder (pd.OrderQty)

FROM
	Purchasing.PurchaseOrderHeader ph
	INNER JOIN Purchasing.PurchaseOrderDetail pd ON ph.PurchaseOrderID = pd.PurchaseOrderID
	INNER JOIN Production.Product p ON pd.ProductID = p.ProductID
WHERE
	ph.PurchaseOrderID < 10
GROUP BY
	ph.PurchaseOrderID

When you execute the above query, you can see an output that consists of 1, 2, or 3 in the [Rank] column. Note that UDAs are invoked using the two part schema like: schema_name.udagg_name.

Let us see another example of an UDA. In this scenario, we will form a comma-separated list of items for an order and display the result. Follow the same steps as above to create a new UDA and name it FormDelimitedString. The following is the code for this UDA.


Imports System
Imports System.Data.Sql
Imports System.Data.SqlTypes
Imports System.Text
Imports System.Runtime.Serialization

<Serializable()> _
<SqlUserDefinedAggregate(Format.UserDefined, IsInvariantToDuplicates:=False, IsInvariantToNulls:=True, _
	IsInvariantToOrder:=False, IsNullIfEmpty:=True, MaxByteSize:=8000)> _
Public Class FormDelimitedString
    Implements IBinarySerialize

    ' Variable to hold the return result
    Private returnString As StringBuilder

    Public Sub Init()
        ' Initialize the string builder. This method will be called for each
        ' group that is accumulated
        returnString = New StringBuilder()
    End Sub

    Public Sub Accumulate(ByVal value As SqlString)
        ' Accumulate the value into the string builder. If the value is NULL
        'do not do anything. This method is called for each row of data
        If (value.IsNull) Then
            Return
        Else
            returnString.Append(value.Value).Append(", ")
        End If
    End Sub

    Public Sub Merge(ByVal value As FormDelimitedString)
        ' Merge the current instance with the other instance.
        returnString.Append(value.returnString)
    End Sub

    Public Function Terminate() As SqlString
        ' Return the result if the string has some content
        If (returnString.Length > 0) Then
            Return New SqlString(returnString.ToString(0, returnString.Length - 2))
        End If

        ' Return an empty string otherwise
        Return New SqlString("")
    End Function

    Public Sub Read(ByVal r As System.IO.BinaryReader) Implements System.Data.Sql.IBinarySerialize.Read
        returnString = New StringBuilder(r.ReadString())
    End Sub

    Public Sub Write(ByVal w As System.IO.BinaryWriter) Implements System.Data.Sql.IBinarySerialize.Write
        w.Write(returnString.ToString())

    End Sub
End Class

Some things that you notice different here are:

  • The serialization format is specified as UserDefined. This requires us to import the System.Runtime.Serialization namespace and also specify the MaxByteSize. The class implement the IBinarySerialize interface.
  • We implement two methods called Read and Write that (de)serializes the data when required.

Apart from this, the code is very similar to the earlier UDA and I encourage you to get a grasp of it without me taking you through the code :-). Deploy the above UDA as done earlier and you can test the aggregate with the following SQL statement.


SELECT
	ph.PurchaseOrderID, [Items] = dbo.FormDelimitedString (p.ProductNumber)

FROM
	Purchasing.PurchaseOrderHeader ph
	INNER JOIN Purchasing.PurchaseOrderDetail pd ON ph.PurchaseOrderID = pd.PurchaseOrderID
	INNER JOIN Production.Product p ON pd.ProductID = p.ProductID
WHERE
	ph.PurchaseOrderID < 10
GROUP BY
	ph.PurchaseOrderID

If you execute the above code, you will see the following output:

Well that was easy! You can create many other aggregate functions as your situation demands. However, before you start writing .NET classes for all types of aggregations, verify whether the existing SQL aggregates itself can be used for the same and do performance tests to ensure that you aggregates perform the way you want them too. Sometimes, it may be cheaper to do the aggregation in the application tier of your application and free resources in the database server. Apply caution always!!

That brings us to the end of this article and we explored the new feature of user defined aggregates in SQL Server 2005. Have fun!

Home