| The PIVOT and UNPIVOT Operators Let's face it—users usually want to see data in tabular format, which is a bit of a challenge given that data in SQL Server is most often stored in a highly relational form. PIVOT is a T-SQL operator that you can specify in your FROM clause to rotate rows into columns and create a traditional crosstab query. Using PIVOT is easy. In your SELECT statement, you specify the values you want to pivot on. The following example in the AdventureWorks2008 database uses the order years (calculated using the DatePart function) as the columns. The FROM clause looks normal except for the PIVOT statement. This statement creates the value you want to show in the rows of the newly created columns. This example uses the aggregate SUM of TotalDue (a calculated field in the FROM clause). Then we use the FOR operator to list the values we want to pivot on in the OrderYear column. The example is shown in Listing 2-11. Listing 2-11. Creating tabular results with the PIVOT operator | SELECT CustomerID, [2001] AS Y2001, [2002] AS Y2002, [2003] AS Y2003, [2004] AS Y2004 FROM ( SELECT CustomerID, DATEPART(yyyy, OrderDate) AS OrderYear, TotalDue FROM Sales.SalesOrderHeader ) AS piv PIVOT ( SUM(TotalDue) FOR OrderYear IN([2001], [2002], [2003], [2004]) ) AS child ORDER BY CustomerID | Here are the results: CustomerID Y2001 Y2002 Y2003 Y2004 ----------- ---------- ----------- ------------ ------------ 1 40732.6067 72366.1284 NULL NULL 2 NULL1 5653.6715 12118.0275 4962.2705 3 39752.8421 168393.7021 219434.4265 51925.3549 4 NULL 263025.3113 373484.299 143525.6018 5 NULL 33370.6901 60206.9999 20641.1106 6 NULL NULL 668.4861 2979.3473 7 NULL 6651.036 3718.7804 NULL 8 NULL NULL 19439.2466 10900.0347 9 NULL 320.6283 11401.5975 5282.8652 10 NULL 96701.7401 291472.2172 204525.9634 ... That's all there is to it. Of course, this example is simplified to show you the concept; other, more sophisticated, aggregates are possible, and you can even use CTEs in the FROM clause. In any case, using PIVOT is simple. Using UNPIVOT You can use the UNPIVOT operator to normalize data that is already pivoted. For example, suppose you obtain pivoted data that shows, for each vendor, the number of orders placed by each employee. The code in Listing 2-12 creates such a table. Listing 2-12. Example table containing pivoted data | CREATE TABLE VendorEmployee (VendorID int, Emp1Orders int, Emp2Orders int, Emp3Orders int, Emp4Orders int, Emp5Orders int) GO INSERT INTO VendorEmployee VALUES(1, 4, 3, 5, 4, 4) INSERT INTO VendorEmployee VALUES(2, 4, 1, 5, 5, 5) INSERT INTO VendorEmployee VALUES(3, 4, 3, 5, 4, 4) INSERT INTO VendorEmployee VALUES(4, 4, 2, 5, 4, 4) INSERT INTO VendorEmployee VALUES(5, 5, 1, 5, 5, 5) | Our table looks like this: VendorID Emp1Orders Emp2Orders Emp3Orders Emp4Orders Emp5Orders ----------- ----------- ----------- ----------- ----------- ----------- 1 4 3 5 4 4 2 4 1 5 5 5 3 4 3 5 4 4 4 4 2 5 4 4 5 5 1 5 5 5 You might want to unpivot the data to display columns for vendor ID, employee, and number of orders. Listing 2-13 shows how to use the UNPIVOT operator to achieve this goal. Listing 2-13. Using the UNPIVOT operator | SELECT VendorId, Employee, Orders AS NumberOfOrders FROM (SELECT VendorId, Emp1Orders, Emp2Orders, Emp3Orders, Emp4Orders, Emp5Orders FROM VendorEmployee ) AS p UNPIVOT ( Orders FOR Employee IN (Emp1Orders, Emp2Orders, Emp3Orders, Emp4Orders, Emp5Orders) ) AS unpvt | Here are the results: VendorID Employee NumberOfOrders --------- ------------ ---------------------- 1 Emp1Orders 4 1 Emp2Orders 3 1 Emp3Orders 5 1 Emp4Orders 4 1 Emp5Orders 4 2 Emp1Orders 4 ...
Dynamically Pivoting Columns The problem with PIVOT is the same problem with CASE and other methods: you have to specify the columns. Consider the code in Listing 2-14. Listing 2-14. Statically driven PIVOT | SELECT * FROM (SELECT CustomerID, YEAR(OrderDate) AS OrderYear, TotalDue FROM Sales.SalesOrderHeader) AS header PIVOT ( SUM(TotalDue) FOR orderyear IN([2002],[2003],[2004]) ) AS piv | The results show us a nice crosstab query with the years displayed as columns: CustomerID 2002 2003 2004 ----------- ---------- ---------- ----------- 14324 NULL 2264.2536 3394.9247 22814 NULL 5.514 NULL 11407 NULL 59.659 NULL 28387 NULL NULL 645.2869 19897 NULL NULL 659.6408 15675 2699.9018 2682.9953 2580.1529 24165 NULL 2699.9018 666.8565 ... Because this data goes only up to 2004, what happens when you add 2005 to the data? Do you want to go into all your queries and add the current year to the IN clause? We can accommodate new years in the data by dynamically building the IN clause and then pro-grammatically writing the entire SQL statement. Once you have dynamically written the SQL statement, you can execute it using sp_executesql, as shown in Listing 2-15. Since all we have to do is generate dynamically the IN clause, creating a dynamic PIVOT in SQL Server is much easier than creating a dynamic CASE statement. The results are exactly the same as those shown following Listing 2-14, except that as new yearly data is added to the table, the query dynamically adds the column for it. Remember that your reporting engine will most likely not accommodate the new dynamic columns, but data-bound controls will. Listing 2-15. Dynamically driven PIVOT | DECLARE @tblOrderDate AS TABLE(y int NOT NULL PRIMARY KEY) INSERT INTO @tblOrderDate SELECT DISTINCT YEAR(OrderDate) FROM Sales.SalesOrderHeader DECLARE @cols AS nvarchar(max) DECLARE @years AS int SET @years = (SELECT MIN(y) FROM @tblOrderDate) SET @cols = N'' WHILE @years IS NOT NULL BEGIN SET @cols = @cols + N',[' + CAST(@years AS nvarchar(max)) + N']' SET @years = (SELECT MIN(y) FROM @tblOrderDate WHERE y > @years) END SET @cols = SUBSTRING(@cols, 2, LEN(@cols)) DECLARE @sql AS nvarchar(max) SET @sql = ' SELECT * FROM (SELECT CustomerId, YEAR(OrderDate) AS OrderYear, TotalDue FROM Sales.SalesOrderHeader ) AS a PIVOT ( SUM(TotalDue) FOR OrderYear IN(' + @cols + N') ) AS b ' PRINT @sql -- for debugging EXEC sp_executesql @sql | You can accomplish the same results using the newer CTE syntax instead of using the table variable, as shown in Listing 2-16. Listing 2-16. Dynamically driven PIVOT using a CTE | DECLARE @cols AS nvarchar(MAX) WITH YearsCTE AS (SELECT DISTINCT YEAR(OrderDate) as [Year] FROM Sales.SalesOrderHeader) SELECT @cols = ISNULL(@cols + ',[', '[') + CAST([YEAR] AS nvarchar(10)) + ']' FROM YearsCTE ORDER BY [YEAR] -- Construct the full T-SQL statement and execute it dynamically. DECLARE @sql AS nvarchar(MAX) SET @sql = ' SELECT * FROM (SELECT CustomerId, YEAR(OrderDate) AS OrderYear, TotalDue FROM Sales.SalesOrderHeader ) AS a PIVOT ( SUM(TotalDue) FOR OrderYear IN(' + @cols + N') ) AS b ' PRINT @sql -- for debugging EXEC sp_executesql @sql | The APPLY Operator APPLY is an operator that you specify in the FROM clause of a query. It enables you to invoke a table-valued function (TVF) for each row of an outer table. The flexibility of APPLY is evident when you use the outer table's columns as your function's arguments. The APPLY operator has two forms: CROSS APPLY and OUTER APPLY. CROSS APPLY doesn't return the outer table's row if the TVF returns an empty set for it; the OUTER APPLY returns a row with NULL values instead of the function's columns. To see how APPLY works, we'll first create a TVF that returns a table. Listing 2-17 shows a simple function that returns as a table the top n rows for a customer from the SalesOrderHeader table. Listing 2-17. Returning a table | CREATE FUNCTION fnGetCustomerOrders(@CustomerID int, @TopRecords bigint) RETURNS TABLE AS RETURN SELECT TOP (@TopRecords) * FROM Sales.SalesOrderHeader WHERE CustomerID = @CustomerID ORDER BY OrderDate DESC | After creating the fnGetCustomerOrders TVF, we call it from the query, as shown in Listing 2-18. Listing 2-18. Executing a query with APPLY | SELECT * FROM Sales.Customer cust CROSS APPLY fnGetCustomerOrders(CustomerID, 100) | This query returns all the records from the Customers table and then, as additional fields, the records from the Orders table (by way of the fnGetCustomerOrders function) that match for the customer ID because that's what's being passed in dynamically as the first argument to fnGetCustomerOrders. Because we passed the value 100 for the second parameter, rows for up to the first 100 orders per customer are generated and returned by this query. |
No comments:
Post a Comment