Skip to main content

Formatting XML in SQL Server 2008

Since SQL SERVER 2005 it is possible to store XML data in database by using dedicated data type - XML. This type solved many programming problems such storing XML data as simple text which was very unpleasant to maintenance  and detecting errors in document structure. Now stroring XML data is very simple and requires only the creation of column in XML data type. But sometimes we may want to do reverse operation. I mean generating XML document or fragment from non XML columns in a single SELECT statement. Of course SQL Server meets our expectations and allow to transform any results of SELECT statement (which always returns table) to XML structure.

To start transforming selected results to XML format we must use FOR XML clause right after last statemnt in normal SELECT. For example instead of using:

Code Snippet
  1. SELECT ProductNumber,Name,ListPrice,Color
  2. FROM Production.Product


we should use:

Code Snippet
  1. USE [AdventureWorks]
  2. SELECT ProductNumber,Name,ListPrice,Color
  3. FROM Production.Product
  4. FOR XML


But if  we want run  this query server going to return an error 'Incorrect syntax near 'XML'. So now we realize that it is so easy - sorry :) We just tell SQL Server to prepare XML for us but we didn`t tell them how to do that. In SQL Server 2005 and above there is a short list of modes in this topic:

  • FOR XML RAW - generate a single <row> for each single row in result table returned by SELECT statement.
  • FOR XML AUTO - generate nesting structure of XML document based on structure of the SELECT statement. 
  • FOR XML EXPLICIT - allow to generate complex structure which we can control. This provides a lot of flexibility but does so at the cost of much more greater complexity and performance.
  • FOR XML PATH - as EXPLICIT its allow to customize output XML  but here we can do it much more easier - queries are less complex.
Ok, so now we are able to fix our last query so lets do this by using each of them:

Code Snippet
  1. USE [AdventureWorks]
  2. SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
  3. FOR XML RAW
  4.  
  5. <row ProductNumber="AR-5381" Name="Adjustable Race" ListPrice="0.0000" />
  6. <row ProductNumber="BA-8327" Name="Bearing Ball" ListPrice="0.0000" />
  7. <row ProductNumber="BE-2349" Name="BB Ball Bearing" ListPrice="0.0000" />
  8. ------------------------------------------------------------------------------
  9. SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
  10. FOR XML AUTO
  11.  
  12. <Production.Product ProductNumber="AR-5381" Name="Adjustable Race" ListPrice="0.0000" />
  13. <Production.Product ProductNumber="BA-8327" Name="Bearing Ball" ListPrice="0.0000" />
  14. <Production.Product ProductNumber="BE-2349" Name="BB Ball Bearing" ListPrice="0.0000" />
  15. ------------------------------------------------------------------------------
  16. SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
  17. FOR XML EXPLICIT
  18.  
  19. --Msg 6803, Level 16, State 1, Line 1
  20. --FOR XML EXPLICIT requires the first column to hold positive integers that represent XML tag IDs.
  21. ------------------------------------------------------------------------------
  22. SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
  23. FOR XML PATH
  24. <row>
  25.   <ProductNumber>AR-5381</ProductNumber>
  26.   <Name>Adjustable Race</Name>
  27.   <ListPrice>0.0000</ListPrice>
  28. </row>
  29. <row>
  30. ------------------------------------------------------------------------------


You can see that three FOR XML type returns different XML structure (RAW, AUTO, PATH) and one returns error message (EXPLICIT). To explain behavior of each type we need to look at them more precisely.

This mode returns a  single XML fragment (not whole document) which consist of many <row ... /> nodes with many attributes (each column or alias from base SELECT command related to one attribute in output XML) but this was only default behavior. We may tell SQL SERVER to generate sub-nodes insted of attributes by using FOR XML RAW, ELEMENTS clause.

Code Snippet
  1. SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
  2. FOR XML RAW, ELEMENTS
  3.  
  4. <row>
  5.   <ProductNumber>AR-5381</ProductNumber>
  6.   <Name>Adjustable Race</Name>
  7.   <ListPrice>0.0000</ListPrice>
  8. </row>
  9. <row>
  10.   <ProductNumber>BA-8327</ProductNumber>
  11.   <Name>Bearing Ball</Name>
  12.   <ListPrice>0.0000</ListPrice>
  13. </row>
We may use much more options in RAW mode and these are:

  • FOR XML RAW ('custom_name') - rename <row />  node to name given in brackets but we can using white space in node name
  • FOR XML RAW, ROOT - result XML has exactly one root node. By default root is named <root /> but this may be changes to custom name in the same way as in case <row/> (ex. FOR XML RAW ('node'), ROOT ('my_root'))
If You want to use RAW mode to obtain hierarchical structure this is not god idea....


This mode by default creates list on nodes without root, but unlike the RAW by default generates nodes named the same as in SELECT statement. Additionally allow to create nesting structure of XML document depends of the structure of the SELECT. The root,elements and custom root name rules are the  same as in RAW mode. To demonstrate how AUTO mode works lets focus on following example.

Code Snippet
  1. SELECT pm.Name, p.ProductNumber,p.Name,p.Color
  2. FROM Production.ProductModel as pm
  3. JOIN Production.Product as p on p.ProductModelID = pm.ProductModelID
  4. FOR XML AUTO,ROOT ('Inventory')
  5.  
  6. <Inventory>
  7.   <pm Name="HL Road Frame">
  8.     <p ProductNumber="FR-R92B-58" Name="HL Road Frame - Black, 58" />
  9.     <p ProductNumber="FR-R92R-58" Name="HL Road Frame - Red, 58" Color="Red" />
  10.   </pm>
  11.   <pm Name="Sport-100">
  12.     <p ProductNumber="HL-U509-R" Name="Sport-100 Helmet, Red" Color="Red" />
  13.     <p ProductNumber="HL-U509" Name="Sport-100 Helmet, Black" Color="Black" />
  14.   </pm>
  15.   <pm Name="Mountain Bike Socks">
  16.     <p ProductNumber="SO-B909-M" Name="Mountain Bike Socks, M" Color="White" />
  17.   </pm>
  18. </Inventory>

As we can see by using AUTO mode we are able to generate hierarchical XML  structure. But here we have several problem to fix. Firstly we not descriptive node names. This because AUTO mode uses name/aliases from original SELECT to name result nodes. To fix this just use more descriptive aliases for columns. Second things is sorting, because we don`t use any ordering nodes in result XML are non sorted and whole document looks  like unpleasant. Just use ORDER BY clause to fix problem. Last problem is that   first product  'FR-R92B-58' in their node haven`t attribute Color. Such thing may occur if value in database was set to NULL and SQL SERVER omit this value. Fix is very simple, just add special keyword XSINIL (we can use this only with ELEMENTS directive) at the end of FOR XML - this will add the schema to handle  xsi:nil="true". After  all changes result presents as follow.


Code Snippet
  1. SELECT [ProductModel].Name, [Product].ProductNumber,[Product].Name,[Product].Color
  2. FROM Production.ProductModel as [ProductModel]
  3. JOIN Production.Product as [Product] on [Product].ProductModelID = [ProductModel].ProductModelID
  4. ORDER BY [ProductModel].Name,[Product].Name
  5. FOR XML AUTO , ROOT('inventory'), ELEMENTS XSINIL
  6.  
  7. <inventory xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  8.   <ProductModel>
  9.     <Name>All-Purpose Bike Stand</Name>
  10.     <Product>
  11.       <ProductNumber>ST-1401</ProductNumber>
  12.       <Name>All-Purpose Bike Stand</Name>
  13.       <Color xsi:nil="true" />
  14.     </Product>
  15.   </ProductModel>
  16.   <ProductModel>
  17.     <Name>Bike Wash</Name>
  18.     <Product>
  19.       <ProductNumber>CL-9009</ProductNumber>
  20.       <Name>Bike Wash - Dissolver</Name>
  21.       <Color xsi:nil="true" />
  22.     </Product>
  23.   </ProductModel>
  24.   <ProductModel>
  25.     <Name>Cable Lock</Name>
  26.     <Product>
  27.       <ProductNumber>LO-C100</ProductNumber>
  28.       <Name>Cable Lock</Name>
  29.       <Color xsi:nil="true" />
  30.     </Product>
  31.   </ProductModel>
  32.   <ProductModel>
  33.     <Name>Chain</Name>
  34.     <Product>
  35.       <ProductNumber>CH-0234</ProductNumber>
  36.       <Name>Chain</Name>
  37.       <Color>Silver</Color>
  38.     </Product>
  39.   </ProductModel>
  40.   <ProductModel>
  41.     <Name>Classic Vest</Name>
  42.     <Product>
  43.       <ProductNumber>VE-C304-L</ProductNumber>
  44.       <Name>Classic Vest, L</Name>
  45.       <Color>Blue</Color>
  46.     </Product>
  47.   </ProductModel>
  48. </inventory>

EXPLICIT
This mode gives the greatest control over result XML document but requires work the most. It`s good in the situation where you want to have hierarchical and complex document. Because this mode requires special table structure returned by SELECT  statement good idea is firstly create appropriate table and then add for command to construct XML. But what 'special table structure' does mean? As we can see in picture 1. if we want to use EXPLICIT mode we have to create table that contains:

  • TAG column at first position: this column is used by SQL SERVER to detect nesting level of each row.
  • Parent: this column at second position tell SQL SERVER into which node each of children node is located. For top level nodes (not root node!) we pass NULL and for each next node in nesting hierarchy we increment this value at 1 point.
  • Rest of columns: each columns from position 3 in table  will be converted into node or attribute in result XML. But as You can see naming of those columns play the very important role and general pattern of naming looks NodeName!TagID!AttributeName for store value as attribute or  NodeName!TagID!AttributeName!Element for store value as element text - please notice that in this case 'Element' is constant.

Pic.1. 'Special table structure'.
Now when we have our 'special table' we can add FOR XML EXPLICIT clause to produce XML  document (additionally with the root element). Please remembaer abut sorting to produce more readable output.


Code Snippet
  1. SELECT 1 as TAG, NULL as Parent,
  2.              [ProductModel].ProductModelID as [Model!1!ID],
  3.              [ProductModel].Name as [Model!1!Name],
  4.              NULL as [Product!2!Number!Element],
  5.              NULL as [Product!2!Name],
  6.              NULL as [Product!2!Color]
  7. FROM Production.ProductModel as [ProductModel]
  8. JOIN Production.Product as [Product] on [Product].ProductModelID = [ProductModel].ProductModelID
  9. UNION ALL
  10. SELECT 2, 1,
  11.             [ProductModel].ProductModelID,
  12.              NULL,
  13.              [Product].ProductNumber,
  14.              [Product].Name,
  15.              [Product].Color
  16. FROM Production.ProductModel as [ProductModel]
  17. JOIN Production.Product as [Product] on [Product].ProductModelID = [ProductModel].ProductModelID
  18. ORDER BY [Model!1!ID]
  19. FOR XML EXPLICIT, ROOT ('Inventory')
  20.  
  21. <Inventory>
  22.   <Model ID="1" Name="Classic Vest">
  23.     <Product Name="Classic Vest, S" Color="Blue">
  24.       <Number>VE-C304-S</Number>
  25.     </Product>
  26.     <Product Name="Classic Vest, M" Color="Blue">
  27.       <Number>VE-C304-M</Number>
  28.     </Product>
  29.     <Product Name="Classic Vest, L" Color="Blue">
  30.       <Number>VE-C304-L</Number>
  31.     </Product>
  32.     <Product Name="AWC Logo Cap" Color="Multi">
  33.       <Number>CA-1098</Number>
  34.     </Product>
  35.   </Model>
  36.   <Model ID="2" Name="Cycling Cap" />
  37.   <Model ID="3" Name="Full-Finger Gloves">
  38.     <Product Name="Full-Finger Gloves, S" Color="Black">
  39.       <Number>GL-F110-S</Number>
  40.     </Product>
  41.     <Product Name="Full-Finger Gloves, M" Color="Black">
  42.       <Number>GL-F110-M</Number>
  43.     </Product>
  44.     <Product Name="Full-Finger Gloves, L" Color="Black">
  45.       <Number>GL-F110-L</Number>
  46.     </Product>
  47.     <Product Name="Half-Finger Gloves, S" Color="Black">
  48.       <Number>GL-H102-S</Number>
  49.     </Product>
  50.     <Product Name="Half-Finger Gloves, M" Color="Black">
  51.       <Number>GL-H102-M</Number>
  52.     </Product>
  53.     <Product Name="Half-Finger Gloves, L" Color="Black">
  54.       <Number>GL-H102-L</Number>
  55.     </Product>
  56.   </Model>
  57. </Inventory>


PATH
We can read in MSDN 'PATH mode is also a simpler way to introduce additional nesting for representing complex properties' and whole is true. PATH mode is much more easier way than EXPLICITY mode to create complex XML. This mode based on additional technology called XPATH which allow to query XML data but in this case reverse application was used. Its  means that XML document hierarchy is contructed at runtime based on column name from the underlying SELECT statement.

The easiest way  to introduce custom attributes to XML output is to add column aliases to each column in base SELECT statement (result as below). The problem with outputted XML is that we have a lot of <row /> node which is obviously not a professional, additionally we haven`t root node but as You can see by using @ before column alias we may create a attribute of current node ex. @Name. Next problem is that we don`t have any nested nodes because if we add some XPATX name like 'node/test' the result will be far from true.
Code Snippet
  1. SELECT         [ProductModel].ProductModelID as '@ID',
  2.              [ProductModel].Name as '@Name'
  3. FROM Production.ProductModel as [ProductModel]
  4. ORDER BY  [ProductModel].ProductModelID
  5. FOR XML PATH
  6.  
  7. <row ID="1" Name="Classic Vest" />
  8. <row ID="2" Name="Cycling Cap" />
  9. <row ID="3" Name="Full-Finger Gloves" />
  10. <row ID="4" Name="Half-Finger Gloves" />
  11. <row ID="5" Name="HL Mountain Frame" />
  12. <row ID="6" Name="HL Road Frame" />
  13. <row ID="7" Name="HL Touring Frame" />

To fix problems listed above we have to make some changes in base SELECT query. Let`s assume that we want to have root node named 'Models'. As You can read earlier in this article this problem can be solve by adding ROOT('Models') at the end of FOR XML PATH statement. Furthermore we want to remove odd <row /> nodes from output XML replacing them with 'Model' nodes - this can be done easily by adding  name after FOR XML PATH ex. FOR XML PATH ('Model'). Last this we want to do is to attach products list in each 'Model' node. This is not as intuitive as we can think and to solve this problem we must be familiar  with TYPE  SQL keyword. By using it we can create SQL type content (hyperlinked in Management Studio) in sub-query so we are able to return base data time in the same row as XML. But in this case we need to use TYPE keyword to create sub-nodes collection in our output document. Because we cant to add list of products to each product model (example below) we need to use sub-query inside in which we want to make a list of products where each products contains two attributes (name and color) and list (sub-nodes) of  product number. To achieve this we need to make a sub-query in our SELECT statement and inside it we have to make correlated query (in example. [Product].ProductModelID = [ProductModel].ProductModelID). Whole sub-query result need an alias which will be parent node for whole product list in output example. Our sub-query need to return XML element so to to this right we have to use TYPE keyword with FOR XML PATH mode.

Code Snippet
  1. SELECT         [ProductModel].ProductModelID as '@ID',
  2.              [ProductModel].Name as '@Name',
  3.              (SELECT
  4.                  [Product].Name  as '@Name',
  5.                  [Product].Color as '@Color',
  6.                  [Product].ProductNumber as 'Number'
  7.               FROM  Production.Product [Product]
  8.               WHERE [Product].ProductModelID = [ProductModel].ProductModelID
  9.               FOR XML PATH ('Product'), TYPE
  10.               ) as 'Products'
  11. FROM Production.ProductModel as [ProductModel]
  12.   ORDER BY[ProductModel].ProductModelID
  13. FOR XML PATH('Model'),ROOT('Models')
  14.  
  15. <Models>
  16.   <Model ID="1" Name="Classic Vest">
  17.     <Products>
  18.       <Product Name="Classic Vest, S" Color="Blue">
  19.         <Number>VE-C304-S</Number>
  20.       </Product>
  21.       <Product Name="Classic Vest, M" Color="Blue">
  22.         <Number>VE-C304-M</Number>
  23.       </Product>
  24.       <Product Name="Classic Vest, L" Color="Blue">
  25.         <Number>VE-C304-L</Number>
  26.       </Product>
  27.     </Products>
  28.   </Model>
  29.   <Model ID="2" Name="Cycling Cap">
  30.     <Products>
  31.       <Product Name="AWC Logo Cap" Color="Multi">
  32.         <Number>CA-1098</Number>
  33.       </Product>
  34.     </Products>
  35.   </Model>
  36. </Models>

After making appropriate adjustments our query is ready to deploy on server to returns pretty looks XML document.


Thank You...

Popular posts from this blog

Full-Text Search with PDF in Microsoft SQL Server

Last week I get interesting task to develop. The task was to search input text in PDF file stored in database as FileStream. The task implementation took me some time so I decided to share it with other developers. Here we are going to use SQL Server 2008 R2 (x64 Developers Edition), external driver from Adobe, Full-Text Search technology and FileStream technology.Because this sems a little bit comlicated let`s make this topic clear and do it step by step. 1) Enable FileStream - this part is pretty easy, just check wheter You already have enabled filestream on Your SQL Server instance - if no simply enable it as in the picture below. Picture 1. Enable filestream in SQL Server instance. 2) Create SQL table to store files  - mainly ther will be PDF file stored but some others is also be allright. Out table DocumentFile will be created in dbo schema and contain one column primary key with default value as sequential GUID. Important this is out table contains FileStream

Playing with a .NET types definition

In the last few days I spent some time trying to unify structure of one of the project I`m currently working on. Most of the changes were about changing variable types because it`s were not used right way. That is why in this post I want to share my observations and practices with you. First of all we need to understand what ' variable definition ' is and how it`s different from ' variable initialization '. This part should be pretty straightforward:   variable definition  consist of data type and variable name only <data_type> <variable_name> ; for example int i ; . It`s important to understand how variable definition affects your code because it behaves differently depends weather you work with value or reference types. In the case of value types after defining variable it always has default value and it`s never null value. However after defined reference type variable without initializing it has null value by default. variable initialization  is

Persisting Enum in database with Entity Framework

Problem statement We all want to write clean code and follow best coding practices. This all engineers 'North Star' goal which in many cases can not be easily achievable because of many potential difficulties with converting our ideas/good practices into working solutions.  One of an example I recently came across was about using ASP.NET Core and Entity Framework 5 to store Enum values in a relational database (like Azure SQL). Why is this a problem you might ask... and my answer here is that you want to work with Enum types in your code but persist an integer in your databases. You can think about in that way. Why we use data types at all when everything could be just a string which is getting converted into a desirable type when needed. This 'all-string' approach is of course a huge anti-pattern and a bad practice for many reasons with few being: degraded performance, increased storage space, increased code duplication.  Pre-requirements 1. Status enum type definition