Pages

Thursday, June 9, 2011

BizTalk WCF-SQL Adapter to load a flat file into a SQL Database


Working with FlatFiles came through one of blog written by Thiago Almeida and found useful information and thought to share; below is extract from that

I have a flat file that contains a list of products. I need to load the contents of this flat file into a SQL Server 2008 table using BizTalk Server 2009 and the WCF-SQL Adapter. The table might already have some of the products in the flat file, and in this case the product row should be updated.


The data in my sample flat file was extracted from the Adventure Works sample database in the SQL Server 2008 samples in Codeplex, which gave me 504 products to play with.

Flat File and Debatching
-------------------------------------
As you might already know, messages with multiple items in them (multiple Products in this case) coming into BizTalk can be disassembled and debatched on their way in by the disassembler pipeline components. In this case, since it is a flat file that we are receiving, we will use the flat file disassembler component that comes out of the box with BizTalk.

In this post I want to go over loading the file into SQL Server in two ways: one by splitting the Product items into individual messages and loading them individually with the WCF-SQL Adapter; and another by not splitting the Product items and sending one single message to the WCF-SQL Adapter with all the products.

For that I created two flat file schemas, and both look like the below:

On one of the schemas the Product node has its ‘Max Occurs’ set to ‘unbounded’. The other schema has the Product node’s ‘Max Occurs’ set to 1. This property is what tells the flat file disassembler pipeline component if it should debatch the Products or not.

I created two BizTalk pipelines to handle the two different schemas. I dragged the flat file disassembler pipeline component to the disassemble stage of each pipeline, and selected the appropriate schema for each.




SQL Server Table and Stored Procedure
---------------------------------------------------------
On the SQL Server database side, we have a table called Product (what a surprise!) with the following columns:


We are going to call a stored procedure for each line in the flat file to load each product. An easy way to either insert the product if it doesn’t exist in the table or update it if it exists is to use the MERGE statement that is new in SQL Server 2008. So all we have in our stored procedure is the following:

CREATE PROCEDURE [dbo].[ADD_PRODUCT]

@ProductShortDescription varchar(50), @ProductFullDescription varchar(max), @UOM nchar(10), @UnitPrice money

AS BEGIN
SET NOCOUNT ON;
–Use merge statement to either insert or update product based on product short description

MERGE INTO Product AS Target

USING (SELECT @ProductShortDescription, @ProductFullDescription, @UOM, @UnitPrice)
AS Source(ProductShortDescription, ProductFullDescription, UOM, UnitPrice)
ON (Target.ProductShortDescription = Source.ProductShortDescription)
WHEN matched THEN

UPDATE SET ProductFullDescription = Source.ProductFullDescription, UOM = Source.UOM, UnitPrice = Source.UnitPrice
WHEN not matched THEN
INSERT (ProductShortDescription, ProductFullDescription, UOM, UnitPrice)
VALUES (Source.ProductShortDescription, Source.ProductFullDescription, Source.UOM, source.UnitPrice);
END

WCF-SQL Adapter Schemas
--------------------------------------------
To add the SQL Server schemas used by the WCF-SQL Adapter from the BizTalk solution you can right click on the BizTalk project, select Add, and then ‘Add Generated Items’. From there you can either choose the ‘Add Adapter Metadata’ or the ‘Consume Adapter Service’ options. They will both bring the ‘Consume Adapter Service’ wizard where you can connect to the target SQL Server database and select what items and operations you want to consume. In our case we are only interested on the ADD_PRODUCT strongly typed stored procedure:


This will give you a schema like the following for the ADD_PRODUCT stored procedure

Note that the ADD_PRODUCT node is the root node, and therefore can only exist once in the XML instances for this schema. This is the schema we are going to map to for the debatched Product information we get from the flat file schema with a max occurs of 1.  It is a straight map then from the single product flat file schema to the single stored procedure schema

That takes care of mapping the products when the flat file is being debatched into single product messages. Now what do we do about mapping all the products in the flat file to only one XML that is sent to the WCF-SQL Adapter? Here’s where the WCF-SQL Adapter’s composite operations come in handy. The composite operations in the adapter have been described on Richard Seroter’s book (free sample chapter on the WCF-SQL Adapter). I created a new schema with a root node ‘Request’ and a second root node ‘RequestResponse’. The first root node name isn’t really important, as long as the second root node name is the same as the first with a ‘Respose’ suffix. I then added the single ADD_PRODUCT schema as an XSD Import to my composite schema. This allows me to create an unbounded record under the ‘Request’ node and change it to have a data structure type of ns0:ADD_PRODUCT, and an unbounded record under the ‘RequestResponse’ node and change it to have a data structure type of ns0:ADD_PRODUCTResponse.


This allows us to map from the non debatched flat file schema to the schema created above

Calling the WCF-SQL Adapter
--------------------------------------------------------
After deploying the solution I created two receive ports and two respective receive locations – one of them configured with the debatching pipeline and the single product map, and the other configured with the single file pipeline and the composite operation map.


I then created two one way send ports with the WCF-Custom Adapter and the sqlBinding, each with a filter for one of the receive ports. The send port that filters on the debatched single product insert receive port is configured as follows, with the TypedProcedure/dbo/ADD_PRODUCT action:

The send port that filters on the single file with multiple products and composite operation map is configured as follows, with the CompositeOperation action:

Both send ports had a binding type of sqlBinding of course, with the default values (make sure useAmbientTransaction is enabled so that the stored procedure calls are inside a transaction):

Transactions and Conclusion
---------------------------------------------
So when we debatch the Products flat file on the way in we end up with multiple concurrent calls to the stored procedure via the WCF-SQL Adapter, each in its own transaction

When we map the entire file to the composite schema we end up with one transaction that wraps around all the stored procedure calls:

If we monitor the Transactions/sec for the database we see barely any activity when we use the single file method:

If we use the debatch multiple message method we some spikes as the multiple transaction to the database are made

As expected the single file method performs much faster for loading the 504 rows into the table. By placing a datetime column on the products table I could see the difference from the first insert to the last is only 254 milliseconds. With the debatch method BizTalk goes through the debatched records at a much slower pace taking around 16 seconds to load them all, since it has to map each debatched message, route multiple messages to the send port, create multiple transactions against SQL Server, etc.



After looking into it a bit more I also noticed that for the debatched scenario the message delivery throttling and message publishing throttling were kicking off for the BizTalk host loading the messages into SQL Server. By simply changing the number of samples that the host should base its throttling decision on to something over the 504 records being inserted the time for the debatched inserts went down to 4 seconds from the 16 seconds mentioned above:

The debatch method is still useful in many situations – if you need to perform extra steps for each message in the batch, or if your DBAs require one transaction for each stored procedure call, etc.

No comments:

Post a Comment