I ran across an interesting problem when trying to tune some SSIS data source performance issues the other day. I had noticed that we were doing a pretty large select out of a table filtered on a handful of date-related columns. We then pulled in most of the data from the table. Well, SQL Server saw this query and apparently figured that since we needed most of the columns, a table scan would be the most efficient way to get that data. Since in real life, we only need a very small percentage of those rows, I decided to rewrite the select to pull in the PK values first, then pull in the data based on that subset.
My first attempt was to use a Temp Table. Sadly, using a temp table does not work in SSIS Data Sources, even if you put that inside a stored procedure. Disappointed, but not deterred, I decided to use a Table Variable. I quickly made the changes and tested them with some values I knew would returned quite a few results. Great – no worries. I was still using a stored procedure at the time and used some of our standard code to start it off so had no issues.
Ultimately, I didn’t want to write a bunch of stored procedures that I would need to maintain outside of SSIS just for this particular load. We had the code already inside of the OLE DB Sources, just not optimized as much as I’d like. I started pulling out the old queries, adding a table variable, populating it with the necessary PK values, the using that to filter out the main data set. Save, deploy, run, see major speed increase.
Then, I looked at the actual insert/update counts. Yes, it ran much faster, but in the process, it didn’t pick up any results. I was pretty confused because the step was successful, the code ran as expected through SSMS, there were no errors or warnings. I re-verified that running that same code got results in SSMS; it did. I started doing some searching to see what might be happening and found this post by Dustin Ryan.
In short, it seems that if you have row counts being returned (default behavior), you return a number of rows affected when you insert the table variable, which results in a “Done in Proc” type message, which in turn tells SSIS that the data flow is done and moves on to the next step. The workaround? Add SET NOCOUNT ON; to the beginning of your OLEDB Source query (or make sure this is in your stored proc – we tend to put that in our procs by default).
Once again, thanks to Dustin Ryan for his post on this topic. There’s a good possibility I’d still be struggling with the root cause of this without that assistance.
Holy cow! I was having the exact same problem and this article gave me the fix. Thanks so much for posting this.
ReplyDelete--Calvin
Glad it was helpful, Calvin. I messed around with this for a day or so before finding Dustin's post. Figured it wouldn't hurt to summarize in case others had this problem.
ReplyDeleteWhat do you mean by PK values?
ReplyDeletePK Values == Primary Key values. In my particular cases, the clustered indexes tend to be on the Primary Key, thus my desire to pull those into a table that can be used in a join against the main table. Thus if you have a table called People with a primary key of PeopleID that is also the clustered index, I would consider my query to find the various PeopleID values and put them into a table variable. From there, I would join to the People table to fetch the actual data needed using the Primary Key.
ReplyDeleteThank You! Had the same issue..SET COUNT ON works!!
ReplyDelete