Friday 21 June 2013

Data Distribution on Performacne DB- Internal Testing


We had an  performance issue which we found in our internal performance lab ; The issues was logged as blocker and release was pushed off! The whole  team was on fire!. The release went into red! Round the clock focus went on the bug. After the development team few rounds of analysis after push back for  reasons below
1. as environment issues , this task got completed in my environment very early.
2. Unit testing we are not seeing this issues( mind unit testing in DW is heavy transaction and sometime bring down your system)

We finally nailed down the issues with one ETL package talking more time and it was because of data in one of our transaction system which we have increased 3 times . We were less on time to fix the bug in the cycle so we thought to review our data against the existing customers in production . On our analysis the largest current customer data on the said table was less than the data we have originally in performance database.Then we started the process to analysis potential customer and we saw of all 10+ biggest customer database we analyzed only 1 Customer DB has data more than originally present in Performance DB.

Sometime you might be investing effort on one off requirement whose frequency of happening is less than 1 %.  We changed our strategy of Test data and moved back to original data in this table and plan to include the data only when we will onboard this Customer.
For the general testing , The data to be populated in Database should be more  towards the mean than on extreme to avoid the such conditions!. We can plot the normal distribution of data and try to populate the data in Performance DB which is more towards the median.


No comments:

Post a Comment