Blog Archive

Sunday, February 27, 2011

how i learned data warehousing and OLAP

It was my view in my academic life that only learning building a SaaS is not going to give me much scope in market as already thousands of Saas is roaming in market even as open source. And I felt like that the way corporate are adopting Saas in a adhoc basis they will be surely in trouble with having intelligent reports and it would be hard for them to create decisions from them. And I was searching such solutions which can integrate data from sporadic Saas and provide a combine view. And I found Data warehousing. And as my third years project I proposed to my supervisor the subject “Designing Data warehouse” and he wondered but accepted. But I was in sea and not getting a way to see any practical implementation my nearby, meanwhile I have covered Bill Inmon and Ralph Kimball. But it was not satisfying my thirst as all was just theory. I was dying to implement a demo of my own. And this thrust turned me to Oracle. At that time I heard that Oracle 9i got release with Data warehousing concepts. I jumped for it and installed in my PC. But still I was in dark .. not getting where to start. 

Thousands and Thousands thanks to oracle team who has included SH (sales history) schema in oracle. After getting this schema I saw a light of having my dream as true. I got bulk of data provided by them.
I have created a dummy sales transactional database and used SH schema data in a modified way to pour that. Huh! Now I have a huge transactional database and I have already an empty start Schema (SH) database.  Here comes the need of ETL now. I was so much pleased to reach in such position. Baby now it is time to do some ETL .. but how?  That time the tool “Sql loader” showed me the easiest way to do that. And created some sql scripts and succeeded to perform ETL and schedule that. I was so happy to have the data in my SH schema (Data warehouse :D ).  And I cannot forget learning building the Time table for the first time. Such a table was most interesting thing for me. Now data is there, fact table there.. time table there but how to create report from them and how to provide decision supporting reports.  And a simple query takes huge time. 

Here comes the Advanced query rollup… with clause .. thanks oracle for these.  But how to minimize the query time… :D here I learned the materialized View.. I was highly excited to implement them one by one and getting the outputs. But the reports were not satisfying me.  What is the benefit of having this much data and only some ordinary reports.

Where comes the OLAP. Thanks again to oracle. I created first OLAP metadata with PL/SQL definitions. It was really much interesting to create dimensions and form a cube. Earlier I have read lot of cube but writing one’s definition in my own hand was really excellent. WOW..  It was now sounding to me I have now a real useful data warehouse build by me. But I was failing to make my friends and teachers understand this thing as they were just trying to see some outputs. And I was getting frustrated that I was unable make them understand the benefit of data warehouse with OLAP only because of having lack of some beautiful UI.
And where comes the Oracle Discoverer. WOW that this tool was really a striker for me. I used that over my SH data warehouse and showed my interested friends and teachers and now they got what I was saying in the name of Data warehouse.

I was trying to switch my java programming job and get a data warehouse related job but interestingly found that corporate are expecting some white haired aged guy for their data warehouses and getting me in their interview boards they are not satisfied. I tried several places but I saw doubt in their eyes and at the end I did not get the jobs.

Here comes the Pentaho. An American company was searching some java guy and in requirement they have written – having data warehouse experience is advantage. WOW it sounded like it is for me. I submitted C.V. and they called for interview and interestingly I got the job next day.  And I discovered it is Pentaho with which I have to build data warehouse on Telecom billing and do reporting.
WOW its excellent . I was amazed to get “Kettle”  instead of “sql loader” for the ETL .. it is really a dynamic tool and very much useful.  Doing star schema in Mysql was also a interesting thing for me. And then Mondrian OLAP server  instead of Oracle OLAP.. I was just happy to get lost with that and then MDX with Jpivot instead of Oracle Discoverer.. I was really happy in working as it was all in Java and I was playing with it shaping all in any way I liked.

I was writing the whole to store some memory in written and also in a hope that some new  data warehouse  interested guy may read it and find some tips to go ahead.

  

1 comment: