Montag, 1. Februar 2010

Open Source tool for Data cleansing and Master Data Management

Last weekend SQL Power released an improved version of SQL Power DQguru (formerly known as SQL Power MatchMaker), one of the few open source tools for data cleansing and master data management (MDM) available. Version 0.96 brings a new feature that allows you to run SQL Power DQguru from command line. This allows you to integrate it into batch scripts and your ETL jobs.

As a BI consultant for SQL Power I have used SQL Power DQguru in different projects and it has made my job a lot easier. Some of the features I like the most are:

  • Easy connection to any database with JDBC drivers, incl. SQL Server, Oracle, MySQL, Postgres
  • Lets you create complex merge rules so your dependent data will always be updated when you merge records.
  • You can combine over 25 steps to find possible duplicate data with a match rule, for example:
    • Word Count
    • Regular Expressions
    • Substrings
    • Retain certain characters
    • Translate Words, you can create your own translation rules.
  • You can preview how your data will look like when you apply the match rules
  • Automatic Address correction (for Canadian addresses, Premium version)
Here is a example how a simple match rule could look like using some of the available steps:


Even the user interface is mostly straight forward, it might be useful to take advantage of the user guide which is available for a small fee. You will see SQL Power DQguru is very powerful if you know how to use it.

Freitag, 29. Januar 2010

Using the Community Build Framework for Pentaho

Recently I had to prepare a installation for the Pentaho BI Server (CE) and I decided to try the Community Build Framework (CBF) from Pedro Alves. I had to install the server on a test and a production environment so it seemed to fit perfectly for my requirements.

It is working fine now and helps a lot in applying changes to the installation having a clean structure but it took me quiet a few hours till I had it working (probably because I'm not an expert when it comes to using ant & Co.)

Here are some issues you should be aware of:

  • You'll need Java 1.6.
  • Make sure your path to ant, java but especially the project folder doesn't contain any spaces. Spaces will only cause problems.
  • Tomcat 6 is not supported yet.
  • I recommend setting the solution paths to the folder "C:/...../project-client/solution" until you figured out how CBF works in detail.
You will have your CBF ready to run a lot faster than I did if you keep these issues in mind. I'm sure I'll use CBF a lot more often in the future.

If need more information on how to setup your own Pentaho installation I highly recommend checking this website http://www.prashantraju.com/ (besides the Pentaho Wiki).

Freitag, 22. Januar 2010

Stop annoying applications in your Facebook news feed

With getting more and more friends on Facebook, I also get a whole bunch of them playing Mafia Wars, Farmville and all these other applications. And of course every day they have to share with me their latest score. I just found a good way how to stop getting these annoying updates in my news feed:

  • In the entry of the application in your news feed, click on the application name (highlighted in the screenshot). This will lead you to the application page.


  • On the application page you will see a link "Block Application" (Highlighted again). Click on it.




  • Facebook will ask you to confirm your decision. Confirm to block it.
  • Go back to your Facebook start page (news feed). All entries from this application will be gone!


    Donnerstag, 10. Dezember 2009

    SQL Power's Wabit - A feature overview

    Last week I had some time to create a short screencast to show some features of SQL Power's Wabit open source version. The video could be more professional (My headset didn't like me too much) but I was to busy with other projects to  have more revisions. You are very welcome to share your ideas, critics and comments.

    Here you go:

    (Watch on Youtube: SQL Power's Wabit - Feature overview)

    Dienstag, 20. Oktober 2009

    Real-Time Business Intelligence with Wabit & SQLstream

    The last week I got the chance to prepare a screencast of SQL Power's new real-time BI solution. It uses a SQLstream server as backend and Wabit as a BI reporting tool. Both the Open Source and the Enterprise Edition of Wabit can be used for it.

    Here is the screencast (it's the best to watch it in full screen):



    Comments are welcome!

    UPDATE: The full offer is now available on SQL Power Real-Time BI solution.

    Samstag, 3. Oktober 2009

    Delta Generation with Kettle

    In one of my current project I have to do lots delta generation to figure out if any data changed and be able to work differently with the data depending if it's similar, new, changed, or deleted. I came up with the following transformation:


    Mittwoch, 23. September 2009

    Products you don't expect to be 'Made in China' - Del Monte fruit cups

    Since I moved to Canada back in March I have started to realize how many products are actually made in China. Back in Germany you could also buy lots of stuff from China but you mostly had the choice between German or Europe products and Chinese products.
    When I went to Food Basics in Oakville a couple weeks ago to get some apples I stood in front of a huge tray of Chinese apples! Aren't there enough apples in Ontario, Canada or the US? Even Mexico would probably be closer than China.
    Another day my wife bought Del Monte fruit cups in the grocery store. I checked the label when I was going to eat it and i decided to leave it in the fridge. First of all it is 'Made in China' (again I guess no other country in this world has fruit) and second it contains artificial flavor. How bad must the fruit inside be that you need artificial flavor (and does anybody in China controls how it is made)?
    For my part I'll check the labels more closely whenever I buy any kind of product, especially when it comes to food. My health but also the economy of our country (and the other western states) is too important too me to ruin them with something thats just a few cents cheaper.

    (This blog entry will hopefully become part of a small series on 'Made in China' related subjects.)