Install and Configure Pentaho Data Integrator

Introduction

Pentaho is an open source business intelligence suite owned by Hitachi data system. They provide the wild range of products.

  1. Big Data Solutions .
  2. Data Integration (ETL) .
  3. OLAP Services .
  4. Reporting tools .
  5. Dashboarding .
  6. Data mining.
In this point, I will discuss with you about Pentaho Data Integrator and later I will start a discussion about other tools.
There is two main release in Pentaho data integrator
  1. Pentaho Community Edition.
  2. Pentaho Enterprise Edition.
There are two main differences between this two edition. scheduling and technical support options available only with enterprise edition. In this tutorial, I will discuss with you how to install and configure Pentaho community edition.

What is E-T-L

This is an abstract view of E-T-L Architecture. If you want more about E-T-L Read this Wiki article.

ETL_input_output

Download

1. Download Pentaho CE
http://community.pentaho.com/
Click download section button

image

2. Click all OS button to start download from SourceForge

image

Install Pentaho CE

1. Once you finish your download, You should extract the zip file into any locations and after open the extracted folder, But you couldn’t find any executable file to install Pentaho. Because Pentaho CE is a portable version. So you don't want to any installation process to start Pentaho. (Anyway if you buy or download the trial of enterprise version you should have the executable file to start installation). So you should find below file to run Pentaho.
spoon.bat

image

2. Pentaho CE is opening.

image

Configure Pentaho CE

Pentaho has two type of repository. (If you want to create DB repository, First step is to create DB user for Pentaho repository)

  1. Database repository
  2. File system repository


pentaho-img05

As a first step, I will discuss with you that how to create DB repository and as a second step will explain that how to create file repository. If you create DB repository you will have many futures like that you can find transformation detail by querying your repository tables and also you can keep historical log details in separate tables.

Create Table base repository.

1. First, you should create DB user/schema (or separated DB in MySQL) in your database. (In this tutorial, I will use oracle DB, However you can use any relational DB system to create this repository like as a MySQl, MSSQL or DB2)

image

2. Once you create the user, Open Pentaho and follow this steps.
  • Tools –> Repository –> Connect
image

3. Click add new repository button.

image

4. Select  ‘Kettle database repository’ and click ‘OK

image

5.  Click ‘New’ button to add new database connection

image

6. Select your database type and fill the details and click ‘OK’. (If you want you can test your connection using test button. )

image

Test result

image

7. Select the created database connection from 'Select Database Connection' and enter the name and description for the repository and click ‘Creta or Upgrade’ to create repository in the database.

image

8. Click 'Yes' if you agree.

image

9. Click 'Yes' if you agree.

image

10. This is SQL scripts to create repository table in the database, Click ‘ Execute’ to execute scripts.

image

10. This is SQL execute result. Check the end of this result. If you success, Its should be ‘XX SQL statements executed’. If you success click ‘ OK’ and close SQL script windows and click ‘OK’ in repository information window to finalise repository creation process.

image

11. Select your repository name, Enter login details and click ‘OK’.

image

Default login details for admin user
Usre Name : admin
Password : admin
12. This is your Pentaho main windows

image

13. You can explore your repository by following this steps
Tools –> Repository –> Explore
image

14. This is your repository explore window.

image

Create file base repository.

1.  Go to tools –> Repository –> Connect.

image

2. This is repository connection windows

image

3. Click this add new repository button

image

4. Select your repository type as a ‘kettle file repository’
.
image

5. This is file repository configuration window. Fill the required details like as a file directory, Repository Name and also you can mention wether is will be the read-only repository or not. and click ‘OK

image

6. Now click your repository name and click ‘ OK’ to open repository. (When you use file system you don;t want user account)

image

7. This is your Pentaho interface.

image


Install and Configure Pentaho Data Integrator Install and Configure Pentaho Data Integrator Reviewed by Lilantha Lakmal on 2:40:00 PM Rating: 5

No comments:

Powered by Blogger.