Tuesday, June 25, 2013

How-to: Hadoop Integration with OBIEE 11g

The newest release of Oracle Business Intelligence 11.1.1.7 shows Oracle's continued efforts in trying to integrate its Oracle Business Intelligence Platform with big data technologies such as Hadoop and Hive. Specifically, I'm talking about OBIEE 11g's ability to integrate with a Hadoop Data source.

What is Hadoop?

Hadoop is a framework that enables data to be distributed amongst many servers (nodes), commonly referred to as a 'distributed file system'. The data is not stored in a single database, rather it is spread across multiple clusters.

How does Hadoop process data stored in multiple nodes?

Hadoop uses a programming model called 'MapReduce' for parallel processing across multiple nodes. At a high level this is comprised of two steps:
  1. Map step
    1. The map step takes the data, divides it into smaller sets of data and distributes the result to worker nodes
  2. Reduce step
    1. The reduce step collects the data from all of the worker nodes and aggregates it into a single 'output'

What is Hive?

MapReduce functions are generally written in Java and generally require someone with deep knowledge in both Hadoop and MapReduce. The guys over at facebook created a technology called 'Hive' which is a data warehouse infrastructure that sits on top of Hadoop. More simply, Hive does the 'heavy lifting' of creating the MapReduce functions. In order to query a Hadoop distributed file system, instead of having to write MapReduce code, you generate sql-style code in a hive language called 'HQL'

Why does this matter in the Oracle Business Intelligence / Analytics space?

The analytics space is experiencing a shift in both technology and function. Traditional BI projects required a 'data warehouse' to store data in a series of star schemas (denormalized models) for quick query generation and data retrieval. The development and support of the data warehouse is achieved through a team of ETL developers whose main focus is to create the mappings that perform the data transformation from the source to the target.
Unless the functional requirements are clearly understood during this phase, value is usually lost in the data transformation and the potential to eliminate relevant data is certainly possible.

Using OBIEE 11g's Hadoop integration via a Hive ODBC, OBIEE can directly query distributed file systems via Hive. What does this mean? The potential now exists to eliminate or reduce the need for ETL as we now have the ability to directly query gigantic file systems.

The saving grace to ETL developers is that a need still exists for someone to create the HQL functions that populate the 'tables' that OBIEE uses. Ultimately, it could be a change in how ETL is developed.

How do you integrate OBIEE 11g with Hadoop?

Step 1: Download the Hive ODBC Drivers from http://support.oracle.com

You can reference Oracle Note 'Using Oracle Hadoop ODBC Driver with BI Administration Tool [ID 1520733.1]'

Step 2: Create a Hive ODBC Connection via the ODBC Data Source Administrator

Similiar to how you create an ODBC connection to edit the repository online, you're going to create an ODBC connection but this time specify the driver as 'Oracle Apache Hadoop Hive WP Driver'



Once you've created the ODBC Data Source Connection, you can configure the Driver set up under the 'General' tab:


Step 3: Configure Database Connection

Moving into the repository, you're going to create a new database connection like you would for any data source in the physical layer. Note that you need to specify the database type as 'Apache Hadoop' (this is important!).

Step 4: Create Connection Pool

Within the Apache Hadoop database connection you just created in step 3, create a data source with a call interface as type 'ODBC 2.0' or 'ODBC 3.5'. The data source call interface should not be 'Apache Hadoop' (you've already specified the database as type as Apache Hadoop!). If you specify the data source call interface as 'Apache Hadoop' you will receive the following error:
Your connection pool should be similar to the following:

You should now be able to import your tables and columns just like any other connection pool. The BI Server will generate normal SQL statements as if it were querying a traditional Oracle database, but the Hive ODBC driver in turn converts that to HQL which is used to execute mapreduce functions to query the Hadoop distributed file system across multiple nodes.

 
keywords: hadoop, obiee 11g, hive, mapreduce, HQL

Monday, June 24, 2013

How-to: Data Visualization with External Javascript Libraries (D3)

One of the great features of Oracle's Business Intellgience 11g foundation is the ability to integrate external applications via an API call or through the use of javascript libraries. In a previous article I discussed how to utilize javascript functions using OBIEE 11g's native UserScripts.js. Today we're going to expand on this functionality by integrating third party data visualization scripts. One popular javascript library used for data manipulation is 'Data-Driven Documents' . This open source scripting library gives users the ability to manipulate data using methods not available in OBIEE 11g.  Kevin McGinley first wrote about this in 2012 and the guys over at Rittman Mead recently posted an overview of D3 / OBIEE integration.  Below we're going to cover all the steps required to implement a D3 visualization technique.

Before we get started, you can view all of the D3 visualization methods at their github. In the example below we're going to use airline data to and D3's Calendar View to visualize average flight delays. You will need OBIEE 11.1.1.6.2 or higher (this example uses OBIEE 11.1.1.7.0) and IE 9+.


 

Step 0: Create an Answers Report

This report should contain a year dimension, a date dimension and an aggregate fact column. In the airline example I've selected 'Date', 'Year' and 'Average Departure Delay'. Take note of the column order as you will have to reference the column number in a narrative.


Step 1: Download the D3 Javascript Library from github

This is going to download a 'd3-master.zip' file that contains all of the javascript libraries needed for integration. You will unzip all of these files into OBIEE 11g's analytics ear deployment under Weblogic's Domain Home  located at :
 user_projects\domains\bifoundation_domain\servers\bi_server1\tmp\_WL_user\analytics_11.1.1\7dezjl\war\res\b_mozilla\common

Step 2:  Create css file for Calendar Formatting

The Calendar view's javascript code is basically one script, with one function and one css file. These 'chunks of code' are all stored in the index.html using the example located on github, but in order for this view to play nice with OBIEE 11g, we're going to need to dissect components of the code into isolated narratives and css files. The first step is to take the css code:
#chart {
  font: 10px sans-serif;
  shape-rendering: crispEdges;
}
.day {
  fill: #fff;
  stroke: #ccc;
}
.month {
  fill: none;
  stroke: #000;
  stroke-width: 2px;
}
and save it to its own css file (calendar.css) located at:
user_projects\domains\bifoundation_domain\servers\bi_server1\tmp\_WL_user\analytics_11.1.1\7dezjl\war\res\b_mozilla\common\d3\examples\calendar\calendar.css (you will need to create the directory as this doesn't exist)

Step 3: Create an Answers Narrative to Execute the Javascript Library

Now that we've laid the groundwork for calling the D3 library, the next step is to integrate the Calendar View code into an Answers narrative.

First create the script headers and link type to call the javascript library. This code will be stored in the pre-fix of the narrative:
<script type="text/javascript" src="/analytics/res/b_mozilla/common/d3/d3.js"></script>
<link type="text/css" rel="stylesheet" href="/analytics/res/b_mozilla/common/d3/lib/colorbrewer/colorbrewer.css"/>
<link type="text/css" rel="stylesheet" href="/analytics/res/b_mozilla/common/d3/examples/calendar/calendar.css"/>
Next we're going to take the calendar view code and copy the entire code block from the start of the width variable delcaration to the end of the call to the selectAll function. Your code should look similar to:

<script type="text/javascript" src="/analytics/res/b_mozilla/common/d3/d3.js"></script>
    <link type="text/css" rel="stylesheet" href="/analytics/res/b_mozilla/common/d3/lib/colorbrewer/colorbrewer.css"/>
    <link type="text/css" rel="stylesheet" href="/analytics/res/b_mozilla/common/d3/examples/calendar/calendar.css"/>
    <div id="my_chart"></div>
    <script type="text/javascript">
var margin = {top: 19, right: 20, bottom: 20, left: 19},
    width = 720- margin.right - margin.left, // width
    height = 136 - margin.top - margin.bottom, // height
    cellSize = 12; // cell size
var day = d3.time.format("%w"),
    week = d3.time.format("%U"),
    percent = d3.format(".1%"),
    format = d3.time.format("%Y-%m-%d");
var color = d3.scale.quantize()
    .domain([5,30])
    .range(d3.range(9));
var svg = d3.select("#my_chart").selectAll("svg")
    .data(d3.range(year_range1, year_range2))
  .enter().append("svg")
    .attr("width", width + margin.right + margin.left)
    .attr("height", height + margin.top + margin.bottom)
    .attr("class", "RdYlGn")
  .append("g")
    .attr("transform", "translate(" + (margin.left + (width - cellSize * 53) / 2) + "," + (margin.top + (height - cellSize * 7) / 2) + ")");
svg.append("text")
    .attr("transform", "translate(-6," + cellSize * 3.5 + ")rotate(-90)")
    .attr("text-anchor", "middle")
    .text(String);
var rect = svg.selectAll("rect.day")
    .data(function(d) { return d3.time.days(new Date(d, 0, 1), new Date(d + 1, 0, 1)); })
  .enter().append("rect")
    .attr("class", "day")
    .attr("width", cellSize)
    .attr("height", cellSize)
    .attr("x", function(d) { return week(d) * cellSize; })
    .attr("y", function(d) { return day(d) * cellSize; })
    .datum(format);
rect.append("title")
    .text(function(d) { return d; });
svg.selectAll("path.month")
    .data(function(d) { return d3.time.months(new Date(d, 0, 1), new Date(d + 1, 0, 1)); })
  .enter().append("path")
    .attr("class", "month")
    .attr("d", monthPath);
    var csv =[];

Notes About this Code

Although this code does most of the heavily lifting and can be left unmodified, there are specific lines that can be changed and updated dynamically via the use of presentation variables.

Color Thresholds:

The color variable specifies the thresholds for red/yellow/green. In this case I deem the min and max ranges of an airline delay to be between 5 minutes and 30 minutes:
var color = d3.scale.quantize()
    .domain([5,30])

Chart Size Adjustment:

By modifying the code for the margin variable:
var margin = {top: 19, right: 20, bottom: 20, left: 19},
    width = 720- margin.right - margin.left, // width
    height = 136 - margin.top - margin.bottom, // height
    cellSize = 12; // cell size
  The height/width/cell size can be adjustable by changing the hardcoded values to presentation variables such as:
  • @{Width}
  • @{Height}
  • @{CellSize}

Date Formatting:

The 'day' variable responsible for date formatting:
var day = d3.time.format("%w"),
    week = d3.time.format("%U"),
    percent = d3.format(".1%"),
    format = d3.time.format("%Y-%m-%d");
Requires that the format of the date be specified.  The Calendar View script by default uses a 'YYYY-MM-DD' format. If your OBIEE data is a MM-YY-DD format or has a timestamp, you will need to modify the column data format to the following:

Modifying the Date Range:

The Calendar View code by default hard codes a date range of 1990 to 2011. You will most likely need to modify these values for your data set create a presentation variable that allows the users to change the date range dynamically:
var svg = d3.select("body").selectAll("svg")
    .data(d3.range(1990, 2011))
Could be modified to:
var svg = d3.select("#my_chart").selectAll("svg")
    .data(d3.range(year_range1, year_range2))
In the upcoming steps I will show how these variables can be called.

 Step 4: Populate the Narrative and Post-Fix

In the narrative you will need to specify the Date and Metric you want to pass to the javascript function using the corresponding column number (see step 0 if you forgot!)

The Post-Fix should contain the remainder of the Calendar View code. This can remain unmodified:
var data = d3.nest()
    .key(function(d) { return d.Date; })
    .rollup(function(d) { return d[0].Metric; })
    .map(csv);
  rect.filter(function(d) { return d in data; })
      .attr("class", function(d) { return "day q" + color(data[d]) + "-9"; })
    .select("title")
      .text(function(d) { return d + ": " + (data[d]); });
function monthPath(t0) {
  var t1 = new Date(t0.getFullYear(), t0.getMonth() + 1, 0),
      d0 = +day(t0), w0 = +week(t0),
      d1 = +day(t1), w1 = +week(t1);
  return "M" + (w0 + 1) * cellSize + "," + d0 * cellSize
      + "H" + w0 * cellSize + "V" + 7 * cellSize
      + "H" + w1 * cellSize + "V" + (d1 + 1) * cellSize
      + "H" + (w1 + 1) * cellSize + "V" + 0
      + "H" + (w0 + 1) * cellSize + "Z";
}
</script>
Your narrative should be similar to:

Step 5: Create a Second Narrative for the Date Range

This narrative is optional, but assuming you want to give the user the ability to modify the date range, you would take the variables you referenced in the 'Modifying the Date Range' section (in my case year_range1 and year_range2)  and set both of them equal to two presentation variables like below:

Step 6: View Narratives in Answers

Adding both narratives to a single view, your end result should look similar to:


This guide barely scratches the surface of D3-OBIEE integration but serves as a great example of how 3rd party APIs and javascript libraries can be integrated into OBIEE 11g. I encourage all BI Architects to look through the entire D3 library and see how D3 can be integrated into their current engagement.



 
keywords: OBIEE 11g, Data-Driven Documents, OBIEE 11.1.1.7.0, UserScripts.js, Answers, javascript