Months ago I was in a discussion
with professionals in my network about how to apply agile methods or agility in
Business Intelligence (BI) projects.
This blog post is to share some of the key aspects involved in
considering agile methods for BI or data warehousing projects. In addition to implementing or considering
methodology specific ceremonies – for example, Release Planning, Sprint
Planning, Daily Stand-up, and the likes of Scrum, I think one must do certain
things that are specific to BI or data warehousing projects. What are those?
Read on.
BI Lifecycle
Data Extraction in Small Chunks
1) Agile
Approach: BI Life Cycle has several key milestones (see images). These are ‘Tool Selection & Proof of Concept’,
‘BI Charter & High Level Planning’, ‘Data Extraction’, ‘Data
Transformation’, ‘Data Loading’, ‘Reporting & Analysis’, ‘Governance’, ‘Enhancement’,
and ‘DW/BI Maintenance’. Some of these can be done in parts. For example,
‘Data Extraction’ in large projects can be done one week or two weeks at a
time. How? Data extraction from a set of
tables from DataSource-1 can be done during week-1, data extraction from
the rest of the tables from DataSource-2 can be done during week-2 etc.,
The lessons learned during each week can be applied to the activities of
the next week. This is a way to implement continuous improvement.
Also, this approach has the potential to improve visibility and
predictability. And it provides us an opportunity to do what matters (to
business users) first. This is how we embrace agile
principles in BI projects. (Similarly, ‘Data Transformation’ and ‘Data Load’
and other ‘mini-projects’ can also happen in parts.)
2) Automation: In BI projects when we follow an approach like this, we must consider
automation. Automation is required to improve productivity and
quality. Automation can happen in small steps. Some of the candidates for
automation include,
a. Test
Bed Setup (Data Cleanup & Loading)
b. Test
Execution
c. Analysis
of Test Results
d. Referential
Integrity Checks
e. Validation
using aggregates (sum, average, etc.)
f.
Data Quality Assurance
3) Planning: It is critical to invest time in planning and creating a BI road map. This
planning activity may require 2 or 3 months at the start of BI project.
One may argue that this is not ‘agile’. However, one must agree that BI
projects cannot have the objective of delivering something from the first week
or even the first month. A good amount of planning followed by an
iterative (&incremental) approach helps. So, a large BI projects can
be seen as a project that starts with adequate planning (to arrive at a
road map) followed by several small projects implementing agile practices. In fact, there are several myths on agile. One of them is 'Agile means no planning'. Read my post 'Agile Myths and Misunderstandings' - this post links to a free PDF on this topic. Happy reading!
4) Working
Software and Feedback: When we adopt agile practices in BI projects, it is important
to keep in mind that some iteration may not result in demonstrable software
useful to ‘business users’. This is because of the inherent nature of BI
projects. BI projects may require several initial iterations to set up the
target data source and populate data. When data is ready and
reports are working, end users can see the working product. Until that
time the technical architect or data architect is your end user and she consumes
and provides feedback on what you deliver.
5) BI
Tools: Tools used in BI projects can be of two
types - a) Commercial tools (for data extraction, loading etc.,) and
b) homemade tools (small scripts to big routines). Teams must be
ready to see the potential of homemade tools and think deep, collaborate and
create small tools that can help in several ways. One approach is
to identify at least 1 or 2 engineers who are ‘tool smiths’ in the team and
encourage everyone in the team to come up with new ideas and tools. Tool smiths
can help in implementing these ideas.
6) Why
Agile? Adopting agile practices in BI projects helps in identifying risks at
early stages and also enables proactive thinking and preparedness for
production release. For example, iterative and incremental
approaches help BI project teams estimate, measure and optimize the downtime
required to launch the warehouse. In classical or traditional
approaches, this happens at the final lap of the project.
7) Budget
Control: Agile adoption is a feasible way to release BI project in parts
(incremental manner) to the world (or business users). This helps
in budget control as well as optimization (in the form of process reuse or
component reuse). Instead of delivering all 40+ reports in one stretch, you can
deliver subset of prioritized reports in batches. This will help you save budget in developing
the low priority ones.
8) Data
Quality Assurance (DQA): DQA is one of the key activities in BI projects.
This is because unless we find the right steps (or processes) to identify or
assess the quality of data flowing from different sources, the quality of
data in the BI store or warehouse may get contaminated. This is obvious.
However, in practice, during the maintenance phase of BI projects, a
number of defects reported by end users are found to be related to data quality
issues. Agile practices can be leveraged to identify potential data
quality issues ahead of time and implement periodic checks to assess data
quality (through automated scripts).
When we adopt agile practices, we
come across several opportunities not only to provide predictability and
visibility but also to improve the level of automation and hence to deliver
home grown automation tools (such as scripts for data validation or data
quality assurance) to customers. These
automation tools carry the potential of providing long term value in BI
projects.
What else do you do to improve agility in BI or data warehousing projects?
Related Post: TDD in ETL Projects