Software Engineering: June 2014

Friday, June 20, 2014

Tips on Embracing Agility in BI Projects

Months ago I was in a discussion with professionals in my network about how to apply agile methods or agility in Business Intelligence (BI) projects. This blog post is to share some of the key aspects involved in considering agile methods for BI or data warehousing projects. In addition to implementing or considering methodology specific ceremonies – for example, Release Planning, Sprint Planning, Daily Stand-up, and the likes of Scrum, I think one must do certain things that are specific to BI or data warehousing projects. What are those? Read on.

1) Agile Approach: BI Life Cycle has several key milestones (see images). These are ‘Tool Selection & Proof of Concept’, ‘BI Charter & High Level Planning’, ‘Data Extraction’, ‘Data Transformation’, ‘Data Loading’, ‘Reporting & Analysis’, ‘Governance’, ‘Enhancement’, and ‘DW/BI Maintenance’. Some of these can be done in parts. For example, ‘Data Extraction’ in large projects can be done one week or two weeks at a time. How? Data extraction from a set of tables from DataSource-1 can be done during week-1, data extraction from the rest of the tables from DataSource-2 can be done during week-2 etc., The lessons learned during each week can be applied to the activities of the next week. This is a way to implement continuous improvement. Also, this approach has the potential to improve visibility and predictability. And it provides us an opportunity to do what matters (to business users) first. This is how we embrace agile principles in BI projects. (Similarly, ‘Data Transformation’ and ‘Data Load’ and other ‘mini-projects’ can also happen in parts.)

BI Lifecycle

Data Extraction in Small Chunks

2) Automation: In BI projects when we follow an approach like this, we must consider automation. Automation is required to improve productivity and quality. Automation can happen in small steps. Some of the candidates for automation include,

a. Test Bed Setup (Data Cleanup & Loading)

b. Test Execution

c. Analysis of Test Results

d. Referential Integrity Checks

e. Validation using aggregates (sum, average, etc.)

f. Data Quality Assurance

3) Planning: It is critical to invest time in planning and creating a BI road map. This planning activity may require 2 or 3 months at the start of BI project. One may argue that this is not ‘agile’. However, one must agree that BI projects cannot have the objective of delivering something from the first week or even the first month. A good amount of planning followed by an iterative (&incremental) approach helps. So, a large BI projects can be seen as a project that starts with adequate planning (to arrive at a road map) followed by several small projects implementing agile practices. In fact, there are several myths on agile. One of them is 'Agile means no planning'. Read my post 'Agile Myths and Misunderstandings' - this post links to a free PDF on this topic. Happy reading!

4) Working Software and Feedback: When we adopt agile practices in BI projects, it is important to keep in mind that some iteration may not result in demonstrable software useful to ‘business users’. This is because of the inherent nature of BI projects. BI projects may require several initial iterations to set up the target data source and populate data. When data is ready and reports are working, end users can see the working product. Until that time the technical architect or data architect is your end user and she consumes and provides feedback on what you deliver.

5) BI Tools: Tools used in BI projects can be of two types - a) Commercial tools (for data extraction, loading etc.,) and b) homemade tools (small scripts to big routines). Teams must be ready to see the potential of homemade tools and think deep, collaborate and create small tools that can help in several ways. One approach is to identify at least 1 or 2 engineers who are ‘tool smiths’ in the team and encourage everyone in the team to come up with new ideas and tools. Tool smiths can help in implementing these ideas.

6) Why Agile? Adopting agile practices in BI projects helps in identifying risks at early stages and also enables proactive thinking and preparedness for production release. For example, iterative and incremental approaches help BI project teams estimate, measure and optimize the downtime required to launch the warehouse. In classical or traditional approaches, this happens at the final lap of the project.

7) Budget Control: Agile adoption is a feasible way to release BI project in parts (incremental manner) to the world (or business users). This helps in budget control as well as optimization (in the form of process reuse or component reuse). Instead of delivering all 40+ reports in one stretch, you can deliver subset of prioritized reports in batches. This will help you save budget in developing the low priority ones.

8) Data Quality Assurance (DQA): DQA is one of the key activities in BI projects. This is because unless we find the right steps (or processes) to identify or assess the quality of data flowing from different sources, the quality of data in the BI store or warehouse may get contaminated. This is obvious. However, in practice, during the maintenance phase of BI projects, a number of defects reported by end users are found to be related to data quality issues. Agile practices can be leveraged to identify potential data quality issues ahead of time and implement periodic checks to assess data quality (through automated scripts).

When we adopt agile practices, we come across several opportunities not only to provide predictability and visibility but also to improve the level of automation and hence to deliver home grown automation tools (such as scripts for data validation or data quality assurance) to customers. These automation tools carry the potential of providing long term value in BI projects.

What else do you do to improve agility in BI or data warehousing projects?

Related Post: TDD in ETL Projects

Monday, June 16, 2014

Architectural Considerations for Multi-Tenancy – Part 2

In the first part of this series I listed a set of questions to understand the architectural requirements when you want to enable multi-tenancy. In this final part of this series, I have included three broad areas of architecture that influence multi-tenancy.

Data Architecture: An ultimate approach to define data architecture is by considering a shared schema. In this approach, you store the data of all tenants in a single schema by including an identifier or column such as Tenant_ID to identify data sets corresponding to tenants. Another approach is to have a schema per tenant. This helps when you have to deal with large volume of data per client. There is yet another approach – a simple and fundamental approach. That is about having separate databases per client. Every approach comes with some pros and cons.

Code Base: Can you afford to have the same code base for all tenants and maximize sharing so that you allow some percentage of code that is unique to each client? Or do you want to keep the code bases separate and have them run on separate instances of the application server? Which is right for you? It depends on the needs of your application and tenants.

Implementation View: How is your application going to be deployed in production? Is there a need to share instances? Or do you need to have an instance per tenant? What are the run-time requirements? This is another broad area that is going to influence your architecture and design.

This makes it clear that the multi-tenancy needs of your application or product is going to determine how you are going to implement multi-tenancy. There are different multi-tenant systems running in production across organizations. Their architectures and designs are different and they offer different degrees of multi-tenancy. When you want to define an architecture and design multi-tenancy you need to look at these broad areas and ask questions – I had shared those in the first part of this series, and make your decisions. When you do this, I am sure you will come up with a meaningful architecture.

Have you come across any difficulties or challenges in doing this? Let us discuss.

References:

Architectural Considerations for Multi-Tenancy – Part 1

How do you go about defining an architecture that supports multi-tenancy? What are the design considerations? Let us explore answers to these questions. In the first part of this two-part series, I am setting the context and presenting a list of questions that will help you get adequate information on the task on hand. In the next and final part I will discuss the broad areas of architecture that influence multi-tenancy and provide some references.

The term multi-tenancy became popular with the advent of Software-as-a-Service or SaaS applications. One of the characteristics of SaaS based products is the ability to serve multiple customers through a single installation. This is known as multi-tenancy. With multi-tenancy every customer or company will have administrator and user credentials to access the system. This warrants privacy and security. In order to achieve multi-tenancy, it is essential to ensure configurability and scalability. With configurability, each customer or company can configure the product, and customize the UI as well as other elements (such as business logic, report formats, user preferences, personalization, etc.). Hence new customers can be added with ease – there is no need to get the product installed again! Also this architecture is required to provide cross platform support – for example it has to support multiple browsers based on a pre-defined set of browsers. Above all the SaaS enabled product needs to be highly robust and offer ease of integration.

What matters is the kind of questions you ask when a problem like this is thrown at you. The typical questions are,

1) How do we define the details of multi-tenancy requirements for our application?

2) How many tenants – both minimum and maximum, to be served?

3) How will the application and other routines identify a specific tenant?

4) What are the security requirements (across all layers)?

5) How are we going to validate users?

6) What are the configurable parameters needed to support multiple tenants?

7) How are we going to add new tenants?

8) How are we going to retire an existing tenant?

9) What is going to be the volume of data over the next five years? What are the scalability and performance requirements?

10) What are the UI requirements? Do we need to enable tenants or customers to define their UI or theme? (Obviously, yes. But to what extent?)

11) What are the external systems and interfaces? Are these different for different customers? How are we going to handle these?

12) What is the implementation view? Are we going to have multiple databases or application servers? Why? Why not?

13) How many browsers and languages are going to be supported?

14) What are the other devices from which the application will be accessed (such as tablets, smart phones)?

15) What is the expected load (number of transaction) during different times in a year?

16) What are the expected benefits of implementing multi-tenancy and what are the corresponding architectural or design considerations?

17) What are the maintenance routines and dashboards required to monitor the performance?

18) What kind of dashboard is required by the administrator or super user(s) of each tenant?

19) What is going to be the maintenance and release cycle? How will it affect multiple tenants?

20) Is there a need to implement usage based pricing or monitoring the number of transactions per tenant? If yes, what are the design considerations?

21) What is going to be the level of reuse to improve maintainability?

If you have designed (or are designing) a multi-tenant application these questions will lead you to additional questions. Please feel free to share those questions. I will add them to this list. Finding answers to all these questions will help you define an architecture that fits well.

In the next and final part of this series, I have discussed the broad areas of architecture that influence multi-tenancy.

Tuesday, June 3, 2014

Succeeding with Distributed Agile Teams

Today I published a blog post at Global Distributed Agile Consortium blog. That post is about providing a single and stable platform to geographically distributed teams. I used the term ‘a single and stable platform’ as a metaphor to mean a seamless work atmosphere or ecosystem in terms of technology, people practices, issue management, collaboration, and knowledge management and so on. That is a tall order. I agree.

When there is no single and stable platform, teams operate in different directions. Besides, when there is a lack of attention or focus or governance, software projects go out of control.

Years ago, I started coaching a team. This team was doing all it can to deliver working software iteration after iteration. However, everyone in that team lived with ’we-are-being-pushed’ feeling. The team members were supposed to be working collaboratively with the other two geographically distributed teams. All three teams put together were not collaborative enough. One team was persistent in demonstrating one-upmanship. There were double standards in adhering to standard operational procedures or established policies. There was a constant nudging from one side to the other. The other two teams were taking the load and stretching to deliver. As a result, the atmosphere was getting polluted with animosity, indifference, ambiguity, artificial harmony and lack of empathy.

On a positive note, they had a good infrastructure, tools and systems in place. Not a bad start at all. Within my first two weeks I could perceive the absence of something – I mean the absence of a ‘single and stable platform’. There was no shared vision. There was no handshake or coordination among teams. There was some level of disconnect. I heard team members in one location telling someone in the other continent, ‘I don’t think we are on the same page. It does not work that way!’ Days went by. And they stayed on their own pages – I am sure you get what I mean. They went on for several days without resolving a bunch of cold issues. Reason? Absence of a single and stable platform. Every team was on its own platform at different levels and operating with their own perspectives.

I had a month and a half to turn around the situation – not a complete turnaround but a turnaround that can reduce negativity and boost the motivation levels. That was our first goal. We accomplished that. And we worked together over the next two months to improve the situation further.

So, let me ask. What do we need to do to establish a single and stable platform in geographically distributed teams? Here, platform does not mean technology platform. It encompasses several things, right from establishing a shared vision to ensuring inclusion across teams. There are many more. When you ensure all these you have a single and stable platform. That is the lifeline of teams.

I am not intending to sound philosophical. You know, this is not about philosophy. This is about our day-to-day challenges and things that happen on the ground.

The metaphor - 'single and stable platform' consolidates both the hard and soft aspects involved in enabling seamless delivery in geographically distributed teams. Distributed development is a complex thing. The rules of the game are different here. To make things better, we must convert sound practices in to principles and start adhering to principles so that the consequences are positive. Violation of fundamental principles is what hurts projects, project teams and stakeholders. That is the truth.

Adhering to principles that enable success in distributed agile is, I think a graceful start in this journey. For more information on these principles read my post ‘Distributed Agile: A Single and Stable Platform’.

Pages