Friday, December 26, 2014

Scaling Agile: What do Architects Deliver?

Last year, I wrote a 4-part blog series titled ‘Software Architecture and Design in Agile World’. You can read all four parts of that series in less than ten minutes. If you haven’t read yet, I suggest that you read those before proceeding further.  The main reason is that I am writing this post as a sequel to that series.
One of the factors I considered in Part-4 of that series is ‘Project Complexity’ which is based on two key aspects – the size of the project and complexity of requirements.   These two aspects along with time-to-market constraints become a significant determinant of the team size.   When you need to run a large project or a program with hundreds of team members and follow Agile Software Development, you are scaling Agile.  ‘Scrum of Scrums’ is a pattern of Scrum in scaling agile. SAFe is a framework meant for executing large projects with agile practices.  Disciplined Agile Delivery (DAD) provides a flexible and pragmatic platform for large scale distributed projects.
When you have 200 or 300 team members spread across two time zones, you are going to have at least one team of architects – for example, a team of 5 to 7 architects.  That is not it. You may have to have designers, story authors, developers, testers and so on.  In a situation like this, in addition to having a team of architects, you may need to have one or more teams of designers, one or more teams of product owners and story authors, several feature teams and some independent verification teams etc. Why do we need a team structure like this? What difference does it make? I will answer these questions in another post.  Now, let us focus on the main question - In large projects or programs, what do architects deliver?
Do they deliver a roadmap? Do they deliver prototypes? Do they deliver some high-level code? Or do they deliver something else?  What do you want them to deliver?
To answer these questions, we must wear the hat of program managers.  Assume that you are program managing a large-scale multi-year multi-release agile program distributed across two or three time zones and your teams are working on emerging technologies.  What questions will you ask week after week, month after month to make sure that your team is moving forward in the right directions?  Which of those questions will have a significant focus on the team of architects you have?
Here are the key questions.
1)      Is decision making effective and timely in our project?  - This includes all decisions – architectural, design, integration, and so on.
2)      Are we able to assess or measure or understand the cost of ‘not doing’ certain things in a timely manner? And are we highlighting those to our product owners?
3)      Do we have a backlog (prioritized) of technical stories (related to architecture/design/complex coding issue, etc.)? How do we assess the ‘technical debt’ associated with those and track them to closure?
Questions like these will help us figure out the deliverables of architects.   At the time of writing this post, I read an article by Eltjo R. Poort published in the September/October 2014 issue of IEEE Software. The title of the article is ‘Driving Agile Architecting with Cost and Risk’.  I have provided a link to this article at the end of this post.  In this article, the author says “Decisions are your main deliverables.  Thus, the role of the architect is to make sure that the group’s decisions are consistent across the whole solution, even when multiple teams are working on it simultaneously.” 
Decisions!  Yes, a whole lot of decisions right from decisions related to the basic building blocks, trade-offs, integration points, interfaces, dependencies, data storage, performance, scalability, deployment, cost, maintainability, and so on.  Decisions are the main deliverables.  Also, as the author says that architects need to have a backlog of architecture concerns and convert them in to decisions in a timely manner.  This is a good practice to make sure that nothing is ignored or forgotten or assumed.
Had you been part of large-scale Agile projects? Did you project have a team of architects? If yes, what did they deliver? What more did you expect them to deliver?
Reference:  Driving Agile Architecture with Cost and Risk’, Eltjo R. Poort, IEEE Software, September/October 2014

Thursday, November 27, 2014

Managing Complex Requirements - Part 4

In Part-3 of this series, we discussed a simple anatomy of business requirements and raised couple of questions – ‘Is it enough if we focus on INVEST in large projects? What else should we do in order to manage complex requirements?’    In this post, I will share my thoughts and some takeaways.
Severability:  Severability of a user story (or user case or a piece of requirement) is an indicator of the level of connectedness or interdependency it has with other user stories or epics.  High severability indicates that a user story can be developed and deployed anytime in the development cycle in any iteration.  Low severability indicates high level of connectedness or dependency.  Obviously, developing and deploying user stories with low severability requires significant amount of planning, dependency resolution and regression testing.
How does severability relate to complexity?  Let me try to answer.  If you have an epic or user story with high severability and if it is complex, it is relatively easier to deal with it.  You can design, develop, test and deploy by budgeting extra time for analyzing and understanding the complex requirement and designing appropriate test scenarios.  You can treat that as a separate release comprising of two or more iterations.  Presumably, there will be very little or no rework on the rest of the system.
On the other hand, if you have to deal with a complex user story or epic with medium or low severability, you will find it very challenging. This is because you need to understand all dependencies, resolve them and sequence the development activities (and make it part of your ongoing prioritization exercise). When you ignore this, you accumulate technical debt. This can lead to significant refactoring at a later stage.
When you do iterative and incremental development using Agile methods, you need to understand the severability of epics and user stories and manage dependencies by means of proactive architecture or design, backlog prioritization and release planning.  Project teams will be able to identify a greater number of user stories that can be constructed and tested with minimum need for rework  or impact on related stories when
  1. Architecture and design trade-offs, other decision making, and issue resolutions happen in a timely manner and architecture sub-systems are mature enough,
  2. Release planning is effective, and
  3. Prioritization of backlog is based on business needs as well as dependencies.
In a way, this approach can improve our ability to deliver user stories or features as individual (and independent) components or apps.
So, what are the takeaways?
  1. Build architecture and design components or sub-systems and mature them iteratively as early as you can. Avoid procrastination.  In the name of ‘Agile’ do not let all aspects of your technical components (or sub-systems) emerge all the time until you complete the final iteration.
  2. Validate if user stories can include all related NFRs in your project.  In some projects this may not be possible. In that case, you will have to plan for technical stories to address NFRs at regular intervals to suit your release plan.
  3. At iteration level, focus on INVEST.  At release level or product roadmap level, focus on everything else discussed in this blog series.  If your project is large, consider disciplined or structured parallel agile development teams working at multiple levels such as features, design/architecture, independent verification etc.
  4. Focus on proactive grooming and creating INVEST stories for upcoming iterations. If user stories are volatile inside iterations, find ways to improve grooming sessions.  With multiple parallel small teams moving forward in sustainable pace, you will need the right number of teams working on identifying and creating user stories for upcoming iterations.
  5. Identify complex requirements and come up with techniques to deal with them. Complex requirements will need extra grooming expertise and efforts as compared to other requirements. Resolve dependency issues of complex requirements. Follow techniques such as TDD, ATDD, or BDD and ensure high level of collaboration with business at the time of development, testing and acceptance.
Other Posts on Requirements Engineering:

Thursday, November 20, 2014

Managing Complex Requirements - Part 3

What are complex requirements?  How challenging is managing them?  How can we succeed in managing complex requirements?  This is what I am planning to cover in this blog series.  In Part-1of this series, I shared my thoughts analyzing and understanding requirements in terms of factors such as complexity, stability, clarity, and availability. In Part-2 of this series, I shared a classification scheme and potential opportunities or advantages.  In Part-3 we will discuss about some basics and connect with what is happening in the Agile world.   I am planning to end this series with Part-4 and share some takeaways.

Anatomy of Business Requirements:  At a broad level, business requirements can be categorized as functional and non-functional requirements.  That is a simple categorization for this discussion and it holds true in most cases.  A major source of business requirements in product development organizations is market demands - also known as market requirements.
There are many types of non-functional requirements as shown in this diagram - you can add Security, Compliance, etc. to the list.  If there are more, let me know through your comments.

In Agile world, team members are getting habituated to visualizing requirements as user stories. That is ok. However, understanding this simple classification or categorization of requirements is a fundamental need.
In small projects it may be possible to define non-functional requirements (NFR) along with user stories so that each user story carries related NFR as well. However, in large projects or programs attempting to define NFRs with user stories is not sufficient to cover all NFRs.  You will need to create technical stories to address or cover system level NFRs.
Complexity can be on either side – functional or non-functional. Functional complexity can be dealt with in several ways or means - e.g., close collaboration with business users, requirements workshops, prototyping, POCs, active feedback, early acceptance etc. Complexity in non-functional requirements poses different kind of challenges. There are ways to deal with such challenges.   For example, architecture patterns, design patterns, code optimization techniques, tool selection, test strategy and so on help us in many of these non-functional areas.
So, what is the point?  It is obvious. Understand where you anticipate complexity and deal with it.  It is about anticipation, awareness and action planning.

In Agile projects, we talk about epics and user stories.   Functional requirements are either explicitly or implicitly seen as a group of epics belonging to individual modules.  Epic is a large business case or scenario. It is broken into multiple user stories.  We all know that.  However, let me tell you, when you attempt to draw a diagram like this for an enterprise agile project, it is not going to be clear and simple like this one!  This is because a good number user stories can be interrelated.
So, what?  Understand the relationships among epics and user stories.  You may come across situations wherein user stories are related to each other. Remember use cases and their relationships - a) includes and b) extends. Similar relationships can exist among epics and user stories.  This will help you further understand the complexity in terms of such relationships and dependencies.
INVEST is a popular acronym in agile world.  INVEST stands for Independent, Negotiable, Valuable, Estimatable, Small and Testable.  User stories that satisfy these six properties enable project teams in creating software that satisfies acceptance criteria and delivers business value.  You will find more information on INVEST in Chapter 2 ‘Writing Stories’ of User Stories Applied for Agile Software Development.  Click this link to download the free PDF of this chapter.
Now, the questions are, ‘Is it enough if we focus on INVEST in large projects? What else should we do in order to manage complex requirements?’  I will share some thoughts and answer this in the next part. That is going to be the final part in this series.

Thursday, November 13, 2014

Managing Complex Requirements - Part 2

What are complex requirements?  How challenging is managing them?  How can we succeed in managing complex requirements?  This is what I am planning to cover in this blog series.  In Part-1of this series, I shared my thoughts analyzing and understanding requirements in terms of factors such as complexity, stability, clarity, and availability. That is the first step.
The next step is to come up with a classification scheme for such factors.   For example, requirements can be classified as simple, medium or complex in terms of complexity.  There are scientific approaches to derive and define this classification.
Stability of requirements can be classified in to three levels.  In every project you will find requirements that are very stable – the number of such requirements will vary from project to project. For example, in reengineering projects, a vast majority of requirements will be stable requirements whereas in new product development, the number of stable requirements will be very low and the number of volatile requirements will be very high.
Clarity is about understandability as well as testability of requirements at the initial stage of their availability. Requirements that are ambiguous will require considerable communication and coordination efforts in order to refine and make them clear.
Availability is about the availability of subject matter experts or business users who know the business problem very well and can help you define and refine requirements.  In Agile world, this is about the availability of product owners and related business users who are going to assist the product owner in identifying and prioritizing product backlog,  identifying  Sprint backlog , grooming user stories, defining acceptance criteria, attending product demos and accepting user stories.
When we apply this classification, we get an opportunity to group requirement under different segments. When we do this,
1)      Team members’ perception on ‘managing complex requirements’ will move closer to reality. There will no longer be a nagging feeling of complexity everywhere.
2)      Project managers can identify risks related to requirements and arrive at mitigation plans for such risks.
3)      Program managers or senior leaders can identify the right set of requirements that can be assigned to remote or virtual teams and assign requirements that belong to the other end of spectrum to teams that are co-located with business users.
4)      Product owners can take proactive measures in requirement analysis, elicitation, grooming etc. and eliminate the anti-pattern of grooming of users stories within iterations.
So, what is the point?  Requirement prioritization is necessary but not sufficient.  In Agile parlance, prioritization and reprioritization of product backlog is necessary but not sufficient.  Managing requirements by analyzing and understanding them in terms of factors such as complexity, stability, clarity, and availability will make product backlog management effective especially in large, complex projects or programs.
In addition to this, we need effective dependency management in order to create release plans. Isn't it? This post must have triggered some interesting thoughts and questions in your mind. Please feel free to share and we will discuss.
In Part-3 of this series, I am going to share some more critical factors to be considered while managing complex requirements.

Sunday, November 9, 2014

Managing Complex Requirements – Part 1


What are complex requirements?  How challenging is it to manage them?  How can we succeed in managing complex requirements?  This is what I am planning to cover in this blog series.
So, have you got complex requirements to manage? Here are some high-level things you need to consider.
  1. Do your homework - let the creators of requirements (e.g. Business analysts) think through, specify requirements and put them for review before communicating it to a broad set of audience.
  2. Create test scenarios with test data -  complex scenarios need illustrations of multiple paths with test data.
  3. Conduct walk-through -  walk-through enables collaboration and improves understanding.
  4. Facilitate Q&A sessions.
  5. Involve the creator (Business analyst or business user) in test case /test data reviews and testing.
Well, these are well-known guidelines.  In spite of these, managing complex requirements continues to be a complex affair.   Why? In most cases it is not about the complexity of requirements. It is about something else. So, it is worth understanding how project teams perceive something as a complex requirement.

In one of my projects, there was a constant issue reported by team members. It was about complex and changing requirements.  When we analyzed the situation we found that only 20% of the requirements were very complex or highly complex. 80% of the requirements were not so complex – they were either simple or medium in terms of complexity – obviously, we had our own approach or mechanism to decide complexity of requirements.

‘So, what is the real issue here? Is it about managing complex requirements?’ I enquired my team members.

‘About 20% of our requirements are complex but in general requirements are not stable. We see changes every other day.’

‘So, is this about ongoing changes to requirements?’

‘There is more to it.  We are not very clear about some of the requirements. So, we ask questions to understand different scenarios. In this process, requirements evolve.’

'What else?’

‘Some of the business users or product owners are very busy. They are not available to answer our questions.  We have to do several follow-ups. That makes things complex.’

‘Oh. Ok. Our issues are because of several factors – complexity of requirements, requirement stability, clarity of different scenarios, and availability of business users or subject matter experts. Is that right?’

‘Yes. That is right.’

That was a learning moment that helped me understand that managing complex requirements is not about complex requirements alone!   That is about the complexity of managing requirements because of factors such as complexity, stability, clarity, availability and so on.

One way to manage requirements is to analyze and understand requirements in terms of such factors and deal with them.  How do you manage complex requirements?  How do you do it when you follow Agile methods such as Scrum or XP or DSDM or SAFe or a home-grown method?  In Part-2 of this series, I am going to share some thoughts on how requirements can be categorized and effectively managed from this point-of-view.
Other Posts on Requirements Engineering:

Wednesday, October 1, 2014

Documentation Handoff – The Trap and Tips!

Believe it or not, most software projects cannot do without documentation and many of them require significant documentation handoff from initial days.  Project Charter, Requirement Specifications, Architecture Definition, Design Specifications and the likes, in some form or the other, become the essentials in such projects.  This happens even when you follow agile methods.  Some of these are required when you follow agile with geographically distributed teams.  Beware! Documentation handoff can and will become a trap in your project.  You need to be cautious. I am writing this blog post to share some tips on how to optimize documentation efforts.
Understand the purpose of every document.  Some are live documents and the others such as status reports are periodical and short lived.  Ask questions. Why this document? Who are the consumers? What are the expected outcomes? What is the ROI?
Set expectations before you see chaos.  When a document is handed over for consumption or knowledge transfer, avoid
      Redlined documents – They need to go back to the authors and the authors need to go through all redlines and accept or reject the changes. 
      Documents without version history – Live documents get updated and updated documents come to you every week or the other. Without version history, it is going to be difficult for you to identify what is new or updated as compared to the previous version.   Including a table that provides version history and list of changes or updates in each version helps readers.  Live documents need to have a life – a life that can be traced back to history, and this is not possible when you ignore version control.
      Large documents - Documents with hundreds of pages that take several minutes to download? These are large documents that need not be large unless there are compelling reasons.  Find was to break large documents into multiple small documents.  Move generic or common content out of the document or create annexures or separate reference documents.  Why should you embed publicly available content or external content or common reference in a document?
      Documents without reviews or approvals – Are you receiving documents before reviews and approvals? Are you expected to read those and create software or other work products? You need to take a step back and politely ask for reviews and approvals.  Else, you will fall into documentation trap!
Sometimes there is a mandate to generate a new document after reading an existing document. Many of us do this – for example, software testers read requirement specification document and write test cases.  Another example is about reading user stories and creating working code.  The key question is whether or not this approach enables communication among the two parties involved.  In spite of these documents, if there is a communication gap, you need to try a different approach.  Face-to-face discussions, video conferencing sessions, white board sessions, etc. play a vital role and improve the overall effectiveness of knowledge transfer.
Another interesting point is about ‘live’ documents.   You hear someone saying, ‘That is a live document. We will get updates next week.’   If that is the case, ensure all that you can so that you don’t fall into documentation trap.  Huh? How do you do that?  Read, ask questions, and get answers.  It is not that simple.  Also, be curious to find out when it is going to be ‘end-of-life’ for such documents.   Live documents are not ‘live’ forever. They need to stop growing, stagnate and retire.  Above all, they must create value to someone other than the creator!  When is the last time you saw an up-to-date architecture or design document that was adored and frequently used by maintainers or product support groups?
Check if documents contribute to testability.  For example, creating a document that is full of rhetoric does not help. The content in every section or page should correspond to a verification measure or parameter.  Ask yourself. Have you found at least 2 callouts per page in architecture or design document?  Do those callouts lead to architecture or design guidelines? Do those guidelines help you in software verification or refactoring?  If your answer is no, most probably the document you are using is not serving its purpose or providing any return on investment.
Documentation handoff as a standalone activity does not contribute to software development.  It requires several follow-up activities.  These activities are to help you enhance the outcome.  Here are some examples. 
  • Facilitate walk-through sessions.
  • Encourage collaboration and resolve queries in a timely manner.
  • Validate understanding. For example, let the readers present it to subject matter experts.
With all these in place, how do you measure the effectiveness of handoff?  You can measure that by considering the following. 
  1. The quantity and quality of questions asked by readers
  2. Level of participation or collaboration in walk-through sessions
  3. Quality of document handed off and user feedback
  4. Quality of the outcome (for example, the next level of work products such as test cases or source code).
Believe it or not, most software projects cannot do without documentation and many of them require significant documentation handoff from initial days.  Beware! Documentation handoff can and will become a trap in your project.  You need to be cautious!

Tuesday, July 29, 2014

TDD in ETL Projects

How can we implement TDD (Test Driven Development) in Business Intelligence or Data Warehousing projects that involve ETL processes?  Is there a way to implement TDD when the project team is using a tool for ETL or developers are writing programs to extract, transform and load data into the data warehouse or a set of target tables?  Let us explore.
Whether you consider extract, transform and load as distinct steps or all of them together as a single step, one thing to keep in mind is the nature of TDD you can implement here.  It is not going to be the typical JUnit approach of using assertions.   All your data are going to be in schemas or tables.  You are not going to embed input and output data in your test scripts.   If you embed input and output data in your assertions and write pure play JUnits, you can test the ‘T’ of ETL – i.e., transform.  So, what can you do to include E and L – or extract and load?
Think of flight operations in airports. Before the departure of every flight there are several tests or checks to ensure that the aircraft satisfies a set of pre-conditions.  This will include the verification of a whole lot of technical parameters and things related to the number of passengers, and other inventory such as food and other supplies.  At the destination there will be a bunch of tests or checks to ensure that all passengers including the crew members reached the destination.  This will also include routine verifications or checks on the technical parameters of the flight depending on a set of post-conditions.
Now, apply this analogy to implementing tests in ETL context.  The first step is to articulate the pre-conditions and write tests to assert or verify all pre-conditions.  Obviously, if you know all pre-conditions and if you check them one by one by hand, the next wise step is to automate all of them.  That is going to save your time and eliminate manual errors.
Let us assume that your source tables satisfied all pre-conditions and you initiated the ETL process.  When the ETL process completes, or the flight has reached the destination, it is necessary to run data quality checks.   This is a high-level health check. For this, you need scripts to verify row count, data integrity, aggregates (sum, average) and so on.  You can call it a sanity test.
The next step is about verifying all post-conditions.  These are the next level, detailed tests.  Many of these will depend on the pre-conditions and the data values present in source tables.  One way to handle this is to have a meta-table or an intermediate table and scripts to compute and store expected results corresponding to post-conditions. Using this table, you need to run another set of scripts to verify if your ETL process populated the target tables with the right set of data.  The meta-table or intermediate table is what holds your expected results.  The target tables hold the actual results.   Depending on the test strategy of your project, design the meta-table or intermediate table to hold as many rows or pieces of data.  In case of complex projects, you may need multiple meta-tables or intermediate tables.
If you are using manual tests to verify post-conditions, convert them in to scripts.  Populate the results of your scripts in a table or create a log file.  Write a script to analyze errors. This can save you lot of time and money.
Doing all these in small steps – incrementally and iteratively will help you adopt test driven development.  If you decide to do all this as a one big chunk somewhere in the project –mostly at project end, it is going to be classical phase driven approach.  Try TDD!
Are there other ways to implement TDD in ETL projects?  Let us discuss.

Related Posts:  Tips on Embracing Agility in BI Projects

Friday, June 20, 2014

Tips on Embracing Agility in BI Projects

Months ago I was in a discussion with professionals in my network about how to apply agile methods or agility in Business Intelligence (BI) projects.  This blog post is to share some of the key aspects involved in considering agile methods for BI or data warehousing projects.  In addition to implementing or considering methodology specific ceremonies – for example, Release Planning, Sprint Planning, Daily Stand-up, and the likes of Scrum, I think one must do certain things that are specific to BI or data warehousing projects. What are those? Read on.

1)      Agile Approach: BI Life Cycle has several key milestones (see images).  These are  ‘Tool Selection & Proof of Concept’, ‘BI Charter & High Level Planning’, ‘Data Extraction’, ‘Data Transformation’, ‘Data Loading’, ‘Reporting & Analysis’, ‘Governance’, ‘Enhancement’, and ‘DW/BI Maintenance’.  Some of these can be done in parts. For example, ‘Data Extraction’ in large projects can be done one week or two weeks at a time. How?  Data extraction from a set of tables from DataSource-1 can be done during week-1,  data extraction from the rest of the tables from DataSource-2 can be done during week-2 etc.,  The lessons learned during each week can be applied to the activities of the next week. This is a way to implement continuous improvement.   Also, this approach has the potential to improve visibility and predictability.  And it provides us an opportunity to do what matters (to business users) first.    This is how we embrace agile principles in BI projects. (Similarly, ‘Data Transformation’ and ‘Data Load’ and other ‘mini-projects’ can also happen in parts.)
BI Lifecycle
Data Extraction in Small Chunks

2)    Automation: In BI projects when we follow an approach like this, we must consider automation.  Automation is required to improve productivity and quality.  Automation can happen in small steps. Some of the candidates for automation include,

a.       Test Bed Setup (Data Cleanup & Loading)
b.      Test Execution
c.       Analysis of Test Results
d.      Referential Integrity Checks
e.      Validation using aggregates (sum, average, etc.)
f.        Data Quality Assurance

3)    Planning: It is critical to invest time in planning and creating a BI road map. This planning activity may require 2 or 3 months at the start of BI project.  One may argue that this is not ‘agile’.  However, one must agree that BI projects cannot have the objective of delivering something from the first week or even the first month.   A good amount of planning followed by an iterative (&incremental) approach helps.  So, a large BI projects can be seen as a project that starts with adequate planning (to arrive at a road map) followed by several small projects implementing agile practices. In fact, there are several myths on agile. One of them is 'Agile means no planning'.  Read my post 'Agile Myths and Misunderstandings' - this post links to a free PDF on this topic. Happy reading!

4)    Working Software and Feedback: When we adopt agile practices in BI projects, it is important to keep in mind that some iteration may not result in demonstrable software useful to ‘business users’.  This is because of the inherent nature of BI projects. BI projects may require several initial iterations to set up the target data source and populate data.   When data is ready and reports are working, end users can see the working product.  Until that time the technical architect or data architect is your end user and she consumes and provides feedback on what you deliver.

5)   BI Tools: Tools used in BI projects can be of two types - a) Commercial tools (for data extraction, loading etc.,) and b) homemade tools (small scripts to big routines).   Teams must be ready to see the potential of homemade tools and think deep, collaborate and create small tools that can help in several ways.   One approach is to identify at least 1 or 2 engineers who are ‘tool smiths’ in the team and encourage everyone in the team to come up with new ideas and tools. Tool smiths can help in implementing these ideas.

6)    Why Agile? Adopting agile practices in BI projects helps in identifying risks at early stages and also enables proactive thinking and preparedness for production release.   For example, iterative and incremental approaches help BI project teams estimate, measure and optimize the downtime required to launch the warehouse.   In classical or traditional approaches, this happens at the final lap of the project.

7)     Budget Control: Agile adoption is a feasible way to release BI project in parts (incremental manner) to the world (or business users).   This helps in budget control as well as optimization (in the form of process reuse or component reuse). Instead of delivering all 40+ reports in one stretch, you can deliver subset of prioritized reports in batches.  This will help you save budget in developing the low priority ones.

8)     Data Quality Assurance (DQA): DQA is one of the key activities in BI projects.  This is because unless we find the right steps (or processes) to identify or assess the quality of data flowing from different sources, the quality of data in the BI store or warehouse may get contaminated.   This is obvious. However, in practice, during the maintenance phase of BI projects, a number of defects reported by end users are found to be related to data quality issues.   Agile practices can be leveraged to identify potential data quality issues ahead of time and implement periodic checks to assess data quality (through automated scripts). 

When we adopt agile practices, we come across several opportunities not only to provide predictability and visibility but also to improve the level of automation and hence to deliver home grown automation tools (such as scripts for data validation or data quality assurance) to customers.  These automation tools carry the potential of providing long term value in BI projects.
What else do you do to improve agility in BI or data warehousing projects?
 Related Post:   TDD in ETL Projects

Monday, June 16, 2014

Architectural Considerations for Multi-Tenancy – Part 2

In the first part of this series I listed a set of questions to understand the architectural requirements when you want to enable multi-tenancy.  In this final part of this series, I have included three broad areas of architecture that influence multi-tenancy.
Data Architecture:    An ultimate approach to define data architecture is by considering a shared schema. In this approach, you store the data of all tenants in a single schema by including an identifier or column such as Tenant_ID to identify data sets corresponding to tenants.    Another approach is to have a schema per tenant.  This helps when you have to deal with large volume of data per client.  There is yet another approach – a simple and fundamental approach.  That is about having separate databases per client. Every approach comes with some pros and cons.
Code Base:  Can you afford to have the same code base for all tenants and maximize sharing so that you allow some percentage of code that is unique to each client?  Or do you want to keep the code bases separate and have them run on separate instances of the application server?  Which is right for you? It depends on the needs of your application and tenants.
Implementation View:  How is your application going to be deployed in production?  Is there a need to share instances? Or do you need to have an instance per tenant?  What are the run-time requirements? This is another broad area that is going to influence your architecture and design.
This makes it clear that the multi-tenancy needs of your application or product is going to determine how you are going to implement multi-tenancy.  There are different multi-tenant systems running in production across organizations. Their architectures and designs are different and they offer different degrees of multi-tenancy.  When you want to define an architecture and design multi-tenancy you need to look at these broad areas and ask questions – I had shared those in the first part of this series, and make your decisions.  When you do this, I am sure you will come up with a meaningful architecture.
Have you come across any difficulties or challenges in doing this? Let us discuss.

  1. Multi-Tenant Data Architecture
  2. Introduction to Multi-Tenant Architecture