Home / Blog / Agile Outsourcing

How to do Deep Dives

You are sitting down and talking about your problems to a complete stranger. They keep asking you – what made you feel like that. You answer but things start popping out. Your childhood traumas, that kid that bullied you, your first romantic rejection..

 

After a few minutes, you realize that you are in a therapy session. Oh, I’m sorry, it’s not you that I am talking about, it’s me. 

 

But, why? Why am I talking about therapy?

 

Because, doing deep dives (DDs) is very much like going on therapy. The only way to truly fix an issue is by identifying and solving the true underlying root cause.

 

 

Before diving into details, let’s take a look at why we should do deep dives. 

 

Welcome to agile therapeutics 101.

 

Why Should We Do Deep Dives?

 

There is an old idiom saying “the devil is in the details” and I deeply believe that it is true, especially in complex and dynamic systems and environments such as software development and digital products. 

 

To truly understand the issue, we need to get to the appropriate level of detail (of a problem or impediment, but more on that later) and identify the correct actions to fix it long-term. 

 

The best scrum masters are constantly in the deep. They are listening carefully to developers when talking about issues in daily meetings. They speak regularly to people in every team of the company, and click their way through Jira tickets, Confluence/Team and PR links to find details. 

 

However, top notch scrum masters are not only comfortable but also regularly deep dive into the details, technical and otherwise. They insist on details, just like my therapist.

 

Similar to how we timebox weekly Refinement sessions, we should find time to do DDs and troubleshoot issues during the sprint rather than waiting for Sprint Retrospective.

 

Reason is that during a sprint meeting, there can be many different issues happening at the same time and you are not able to address them all. Usually it is not enough to have a single broad discussion with everyone present in the meeting. 

 

On the other hand, some issues are usually involving only a few people from the team that were directly impacted by it or involved with it in some way. 

 

For others it may be only partly relevant discussion in best case or a discussion they cannot even understand or contribute to due to missing details in worst case. 

 

On yet another (third?) hand, why wait until the end of the sprint to fix things and make improvements in the first place?

 

Importantly, what kind of issues can you encounter during DDs?

 

 

Two Types of Issues

 

DDs are tied to and are executed on a very specific issue. We typically recognize two types of issues. 

 

An issue can be something that is increasing a risk for the team to fail delivering a specific work item on the board (just like my urge to insert jokes increases the risk of my boss telling me I am going over the line..)

 

It can be something quite obvious. For example, a developer doesn’t have access to a repository. 

 

Or it can be something quite subtle. For example, the communication is not open enough within the team and key details are not shared in time (which again reminds me to be more open with my therapist). 

 

These kinds of issues we know in Scrum as impediments

 

On the other hand, there are circumstances in which the team already failed to deliver a work item as expected.  

 

For example, a specific bug that has been reopened 2 times, a bug that cannot be reproduced on a local or dev environment but we try to fix it, a feature that was deployed but does not work (not even the happy flow), a feature deployed but much later than expected, etc. 

 

These kinds of issues we recognize as problems

 

Both kinds of issues are associated with specific work items that failed (or could fail) which is why we should enforce the DoD first. 

 

How to Do DDs?

 

The reason we do DDs is the same reason I go to therapy – find the root cause and then come up with a proper action to fix the identified root cause long-term. 

 

To start, you should look at a specific user story or a bug that failed and ask yourself ‘Why did this happen?’ Instead of going into despair (no, I will not insert another therapy joke, why would going into despair make you think it has any resemblance with me going to therapy?) you should expect an answer from the team.

 

The quick answer that comes immediately to mind is usually something stereotypical (e.g. team was not attentive enough, or communication was inadequate, etc.) 

 

However, such causes are what we call a “surface cause”. It’s not very difficult to identify the surface cause of why something failed but this does not provide any value. For instance, we were unable to meet a sprint goal because we had improperly estimated our tasks.

 

Characteristic of a surface cause is such that it does not result in a concrete action plan, hence there is no way to isolate it and solve it long-term. You need to go deeper than my accountant (what, you expected me to write ‘my therapist’?).

 

Therefore, to find a true root cause we need to get down into details. Simply, once you have your first answer to ‘Why did this work item fail?’ ask another ‘Why?’ and keep asking ‘Why?’ to your answers until you are at the root of the problem. 

 

Most teams are not successful in applying DDs because they do not bother to go deep enough but are satisfied with abstract root causes and abstract action points that cannot be executed nor measured. 

 

Status quo and everyone is happy but in reality nothing works (just like our political system).

 

 

How to Know When You Are Deep in Details Enough?

 

The answer to that is – when you come up with a very specific reason for why something failed and therefore a specific action of how to fix it long-term. 

 

If you’re looking at the source code to understand how we crashed the production or looking at the comments and speaking to people involved to understand how we missed crucial details relevant for the feature, you’re probably in the right spot. 

 

For example, a page won’t load in production but we notice that we have nullpointerexception in logs when we try to load the page. Further investigation reveals that the user’s missing data is what led to nullpointerexception (happy childhood is my nullpointerexception).

 

When you determine a specific reason, all that is left to do is to create an action plan to fix it. Be concrete, assign actions, agree with the team on how and when it will be delivered and make sure your team does not get into the same situation again.

 

Detailed Instructions for Doing DDs

 

Here are the steps for doing DDs.

 

1. Identify at least 3 work items which recently failed the DoD. Having work items that fail DoD (whether on PR or User Story level) are normal results of enforcing DoD within your team.

 

 

2. You need to have the right mindset. During DDs you want to be:

 

Data driven: don’t jump to conclusions! every conclusion should be supported by concrete, proven facts, not hearsay or intuition (for example, don’t assume I’m crazy just because I mentioned my therapist a couple of times)

 

Observe the big picture: observe the problem as a whole, and not just one specific instance of a problem and its context that was brought to the attention. Ask yourself:

 

“Is the observed problem the actual problem or is there a deeper problem behind it that is causing it?”

 

“Can this same problem happen to us on other work items as well?”

 

To be successful at DDs, you need to be:

 

Value-oriented: look at every problem with the eyes of your customer and understand impact! Don’t justify and demean the problem with metrics, formalities, politeness.


Farsighted: don’t just fix the problem for a particular work item. Instead, put an  improvement in place so that a whole category of similar problems will be prevented. Prevent your team from facing the same issues again.


Quantitative: quantify the best way possible the value that team would get out of solving each particular issue. 

 

Quantification can be done as the percentage of improvement. For example, out of 10 issues, 3 have this root cause. This means that we would improve in this specific regard by 30% if the issue is fixed long-term.

 

The improvements that are identified can be formulated in terms updated to DoDs on needed level. Which concrete DoD checklist should have prevented this issue from occurring? Does that DoD need to be

 

improved? (i.e. we should add a new item to the DoD in order to prevent these kind of miscommunications)

 

Or

 

enforced? (i.e. our current DoD is good enough and would have caught this issue but it was not properly enforced).

 

3. For each work item, do a deep dive to identify the root cause of the issue. To get to the right depth, you can use 5 whys. Simply, ask yourself why the work item failed the DoD. When you have the answer, ask another why, repeat until at least 3-5 whys were asked.

 

4. When you’ve identified the true root cause for the issue, build an action plan to make the improvement that will fix this and all future similar category of issues. 

 

Which leads us to four types of actions that are at your disposal.

 

Four Type of Actions

 

Since the whole exercise is based on specific work items, the action plan is expected to be very specific (certainly not stuff like “let’s just communicate better next sprint”. Inconcrete and not measurable improvement and no action plan!). There can only ever be four type of actions:

 

1. Improve/Enforce a DOD

 

Without an enforced DoD, nobody knows when tasks are really done and what their responsibilities are. A good DoD needs to be objective and documented (just like my psychological diagnosis).

 

You need to do daily standups and retrospectives based on quality. Important step towards that goal is to enforce the DoD.

 

Another important thing is to perform gemba walks and observe each aspect of work in action. Lastly, you need to personally review every single PR or delivered user story for quality.

 

2. Improve the Process for the Team 

 

Without clear and well understood team processes, people tend to make assumptions, oversights and improvisations that can negatively impact a team in one way or another.

 

We see that the deploy process isn’t performed properly, i.e. we’re doing it manually which produces some man-made mistakes. We missed a few steps, for instance.

 

The solution is to introduce an automatic deploy script that will execute all the steps and eliminate the possibility of a human error. This applies to build, deployment, grooming, daily standup, just to name a few.

 

3. Improve the Product or Codebase 

 

Even though we can have perfect procedures and enforce DoD for every single work item, we must ensure that we are building on a strong foundation (which product or codebase is) because if not, everything will eventually crumble (just like reality does when one of my delusional episodes kick in).

 

We have the same functionality implemented in several places in code. Which is why, every time we want to replace something about that functionality we need to change it in all those places. 

 

It happens often that we forget to substitute in all places. Refactoring the code to make the implementation strictly in one location in the code is the solution. Here is where we make modifications as necessary.

 

4. Coach the Individual Team Member 

 

Lastly, developers frequently send tasks to testing without first testing them themselves. This in effect creates a lot of repetitions (developer sends the taks, it’s returned to be improved several times). 

 

Teaching a developer to play a win game and thoroughly test the task himself before sending them untested is the solution to this problem. 

 

We can now look at a real example of doing a DD.

 

Real Example of a DD

 

Let’s imagine that we have a web application that enables people to place online meal orders for delivery. Its primary functionality is to allow users to enter their home address, and the application will then provide a list of restaurants that deliver nearby.

 

Let’s say we are having trouble since we are not getting a result after entering a new address. To solve the issue, we need to start asking questions. The first why can be: Why are we not getting a result after entering a new address?

 

After we take a look at the database records, we realize that the address has totally not been verified and associated with nearby restaurants that deliver. What a shocker! The second why may be: Why haven’t the addresses been verified in the database?

 

That should not happen as we have the hooks that capture and calculate addresses. And by checking the logs, we quickly get our answer: hooks indeed were never triggered. Curiosity peaks, we certainly shouldn’t stop here. We can ask the third why: Why the hooks have not been triggered? 

 

Once again, after debugging line by line of code we get the answer: The error is within the code; one statement was changed as part of a new feature we built that had an unexpected impact on the hook trigger scope. Again, we probe further with the fourth why: Why did this error go by us and was not fixed before the release? And soon enough we discover the root cause:  

 

In the regression testing phase (which serves as an insurance that the increment does all the critical functionalities without disturbing the user) nobody tested the pretty common case that new users will order. The testers only used the current addresses throughout the testing process.

 

As you can see, after you’ve gotten to the crux of the matter, you may get to the root cause and most importantly the right actions that will solve the problem long term. Here are such actions in this example:

 

Update regression test set to include a test case related to restaurant search using a fresh new address, never used before in a system.

 

The Bottom Line

 

Getting into details and fixing issues long term are crucial actions when it comes to DDs. 

 

DDs are excellent for avoiding issues and investigating those that already exist. In other words, they can be applied to overcome impediments and problems.

 

Therefore, be sure to probe deeply enough to identify the main causes behind surface-level issues. You’ll be able to avoid problems in the future and greatly simplify your life as a result.

Schedule a free consultation (or a deep dive) with us if you require assistance with software development or simply want to hear our thoughts!

Let’s talk