Simplifying the Complexities of Institutional Risk
Background
One of the largest banks in the United States asked us to help completely overhaul the way that they manage institutional risk. Ever since the 2008 Financial Crisis, banks such as this one have been tasked with managing and reporting their risk to the U.S. government with greater transparency. After almost a decade, the bank found that it had created a byzantine process for documentation that took the whole year and required an enormous task force to power it. This process was run on an old legacy system that made nobody happy. Control managers, i.e. the employees that had to document the risk inside this tool, thought it slow and difficult to navigate. Directors were dissatisfied with inconsistent and poor quality of information that they were able to pull from it as well as the fact that the firm only looked at risk once a year. They wanted to move towards a model where control managers could review risk data on a more regular basis and take action to fix any risks as they happened.
Where We Came In
For an effort this large, the bank had been planning for almost a year before they hired Pivotal. They had planned out a full service design overhaul and were planning on restructuring the jobs of existing control managers. They specifically wanted my team to build out “Triggers,” a module within the rest of the risk management suite that would function as a sort of alarm system, letting them know if any risks were elevated or if any of the measures meant to reduce risks were failing.
My Role
I was the primary designer on this part of the product. We plugged into a larger suite of risk management products all getting built at the same time. On my product team, I worked with 2 PMs, 1 client designer who was mostly staffed on another product in this site, and a team of 6 developers.
De-risking Foregone Conclusions
We were skeptical of a lot of the assumptions made up to this point. Here are the key ones that we wanted to tackle.
There had been no real test of the new way control managers were supposed to assess risk. People generally assumed that this assessment would be very complex and visual (aka lots of graphs).
The directors wanted the software to dictate how control managers would perform their new jobs instead of coming up with a good way of performing control management and then building the tool around that process.
The directors wanted to build out all of the software and release the new software/turn off the old software to everyone all at once.
Because this was such a dense domain, it took a lot of trial and error to get things to a place where we could even run some experiments. Notably, it took us a long time to identify a real set of end users (not just the managers of the end users), and it took us a long time to understand the jargon of risk management, such that we could understand the problems that users faced. After we managed to establish that baseline through exploratory interviews, we ran through this general experimental setup:
Identify a proposed workflow and the assumptions behind it.
Do some exploratory interviews to understand how whether those assumptions are true and to understand the context around the workflow.
Come up with a lean prototype that tackles some of their pain points and needs for that process.
Test it with those same users.
Repeat steps 2-4.
Example of this Process
1: Assumption: Users will want to see a dashboard with all of their triggers (aka risk alarms) with a lot of sophisticated graphs telling them which triggers to prioritize first. Then they will perform a complex analysis on each one to see why it’s breaching and then decide whether to heighten the related risk or dismiss the breach.
2: I worked with the PMs to come up with the learning goals behind our initial set of interviews. We wanted to find out how control managers prioritized their breaching triggers, how they analyzed each breach, and how they decided whether to heighten the risk or dismiss it. I then wrote the interview script with non-leading questions about examples of how each control manager dealt with these situations in the past and interviewed the control managers.
We found out that control managers prioritize breached triggers by most recent to oldest, look at whether a trigger has breached any other times in the recent past, and call other people to dig into the reasons why the underlying data is causing the trigger to breach. They didn’t need a lot of complex graphs. They mostly just looked at whether the value of the trigger was above or below some thresholds they set.
3: Based on those findings, we came up with the following workflow for the control manager:
Breach Management Page
Breaches would be ordered by most recent to oldest. Only breaching triggers would appear on this page.
Trigger Detail Page
Once you clicked on a specific breach, you would be taken to the detail page. Our initial idea was that the current value + how long the trigger had been breaching would tell a CM whether or not they needed to take action. We also included an audit log, so that CMs would know what actions had been taken on the trigger in the past.
4: We tested these two pages of the workflow many times, since this was the main happy path for a control manager to assess a trigger breach. A couple of key findings from multiple rounds of research and building were:
Control Managers didn’t spend a lot of time prioritizing which breach to look at first. They knew they had to get through the whole list, so they just went from most recent and looked at one at a time.
Control Managers didn’t care about the current value as much as the general historical trend of whether or not that trigger had breached a bunch of times before.
Control Managers wanted to see if this breach had a domino effect on other triggers and risks.
Control Managers wanted to see what specific actions had been taken to resolve previous breaches and they didn’t want to scroll through the audit log to find it.
Subsequent Version:
Newer Breach Management Page
- Reduced the amount of information about each breach to just the essentials
- Took out the number badge for each tab
- Renamed the tabs to be more understandable
Newer Trigger Details Page
- Added in more linkage information and direct links to related items
- Added the ability to hover over a breach on the timeline and see the resolution comment
- Represented current value on the timeline at the same size as all of the other data points
- Added in more information about potentially affected risks