Why is CSAT (Customer Satisfaction Score) expensive to measure?

CSAT is expensive on four dimensions. First, response rates are catastrophic - typically 5-15% for B2C surveys and often lower for B2B, meaning you've literally not measured 85-95% of your customers. Second, the customers who do respond are systematically biased (unusually happy or unusually unhappy), making the average unrepresentative. Third, enterprise CSAT programs (survey platforms like Qualtrics or Medallia, plus integration engineering, plus weekly score review meetings) typically run six figures annually. Fourth, the metric is delayed - by the time CSAT scores tell you something is wrong, your churn has already happened. The combined cost-to-insight ratio is poor compared to behavioral alternatives.

Why does CSAT misdirect organizational investment?

Beyond the mechanical problems of CSAT (low response rates, sampling bias, high cost, delayed signal), the deepest issue is strategic: CSAT points organizations at the wrong problem. Because CSAT is measured almost entirely at the point of support interaction, the answer to 'how do we improve our CSAT?' almost always becomes 'improve support' - better scripts, more agents, smarter chatbots, deeper knowledge transfer to support staff. But the real way to improve customer satisfaction is to reduce the need for support in the first place: a clear product, self-service that actually self-serves, in-flow onboarding, well-written documentation, and a community or brand that creates loyalty before any ticket is opened. When the product is built right, support becomes a small, sharp team handling genuinely special cases - which is the only context where deep personal service and tight product-feedback loops are possible. A small support team is not a deficit; it is a signal that the product is doing its job. CSAT pushes investment in the opposite direction - more support headcount, more support tooling - making the support function look like the problem when the product is actually the leverage point. CSAT does not measure the product; it measures the recovery from the product's failures. Behavioral metrics (usage, churn, online reviews) push investment back toward the product, where the leverage lives.

What are cheaper, more honest alternatives to CSAT?

Five behavioral alternatives that are cheaper and more accurate than survey-based CSAT: (1) Behavioral usage data - what customers do is more honest than what they say; daily active usage, feature adoption, session length, time-to-value tell the real story; (2) Support ticket patterns - volume, sentiment, recurring topics, resolution time give real-time signal; (3) Online review crawling - public reviews on Trustpilot, G2, App Store, Google Reviews, Reddit; sentiment analysis on the last 6 months of reviews surfaces honest customer truth; (4) Churn analysis - cohort retention curves and exit interviews give ground truth no survey can; (5) Customer Effort Score (CES) - Matthew Dixon's HBR research (2010) shows CES is a stronger loyalty predictor than CSAT and asks only one question. Fred Reichheld, the creator of NPS, made a similar pivot in his 2021 book Winning on Purpose, introducing Earned Growth Rate based on behavioral data rather than surveys.

How do departmental KPIs create internal civil wars?

When departments are given metrics set in isolation, the metrics often create destructive interactions. The textbook example from fintech: the Risk department is measured on minimizing risk (low fraud, low defaults); the Product team is measured on successful onboarding (high signup-to-active conversion). In isolation, both KPIs are defensible. Together: Risk tightens onboarding criteria to lower fraud, rejecting more applicants; Product responds by spending more on acquisition with broader targeting to maintain conversion volume. The customers Product is acquiring are increasingly the ones Risk will reject. CAC rises. Risk-adjusted onboarding falls. Both teams show wins on their dashboards while the company is burning money on the wrong customers. Conway's Law, applied to measurement: organizations build internal warfare when their measurement structures don't talk to each other. Same pattern hits Sales vs Customer Success, Engineering vs Product, Security vs Engineering.

What was the Wells Fargo cross-selling scandal?

Between 2002 and 2016, Wells Fargo set aggressive cross-selling KPIs for branch employees - including the famous 'Eight is Great' target (eight Wells Fargo products per customer). The metric was supposed to indicate deep customer relationships. What it actually produced, by the time it was uncovered, was approximately 3.5 million unauthorized accounts opened in customers' names by employees trying to hit their numbers. Final cost to the bank: over $3 billion in fines from the CFPB and other regulators, multiple CEO terminations, a Federal Reserve growth cap that lasted years, and lasting reputational damage. The scandal is the cleanest recent demonstration of Goodhart's Law in corporate life: the KPI did exactly what executives asked, but the metric, having become the target, no longer measured what it was supposed to measure.

Why is measuring AI adoption by token usage a KPI trap?

Measuring AI adoption by tokens consumed (or calls per employee, or active seats) is a textbook Goodhart's Law trap: token volume is an activity-and-cost metric, not an outcome metric, and the moment it becomes the target, teams optimize for consumption rather than value. Prompts get more verbose, simple tasks get routed through multiple model calls, and content gets auto-generated whether or not anyone uses it, so the usage dashboard turns green while the monthly bill climbs and nothing measurable gets faster or better. The metric also inverts the incentive: the disciplined user who solves a problem in one tight prompt looks like a laggard next to a colleague burning ten times the tokens for the same result. The fix is to measure outcomes instead of activity - tasks completed per unit of time, cycle-time reduction on a named process, error and rework rates, hours returned on real work - while tracking token spend separately as a cost to manage, never as proof of success. Adoption is not how many tokens you burned; it is how much better, faster, or cheaper the work became because of them.

What are the three roles of KPIs in a healthy organization?

Healthy organizations operate three distinct categories of KPI, each serving a different role and feeding the next: Outcome KPIs, Execution KPIs, and Foundation KPIs. (1) Outcome KPIs (also called Business KPIs) measure growth and profitability - revenue, gross margin, CAC, LTV, retention, runway. They are lagging indicators; by the time an Outcome KPI shows a problem, the underlying cause has been happening for months. (2) Execution KPIs (also called Process KPIs) measure effective resource management and output quality - cycle time, defect rate, onboarding completion, time-to-resolution, throughput per engineer. They are mid-stream indicators that show up weeks before Outcome KPIs respond, but in isolation they can produce operational efficiency at producing the wrong thing, and trigger the departmental civil wars described elsewhere in the article. (3) Foundation KPIs (also called Culture KPIs) measure the conditions that make innovation, sound decision-making, authenticity, professionalism, and personal growth possible. Behavioral examples (not survey-based) include innovation velocity (time from internal idea to experiment), decision quality signals (percentage of strategic decisions with documented rationale findable six months later, cross-functional involvement in major decisions), authenticity and professionalism (cadence of one-on-ones held, time between issue raised and addressed in writing, public disagreement frequency in leadership forums), and growth (internal mobility rate, learning time per quarter, percentage of senior roles filled from internal promotion). The hierarchy: Foundation enables Execution enables Outcome. Most organizations invest in the opposite order - Outcome KPIs first, Execution second, Foundation not at all - which is why their Execution and Outcome metrics slowly degrade and nobody can explain why. The smallest meaningful stack a growing company should run is one or two metrics in each layer, coupled so that movement in one is visible to the others. Six or eight metrics total, not 47.

Why is GTO so hard to follow in live poker - and what does that teach about KPI execution?

GTO (Game Theory Optimal) is almost impossible to execute correctly in a live game, and this is the most underrated lesson the poker world offers business measurement. The framework is beautiful on a solver screen at 3am; at a live table it collapses against real conditions. Cognitive load: GTO is not a strategy but a library of thousands of decision points - retrieving the right answer in 15 seconds under social pressure exceeds human memory. Mixed strategies: many GTO-correct plays require true randomization (do X 65% of the time, Y 35%) - humans cannot generate genuinely random sequences and develop exploitable patterns over hundreds of hours. Bet sizing precision: solvers prescribe specific sizes (47% pot, 73% pot, all-in) but chips come in fixed denominations, forcing estimation error that compounds. Multi-way pots: most solvers are heads-up but real games are 3-, 4-, or 5-way at the flop, where multi-way GTO is not fully solved. Missing data: online players have HUDs but live players have only memory and tells, breaking the data layer GTO assumes. Fatigue: live sessions run 8-12+ hours, degrading prefrontal cortex performance by hour six. Tilt: bad beats and stress collapse the framework into emotional play. The business parallel: a well-designed KPI framework (Three Roles, Six Principles, clean data foundation) is exactly as hard to execute as GTO at a live table. The cognitive load of remembering coupled metrics, ownership, review cadence, deprecation, and signal interpretation exceeds what tired managers can do well at hour seven of a long week. The fix is not 'design a better framework' - it is design a framework simple enough that tired humans can actually use it, supported by data foundation and proactive alerting that lets the system do the work humans can't. Six metrics the team actually understands beat 47 nobody has time to interpret. Alerts that fire when something changes beat dashboards that nobody opens. If your KPI architecture cannot survive contact with a tired human at 4pm on a Tuesday after a hard meeting, it is a solver output, not a strategy.

What is exploitative play in poker and how does it apply to KPIs?

Exploitative play is poker's most important contribution to business measurement thinking - and it is the missing half of the GTO conversation. GTO (Game Theory Optimal) is the unexploitable baseline strategy: a player following GTO cannot be systematically beaten by an opponent also playing GTO. But against opponents who are not playing GTO - which is to say, almost everyone in real games - GTO leaves money on the table. Exploitative play is the deliberate deviation from GTO based on opponent-specific reads, designed to extract maximum value. Concrete examples: against a player who folds too often, bluff more frequently than GTO prescribes (you win more pots even though some bluffs are technically 'incorrect'); against a calling station who never folds, value-bet thinner than GTO prescribes (you extract more money even though some bets are marginal); against a hyper-aggressive opponent, tighten your defending range below the GTO recommendation (you concede small pots to avoid big losses). The most profitable poker players maintain a strong GTO baseline AND deviate exploitatively based on what they read at their specific table. GTO is the floor; exploitative play is the ceiling. The business translation: a well-designed KPI architecture (Outcome / Execution / Foundation KPIs with proper coupling) is your GTO baseline - it prevents you from being obviously wrong about what to measure. But the framework alone will not win the game. The ceiling comes from exploitative deviations based on what is happening in your specific market: a competitor bleeding talent makes Foundation KPIs around retention temporarily disproportionate; an incoming regulatory change makes risk KPIs sharper than peers'; a high-LTV acquisition channel demands a metric that explains why even if it wasn't on your standard list. The discipline is strong baseline, deliberate deviations. A company using only the framework will be unexploitable but will not maximize value; a company using only exploitative thinking will catch local opportunities but lose to its own inconsistency. The skill is holding both at once.

Can a well-chosen KPI still fail an organization (the GTO lesson)?

Yes - and this is the most uncomfortable failure mode of KPI architecture, because nothing was gamed and no one cheated. The metric was honored, the framework was respected, and the business still struggled. The analogy from high-stakes poker is instructive: the dominant strategic framework in modern poker is GTO (Game Theory Optimal), which prescribes a mathematically correct play at every decision point. Top pros spend years studying GTO solver outputs to align their decisions with the theoretical optimum. The community optimizes for one metric: decision quality measured against GTO. Yet most serious GTO students at mid-stakes are losing money. The metric (GTO purity) has become disconnected from the goal (profitability). The game itself is not 'play the mathematically correct move' - the game is 'make money.' A pro chasing GTO purity often misses the factors that actually determine whether money moves: the wrong table choice (GTO-correct play against weak opponents wins less than exploitative play targeting their specific leaks), variance and bankroll management (positive-EV plays can bankrupt you before the long run arrives), emotional state (mathematically correct decisions made on tilt still lose money), and game selection over decision selection (the biggest profitability decision is which table to play at, not which action to take). The business lesson: even a well-chosen KPI optimized correctly can fail to deliver the outcome it was supposed to indicate. If your company is doing everything right by the framework and still struggling, the framework is incomplete. The KPI is measuring a proxy for success and the proxy has drifted from what success actually requires. A healthy organization treats KPIs the way a sophisticated poker player treats GTO - as a useful tool, not a destination. Use the framework. Watch the numbers. And always ask the prior question: are we playing the right game?

What data prerequisites are needed before defining good KPIs?

Before any KPI architecture can work, six data-foundation prerequisites must be in place: (1) Data modeling that matches the business - schemas that capture events at the right granularity, entity relationships that reflect how customers actually move through the product, and time-series tables that preserve state changes. If 'active user' is defined three different ways in three tables, no KPI built on it can mean what you think. (2) Instrumentation - events captured reliably at the right time, with monitoring for client-side failures (ad-blockers, privacy tools), server drops during traffic spikes, and schema-change breaks. Most data quality problems are collection problems, not analysis problems. (3) Single source of truth - or an explicit, written map of which system is authoritative for which metric (CRM, analytics, finance, customer success). Without this, every meeting starts with reconciliation and ends without decisions. (4) Metric definitions written down and shared - a data dictionary covering active user, churn, conversion, engagement, and every other term used differently across teams. Most metric disputes are definition disputes wearing a measurement costume. (5) Data ownership and quality monitoring - clear owners for each metric's data quality and modern observability tools (Monte Carlo, Bigeye, Anomalo) or simple freshness checks so pipeline failures don't surface in board meetings. (6) Governance - a light intake process for new metrics and a quarterly review where unused or contradictory metrics are retired, so the metric stack doesn't accrete into 400 contradictory KPIs. None of this is technically hard. The blockers are organizational: nobody owns the data layer, nobody writes the definitions down, nobody wants to tell the VP that their favorite KPI is built on broken data. Most KPI projects fail at this foundation layer, long before any metric mis-design. Build the foundation before you build the floor.

How do you fix misaligned KPIs?

Five principles for healthier KPI architecture: (1) Couple metrics across functions; never measure in isolation - if Risk gets a 'minimize risk' KPI, give them a coupled 'approve N qualified onboardings' KPI. If Product gets 'maximize onboarding,' couple it with 'of those onboardings, X% must pass risk review and Y% must be active at day 90.' (2) Prefer behavior over surveys by default - data the user produces by action is harder to fake than survey responses. (3) Sunset metrics on a regular cadence - a KPI unchanged in two years is either solved or unchangeable; delete either way. (4) Pair every 'go' KPI with a 'stop' KPI - growth metrics without quality or risk metrics produce Wells Fargo; stop metrics without go metrics produce risk-averse paralysis. (5) Measure the cost of measurement - if the time and budget the measurement stack consumes exceeds the marginal value of the metric, you have a measurement-stack problem; the fix is fewer KPIs, not more.

What is proactive KPI monitoring and how does it differ from traditional BI dashboards?

Traditional BI monitoring assumes a developer builds dashboards and humans watch them periodically - usually in weekly review meetings. The data is stale by the time anyone looks at it. Proactive KPI monitoring inverts this: instead of humans watching dashboards, the system watches itself and alerts humans when thresholds are breached or patterns shift. A spike in churn triggers an alert. A sudden drop in onboarding triggers an alert. An unexpected surge in conversion from a new source triggers an alert (potentially a positive signal worth investigating). The dashboard remains as a forensic tool you visit after the alert, not as a constant attention sink. Benefits: cheaper (no full-time BI watching), faster (real-time signals vs weekly meetings), and returns team attention to growth work instead of monitoring. Tools that enable this: Datadog, Grafana with alerting, anomaly detection in modern BI platforms like Sisense or ThoughtSpot, AI-powered pattern detection, and custom threshold-based notifications via Slack or email. The shift is philosophical as much as technical - the goal was never to monitor; the goal was always to grow. Most companies under 500 employees can stand up effective proactive monitoring without hiring a single BI developer.

Is the measurement stack itself a source of risk?

Yes - and it is one of the largest sources of invisible organizational risk in most growing companies, precisely because almost no one measures it. The measurement stack consumes engineering time (instrumentation), platform budget (BI tools, survey software), and senior attention (review meetings, KPI debates). It produces internal civil wars through misaligned departmental KPIs (Conway's Law of Measurement). It exposes the company to Goodhart's Law dynamics every time a metric is tied to compensation. And it makes leadership confident in numbers that systematically understate or overstate reality (CSAT being the worst case). A five-question diagnostic: (1) Do departments achieve their KPI by harming another department's KPI? (2) Is more than 15% of engineering or ops time spent on dashboards or instrumentation? (3) Are leadership KPIs unchanged from two years ago? (4) Do CSAT KPIs run on under-20%-response-rate surveys? (5) Could an employee hit their KPI by doing something the company would consider unethical? Three or more yes answers means the measurement stack itself is a hidden risk.

Risk Management · Critical Essay

The KPI Trap: When Your Measurement Stack Becomes the Risk

Q: What is Goodhart's Law and how does it apply to KPIs?

Goodhart's Law, originally formulated by British economist Charles Goodhart in 1975 and popularized in its current form by Marilyn Strathern in 1997, states: 'When a measure becomes a target, it ceases to be a good measure.' Applied to KPIs: any metric you choose to optimize for stops measuring what it originally measured, because the people being measured will optimize for the metric directly - not for the underlying thing the metric was supposed to indicate. The Wells Fargo cross-selling scandal (3.5 million unauthorized accounts opened to hit cross-sell KPIs) is the textbook recent case. The same principle, articulated by Donald Campbell in 1976 as Campbell's Law, explains why every quantitative indicator used for decision-making eventually gets corrupted by the people being measured.

May Mor

• May 22, 2026 • 33 min read

Quick Answer

Modern KPI culture has flipped from servant to master. Organizations now spend significant resources monitoring metrics that don't move outcomes - and worse, the metrics themselves frequently become the source of risk. Goodhart's Law warns: when a measure becomes a target, it ceases to be a good measure. The Wells Fargo $3 billion scandal (3.5 million fake accounts opened to hit cross-selling KPIs), the UK PPI mis-selling crisis (£30+ billion in compensation, driven by sales-volume KPIs), and most cross-functional civil wars between Risk, Sales, and Product teams trace to the same root cause: KPIs that look reasonable in isolation become destructive in the system. The newest version of the same mistake is already spreading: companies measuring AI adoption by tokens consumed rather than by the productivity or quality the AI was supposed to deliver.

This article makes six arguments: (1) Some popular KPIs cost more to measure than the insight they produce - CSAT being the worst offender, both because of its mechanical problems AND because it strategically misdirects companies toward fixing support instead of fixing the product; (2) Behavioral signals (usage data, support patterns, online reviews, churn) are both cheaper and more honest than surveys - and push investment toward the product, where the real leverage lives; (3) Departmental KPIs set in isolation create internal civil wars - the textbook case being Risk minimizing risk while Product maximizes onboarding, leading to acquisition spend on the wrong customers; (4) The biggest hidden risk in most growing organizations is the measurement stack itself; (5) The era of BI-developer-built dashboards that humans stare at is ending - the new model is proactive alerting, where the system watches itself and pings humans only when something significant happens (good or bad); (6) Some of the most valuable investments have no short-term KPI at all - internal knowledge bases, human wellbeing, manager training - and a healthy KPI architecture knows which work to measure carefully and which to invest in on strategic logic alone, without demanding a dashboard line that moves every quarter.

Underneath all six arguments sits a question most KPI conversations skip entirely: is the data trustworthy? Data modeling, instrumentation, single source of truth, written metric definitions, data ownership, and governance are the prerequisites for any of this to work. Most KPI projects fail at this foundation layer, long before anyone has a chance to misuse the metric.

The article also lays out the three roles of KPIs: Outcome KPIs (growth and profitability - lagging indicators), Execution KPIs (resource management and output quality - mid-stream indicators), and Foundation KPIs (innovation velocity, decision-quality signals, authenticity, professional growth - leading indicators most companies skip entirely). The names teach the hierarchy: Foundation enables Execution enables Outcome - the compounding logic runs bottom-up. Most organizations invest top-down and wonder why their numbers slowly degrade. Six or eight metrics total - two in each layer, coupled so movement in one is visible to the others. Not 47.

And there is one final argument worth flagging: even a well-chosen KPI optimized correctly can fail. The poker world's GTO framework prescribes "mathematically correct" play, and yet most pros studying GTO are still losing money. The metric (decision quality) has drifted from the goal (profitability). The same trap exists in business: a company can be hitting all the "right" numbers and still losing the actual game. The article closes by explaining why - and what to ask when this happens.

Author: May Mor - Operating Architect. I help operators align their people, systems, and processes so growth scales the business instead of breaking it. M.Sc in AI, 10+ years inside regulated fintech, where she built credit-and-risk infrastructure for a digital bank and lived this Risk-vs-Product civil war from the inside.

The Measurement Industrial Complex

I spent a decade inside a digital bank watching one pattern repeat: the leadership team would gather around a dashboard, point at a number that had moved, and spend the next forty-five minutes guessing why. Not analyzing - guessing. The data underneath the number had been quietly broken for two months, and no one knew.

The day I realized the dashboard was the problem, not the cure, is the day this article became necessary.

Fred Reichheld critiqued surveys and gave us NPS. John Doerr argued for ambitious objectives and gave us OKRs. Cassie Kozyrkov made the case for decision rigor over analysis. Each was right about a piece of the puzzle. This article is about the piece they leave on the floor: the measurement stack itself has become one of the largest sources of invisible risk inside growing companies, and it is the conversation almost nobody is having.

There is a quiet, expensive industry inside most growing companies. It does not appear in the customer-facing P&L. It does not appear in the headline KPIs reported to the board. But it consumes real budget, real time, and real headcount.

It is the measurement stack.

The dashboarding platforms (Tableau, Looker, Power BI). The survey infrastructure (Qualtrics, Medallia, SurveyMonkey). The analytics implementation (engineering hours instrumenting events). The weekly KPI review meetings, multiplied by 30 stakeholders. The OKR consultant the board insisted on hiring.

The global business intelligence and analytics tooling market reached approximately $30 billion in 2024, growing at over 10% annually. That is just the tooling layer. It doesn't count the human time spent producing, reviewing, and arguing about the numbers - which, in a mid-size company, easily exceeds the cost of the tools themselves.

For most companies under 1,000 employees, this is not a problem of insufficient measurement. It is a problem of misallocated measurement. Time spent watching dashboards is time not spent shipping product. Time spent debating CSAT scores is time not spent talking to actual customers.

The hidden assumption underneath modern KPI culture is that more measurement creates better decisions. The evidence suggests the opposite, more often than is comfortable.

And the classical BI model - "developer builds the dashboard, humans watch it, decisions follow" - is showing its age. In practice, nobody watches the dashboard often enough; by the time a weekly review meeting notices a problem, the data is days or weeks old. The new model, which we'll return to in the principles section, inverts the relationship: the dashboard becomes a forensic tool you visit after something is flagged, and the day-to-day work runs on alerts. The system watches itself. Humans get pinged only when something matters - good or bad - and spend the rest of their time building, not staring.

When the Metric Eats the Mission (Goodhart's Law in Action)

In 1975, the British economist Charles Goodhart published an observation about monetary policy that has since escaped its origins and infected almost every domain of measurement. Marilyn Strathern's 1997 generalization is the version most people quote today:

"When a measure becomes a target, it ceases to be a good measure."

Translation: any metric you choose to optimize for stops measuring what it originally measured, because the people being measured will optimize for the metric directly - not for the underlying thing the metric was supposed to indicate.

The American social psychologist Donald Campbell said the same thing in 1976, just longer:

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

This is Campbell's Law. It is older than every KPI dashboard you have ever seen, and it remains undefeated.

The Wells Fargo $3 Billion Lesson

The most spectacular recent demonstration of Goodhart's Law in corporate life is the Wells Fargo cross-selling scandal. Between 2002 and 2016, the bank set aggressive cross-selling KPIs for branch employees - the famous "Eight is Great" target, meaning each customer should hold eight Wells Fargo products. The metric was supposed to indicate deep customer relationships and high loyalty.

What the metric actually indicated, by the time it was uncovered, was that employees had opened approximately 3.5 million unauthorized accounts in customers' names to hit their numbers. The scandal cost the bank over $3 billion in fines, multiple CEO terminations, a Federal Reserve growth cap that ran for several years, and a reputational hit that still drags on the franchise nearly a decade later.

The KPI did exactly what executives wanted. Cross-selling numbers went up. But the metric, having become the target, no longer measured what it was supposed to measure. It measured something else: the willingness of pressured employees to commit fraud.

The UK PPI Scandal

A second example, even larger in dollar terms: the UK Payment Protection Insurance scandal. British banks set aggressive sales-volume KPIs on PPI products from the late 1990s through 2010. Branch staff hit their numbers. The product itself was often mis-sold to customers who did not qualify or did not understand what they were buying.

Final cost to UK banks in compensation and fines: over £30 billion. To put that in perspective, that compensation bill is roughly equivalent to two full years of total UK consulting industry revenue.

The Modern SaaS Version

If banking examples feel too far from your world, the same pattern shows up in tech every quarter. A growth-stage SaaS company sets a single Outcome KPI on its growth team: monthly active users. The team responds rationally. They aggressively expand definitions of "active" (a notification opened counts), they ship features designed to manufacture engagement rather than deliver value, they buy paid traffic that pads the top-of-funnel and quietly churns at 90 days. The MAU number climbs every month for six quarters. Revenue does not. Retention degrades. The board, looking at MAU, congratulates the team. The CEO eventually fires the head of growth and replaces them with someone who promises a "more disciplined approach to engagement metrics." The new head of growth runs the same play with different vocabulary.

This is Goodhart's Law in pure form, and the pattern occurs across SaaS, marketplaces, fintech, e-commerce, ad-tech, and creator-platforms. The dollar costs are smaller than Wells Fargo's. The structural cost is the same: years of misallocated capital, talent burned out on numbers that did not produce the business they were supposed to indicate. The article you are reading is the framework for catching this before it costs you a CEO firing.

The AI-Era Version: Token Spend as a Vanity Metric

The newest instance of this trap is barely a year old. As companies rushed to "adopt AI" through 2025 and 2026, most needed a number to prove the adoption was real, and they reached for the one their vendor dashboard made easiest to pull: tokens consumed. Total tokens, calls per employee, seats actively generating. The metric was meant to indicate that the organization was becoming AI-native. What it actually measures is consumption, not value.

Goodhart's Law does the rest. Once "AI usage" becomes the target, teams optimize for usage directly. Prompts get longer because more context feels like deeper adoption. Tasks a person could finish in one step get routed through three model calls. Documents get auto-summarized whether or not anyone reads the summary. The usage dashboard turns green, the monthly bill climbs into territory nobody budgeted for, and finance starts asking why the AI line item doubled while nothing measurable got faster or better. The metric did exactly what it was asked to do; it simply never measured the thing that mattered.

The failure is a category error: token volume is an activity-and-cost signal wearing the costume of an outcome signal. A high token bill is equally consistent with a team genuinely transforming its workflows and a team generating impressive-looking busywork, and the metric cannot tell the two apart. Worse, it inverts the incentive. The most disciplined user, the one who writes a tight prompt that solves the problem in a single call, looks like a laggard next to the colleague burning ten times the tokens to produce the same result.

The fix is the one this article keeps returning to: measure the outcome, not the activity. The right AI-adoption KPIs are productivity and quality measures tied to specific workflows - tasks completed per unit of time, cycle-time reduction on a named process, error and rework rates, hours returned on work the team actually does - with token spend tracked separately as a cost to be managed, never as the proof of success. Adoption is not how many tokens you burned. It is how much better, faster, or cheaper the work became because of them. A team that cut its reporting cycle from two days to two hours has adopted AI, whether that took a million tokens or ten thousand. A team whose token bill tripled while its output looks the same has adopted a cost.

The banking scandals, the SaaS version, and the AI-adoption version all share a common shape:

Leadership chose a KPI that looked reasonable.
The KPI was tied to compensation and career progression.
Employees did what was rational under the incentive system.
The actions were destructive to customers, to the firm, and to the original business intent.

This is not a story about bad apples. It is a story about Goodhart's Law operating exactly as predicted. Harvard Business Review covered this dynamic in "Don't Let Metrics Undermine Your Business" (Harris and Tayler, 2019), arguing that the failure mode is not exotic - it is the default outcome whenever metrics become the management substitute for judgment.

CSAT Is Expensive Theater

Of all the KPIs commonly worshipped, the Customer Satisfaction Score (CSAT) is the one most likely to cost more than the insight it provides.

The standard CSAT survey asks customers to rate their satisfaction on a 5- or 7-point scale, usually after a service interaction or a purchase. The score is reported up the org chart, included in board decks, and tied to compensation in many companies.

The problems with CSAT begin with the survey itself.

Response rates are typically catastrophic. Industry benchmarks place B2C CSAT survey response rates at 5-15%. B2B is often lower. The 85-95% of customers who do not respond - whose opinions you literally have not measured - are the silent majority. The 5-15% who do respond are systematically biased: they tend to be either the unusually happy or the unusually unhappy. The middle - the customer who quietly used your product and quietly stopped using your product - never shows up in CSAT.

The cost is real. Enterprise CSAT programs typically run six figures annually when you account for survey platform subscriptions, integration engineering, analytics implementation, and the human time spent reviewing weekly score reports. For a 200-person company, that is real budget that could be spent on the product itself.

The insight is often wrong. Customers who answer CSAT surveys lie - not maliciously, but because survey psychology pushes them toward the response that ends the interaction fastest. A customer who clicks "Very Satisfied" may simply have wanted the popup to go away.

The metric is delayed. By the time CSAT scores tell you something is wrong, your churn has already happened.

The Deeper Problem: CSAT Misdirects Your Strategy

The mechanical critique above - cost, response rates, biased samples, delay - is real. But the worst thing CSAT does to a company is more interesting than any of those: it points the organization at the wrong problem.

CSAT is measured, almost by construction, at the point of support interaction. So the answer to "how do we improve our CSAT?" almost always becomes: improve the support function. Better scripts. Better knowledge transfer to agents. More headcount in the support team. A smarter chatbot. Deeper FAQs. The whole CSAT discussion runs through support, because that is where the metric lives.

That is the wrong place to optimize.

The real way to improve customer satisfaction is not to fix the support function. It is to reduce the need for support in the first place. And that path runs through the product, not through the support team:

A product clear enough that customers don't need to ask how to use it
Self-service that actually self-serves
Onboarding that explains the right things at the right moment, in the flow itself
Documentation customers actually read because it's well written and findable
A community or brand presence that creates loyalty and engagement before any ticket gets opened

When the product is built right, support becomes a small, sharp team handling genuinely special cases - and that is the only context where you can really invest in deep product expertise, personal service, and tight feedback loops back to the product team. A small support function is not a deficit. It is a signal that the product is doing its job.

CSAT pushes the organization in the opposite direction. Bigger support team. More tooling for support. More training for support. The metric makes the support function look like the problem. The product is the actual problem (or rather, the actual leverage point) - and CSAT will never tell you that, because CSAT does not measure the product. It measures the recovery from the product's failures.

This is Goodhart's Law again, but at the strategic level. The metric becomes the target, and the company invests in the metric. The underlying thing the metric was supposed to indicate - customer happiness - receives less investment than ever, because the investment flows into the support function instead of the product that would prevent the support interaction.

The behavioral alternatives below do not have this failure mode. Usage data tells you about the product. Support-ticket patterns tell you which product flows are failing. Online reviews tell you what customers actually love and hate. Churn tells you the truth in dollar terms. None of them push you toward "fix support." All of them push you toward "fix the product."

What's Cheaper and More Honest

Most of the information CSAT is trying to capture already exists - in cheaper, more accurate, more timely form - in data you already have:

1. Behavioral usage data. What customers do tells you what they actually feel - far more reliably than what they say. Daily active usage, feature adoption, session length, time-to-value on key flows: these signals are unfaked, ungameable, and effectively free to measure because your product already produces them. A 20% decline in usage over six weeks is a louder signal than a CSAT score, and it arrives months earlier.

2. Support ticket patterns. Volume, sentiment, resolution time, and recurring topics. A spike in tickets about a specific feature is a real-time customer satisfaction signal. Most companies already have ticketing tools; they just don't analyze the patterns systematically.

3. Online review crawling. Public reviews on Trustpilot, G2, App Store, Google Reviews, Reddit, and specialty forums. Customers who write reviews are self-selected, but the selection bias is different from CSAT selection bias - and the language is unconstrained. A simple sentiment-analysis pipeline applied to the last six months of reviews surfaces more honest customer truth than 12 months of CSAT averages.

4. Churn analysis. The most expensive form of customer feedback is the customer who silently leaves. Cohort retention curves, time-to-churn, and exit interviews (for customers who notify you of cancellation) give you ground truth that no survey can match.

5. Customer Effort Score (CES). Research published in Harvard Business Review by Matthew Dixon, Karen Freeman, and Nicholas Toman - "Stop Trying to Delight Your Customers" (2010) - argues that customer loyalty correlates more strongly with how easy a company is to deal with, measured by Customer Effort Score, than with how satisfied customers report being. CES asks one question: "How easy was it to get your issue resolved?" Cheaper, more predictive, less gameable.

6. Earned Growth Rate. Fred Reichheld - the creator of NPS - published Winning on Purpose in 2021, in part to address the criticism that NPS, like CSAT, is too easy to game. His new framework, Earned Growth Rate, is based on behavioral data: how much of your growth came from existing customers versus new acquisition. Behaviorally derived, hard to fake.

The pattern: behavioral data is cheaper, more honest, more timely, and harder to game than survey data. Almost every CSAT program at a growing company would be improved by replacing 80% of its budget with a serious effort to instrument and analyze behavior.

The Departmental Civil War

The most expensive failure mode of KPI culture is not a single bad metric. It is when multiple departments are each given metrics that look reasonable in isolation - but whose interaction creates an internal war.

I lived this. I spent ten years inside a digital bank where my team built the credit-and-risk infrastructure that processed over 500,000 loan applications. The Risk team I worked with had a clean, defensible KPI. The Product team next door also had a clean, defensible KPI. The two KPIs, in interaction, were producing one of the most expensive operational patterns I have ever watched a company normalize. It took us almost two years to name what was happening. Here is the textbook version of it, drawn from fintech and digital banking but transferable to almost any growing company:

The Risk department is measured on minimizing risk. Specifically: fraud rates, default rates, regulatory fines avoided. The KPI is rational. Risk teams exist to protect the company from losses. The lower the risk number, the better Risk has done its job.

The Product team is measured on successful onboarding. Specifically: conversion rate from signup to active customer. The KPI is rational. Product teams exist to grow the customer base. The higher the onboarding number, the better Product has done its job.

In isolation, both KPIs are defensible. Together, they create a civil war.

To minimize risk, the Risk team tightens onboarding criteria. More rejected applicants. More flagged accounts. More friction in the customer journey. The fraud rate goes down. Risk wins their KPI.

To maximize onboarding, the Product team adjusts the funnel: more aggressive acquisition spend, broader targeting, faster onboarding flow with less friction. More customers reach "active" status. Product wins their KPI.

What is happening underneath both teams' wins?

The customers Product is acquiring with aggressive ad spend are increasingly the customers Risk is going to reject downstream. Real CAC dollars are being spent acquiring users who never make it through. Product appears to be performing on its dashboard. Risk appears to be performing on its dashboard. The CFO sees CAC rising and post-Risk conversion falling, and cannot figure out why.

The cost of this misalignment is invisible until you instrument the right metric: the risk-adjusted onboarding rate. That is: of every dollar Product spends acquiring users, what fraction reaches a state that both Risk approves of AND Product counts as activated, weighted by the customer's eventual contribution to the business.

The KPI architecture that produces this kind of mess is depressingly common. It happens whenever leadership sets departmental metrics without modeling the system. It is not a bad-people problem. It is a bad-design problem.

Conway's Law of Measurement

In 1968, Melvin Conway observed that organizations design systems that mirror their communication structures. The corollary for KPI architecture: organizations build internal warfare when their measurement structures don't talk to each other.

Some additional standard cases:

Sales measured on closed revenue. Customer Success measured on retention. Sales closes deals with customers who shouldn't have bought. Retention drops. CS escalates to Sales. Both teams blame each other.
Engineering measured on velocity. Product measured on quality. Engineering ships fast and incomplete. Product files bugs. The backlog grows. The product gets worse.
Security measured on incidents avoided. Engineering measured on shipping speed. Security blocks releases. Engineering does shadow deploys. The next incident is worse than the one Security prevented.

In every case, the underlying KPIs are defensible. The system they create is not.

How to Tell If Your KPI Stack Is the Risk

A five-question diagnostic. If your honest answer to three or more is "yes," your measurement stack is itself a hidden risk:

Do any departments achieve their KPI primarily by making another department's KPI worse? (Conway's Law of Measurement violation.)
Is more than 15% of your engineering or operations time spent producing dashboards or maintaining instrumentation? (Measurement industrial complex.)
Are KPIs in your weekly leadership review unchanged from KPIs reviewed two years ago? (Failure to sunset metrics.)
Are any of your customer-satisfaction KPIs based on surveys with under 20% response rate? (CSAT-as-theater.)
Could a determined employee hit their KPI by doing something the company would consider unethical or harmful? (Goodhart's Law exposure.)

Each "yes" answer represents a class of invisible risk that is exactly the kind of thing risk assessments should surface - and rarely do, because measurement stacks are politically protected. The people who built the dashboards have careers tied to them.

Is the Movement Real? A Quick Statistical Filter

Before treating any KPI move as a signal, run it through three simple checks. Most organizations skip these and waste leadership attention on noise.

Sample size. A 5% drop in a metric with 200 daily users might be ten people on a bad day. The same drop in a metric with 200,000 daily users is a real signal. The rule of thumb: under ~30 observations per cohort, distrust the trend; under ~300 observations, distrust the magnitude. Above that, the math starts to behave.
Seasonality and known baselines. Conversion drops on Sundays. Retention drops in August. B2B usage drops over the December holidays. Before declaring an anomaly, compare against the same window the previous quarter and the same window the previous year. The first thing a competent analyst does, the dashboard never does on its own.
The envelope check. What is the metric's normal range over the trailing 14 to 30 days? If the latest value is within two standard deviations of the mean, it's noise. If it's outside, it's worth investigating. This is the cheapest filter in measurement and the one that prevents the most pointless meetings.

The leadership team that asks "is this move statistically real?" before asking "what caused the move?" wastes much less attention. The team that asks the second question first runs an organization permanently on tilt.

Before You Pick a KPI: The Data Foundation

Before any of the principles below can actually work, something underneath has to be true: the data your KPIs are built on has to be trustworthy. In most growing companies, it is not. And the cost of skipping this layer is that every metric you produce - however beautifully designed - is some unknown fraction fiction.

Six prerequisites have to be in place before any KPI conversation is worth having:

1. Data modeling that matches the business

Your data schema is the skeleton everything else stands on. Events captured at the wrong granularity (too aggregated, too noisy). Entity relationships that don't reflect how customers actually move through your product. Time-series tables that lose information about state changes. Slowly changing dimensions that aren't modeled, so historical comparisons silently lie. All of these silently corrupt every metric you build on top.

If "active user" is defined three different ways in three different tables - one based on login, one based on a feature event, one based on a session record - no KPI built on "active users" can mean what you think it means. The schema is the first place to look when metrics across teams refuse to reconcile.

2. Instrumentation that captures the right events at the right time

Most data quality problems are not analysis problems. They are collection problems. Events fired client-side that fail silently when ad-blockers or privacy tools run. Server-side events that get dropped during traffic spikes because the pipeline can't backfill. Critical user actions that nobody thought to track until eight months later. Schema changes that break historical comparisons. The KPI dashboard looks fine. The data underneath has holes you cannot see.

Investing in instrumentation - the unglamorous tracking layer - is one of the highest-leverage things a growing data team can do. It rarely is, because the work doesn't produce a dashboard. It just produces honest data.

3. A single source of truth (or an explicit map of which sources count)

When Marketing's CRM, Product's analytics platform, Finance's revenue system, and Customer Success's tool all show different numbers for "customers who churned last quarter," every KPI conversation devolves into "whose data is right?"

The fix is not always to consolidate to one tool. The fix is to have an explicit, written agreement about which system is authoritative for which metric - and to keep that agreement updated as the stack evolves. Without this, every meeting starts with reconciliation and ends without decisions. This is the same disease as the Departmental Civil War, but at the data layer instead of the KPI layer.

4. Metric definitions written down and shared

"Active user." "Churn." "Conversion." "Engagement." Every one of these terms means something specific to the team that uses it - and something different to every other team. A data dictionary is unglamorous work. It is also one of the highest-leverage things a growing company can do, because every meeting that begins with "wait, what do you mean by X?" is paid for in senior leadership time.

Most metric disputes are definition disputes wearing a measurement costume. Write the definitions down. Make them findable. Update them when they change. Reference them in every metric review.

5. Data ownership and quality monitoring

Someone needs to be on the hook when the data breaks. A pipeline that silently drops 10% of events for two weeks should not be discovered by accident in a board meeting. Modern data observability tools (Monte Carlo, Bigeye, Anomalo, or even simpler in-house freshness and completeness checks) make this cheap.

The absence of clear ownership is what makes data quality expensive, not the tooling. If you cannot answer "who owns this metric's data quality?" within two seconds, that metric is undefended.

6. Governance: how metrics are created, changed, and retired

Without governance, the metric stack accretes. New metrics get added by anyone who wants one. Old metrics never get removed because "someone might still use them." Definitions drift as people leave and new analysts reinvent them with slightly different rules. Six months later, the data team is maintaining 400 metrics, 60 of which contradict each other, and nobody knows which to trust.

Light governance prevents this without becoming bureaucratic: a simple intake process for new metrics (what's the question, who's the owner, what's the definition, what's the kill criterion), and a quarterly review where unused or contradictory metrics get retired. This is the meta-version of Principle 3 (sunset metrics regularly) - applied to the metric system itself, not just individual KPIs.

Where to actually start - one concrete move per prerequisite

Each prerequisite above can feel like its own quarterly project. It does not have to. One concrete starting move per layer, doable in a week each:

Data modeling: pick the single most-disputed metric in your company (the one that means different things to different teams) and write a one-page schema reconciliation: which events count, which entities are involved, which time grain. That document is the data model for that metric, made real.
Instrumentation: add a one-line freshness assertion to the pipeline of your top three metrics. If the metric does not update by N AM daily, the data team gets paged. This catches silent failures before the board meeting does.
Single source of truth: publish a one-page list of "this metric, this source." Nothing else. Five rows. Send it to the leadership team. Watch the arguments surface immediately - that's the work.
Definitions: build a "data dictionary" Google Doc with one row per metric the leadership team uses. Definition, owner, source. Iterate weekly. Done is not the point; the practice of writing them down is.
Ownership: assign a single named human to each critical metric's data quality. Put their name on the dashboard. Notify them when the pipeline alerts.
Governance: add one calendar event: a quarterly thirty-minute review where you ask "which metrics should we kill?" Nothing else. The discipline is in the cadence, not the elaboration.

Six moves. Six weeks. Most teams that try this report that just the act of doing the modeling step honestly surfaces three or four KPIs that were quietly meaningless.

Metric versioning: the discipline most data teams skip

Every metric has a version, whether you track it or not. The calculation that produced "monthly active users" in March 2024 is not the same calculation that produces "monthly active users" in March 2026. The product changed. The event taxonomy changed. The marketing team renamed a campaign. A junior analyst quietly fixed what they thought was a bug.

The discipline is to treat metric definitions like code. Each metric has a version, a changelog, and a defined behavior. When the definition changes, the version bumps. The chart shows v2 going forward, with a clear marker where the v1-to-v2 transition happened. Historical comparisons across the version boundary are flagged, not silently broken.

Without this, you get the worst kind of organizational lie: a chart that looks continuous but is actually three different metrics stitched together. Most leadership teams have at least one of these on their dashboard right now. The fix is semver applied to measurement: metric_name v1.0, metric_name v1.1, with the rule that any definition change bumps the version and is documented.

Ownership in detail - a RACI for KPIs

"Nobody owns the data layer" is the most common diagnosis in this work. The fix is to name the roles explicitly, even if one person plays multiple of them in a smaller company. The four roles that have to exist for any KPI:

Role	What they own	Common real title
Metric Owner	Why the metric exists, what decision it informs, when it should be retired	Functional VP or director (CRO, CTO, COO)
Data Steward	The metric's definition, version history, and changes	Analytics lead or senior analyst
Pipeline Owner	The technical pipeline that produces the metric; freshness and integrity	Data engineer or platform engineer
Governance Owner	The framework itself; intake of new metrics, sunset cadence, cross-metric coherence	COO, Head of Operations, or Chief of Staff

In a 30-person company, one person may play all four roles. In a 500-person company, they may be four different humans. What matters is that each role is explicitly assigned. A KPI without a Metric Owner is a candidate for sunset. A KPI without a Data Steward will silently drift. A KPI without a Pipeline Owner will break on a Tuesday morning. A KPI without a Governance Owner will multiply.

Sizing the framework: how this changes by company stage

The discipline above is the same at every size. The implementation is not. Here is what changes:

Under 20 employees: 3 to 5 KPIs total, one founder is all four owner-roles, the "data foundation" is a single Google Sheet with written definitions. No BI tool. The goal is to know what you are tracking and why, not to operationalize.
20 to 100 employees: 5 to 8 KPIs across the Three Pillars, separate Metric and Data Steward roles emerge (often the COO and an analyst), light data warehouse in place (BigQuery or Snowflake free tier), data dictionary as a Notion or Google Doc. Quarterly sunset reviews start here.
100 to 500 employees: 8 to 12 KPIs, distinct Pipeline Owner role (a data engineer), formal data warehouse with documented schema, eval suite for top metrics, dedicated weekly KPI review meeting. This is where Goodhart-trap risk increases sharply because compensation gets tied to KPIs.
500 to 2000 employees: 10 to 15 KPIs at the top level (departments have their own), real Analytics Engineering function, data observability tooling (Monte Carlo, Anomalo, Bigeye), formal metric versioning, Governance Owner role becomes a full-time responsibility for someone.
2000+ employees: measurement is its own function. Data Mesh patterns emerge. The KPI stack at this size requires a dedicated platform team. The framework still applies, but the cost of getting it wrong is now measured in tens of millions of dollars and years of strategic drift.

The single most common mistake: companies in the 100 to 500 band try to operate at 500 to 2000 stage tooling because they hired a Head of Data who insists on it. They get the dashboards and skip the discipline. Match the implementation to the stage.

The pattern across all six

None of this is technically hard. The blockers are organizational. Nobody owns the data layer. Nobody has time to write the metric definitions down. Nobody wants to be the person who tells the VP that their favorite KPI is built on broken data.

This is why most KPI projects fail before they start. Leadership picks beautiful KPIs. A BI developer is hired. Dashboards get built. Six months later, the dashboards exist, the meetings happen, and the decisions get made - on data that is some unknown fraction wrong. Goodhart's Law gets even worse here: the people whose careers depend on the metric have every incentive to ignore the fact that the metric is broken.

The discipline is to build the foundation before you build the floor. Get the data model right. Instrument the events. Write down the definitions. Assign ownership. Implement light governance. Only then start talking about which KPIs to optimize.

It is unsexy work. It is also the single highest-leverage investment most growing companies underweight - and it is exactly what an honest assessment will surface as the first thing to fix.

The Three Roles of KPIs: Outcome, Execution, Foundation

So far the article has treated KPIs as one category. In practice, healthy organizations operate three distinct categories of metric - Outcome, Execution, and Foundation. Each serves a different role. Each feeds the next. The failure mode in most companies is to overweight one category (usually Outcome) and underweight or skip entirely the others (usually Foundation).

The names themselves teach the hierarchy: the foundation holds the building up, execution is the construction work, the outcome is what stands at the end. The right order to invest in them is bottom-up. The order most companies actually invest in is top-down - and that gap is where most KPI architecture quietly fails.

Figure 1: The Three Pillars hierarchy. Foundation enables Execution enables Outcome. Most organizations invest top-down (Outcome first, Foundation never).

Outcome KPIs (also called Business KPIs)

What they measure: Growth and profitability. Revenue, gross margin, customer acquisition cost, lifetime value, retention, market share, fundraise milestones, runway. The metrics the board cares about.

What they're for: Telling you whether the business is winning. Lagging indicators of everything else.

Failure mode: Companies that measure only Outcome KPIs treat their organization as a black box. The numbers move - up or down - and nobody knows why. By the time the Outcome KPI shows a problem, the underlying cause has been happening for months.

Execution KPIs (also called Process KPIs)

What they measure: Effective resource management and quality of output. Cycle time. Defect rate. Onboarding completion rate. Time-to-resolution. Throughput per engineer. Cost per ticket. Pipeline velocity. Quality scores tied to deliverables.

What they're for: Telling you whether the work is being done well, with the resources you have, at the quality bar you set. Mid-stream indicators that show up weeks or months before Outcome KPIs respond.

Failure mode: Companies that measure only Execution KPIs become operationally efficient at producing the wrong thing. Or worse, they create the Departmental Civil Wars described earlier - because execution metrics in isolation reward local optimization at the cost of system performance. A team can hit every Execution KPI while the customer experience quietly collapses.

Foundation KPIs (also called Culture KPIs)

What they measure: The conditions that make innovation, sound decision-making, authenticity, professionalism, and personal growth possible. This is the category most companies skip entirely - either because they don't know how to measure it, or because they assume culture is what you do when there's spare time.

Behavioral examples (not survey-based):

Innovation velocity: Time from an internal idea being raised to a small experiment being run. Number of "we tried it, it didn't work, here's what we learned" reviews per quarter.
Decision quality signals: Percentage of strategic decisions with a documented rationale findable six months later. Frequency of cross-functional involvement in major decisions. Explicit retrospectives on past decisions that turned out wrong.
Authenticity and professionalism: Cadence of one-on-ones being held (not just scheduled). Average time between an issue being raised and being addressed in writing. Public disagreement frequency in leadership forums - a healthy team disagrees in the open.
Growth: Internal mobility rate. Time spent on learning per quarter. Percentage of senior roles filled from internal promotion.

What they're for: Leading indicators of everything else. Foundation metrics predict Execution metrics, which predict Outcome metrics - usually on a multi-quarter lag.

Failure mode: Companies that ignore Foundation KPIs cannot explain why their Execution and Outcome metrics are slowly degrading. The dashboard looks reasonable. The culture has quietly collapsed. By the time anyone notices, the senior talent is gone, decisions have stopped getting documented, and the org is running on tribal knowledge that walks out the door every Friday.

The hierarchy: Foundation enables Execution enables Outcome

The names themselves carry the argument. Foundation is what holds the building up. Execution is how the work happens day to day. Outcome is what shows on the scorecard at the end.

Most organizations invest in the opposite order: Outcome KPIs first (the board demands them), Execution KPIs second (operations needs them), Foundation KPIs not at all (nobody owns them).

The compounding logic runs the right way - bottom-up:

A team with weak Foundation (no documented decisions, no psychological safety, no innovation velocity) will struggle to produce stable Execution metrics.
A team with weak Execution metrics will produce Outcome results that are either lucky or unsustainable.
A team with all three layers healthy will compound. The bottom predicts the middle predicts the top, on multi-quarter lag.

The discipline is to measure across the hierarchy, not just at the top. Each layer enables the layer above. Each layer needs to be coupled to the others (back to Principle 1: never measure in isolation). A Foundation KPI that hurts an Execution KPI - or an Execution KPI that hurts an Outcome - is a coupling failure, not a measurement success.

The smallest meaningful KPI stack a growing company should run is one or two metrics in each layer, coupled so that movement in one is visible to the others. Six or eight metrics total. Not 47. The 47-metric dashboard is what happens when companies confuse "more measurement" with "better measurement" - and skip the Foundation layer entirely because they don't know how to measure it.

What the System Actually Looks Like, End to End

The Three Pillars frame answers what to measure. The next question every operator asks is how is this actually wired together? The honest answer: most companies already own 80% of the stack. They have the warehouse, the BI tool, the analytics platform, the CRM. What they do not have is the layer between the warehouse and the dashboard - the part that turns raw signal into governed metrics, narrates the swings in plain English, and retires what no longer matters. That layer is where the discipline lives, and it is what this section is about.

Reference Architecture - End to End

Four columns. Data flows left to right. The loop runs back. The discipline lives in column three.

01 · Sources

Where signal lives

Code & ticketsGitHub · Jira · Linear

Product analyticsMixpanel · Amplitude

CRM & salesSalesforce · HubSpot

FinanceNetSuite · QBO · Stripe

Customer feedbackSupport tickets · surveys

02 · Ingest + Store

One source of truth

ETL / ELT pipelinesFivetran · Airbyte · custom

Data warehouseSnowflake · BigQuery · Postgres

Semantic layerdbt · MetricFlow · Cube

Metric registrydefinitions · owners · versions

Pipeline observabilityMonte Carlo · custom checks

03 · Agent Layer

The discipline runs here

Change Detectioncommits · orgs · behavior shifts

KPI Definitiondrafts · versions · approvals

Data Pipelinefreshness · schema · alerts

Calculation + AnomalyLLM-narrated swings

Reporting + Lifecycleweekly brief · retire stale

04 · Endpoints

Where humans decide

Leadership weekly3 sentences per KPI · Slack/email

Live dashboardLooker · Metabase · Hex

Real-time alertsSlack · PagerDuty · webhooks

API / MCP endpointdownstream tools · copilots

Quarterly review packboard · investors · ops

SOURCES → INGEST + STORE → AGENT LAYER → ENDPOINTS ··· LOOP BACK →

Figure 2: A reference architecture for a healthy KPI system. Columns one, two, and four are commodity tooling most companies already own. Column three - the agent layer plus the metric registry - is where the discipline lives, and is exactly the column most companies do not have.

How to read this diagram

Column 1 - Sources. Where raw signal lives in the business: code commits and tickets (the system of work), product analytics (the system of usage), CRM (the system of revenue), finance (the system of money), and customer feedback (the system of voice). Most companies have all five. They are rarely connected.

Column 2 - Ingest and store. The pipes and the warehouse. ETL/ELT pulls from sources. The warehouse becomes the one place anyone is allowed to argue from. The semantic layer (dbt, MetricFlow, Cube) is where a "definition" stops being tribal knowledge and becomes a versioned, owned artifact. The metric registry is the most important single component in this column and is usually missing entirely: a flat table that lists every metric, its current definition, its owner, its version history, and its retirement date.

Column 3 - Agent layer. The discipline. Six small, narrow-purpose agents that run continuously and feed humans only when humans are needed. Stage 0 (Change Detection) watches code commits, org changes, and customer behavior shifts and flags when the world the KPI was designed for has changed. Stage 1 (KPI Definition) drafts and versions definitions; humans approve. Stage 2 (Data Pipeline) watches freshness and schema and alerts the owner the moment collection breaks. Stage 3 (Calculation + Anomaly) runs the math, detects swings against historical ranges, and uses an LLM to narrate why rather than just plot a chart. Stage 4 (Reporting) builds the leadership weekly - three sentences per KPI: where it is, what moved it, what to decide. Stage 5 (Lifecycle) runs the quarterly review and flags metrics no one acts on, definitions that have drifted, and KPIs that are now measuring a world that no longer exists.

Column 4 - Endpoints. Where humans actually consume the output. The leadership weekly is the highest-signal endpoint - a short Slack or email digest with three sentences per metric. The live dashboard becomes a forensic tool for after the alert, not the primary attention sink. Real-time alerts go where the owner already lives (Slack, PagerDuty). The API endpoint - increasingly an MCP endpoint - lets downstream tools and copilots query metrics with their governed definitions instead of inventing their own. The quarterly review pack feeds the board.

The loop runs back: yesterday's leadership weekly identified that retention is degrading; Stage 0 picks up a recent change in onboarding flow; Stage 1 drafts a new Onboarding-Quality Foundation KPI; humans approve; the pipeline now collects it; Stage 5 retires an unused metric to make room. This is what a healthy measurement system looks like when it is running, not when it is being explained.

How This Relates to OKRs and Other Goal-Setting Frameworks

A common question at this point: "We already run OKRs. Do I throw them out?"

No. The Three Pillars are a structural framing. OKRs are a quarterly-rhythm framing. The two compose.

OKRs answer two questions: "What are we trying to accomplish this quarter?" (the Objective) and "How will we know we did it?" (the Key Results). They are designed for ambition, alignment, and time-boxing. They are at their best when ambitious, slightly uncomfortable, and small in number.

The Three Pillars answer a different question: "Of the metrics we look at, which type is this, and how does it relate to the others?" They are designed for architectural health, not quarterly cadence.

The two work together like this:

Your Outcome KPIs are usually your headline OKR Key Results - the ones the board sees. Revenue growth, NRR, CAC payback.
Your Execution KPIs are often the supporting Key Results within an Objective - the operational measures that tell you whether the work that produces the Outcome is being done well.
Your Foundation KPIs are the ones that almost never appear in an OKR because OKRs are quarterly and Foundation moves on multi-quarter timescales. Innovation velocity, internal mobility, decision documentation coverage - you watch them across the year, not within a quarter.

The same principle applies to SMART goals, the Balanced Scorecard, and the North Star Metric. Each one is a framework for part of the measurement system. The Three Pillars frame the whole system. The combination is what gives you both the strategic ambition (OKRs) and the structural health (Three Pillars).

The mistake to avoid: treating the Three Pillars as a replacement for OKRs. The mistake to also avoid: treating OKRs as if they cover every type of metric. They don't, by design.

Principles for Healthier KPI Architecture

Five principles, drawn from the academic literature on incentive design and from the lived experience of building measurement inside regulated fintech:

1. Couple metrics across functions; never measure in isolation. If you give Risk a "minimize risk" KPI, give them a coupled "approve N qualified onboardings" KPI. If you give Product a "maximize onboarding" KPI, couple it with "of those onboardings, X% must pass risk review and Y% must be active at day 90." A KPI without a counter-KPI is a recipe for Goodhart's Law.

2. Prefer behavior over surveys, by default. Default to data the user produces by their own actions. Reserve surveys for questions behavior cannot answer (e.g., why did you leave?). Behavioral data is cheaper, more honest, more timely, and harder to game.

3. Sunset metrics on a regular cadence. A KPI that has been in the leadership dashboard for two years without changing is probably either fully solved (delete it) or unchangeable (also delete it). Standing metrics absorb attention from the changing ones that actually matter.

4. Always pair a "go" KPI with a "stop" KPI. Growth metrics (go) without quality or risk metrics (stop) produce Wells Fargo. Stop metrics without go metrics produce risk-averse paralysis. Pair them.

5. Measure the cost of measurement. Track how much engineering, ops, and senior-leadership time the measurement stack itself consumes. If the cost of measuring exceeds the marginal value of the metric, you have a measurement-stack problem - and the solution is fewer KPIs, not more.

6. Replace dashboard-watching with proactive alerting. The previous generation of measurement assumed a BI developer would build dashboards and humans would watch them in weekly review meetings. In practice: nobody watches the dashboard often enough, the data is stale by the time anyone reviews it, and the cost of paying senior people to stare at numbers is enormous.

The new model is proactive monitoring. Define the thresholds and patterns. Let the system alert you when something unusual happens - good or bad. A spike in churn? Alert. A sudden drop in onboarding completion? Alert. An unexpected surge in conversion from a specific source? Alert (worth investigating - you may have stumbled into a goldmine).

Your teams go back to doing what they were hired to do: building. The system does the watching. It is cheaper, faster, and works 24/7 without exhausting anyone. The dashboard becomes a forensic tool you visit after the alert, not a constant attention sink.

This shift is also philosophical: the goal was never to monitor. The goal was always to grow. Proactive alerts return your attention to the thing that actually matters - building the product, serving the customer, scaling the org - while the measurement layer takes care of itself. The tools to do this are now cheap and accessible: anomaly detection in modern analytics platforms (Datadog, Grafana, Sisense, ThoughtSpot), threshold alerts via Slack or email, AI-powered pattern detection. Most companies under 500 employees can stand up effective proactive monitoring without hiring a single BI developer.

Some Investments Don't Need a KPI

A thoughtful KPI architecture has one more property that does not get discussed enough: it knows what it is NOT measuring, and why.

Some of the most important things you can do for your organization have no short-term metric at all. They will not appear in a dashboard. They will not show up in a quarterly review. And they should still be invested in - because the long-term strategic logic is strong, even when the short-term measurement is absent.

The internal knowledge base

Documentation. Decision logs. Onboarding wikis. Architectural records. Lessons captured from past projects. There is no clean short-term KPI for "did we capture this knowledge well this week." The team writing it does not see a number move. The week-over-week dashboard is unchanged.

But the long-term return is enormous:

New hires ramp 2-3x faster when good internal documentation exists
Engineers write better code when prior architectural decisions are findable instead of re-debated
Customers self-serve better when the public documentation is genuinely good (which reduces support load - back to the CSAT argument)
Decisions improve when past reasoning is preserved - especially when past decisions turned out to be wrong. A documented mistake teaches the next decision-maker more than ten correct decisions with no recorded reasoning.
Bottlenecks dissolve when knowledge lives in the system, not in heads. The team stops queuing for the one person who knows. Dependency on specific roles - and on specific individuals inside those roles - drops sharply. The organization stops being held hostage by anyone walking out the door, going on vacation, or moving to a different team.

The strategic upside is bigger than the operational one. A well-built knowledge base lets people maintain alignment with the vision and the goals without anyone having to enforce it. When the reasoning behind a decision is documented and findable, the next person facing a similar decision can locate it, understand it, and make a consistent call - without needing a manager to re-explain, a leader to reapprove, or a cross-functional meeting to re-align. The organization gets passive alignment instead of active enforcement. That difference is the difference between a team that scales and a team that needs more managers every quarter to keep everyone pointed in the same direction.

This is also why the absence of a knowledge base is invisible until it isn't. The bottleneck looks fine while the key person is at their desk. The "alignment" looks fine while the founder is in every meeting. Then one of them goes on parental leave - or quits - and the silence in their absence is suddenly louder than anyone expected. The risk was always there. The metric just never caught it.

The case for investing in knowledge capture cannot be made in KPI terms. It must be made in strategic terms: "we are paying a cost today that we will recover many times over the next two years - in reduced bottlenecks, in autonomous decision-making, in alignment that holds without supervision - even though we cannot show you the line in next quarter's dashboard." The discipline is making the strategic case before you start, so that when the work goes uncelebrated week to week, it still gets done.

Human wellbeing

Time off. Mental-health support. Workload sustainability. Manager training. Genuine flexibility. There is no short-term KPI that captures "did we treat our people well this week." Productivity dashboards often show the opposite of what's actually happening - a stressed team can hit deliverables for months before the breakdown becomes visible.

But the long-term cost of not investing is enormous: burnout-driven attrition, eroded judgment under chronic stress, the slow corrosion of a culture you used to be proud of. The replacement cost of a senior employee is conservatively estimated at 1.5-2x their annual salary. The cost of a bad strategic decision made by an exhausted manager - working a 70-hour week with no margin - can be much higher than that.

These investments are not "metrics-free zones because the work doesn't matter." They are metrics-free zones because the work matters too much to be reduced to a single number. The investment case rests on strategic logic, not on a dashboard line moving.

The discipline of investing without measuring

A healthy KPI architecture has space for both: the things you measure carefully (using the six principles above), and the things you invest in without measuring, because the long-term logic justifies the cost.

The discipline is knowing which is which - and not letting "we can't measure it" become an excuse to not invest in it.

This is where most "metrics-driven cultures" actually fail. By demanding a number for every investment, they systematically underinvest in the things whose returns are real but not measurable in 90-day cycles. The result is a company that looks good on the quarterly dashboard and is quietly hollowing out underneath. The dashboard says everything is fine, right up until the senior engineer quits, the new hire takes eight months to ramp, and the latest critical incident reveals that no one documented the original architectural decision.

The fix is leadership courage: be willing to invest in things you cannot prove with numbers today. Make the strategic case explicit. Revisit it. Defend it when the metrics-driven voices in the room ask for ROI on Tuesday. The compounding return is real - it just doesn't fit into a chart.

The GTO Lesson: Even the "Right" Metric Can Mislead

A note from a different game.

In high-stakes poker, the dominant strategic framework is GTO - Game Theory Optimal. The premise is simple: there is a mathematically correct play at every decision point, and a player who consistently makes that play will, in the long run, be unbeatable. Top pros spend years studying solver outputs. The community optimizes for one metric: decision quality measured against the theoretical optimum.

Here is the uncomfortable fact: most pros studying GTO are still losing money.

The metric (GTO purity) has become disconnected from the goal (profitability). The game is not "play the mathematically correct move." The game is "make money." Those are correlated, not identical. The correlation breaks in ways the framework refuses to admit: wrong table choice, variance and bankroll mismanagement, emotional state, the fact that game selection matters more than action selection. A pro can be technically perfect in every hand and still go broke if they cannot read the table they are sitting at.

The lesson for KPI design is direct: even a well-chosen metric, optimized correctly, can fail to deliver the outcome it was supposed to indicate. If your company is doing "everything right" by the framework and still struggling, the framework is incomplete. The KPI is a proxy for success, and the proxy has drifted from what success actually requires.

This failure mode is uncomfortable because it has no villain. Goodhart's Law produces an obvious one - the metric was gamed, the people who hit it were rewarded, the company lost. The GTO failure mode is quieter: the metric was honored, nobody cheated, the decisions were correct on paper, and the business still underperformed. There is no one to blame. The whole framework was the problem.

Why GTO Is So Hard to Follow in the Real World

There is a deeper reason most GTO students lose money: GTO is almost impossible to execute correctly in a live game. The framework is beautiful on a solver screen at 3am. At a live table - chips moving, a stranger across from you, the dealer waiting - it collapses under conditions humans actually play in: cognitive load (GTO is a library of thousands of decisions, not a strategy), mixed strategies that require true randomization humans cannot generate, bet-sizing precision that exceeds chip denominations, multi-way pots solvers cannot fully solve, missing data at live tables that solvers assume exists, fatigue across an 8-12 hour session, and tilt when a bad beat hits. The framework assumes a player without limbic responses, perfect memory, and access to data that does not exist at the table.

The business parallel is direct, and harder than most leadership teams want to admit.

A well-designed KPI framework (Three Roles, Five Principles, properly coupled, sourced from a clean data foundation) is exactly as hard to execute in a real organization as GTO is at a live table. The cognitive load of remembering which metric is coupled to which, who owns each one, when to review, what to deprecate, how to act on the signal - it exceeds what tired managers can do well at hour seven of a long week. People get political. Strategies drift. The "framework" exists in a slide deck. The day-to-day decisions are made by tired humans reacting to the loudest signal in the room.

The answer is not "design a better framework." The answer is to design a framework simple enough that tired humans can actually use it, supported by data foundation and proactive alerting that lets the system do what the humans cannot:

Six metrics the team understands beat 47 nobody has time to interpret.
Alerts that fire when something changes beat dashboards nobody opens.
One owner per metric beats committee-by-default.
A written definition findable in two seconds beats a tribal one that drifts every quarter.

If your KPI architecture is theoretically beautiful but cannot survive contact with a tired human at 4pm on a Tuesday after a hard meeting, it is a solver output. Not a strategy.

The GTO Floor, the Exploitative Ceiling

This critique can read as a rejection of GTO. It is not. The most profitable poker players use GTO as a baseline, then deliberately deviate from it based on what they read at the specific table they are sitting at. The term is exploitative play: GTO guarantees you do not lose to other GTO players; exploitative deviations extract maximum value from the imperfect opponents you actually face. GTO is the floor. Exploitative play is the ceiling.

The business translation: your KPI architecture - Outcome / Execution / Foundation, coupled, governed - is your GTO baseline. It prevents you from being obviously wrong about what to measure. But the framework alone will not win the game. The ceiling comes from exploitative deviations: a competitor bleeding senior talent makes Foundation retention metrics disproportionately important; an incoming regulatory change makes your risk metrics need to be sharper than peers' before the regulator arrives; a new acquisition channel producing unusually high-LTV customers deserves its own metric even if it did not make the "official" list last quarter.

The discipline is the same as in poker: strong baseline, deliberate deviations. The framework keeps you from being stupid. The exploitative thinking makes you better than your competitors. A company that uses only the framework will be unexploitable but will not maximize value. A company that uses only exploitative thinking will catch local opportunities but lose to its own inconsistency. The skill is holding both at once.

This is the difference between best practices and best outcomes. Best practices give you a defensible floor. Best outcomes require knowing when, where, and why to depart from them. The companies that compound the longest respect the baseline and read their specific table.

A healthy organization treats KPIs the way a sophisticated poker player treats GTO: as a useful tool, not a destination. Use the framework. Watch the numbers. Deviate when the table tells you to. And always ask the prior question - are we playing the right game? If the numbers are healthy and the business is still losing, the answer is no. Step away from the dashboard. Look at the table.

KPI Architecture Is Risk Architecture

If there is one argument this article makes, it is this: in any growing company, the measurement stack is one of the largest sources of invisible risk - and almost no one is measuring it.

Goodhart's Law tells us that any KPI that gets attention will be gamed. Conway's Law tells us that isolated departmental metrics will produce isolated departmental wars. The Wells Fargo and PPI scandals tell us that the cost of getting this wrong, at scale, is measured in billions and decades. The CSAT literature tells us that some of our most-watched metrics are also our most-fictional.

The work of fixing this is not "more measurement." It is better measurement architecture. Fewer KPIs, more coupled. Fewer surveys, more behavior. Fewer dashboards, more conversations with the people doing the work. Fewer hero metrics, more system thinking.

This is the same shape of work as every other piece of organizational risk management: see what is invisible, manage the risk before it becomes expensive, and build systems that bend without breaking. In the new era, those systems don't just measure - they alert. The dashboard becomes the forensic record; the alert becomes the operating mode. Teams stop watching numbers and start moving them.

Most companies treat their KPI stack as a fixed feature of management orthodoxy. It is not. It is a designed artifact - and like every designed artifact, it can be redesigned.

It usually should be.

The Economics: What This Returns, What Skipping It Costs

The argument so far has been structural. The economic argument is worth making separately, because most CFOs and COOs reading this will want to know: if we invest in fixing the measurement stack, what does that produce - and what does it cost us if we do nothing?

The cost of skipping this

Wells Fargo lost over $3 billion in fines. UK banks lost over £30 billion on PPI. Those are the cinematic failure modes. They are useful as warnings but unhelpful as estimates, because most companies will never reach that scale of failure - the cost is paid in quieter, more continuous ways:

Misallocated CAC. A growth team optimizing the wrong Outcome KPI spends 6 to 18 months acquiring customers the business is not built to retain. For a Series B company with $5M in marketing spend, this typically wastes $1-2M before anyone notices.
Engineering attrition. Building features tied to a metric the team knows is gamed is one of the most common reasons senior engineers leave growth-stage companies. Replacement cost of a senior engineer is 1.5x to 2x annual salary. Each preventable departure tied to misaligned measurement is a $200K to $400K cost.
Strategic drift. A leadership team running on incoherent metrics makes incoherent decisions. Across two to three years, this compounds into a strategy that no longer fits the business. The cost is unmeasurable but real, and shows up as missed fundraises, product-market fit erosion, or acquisition multiples being lower than they should have been.
Time on tilt. Leadership time spent debating numbers that are not real signals (sample size too small, definitions unclear, no statistical filter) is the single most-wasted resource in mid-stage companies. A weekly two-hour metrics meeting attended by ten people, two-thirds of which produces no decision, is $300K to $500K of leadership salary per year burned on theater.

None of these failure modes are catastrophic on a single quarter. All of them compound. The bill from skipping this work is rarely visible as a line item; it is visible as a slope that bends the wrong direction over years.

What this work returns

The return on KPI architecture work is hard to quantify in a single number, because it is structural rather than operational. What I have seen in client engagements:

Faster, cheaper decisions. When metrics are honest and coupled, leadership debates collapse from "is this number even real?" (statistical) and "do we agree what it means?" (definitional) to "given that we agree, what do we do?" (strategic). The shift recovers 30 to 50 percent of the time leadership currently spends on measurement-adjacent disagreement.
Fewer rebuilds. Pipelines that get caught by an eval suite take hours to fix. Pipelines that quietly break and surface in board meetings take weeks of forensic work, leadership trust, and political capital. Catching breakage early is one of the highest-ROI engineering investments most companies underweight.
Reduced Goodhart exposure. The single most expensive failure mode (KPIs gamed by employees responding rationally to incentives) is structurally prevented by coupled counter-metrics. The expected value of "we did not become a Wells Fargo case study" is high even when the realized probability of avoiding it is hard to pin down.
Talent retention. Senior operators stay at companies where measurement makes sense. They leave companies where the dashboard is theater. The retention effect alone justifies most of the investment.

In rough numbers, for a Series B to Series D company: the work is typically a one-time investment of $40K to $150K in consulting plus 3 to 6 months of internal owner time. The annualized cost reduction in misallocated decisions, prevented attrition, and recovered leadership time typically runs three to ten times that, depending on company size and the depth of dysfunction.

The return that matters most, though, is not financial. It is the leadership team that walks into a Monday review meeting and trusts the numbers. That trust compounds in ways that resist easy quantification, and it is the strongest single argument for doing this work before the next scaling chapter rather than after.

Where to Start: A 30/60/90 Roadmap

If this essay surfaced uncomfortable questions about your own measurement stack, the natural next question is: where do you actually start? Here is a 90-day path that has worked in client engagements - including in companies that started in worse shape than they realized.

Days 1-30: Audit

The first month is about seeing clearly. Most teams have never inventoried their KPI stack as a system; they have just accumulated metrics over years.

Inventory every KPI on every dashboard the leadership team looks at. Include the formal KPIs and the informal "you know, the number we always check" metrics. You will find more than you expected.
Map each one to its pillar: Outcome / Execution / Foundation. You will find the distribution is heavily lopsided toward Outcome, sparse on Execution, and effectively zero on Foundation.
Run the five-question diagnostic from earlier in this article. Be honest. Three or more yes answers mean the measurement stack itself is the risk.
Identify the civil wars. Which Execution KPIs are owned by single teams without coupled counter-metrics? Which incentive structures pay people to optimize one number at the expense of another?
Document what is broken. Not what is missing - that comes later. What you are documenting now is the architecture you have, not the one you wish you had.

The deliverable at day 30 is a single document: the current state of your measurement architecture, with the risks named.

Days 31-60: Decouple, Couple, Sunset

The second month is about removing damage and starting the foundation.

Pick the three most damaging patterns from the day-30 audit. Probably one Goodhart trap, one departmental civil war, and one zombie KPI that has not moved in two years.
Sunset the zombies. A KPI that has not changed in two years is either solved or unchangeable; either way, delete. Free the attention.
Build coupled counter-metrics for the civil wars. If Risk is optimizing minimum fraud while Product is optimizing maximum onboarding, introduce a Risk-Adjusted Onboarding Rate that both teams share. The principle of pairing every "go" KPI with a "stop" KPI applies directly here.
Start the data foundation work. Pick two of the six prerequisites (data modeling, instrumentation, source of truth, definitions, ownership, governance) and assign owners. You are not finishing this in 30 days. You are starting it visibly so it has organizational momentum.

The deliverable at day 60 is: three damaging patterns retired or restructured, and a data foundation work-stream with an owner.

Days 61-90: Install the Discipline

The third month is about making sure the next KPI added to the system does not recreate the same problems.

Adopt the engineering discipline for any new KPI. PRD before you build. Backlog with prioritization. Sprint cycle with shadow mode, beta, and general availability. Eval suite that catches drift. Sunset policy with five trigger conditions. The full version of this discipline is in Ship KPIs Like Features, the companion piece on the engineering process.
Stand up an eval layer for the critical KPIs that remain. Definition drift. Pipeline integrity. Behavioral sanity. The kind of monitoring that catches a broken pipeline before the board meeting does.
Establish the quarterly sunset cadence. Calendar it. Without it, the stack grows monotonically.
Communicate the framework to the leadership team. Not as theory. As the operating discipline for how this company adds, measures, and retires metrics going forward.

The deliverable at day 90 is: a measurement architecture you trust, a discipline for adding to it, and an executive team that understands the framework as a shared operating system.

Why 90 Days

This is not an arbitrary timeline. It is the shortest window in which the architectural shift is visible enough to defend against political pressure to revert. Most KPI work gets undone in the first quarter after a leadership change, when the new arrival does not understand the architecture and starts adding their preferred metrics back. A 90-day visible installation, with documented rationale and shared discipline, survives the leadership turn.

It is also the longest you should wait. Audits that take longer than a month stall. Decoupling work that takes longer than a month gets political. Discipline that takes longer than a month to install never installs.

This Is the Work I Do

Operating Architecture is the discipline I practice - aligning the people, systems, and processes that actually run a company - and measurement architecture is one of its largest surfaces. I walk into companies whose dashboards have stopped serving them, run them through the diagnostic above, identify the Goodhart traps and the Conway's Law violations that are quietly eating growth, and help them rebuild the architecture from the data foundation up.

What clients get back is a measurement stack that drives decisions instead of producing theater. Fewer KPIs. More coupled. More honest about what they measure and what they cannot. A discipline for adding to the stack that survives the next leadership change.

If this essay surfaced uncomfortable questions about your own measurement architecture, those are exactly the questions worth a conversation. Here is how to start.

The 10x Rule of Organizational Change - why every risk caught late costs 10x more than the same risk caught early. The economics behind why KPI architecture matters now, not next quarter.
The Operator-Consultant Method - the four-phase methodology behind every Operating Architecture engagement.
Use Cases from the Field - real patterns I've caught in similar organizations.

About the author

May Mor

Operating Architect. I help operators align their people, systems, and processes so growth scales the business instead of breaking it - including the parts of the measurement stack that have quietly become the risk. M.Sc in AI, 10+ years inside regulated fintech (built credit-and-risk infrastructure at a digital bank, lived this Risk-vs-Product civil war from the inside). Full bio →

If reading this surfaced uncomfortable questions about your own KPI stack, here's how to get started:

Scale Readiness Assessment From €5,000 / 6 weeks - includes KPI architecture review

Book a 30-min intro call Talk through your specific KPI situation

Take the free Risk Scan 5 minutes - surface your hidden risk patterns

The KPI Trap: When Your Measurement Stack Becomes the Risk

The Measurement Industrial Complex

When the Metric Eats the Mission (Goodhart's Law in Action)

The Wells Fargo $3 Billion Lesson

The UK PPI Scandal

The Modern SaaS Version

The AI-Era Version: Token Spend as a Vanity Metric

CSAT Is Expensive Theater

The Deeper Problem: CSAT Misdirects Your Strategy

What's Cheaper and More Honest

The Departmental Civil War

Conway's Law of Measurement

How to Tell If Your KPI Stack Is the Risk

Is the Movement Real? A Quick Statistical Filter

Before You Pick a KPI: The Data Foundation

1. Data modeling that matches the business

2. Instrumentation that captures the right events at the right time

3. A single source of truth (or an explicit map of which sources count)

4. Metric definitions written down and shared

5. Data ownership and quality monitoring

6. Governance: how metrics are created, changed, and retired

Where to actually start - one concrete move per prerequisite

Metric versioning: the discipline most data teams skip

Ownership in detail - a RACI for KPIs

Sizing the framework: how this changes by company stage

The pattern across all six

The Three Roles of KPIs: Outcome, Execution, Foundation

Outcome KPIs (also called Business KPIs)

Execution KPIs (also called Process KPIs)

Foundation KPIs (also called Culture KPIs)

The hierarchy: Foundation enables Execution enables Outcome

What the System Actually Looks Like, End to End

How to read this diagram

How This Relates to OKRs and Other Goal-Setting Frameworks

Principles for Healthier KPI Architecture

Some Investments Don't Need a KPI

The internal knowledge base

Human wellbeing

The discipline of investing without measuring

The GTO Lesson: Even the "Right" Metric Can Mislead

Why GTO Is So Hard to Follow in the Real World

The GTO Floor, the Exploitative Ceiling

KPI Architecture Is Risk Architecture

The Economics: What This Returns, What Skipping It Costs

The cost of skipping this

What this work returns

Where to Start: A 30/60/90 Roadmap

Days 1-30: Audit

Days 31-60: Decouple, Couple, Sunset

Days 61-90: Install the Discipline

Why 90 Days

This Is the Work I Do

Related Reading

May Mor