Why 80% of AI Projects Fail & How Staff+ Engineers Can Save Them (HBR

TLDR: Why Your AI Thing Will Probably Fail (And How Staff+ Can Stop It)

AI Fails (A Lot): ~80% of AI projects don't die from bad code, but from a thousand cuts at the margins trust, relevance, and understanding the probabilistic weirdness.

Staff+ Are The Fixers (Not Just Coders): Your job is less about debugging models, more about debugging expectations. Think system-level therapist for AI.

Key Battlefronts:

AI isn't IT: Manage probabilistic outcomes, not deterministic specs.
No Trust = No Users: Build it in from day zero; you can't bolt it on.
Impact > Feasibility: Solve real, painful problems, not just chase shiny tech.
Learn, Don't Just Ship: Test hypotheses, not just massive features.
Audit Like You Mean It: AI remaps ecosystems; monitor the unintended.

The Real Work: Slower, harder decisions beat viral hype. This HBR podcast nails why.

AI won. Sort of.

AI steamrolled the roadmap, bypassed the process, and slipped through the org chart like a smug ghost. A thousand breathless demos and a mountain of GPUs later, we’re drowning in tools that hallucinate "facts" with unwavering confidence, generate code of… variable utility, and promise a planetary transformation before lunchtime.

But here’s the dirty little secret whispered in stand-ups and buried in post-mortems—the part no one wants on their OKRs: Most AI projects—around 80%, according to some analyses—still spectacularly flame out.

Not because the core math was wrong. Not because the cloud provider sent a bill that could bankrupt a small nation. Not even, usually, because the underlying model wasn’t clever enough.

They die a death of a thousand papercuts. In the yawning chasm between the dazzling hype and the dreary reality of actual usage. Between the triumphant launch announcement and the deafening silence of non-adoption. Between what leadership dreamed the AI would do, and what users, in their infinite wisdom, actually trusted it with (or didn't).

For Staff Engineers, Principal Developers, and technical leads, this isn't news. It's Tuesday. You're living in that chasm. You’re the one tasked with bridging the unbridgeable—connecting executive ambition with engineering alignment, abstract theory with hardened production systems, faint signals with overwhelming operational noise. You become the involuntary friction interpreter, the system-level therapist, and the person who has to gently explain why the "revolutionary AI feature" is currently gathering dust in the digital equivalent of a feature graveyard.

I recently absorbed an HBR On Strategy episode—"The Right Way to Launch an AI Initiative," featuring Iavor Bojinov[^2] (Harvard prof, ex-LinkedIn AI leader)[^1]. It’s a rare gem: no magic tricks, no silver bullets, just a candid walk through the messy, often counter-intuitive reality of making AI work in the real world. (The HBR podcast is the primary source for the discussion that follows).

Here’s what stuck, and why anyone in a Staff+ role should consider it required listening.

Podcast: HBR On Strategy – "The Right Way to Launch an AI Initiative," Iavor Bojinov interviewed by Curt Nickisch (Originally HBR IdeaCast, 2023). A key episode is "Making Sure Your AI Initiative Pays Off" (HBR IdeaCast Episode 913, May 16, 2023). Searchable on HBR.org and major podcast platforms.
Guest: Iavor Bojinov, Harvard Business School professor, former LinkedIn AI leader.
Core Focus: Best practices for ensuring AI project success by navigating common pitfalls related to AI's probabilistic nature, user trust, project selection, experimentation, and real-world impact.
Key Failure Points Discussed: No value add, low accuracy, bias/unfairness, lack of user trust/adoption.
Recommended For: Staff+ engineers, PMs, tech leads grappling with the strategy, development, or scaling of AI features.

AI Isn’t Your IT Department’s Pet Project — It’s a Wild, Probabilistic Beast

"The model didn’t fail. It just… had a different opinion this time."

Unlike traditional IT projects with predictable outputs, generative AI lives in ambiguity. Same input, different output. That’s not a bug — it’s the deal.

With AI, you're not just in a different room from that contract; you're in a different dimension where the laws of physics are suggestions. Its probabilistic nature means the same prompt, the same input data, can yield wildly different outputs.[^4] The initial quality is often a shrug emoji. This isn't a bug; it's a core feature of the uncertainty you've signed up for.

I’ve seen it: a developer meticulously inputs a prompt into a generative model. Result: garbage. They spot a single, almost imperceptible typo. Fix it. Rerun. Entirely different universe of an answer. Better, maybe. But not incrementally so. The whole premise shifted.

As a Staff+ engineer, your primary job isn't to debug the AI's output; it's to debug your organisation's expectations. You need to become a translator, explaining to leadership, to product, to your own teams, that AI is less like a predictable software component and more like an ongoing, occasionally surreal, negotiation with probability. This means architecting for drift, designing for variance, and building operational muscle around managing inherent weirdness.

You can't spec "consistently delightful vibes." But you can design systems that monitor for trust thresholds and output quality. This requires more intensive upfront design for observability and instrumentation, not less. It means fostering a culture that relentlessly asks, "Why on earth did it do THAT?"—and being unnervingly comfortable when the honest answer is, "We have a strong hypothesis and some data, but we don't fully, deterministically know. Here’s how we’re tracking it."

The Algorithm Was Flawless. The Users Ghosted It.

Bojinov recounts a classic from his LinkedIn tenure: his team built a technically sophisticated AI product for data analysis, slashing processing time from weeks to days. A home run, technologically speaking. The result? Crickets. Minimal adoption.

Why? Not because the tool was faulty. Because users didn’t trust it. They hadn't been part of the journey. The "how" was a black box. So, the output, however miraculous, felt alien, suspect. And when users don't trust the black box, they'll revert to their trusty, rusty, but understandable spreadsheets every single time. Bojinov’s insight: "If you build it, they will not [necessarily] come."

You can’t A/B test your way to trust after launch. It’s not a feature you can "bolt-on." Trust has to be woven into the fabric from the very first thread. It’s not just about algorithmic fairness and transparency (though those are table stakes). It's also, crucially, about users trusting the developers and the intent—believing the AI was designed to solve their actual problems, with their input valued.

For Staff+ engineers, this is about more than just elegant architecture. It’s about socio-technical system design. Orchestrating a process where users are involved early and continuously, not as passive test subjects, but as active co-creators in the solution. It's about championing transparency, even when it means admitting, "This thing is guessing, albeit very intelligently. Here are its limitations."

Don’t Start With Tech: Start With Impact

AI projects often ignite from a spark of technical feasibility: "Can we fine-tune this new foundation model? Can our infra even handle this?" According to Bojinov, this is ass-backwards.

He urges us to flip the lens: start with Impact. Does this project align with genuine, critical company strategy? If this AI performs its magic flawlessly, will it solve a problem so painful, or unlock an opportunity so significant, that it actually matters to the business or the users? Too many data science teams, Bojinov notes, get seduced by the siren song of the "latest and best" tech, prioritizing novelty over tangible business value. Most organizations, especially those newer to AI, don't need bleeding-edge models to see significant returns.

A crucial Staff+ role here is to be the impact-realist and ethical gatekeeper. Before a single line of PoC code is written, you should be the one asking:

"Does this earn its complexity and the inherent risks of being probabilistic?"
"What are the ethical implications—privacy, fairness, transparency—and have we addressed these before starting, not as an afterthought?" Responsible AI isn't a "bolt-on" fix; trying to address it mid-project is a recipe for costly restarts or, worse, shipping harm.[^5]

Sometimes, the most valuable contribution a Staff+ engineer can make is to build the case for not doing the shiny AI project, and instead, redirecting that energy to a "boring" problem with real, validated user friction. That's leadership.

Minimum Viable Learning > Minimum Viable Product: Hypothesis-Driven Experimentation

The cautionary tale of Etsy's "infinite scroll" is a masterclass. They embarked on a significant UI re-architecture, months of work, to implement it. The user reaction? A collective shrug. Zero discernible impact.

Why? Because, as Bojinov highlights, their single "infinite scroll" experiment was actually a bundle of unverified assumptions:

More results hypothesis: Do users buy more if they see more products per page?
Faster results hypothesis: Do users hate pagination delays, and will quicker access to more items drive engagement?

Etsy could have tested these far more cheaply. For "more results," simply change a display parameter. For "faster results" (or the impact of delay), they could have artificially slowed down loading for a segment. Their follow-up simpler experiments showed these individual hypotheses didn't hold as strongly as assumed for their unique marketplace.

The lesson for Staff+ is stark: champion an architecture and culture of Minimum Viable Learning. It’s not just about shipping an MVP; it's about designing the smallest possible experiment to validate (or invalidate) the core hypothesis underpinning a feature. If your experimentation framework requires a papal bull and a committee of VPs, you're already failing. Your role is to advocate for and help build systems, guardrails, and processes that empower teams to test hypotheses rapidly and safely. Define what "safe to fail" means. Teach your teams to treat experiments as diagnostics, not referendums on their worth.

The Algorithm "Worked." Then It Remapped the World. (Unintended Consequences)

LinkedIn's "People You May Know" (PYMK) algorithm was built to do one thing: increase successful connection requests. Clear metric. Achievable goal. An audit a year later revealed something astonishing: the algorithm was profoundly impacting what jobs people were getting. By subtly shifting the proportion of "weak ties" (arm's length connections, as per Granovetter's theory)[^6] it suggested, PYMK was inadvertently boosting users' access to novel information and job opportunities. The AI didn't "know" it was doing this. But it was.

This is the ghost in the machine of deployed AI: ecosystem effects and unintended consequences. AI doesn't operate in a vacuum. It interacts with the entire company, its users, and sometimes society, in ways you cannot fully predict in a test environment. Most products, Bojinov warns, initially have a neutral or negative impact on the very metrics they aim to improve until these interactions are understood and tuned.

For Staff+ engineers, this means auditing and monitoring aren't afterthoughts; they are continuous, architectural responsibilities. Treat it like system ownership, not post-mortem archaeology. From day one, ask: "What second-order effects could this AI ripple outwards in six months? Whose metrics will it silently inflate or decimate? How will we even know?" Design for traceability. Instrument for long-term impact. Give your future self a map of "normal" before the AI subtly redrew it. LinkedIn learned from this, incorporating long-term job-related metrics into PYMK's monitoring—a testament to the power of proactive auditing.

The Parting Shot: It Fails at the Margins

Most AI initiatives don’t collapse because the core technology was fatally flawed. They bleed out at the margins.

The margin of user trust. The margin of strategic relevance. The margin of organisational understanding of its probabilistic nature. The margin of ethical foresight.

Bojinov's insights aren't about quick fixes.[^7] They’re about acknowledging that AI projects are, as he concludes, significantly harder than most other endeavors a company undertakes. But the potential payoff is equally tremendous. It's not hopeless. It just requires a different kind of rigor—recognizing the distinct stages, and, as leaders and senior engineers, championing and building the infrastructure (technical, process, cultural) to navigate each stage. This is how you reduce failure, increase adoption, and actually create value.

No secret algorithm. No viral hack. Just better, harder decisions, made with more deliberation and humility. Sometimes, seeing the messy system clearly, with all its flaws and pressures, and still choosing to build carefully, responsibly, and humanely within it, is the most radical—and most valuable—engineering work you can do.

Primary Source: The insights in this post are primarily drawn from Iavor Bojinov's discussion on the HBR On Strategy podcast (originally HBR IdeaCast, "Making Sure Your AI Initiative Pays Off," Episode 913, May 16, 2023).

[^1]: The podcast episode referenced is "The Right Way to Launch an AI Initiative" from the HBR On Strategy podcast series, featuring Iavor Bojinov. Your notes indicate this was originally an HBR IdeaCast episode from 2023. A relevant HBR IdeaCast episode with Iavor Bojinov discussing these themes is: "Making Sure Your AI Initiative Pays Off" (HBR IdeaCast Episode 913, May 16, 2023)”. You can typically find this episode and other HBR podcasts on major podcast platforms or by searching the episode title or number on HBR.org.

[^2]: Iavor Bojinov's faculty profile at Harvard Business School: https://www.hbs.edu/faculty/Pages/profile.aspx?facId=1199332

[^3]: For more on non-deterministic AI outputs, see Statsig, "What Are Non-Deterministic AI Outputs?": https://www.statsig.com/perspectives/what-are-non-deterministic-ai-outputs-

[^4]: Understanding AI-generated output variability is further explored by Wizard AI: https://wizard-ai.com/understanding-ai-generated-output-variability/

[^5]: Building a scalable and adaptable AI governance program is detailed by OCEG: https://www.oceg.org/building-a-scalable-and-adaptable-ai-governance-program/

[^6]: A study on the employment value of weak ties by MIT IDE: https://ide.mit.edu/insights/new-study-proves-that-weak-ties-have-strong-employment-value/

[^7]: Oliver Wight EAME discusses essential questions leaders must ask before engaging with AI: https://oliverwight-eame.com/news/seven-essential-questions-leaders-must-ask-before-engaging-with-ai

Why 80% of AI Projects Fail & How Staff+ Engineers Can Save Them (HBR Strategy)

Table of contents