{"id":34,"date":"2026-04-08T03:22:28","date_gmt":"2026-04-08T03:22:28","guid":{"rendered":"https:\/\/llama.gs\/blog\/?p=34"},"modified":"2026-04-08T03:22:28","modified_gmt":"2026-04-08T03:22:28","slug":"whats-actually-broken","status":"publish","type":"post","link":"https:\/\/llama.gs\/blog\/index.php\/2026\/04\/08\/whats-actually-broken\/","title":{"rendered":"What&#8217;s Actually Broken"},"content":{"rendered":"<p>Amazon&#8217;s weekly operations meeting in March <a href=\"https:\/\/www.theregister.com\/2026\/03\/10\/amazon_ai_coding_outages\/\">reportedly focused<\/a> on a \u201ctrend of incidents\u201d characterised by \u201chigh blast radius\u201d and \u201cGen-AI assisted changes.\u201d The Financial Times, which saw the briefing note, reported that AI-generated code had been implicated in a series of outages \u2014 including one that took down Amazon\u2019s entire e-commerce website for several hours. Amazon\u2019s response was to deny the problem existed, which is the corporate equivalent of the AI itself: confidently wrong and hoping nobody checks. James Gosling, the creator of Java, who left AWS in 2024, was less diplomatic. He observed that the company\u2019s AI-driven restructuring had \u201cdemolished\u201d the teams responsible for infrastructure stability, and that the ROI analysis behind the decision was, in his words, \u201cdisastrously shortsighted.\u201d One does not need a diagnostic engine to identify the fault here. A company replaced the engineers who understood its systems with a technology that does not, and the systems fell over. The circuit breaker that the AI removed \u2014 the one it classified as \u201credundant\u201d \u2014 had been added after a previous outage. The AI could not distinguish a safety mechanism from dead code, because it had no model of the system. It had statistics. Statistics told it the breaker rarely fired. A model would have told it <em>why<\/em>.<\/p>\n<p>This is the difference between machine learning and model-based reasoning, and it is the difference that this post \u2014 and the toolchain I am releasing today \u2014 is about.<\/p>\n<h2>An Unexpected Reception<\/h2>\n<p><a href=\"https:\/\/llama.gs\/blog\/index.php\/2026\/04\/06\/diagnosing-circuits-into-existence\/\">Yesterday\u2019s post<\/a> announcing <a href=\"https:\/\/gitlab.llama.gs\/logic\/qbf-designer\">qbf-designer<\/a>, a tool for exact digital circuit synthesis via Quantified Boolean Formula solving, generated rather more attention than I had anticipated. Twenty-two thousand LinkedIn impressions, a hundred-odd reactions, and five hundred profile views in twenty-four hours, for a post about problems at the second level of the polynomial hierarchy and FPGA technology mapping. One concludes that there is an audience for work that produces correct answers, even \u2014 or perhaps especially \u2014 in an era when the prevailing technology cannot reliably tell you which end of a circuit is up.<\/p>\n<h2>Dusting Off the Arsenal<\/h2>\n<p>To continue with my plans for commercialising formal methods for EDA through Llama Logic Corporation, I have to excavate, modernise, and release the full inventory of tools and concepts I have built over nearly two decades. There are many reusable components in this stack \u2014 logic representations, solver bindings, encoding schemes, diagnostic algorithms \u2014 and they need to be cleaned up, documented, and made available. The qbf-designer release was the first. Today\u2019s is the second.<\/p>\n<p>Today I am releasing <a href=\"https:\/\/gitlab.llama.gs\/constraints\/lydia\">LyDiA<\/a>, a language and toolchain for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Diagnosis_(artificial_intelligence)\">Model-Based Diagnosis<\/a>. LyDiA was the core of my doctoral research at <a href=\"https:\/\/en.wikipedia.org\/wiki\/Delft_University_of_Technology\">Delft University of Technology<\/a>. I will not be using LyDiA itself going forward \u2014 the modern <a href=\"https:\/\/gitlab.llama.gs\/logic\/llogic\">llogic<\/a> packages have fixed all of its imprecise notions and provide a cleaner foundation for everything I am building \u2014 but LyDiA was where it all started. It was my first serious work on the diagnosis of circuits, and it contains ideas and algorithms that remain relevant. It deserves to be available.<\/p>\n<h2>Model-Based Diagnosis in 15 Seconds<\/h2>\n<div style=\"width: 1200px;\" class=\"wp-video\"><video class=\"wp-video-shortcode\" id=\"video-34-1\" width=\"1200\" height=\"600\" loop autoplay preload=\"metadata\" controls=\"controls\"><source type=\"video\/mp4\" src=\"https:\/\/llama.gs\/blog\/wp-content\/uploads\/2026\/04\/lydia_demo.mp4?_=1\" \/><a href=\"https:\/\/llama.gs\/blog\/wp-content\/uploads\/2026\/04\/lydia_demo.mp4\">https:\/\/llama.gs\/blog\/wp-content\/uploads\/2026\/04\/lydia_demo.mp4<\/a><\/video><\/div>\n<p>The demo takes two inputs. The <strong>model<\/strong> (<a href=\"https:\/\/gitlab.llama.gs\/constraints\/lydia\/-\/blob\/main\/examples\/diagnosis\/2adder-weak.sys\"><code>2adder-weak.sys<\/code><\/a>) describes a two-bit <a href=\"https:\/\/en.wikipedia.org\/wiki\/Adder_(electronics)#Full_adder\">full adder<\/a> \u2014 a hierarchical composition of half-adders built from XOR and AND gates. Every gate has a Boolean health variable: true means the gate works correctly, false means it is faulty and its output is unconstrained. We do not specify <em>how<\/em> a gate fails, only that its output can no longer be trusted. This is called a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Diagnosis_(artificial_intelligence)#Definitions\">weak fault model<\/a>.<\/p>\n<p>The <strong>observation<\/strong> (<a href=\"https:\/\/gitlab.llama.gs\/constraints\/lydia\/-\/blob\/main\/examples\/diagnosis\/2adder.obs\"><code>2adder.obs<\/code><\/a>) records what actually happened: specific values on the inputs and outputs of the circuit that are inconsistent with correct behaviour. Something is broken. We do not know what. The <code>diag<\/code> command hands both files to the GOTCHA engine \u2014 which computes <em>all<\/em> minimal sets of component failures that explain the discrepancy. Not one guess. Not the most likely answer. Every combination of gate failures that is logically consistent with the model and the observation, with no redundancy.<\/p>\n<p>The <code>fm<\/code> command lists the results: six double-fault diagnoses, each a minimal set of gates whose simultaneous failure is sufficient to produce the observed misbehaviour. For example, <code>d4 = { !FA.HA1.X.h, !FA.O.h }<\/code> means the XOR gate in the first half-adder and the OR gate are both broken. There is no single-fault explanation \u2014 at least two gates must be faulty, and the engine has proven this by exhaustive enumeration.<\/p>\n<h3>Why Circuits?<\/h3>\n<p>Writing software to diagnose a fabricated IC does not make practical sense. You would use <a href=\"https:\/\/en.wikipedia.org\/wiki\/Automatic_test_pattern_generation\">ATPG<\/a> and scan chains for that. We use digital circuits as benchmarks because they have the properties that matter for diagnosis research: compositional structure, many components, well-defined fault models, and known-correct reference behaviour. These are the same properties that make diagnosis hard in complex engineered systems generally. This is why the <a href=\"https:\/\/en.wikipedia.org\/wiki\/ISCAS\">ISCAS-85<\/a> suite has been the standard MBD benchmark for thirty years.<\/p>\n<p>Where diagnosis <em>does<\/em> apply directly in EDA is design verification. Suppose an engineer places a NAND gate instead of an AND gate for the carry computation in the adder above. The circuit passes some tests but fails on specific input vectors. The diagnostic engine, given the intended specification and the observed misbehaviour, will isolate the carry gate as the faulty component \u2014 even if the designer has never seen this particular mistake before, even if there are multiple simultaneous design errors. It reasons from the structure of the circuit, not from a database of past bugs.<\/p>\n<h2>The Modelling Problem<\/h2>\n<p>During my early attempts at commercialisation, I encountered a pattern that I suspect anyone in formal methods has seen. People looked at LyDiA diagnosing circuits and said: \u201cWonderful. Can it diagnose my HVAC system? My chemical plant? My supply chain?\u201d And so they tried to model non-circuits as circuits, and things did not work, because the difficulty of modelling is the hard part.<\/p>\n<p>Circuit diagnosis is tractable in part because digital circuits have a natural, compositional, Boolean structure. An AND gate is an AND gate. An HVAC system is a tangle of continuous dynamics, feedback loops, thermal gradients, and human behaviour. Cramming that into a Boolean framework requires heroic abstraction, and the resulting models are either too coarse to be useful or too large to be solvable. The aerospace fuel system model included in LyDiA \u2014 with its typed fault modes for leaking tanks, stuck sensors, and degraded pumps \u2014 hints at what multi-valued modelling can achieve, but it remains a toy compared to the real thing.<\/p>\n<p>That said, LyDiA was never only about circuits. The distribution includes models of the N-queens problem, map colouring, Sudoku, and SEND+MORE=MONEY \u2014 general constraint satisfaction problems expressed in the same language. The diagnostic framework is, at its core, a constraint solver with a notion of health variables. This generality is both its strength and its curse: it can express anything, but making it <em>useful<\/em> for a specific domain requires domain expertise that no tool can substitute.<\/p>\n<h2>What LyDiA Got Wrong: Probability<\/h2>\n<p>LyDiA assigns fault probabilities to components \u2014 each gate gets a prior like 0.99 healthy, 0.01 faulty \u2014 but the probabilistic reasoning was never worked out correctly. The probabilities were treated as independent priors, multiplied together to rank diagnoses, with no rigorous account of how observations update beliefs or how correlations between faults propagate through the system.<\/p>\n<p>The correct formulation turns out to be a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sharp-P\">#P<\/a> problem \u2014 a counting problem. To compute the exact posterior probability of a diagnosis, you need to count the satisfying assignments of the diagnostic formula: how many ways can the internal signals of the circuit be assigned such that the model, the observation, and a given fault assumption are all consistent? The probability of a diagnosis is the ratio of its satisfying assignment count to the total. This is model counting, and it is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Sharp-P-complete\">#P-complete<\/a> \u2014 harder than NP.<\/p>\n<p>One consequence is that all diagnostic probabilities are rationals. They are ratios of integers \u2014 counts of discrete satisfying assignments. This has some puzzling implications for the relationship between fault probability and physical failure rates that I have not yet fully worked out.<\/p>\n<p>There is also a quantum angle. Faults are inherently stochastic \u2014 a gate either works or it does not, and before you test it, the fault state is indeterminate in precisely the sense that a qubit is indeterminate before measurement. I showed in earlier work that placing health qubits in superposition and propagating them through a quantum circuit that mirrors the classical circuit under diagnosis computes the full probability distribution over all diagnoses simultaneously. This connects to <a href=\"https:\/\/en.wikipedia.org\/wiki\/John_von_Neumann\">von Neumann\u2019s<\/a> foundational work on the relationship between logic and probability. The practical implication is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Grover%27s_algorithm\">Grover\u2019s algorithm<\/a>: a quadratic speedup for searching the diagnostic space. I need to finish this work and implement a proper Grover-based diagnostic engine. It is on the list.<\/p>\n<h2>Why Machine Learning Cannot Do This<\/h2>\n<p>In February, a company called Algorhythm Holdings \u2014 formerly a manufacturer of karaoke machines, with a market capitalisation of six million dollars \u2014 <a href=\"https:\/\/www.cnbc.com\/2026\/02\/12\/trucking-and-logistics-stocks-tumble-on-release-of-ai-freight-scaling-tool.html\">announced<\/a> that its AI platform could \u201coptimise\u201d freight logistics, scaling volumes by 300\u2013400% without adding staff. The announcement wiped seventeen billion dollars off U.S. transportation stocks in a single day. C.H. Robinson fell 15%. RXO fell 20%. The Russell 3000 Trucking Index dropped 6.6%. DHL, DSV, and Kuehne+Nagel followed in Europe. All of this because a former karaoke company claimed, in effect, to have solved optimal planning \u2014 a problem that is <a href=\"https:\/\/en.wikipedia.org\/wiki\/PSPACE-complete\">PSPACE-complete<\/a>. If <a href=\"https:\/\/en.wikipedia.org\/wiki\/Alan_Turing\">Alan Turing<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Stephen_Cook\">Stephen Cook<\/a> could be reached for comment, I suspect they would have questions.<\/p>\n<p>The same magical thinking pervades \u201cAI for diagnostics.\u201d A machine learning model trained on historical failures will recognise patterns it has seen before. Show it a novel fault \u2014 a combination that never appeared in the training data \u2014 and it has nothing to generalise from. It will either misclassify the failure or express high confidence in a wrong answer. This is not a limitation that more data or a larger model can fix. It is a structural property of inductive inference: you cannot learn what you have not observed, and complex systems fail in ways that are combinatorially vast and fundamentally unpredictable from examples alone.<\/p>\n<p>Model-based diagnosis does not have this problem. If you have a model of the system, you can diagnose faults you have never observed, in configurations you have never tested, because the reasoning is deductive rather than inductive. The <a href=\"https:\/\/en.wikipedia.org\/wiki\/SAT_solver\">SAT solver<\/a> asks: <em>is there an assignment of health variables that is consistent with the model and the observations?<\/em> The answer is provably correct with respect to the model. This is why NASA uses model-based diagnosis for spacecraft and why the automotive industry uses it for on-board diagnostics. Nobody uses a neural network to diagnose a flight-critical system. The neural network might get it right 95 percent of the time. The other 5 percent is a smoking crater.<\/p>\n<h2>What\u2019s Next<\/h2>\n<p>The modern diagnosis packages in <a href=\"https:\/\/gitlab.llama.gs\/logic\/llogic\">llogic<\/a> have addressed all of LyDiA\u2019s imprecisions \u2014 cleaner encodings, correct probabilistic inference, proper multi-valued support \u2014 but those are a story for a separate post.<\/p>\n<p>There is also Lydia-NG, a framework I built that extends model-based diagnosis to analog systems using a built-in SPICE simulation engine. Rethinking Lydia-NG connects us directly to the analog side of EDA \u2014 a domain where formal methods have barely made an appearance and where the tools are, to put it charitably, showing their age.<\/p>\n<p>And that is the longer ambition. Cadence Virtuoso dates from 1991 \u2014 thirty-five years old. Vivado is newer (2012), but its place-and-route lineage descends from NeoCAD, acquired in 1995, and its synthesis from MINC, acquired in 1998. Synopsys Design Compiler has been around since the late 1980s. The EDA industry is running on architectural foundations that predate the web browser. These tools work \u2014 in the sense that a 1991 Toyota also works \u2014 but the algorithms inside them are heuristic, the interfaces are hostile, and nobody has rethought the fundamentals in decades.<\/p>\n<p>The goal of Llama Logic Corporation is to challenge this. Modern EDA with proper AI-augmented formal methods \u2014 analog, digital, and FPGA. New languages. New solvers. New tools. Not \u201cAI for EDA\u201d in the Silicon Valley sense of wrapping an LLM around Verilog and hoping for the best, but the real thing: algorithms with correctness guarantees, backed by the mathematical foundations that already exist and that the industry has been too comfortable to adopt.<\/p>\n<p>In the next instalment, I will demonstrate qbf-designer doing FPGA technology mapping \u2014 covering a small circuit with k-input Look-Up Tables. The formal methods stack is growing. The software works. It does not hallucinate.<\/p>\n<p>The repository: <a href=\"https:\/\/gitlab.llama.gs\/constraints\/lydia\">LyDiA<\/a> \u2014 language and toolchain for <a href=\"https:\/\/en.wikipedia.org\/wiki\/Diagnosis_(artificial_intelligence)\">Model-Based Diagnosis<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Amazon&#8217;s weekly operations meeting in March reportedly focused on a \u201ctrend of incidents\u201d characterised by \u201chigh blast radius\u201d and \u201cGen-AI assisted changes.\u201d The Financial Times, which saw the briefing note, reported that AI-generated code had been implicated in a series of outages \u2014 including one that took down Amazon\u2019s entire e-commerce website for several hours. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,17,18],"tags":[28,14,32,21],"class_list":["post-34","post","type-post","status-publish","format-standard","hentry","category-computer-science-and-philosophy","category-projects","category-research","tag-applied-science","tag-formal-methods","tag-model-based-diangosis","tag-open-source-2"],"_links":{"self":[{"href":"https:\/\/llama.gs\/blog\/index.php\/wp-json\/wp\/v2\/posts\/34","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/llama.gs\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/llama.gs\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/llama.gs\/blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/llama.gs\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=34"}],"version-history":[{"count":1,"href":"https:\/\/llama.gs\/blog\/index.php\/wp-json\/wp\/v2\/posts\/34\/revisions"}],"predecessor-version":[{"id":36,"href":"https:\/\/llama.gs\/blog\/index.php\/wp-json\/wp\/v2\/posts\/34\/revisions\/36"}],"wp:attachment":[{"href":"https:\/\/llama.gs\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=34"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/llama.gs\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=34"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/llama.gs\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=34"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}