Alexander Feldman's Blog

Tag: open-source

On the Micro-Parallelism of FPGAs, Simulating AXI Busses, and Blinking LEDs
Field-Programmable Gate Arrays and the languages for programming (ahem, configuring them) have the reputation of being difficult to master. I still remember when I heard for the first time: at the beginning of my Ph.D. studies at TU Delft, a new friend of mine, visiting the computer engineering department, came home and told me he was supposed to implement some kind of GPS signal processing that required a lot of linear algebra operations with vectors and matrices (duh, isn’t all computing, including the insanely large vector matrix multiplication of LLMs, like that).

What stroke me at the time is that my friend thought he could just automatically translate Matlab code to VHDL (this is the European version of Verilog, for you Yankees) and benefit from the “micro-parallelism” of the platform. During this phase of my studies, I was already told that turning a sequential algorithm and making it parallel cannot be done by a machine.

But I am diverging. Years later, I still have problems with saying that we program FPGAs. Verilog, VHDL, and SystemC are not classical programming languages like C++ and Python. They have both declarative features for synthesis and procedural features for simulation and fancier techniques like model checking (here I come with my novel ideas and techniques). For the actual physical manifestation of an FPGA like transforming input electrical signals to output signals, the term “FPGA configuration” is more apt than programming. But a job ad for FPGA configurator would sound strange, thus FPGA programming it is!

Demo

Let us fast-forward to the demo. I have felt numerous times the gusto of blinking a LED, so what could be better compared to a bunch of them. In the video below, you can see in action the auto-generated Verilog from the previous article combined with driving the 7-segment LED of the Basys-3.

Of course, this has been done sufficient times and is at the level of a high-school student, if it were not for the connection to the AXI bus, the behavioral simulation and the composition of IP blocks.

Composability and an AXI Architecture for Synthesis

In my previous article on the topic of FPGA, I showed that not all Verilog should be manually written. To do that, I went to implement common digital circuits in Python, saving them as DSLs and translating to Verilog. Of course, testing these circuits with real FPGA hardware requires a lot of infrastructure, and I used Vivado soft processor to do that. Vivado provides a quick graphical way to connect various IP blocks.

I will make a small deviation here. Although these days the boundaries between hardware and software are fuzzy, there is far less open-source hardware than open-source software. The reasons are, I believe, two: first, shipping hardware costs, on top of your time, also cash. The second reason, I believe, is more psychological: that developing hardware provides less instantenous gratification than developing software. That is why the building blocks of electronic chip design are called IP blocks where IP stands for Intellectual Property. Anyhow. Because companies still want to provide closed-source, obfuscated IP blocks, Vivado provides methods for composing these blocks into architectures. And if you want to be anybody in the chip making business you have to adhere to these practices.

One should always be suspicious of acronyms that have both the words Advanced and eXtnsible in them (the I in AXI stands for Interface). But this one is good. It is a part of AMBA and is a standard for chips to connect fast with each other. This is not unlike SPI or I2C, only faster and with many wires.

In the previous blog, I showed you the IP block diagram for the RISC-V architecture that uses autogenerated logic circuits. For the demo that follows, I have developed another IP block that drives the 7 segment LEDs of the Basys 3 development board. The new IP block diagram that includes this LED control block is shown below.

AXI Architecture for Simulation

Because simulation is distinct from synthesis, it is handy to have a separate Vivado block diagram for simulation. The reason is that we need a lot of infrastructure to generate the AXI bus signals, and this is already provided by the big players who have skin in the AXI game: AMD, ARM and the likes. It would be very difficult to toggle all AXI signals of a transaction by hand (remember that an AXI-compliant IP has tens of ports; one has to address a 32-bit address space, for example). Thankfully, this is already done by an IP block that contains only simulation code and is called “AXI Verification IP”. The resulting block diagram for the simulation is shown below.

It turns out that to connect to such a beast as an AXI-bus; it does not use the relatively basic simulation primitives of the original Verilog. One needs more abstraction, and it is provided by the dynamic extended features of System Verilog which extends the original Verilog for system verification. The code excerpt below shows the gist of the test-bench and illustrates how easy it is to generate an AXI transaction.
```
`timescale 1ns / 1ps

import axi_vip_pkg::*;
import design_2_axi_vip_0_0_pkg::*;

module tb_tof_2481_axi;
 WE// ⋮ (module definition and DUT instantiation)
  initial begin
      // ⋮ (code omitted)
      master = new("master", dut.design_2_i.axi_vip_0.inst.IF);
      master.start_master();
      // ⋮ (code omitted)
      master.AXI4LITE_WRITE_BURST(32'h44A0_0000,
                                  3'b000,
                                  32'h0000_BEEF,
                                  resp);
      $display("WRITE 0 resp=%0d", resp);
      master.AXI4LITE_WRITE_BURST(32'h44A0_0004,
                                  3'b000,
                                  32'h0000_FFFF,
                                  resp);
      $display("WRITE 1 resp=%0d", resp);
      // ⋮ (code omitted)
      $finish;
  end
endmodule
```
Having concocted the above test-bench for sending the input signals to the simulation of the AXI connected 7-segment LED display, we can click in Vivavdo, and lo-and-behold, wave-forms come out.

Of course, to perform even further testing and validation of the seven segment LEDs, one could convert the individual signals driving the LEDs to hexadecimals signals and could compare what the LEDs show to what the AXI master sent (in our case the hexadecimal value of 0xbeef). But the journey toward model checking, automatic diagnostics and testing is more interesting, and we have discussed LEDs more than enough.

Reflection

FPGA programming tool chains and Vivado are a hairball of design by committee, we do something this way because we did it the same way when we were young and when we used to bike ten miles to school every day (uphill both ways and into the wind). On the more positive side, these are complex tools, and they work, and people use them for their digital design.

All that being said, our understanding of both the theory and practice of computing has improved dramatically since the mid 20-th century, and it is time to revisit these old ways of designing and implementing circuits. If we do this carefully, maybe, maybe, we will design the hardware, software, and even AI algorithms that are not shameful to write about and use.

What is Next?

In what follows, we will grow our demo and I will show you how crappy the actual FPGA implementation that Vivado does. We will also discuss more accurate simulations, clocks, frequencies, timing analysis, and what proper AI algorithms (not the ones everybody is discussing but the ones that are not used for fraud and deception).

The Real Deal

Unlike what is these days practice in most of the Silicon Valley, everything I talk about is accessible and reproducible. So here is the repository that allowed me to write this blog article and make the demo:
https://gitlab.llama.gs/llogic_basys3

Ceterum censeo slopem esse delendam.

(Cato the Elder ended every speech in the Roman Senate with “Carthage must be destroyed” — regardless of the topic. This is that, but for AI slop.)
May 4, 2026
What’s Actually Broken

Amazon’s weekly operations meeting in March reportedly focused on a “trend of incidents” characterised by “high blast radius” and “Gen-AI assisted changes.” The Financial Times, which saw the briefing note, reported that AI-generated code had been implicated in a series of outages — including one that took down Amazon’s entire e-commerce website for several hours. Amazon’s response was to deny the problem existed, which is the corporate equivalent of the AI itself: confidently wrong and hoping nobody checks. James Gosling, the creator of Java, who left AWS in 2024, was less diplomatic. He observed that the company’s AI-driven restructuring had “demolished” the teams responsible for infrastructure stability, and that the ROI analysis behind the decision was, in his words, “disastrously shortsighted.” One does not need a diagnostic engine to identify the fault here. A company replaced the engineers who understood its systems with a technology that does not, and the systems fell over. The circuit breaker that the AI removed — the one it classified as “redundant” — had been added after a previous outage. The AI could not distinguish a safety mechanism from dead code, because it had no model of the system. It had statistics. Statistics told it the breaker rarely fired. A model would have told it why.

This is the difference between machine learning and model-based reasoning, and it is the difference that this post — and the toolchain I am releasing today — is about.

An Unexpected Reception

Yesterday’s post announcing qbf-designer, a tool for exact digital circuit synthesis via Quantified Boolean Formula solving, generated rather more attention than I had anticipated. Twenty-two thousand LinkedIn impressions, a hundred-odd reactions, and five hundred profile views in twenty-four hours, for a post about problems at the second level of the polynomial hierarchy and FPGA technology mapping. One concludes that there is an audience for work that produces correct answers, even — or perhaps especially — in an era when the prevailing technology cannot reliably tell you which end of a circuit is up.

Dusting Off the Arsenal

To continue with my plans for commercialising formal methods for EDA through Llama Logic Corporation, I have to excavate, modernise, and release the full inventory of tools and concepts I have built over nearly two decades. There are many reusable components in this stack — logic representations, solver bindings, encoding schemes, diagnostic algorithms — and they need to be cleaned up, documented, and made available. The qbf-designer release was the first. Today’s is the second.

Today I am releasing LyDiA, a language and toolchain for Model-Based Diagnosis. LyDiA was the core of my doctoral research at Delft University of Technology. I will not be using LyDiA itself going forward — the modern llogic packages have fixed all of its imprecise notions and provide a cleaner foundation for everything I am building — but LyDiA was where it all started. It was my first serious work on the diagnosis of circuits, and it contains ideas and algorithms that remain relevant. It deserves to be available.

Model-Based Diagnosis in 15 Seconds

The demo takes two inputs. The model (2adder-weak.sys) describes a two-bit full adder — a hierarchical composition of half-adders built from XOR and AND gates. Every gate has a Boolean health variable: true means the gate works correctly, false means it is faulty and its output is unconstrained. We do not specify how a gate fails, only that its output can no longer be trusted. This is called a weak fault model.

The observation (2adder.obs) records what actually happened: specific values on the inputs and outputs of the circuit that are inconsistent with correct behaviour. Something is broken. We do not know what. The diag command hands both files to the GOTCHA engine — which computes all minimal sets of component failures that explain the discrepancy. Not one guess. Not the most likely answer. Every combination of gate failures that is logically consistent with the model and the observation, with no redundancy.

The fm command lists the results: six double-fault diagnoses, each a minimal set of gates whose simultaneous failure is sufficient to produce the observed misbehaviour. For example, d4 = { !FA.HA1.X.h, !FA.O.h } means the XOR gate in the first half-adder and the OR gate are both broken. There is no single-fault explanation — at least two gates must be faulty, and the engine has proven this by exhaustive enumeration.

Why Circuits?

Writing software to diagnose a fabricated IC does not make practical sense. You would use ATPG and scan chains for that. We use digital circuits as benchmarks because they have the properties that matter for diagnosis research: compositional structure, many components, well-defined fault models, and known-correct reference behaviour. These are the same properties that make diagnosis hard in complex engineered systems generally. This is why the ISCAS-85 suite has been the standard MBD benchmark for thirty years.

Where diagnosis does apply directly in EDA is design verification. Suppose an engineer places a NAND gate instead of an AND gate for the carry computation in the adder above. The circuit passes some tests but fails on specific input vectors. The diagnostic engine, given the intended specification and the observed misbehaviour, will isolate the carry gate as the faulty component — even if the designer has never seen this particular mistake before, even if there are multiple simultaneous design errors. It reasons from the structure of the circuit, not from a database of past bugs.

The Modelling Problem

During my early attempts at commercialisation, I encountered a pattern that I suspect anyone in formal methods has seen. People looked at LyDiA diagnosing circuits and said: “Wonderful. Can it diagnose my HVAC system? My chemical plant? My supply chain?” And so they tried to model non-circuits as circuits, and things did not work, because the difficulty of modelling is the hard part.

Circuit diagnosis is tractable in part because digital circuits have a natural, compositional, Boolean structure. An AND gate is an AND gate. An HVAC system is a tangle of continuous dynamics, feedback loops, thermal gradients, and human behaviour. Cramming that into a Boolean framework requires heroic abstraction, and the resulting models are either too coarse to be useful or too large to be solvable. The aerospace fuel system model included in LyDiA — with its typed fault modes for leaking tanks, stuck sensors, and degraded pumps — hints at what multi-valued modelling can achieve, but it remains a toy compared to the real thing.

That said, LyDiA was never only about circuits. The distribution includes models of the N-queens problem, map colouring, Sudoku, and SEND+MORE=MONEY — general constraint satisfaction problems expressed in the same language. The diagnostic framework is, at its core, a constraint solver with a notion of health variables. This generality is both its strength and its curse: it can express anything, but making it useful for a specific domain requires domain expertise that no tool can substitute.

What LyDiA Got Wrong: Probability

LyDiA assigns fault probabilities to components — each gate gets a prior like 0.99 healthy, 0.01 faulty — but the probabilistic reasoning was never worked out correctly. The probabilities were treated as independent priors, multiplied together to rank diagnoses, with no rigorous account of how observations update beliefs or how correlations between faults propagate through the system.

The correct formulation turns out to be a #P problem — a counting problem. To compute the exact posterior probability of a diagnosis, you need to count the satisfying assignments of the diagnostic formula: how many ways can the internal signals of the circuit be assigned such that the model, the observation, and a given fault assumption are all consistent? The probability of a diagnosis is the ratio of its satisfying assignment count to the total. This is model counting, and it is #P-complete — harder than NP.

One consequence is that all diagnostic probabilities are rationals. They are ratios of integers — counts of discrete satisfying assignments. This has some puzzling implications for the relationship between fault probability and physical failure rates that I have not yet fully worked out.

There is also a quantum angle. Faults are inherently stochastic — a gate either works or it does not, and before you test it, the fault state is indeterminate in precisely the sense that a qubit is indeterminate before measurement. I showed in earlier work that placing health qubits in superposition and propagating them through a quantum circuit that mirrors the classical circuit under diagnosis computes the full probability distribution over all diagnoses simultaneously. This connects to von Neumann’s foundational work on the relationship between logic and probability. The practical implication is Grover’s algorithm: a quadratic speedup for searching the diagnostic space. I need to finish this work and implement a proper Grover-based diagnostic engine. It is on the list.

Why Machine Learning Cannot Do This

In February, a company called Algorhythm Holdings — formerly a manufacturer of karaoke machines, with a market capitalisation of six million dollars — announced that its AI platform could “optimise” freight logistics, scaling volumes by 300–400% without adding staff. The announcement wiped seventeen billion dollars off U.S. transportation stocks in a single day. C.H. Robinson fell 15%. RXO fell 20%. The Russell 3000 Trucking Index dropped 6.6%. DHL, DSV, and Kuehne+Nagel followed in Europe. All of this because a former karaoke company claimed, in effect, to have solved optimal planning — a problem that is PSPACE-complete. If Alan Turing and Stephen Cook could be reached for comment, I suspect they would have questions.

The same magical thinking pervades “AI for diagnostics.” A machine learning model trained on historical failures will recognise patterns it has seen before. Show it a novel fault — a combination that never appeared in the training data — and it has nothing to generalise from. It will either misclassify the failure or express high confidence in a wrong answer. This is not a limitation that more data or a larger model can fix. It is a structural property of inductive inference: you cannot learn what you have not observed, and complex systems fail in ways that are combinatorially vast and fundamentally unpredictable from examples alone.

Model-based diagnosis does not have this problem. If you have a model of the system, you can diagnose faults you have never observed, in configurations you have never tested, because the reasoning is deductive rather than inductive. The SAT solver asks: is there an assignment of health variables that is consistent with the model and the observations? The answer is provably correct with respect to the model. This is why NASA uses model-based diagnosis for spacecraft and why the automotive industry uses it for on-board diagnostics. Nobody uses a neural network to diagnose a flight-critical system. The neural network might get it right 95 percent of the time. The other 5 percent is a smoking crater.

What’s Next

The modern diagnosis packages in llogic have addressed all of LyDiA’s imprecisions — cleaner encodings, correct probabilistic inference, proper multi-valued support — but those are a story for a separate post.

There is also Lydia-NG, a framework I built that extends model-based diagnosis to analog systems using a built-in SPICE simulation engine. Rethinking Lydia-NG connects us directly to the analog side of EDA — a domain where formal methods have barely made an appearance and where the tools are, to put it charitably, showing their age.

And that is the longer ambition. Cadence Virtuoso dates from 1991 — thirty-five years old. Vivado is newer (2012), but its place-and-route lineage descends from NeoCAD, acquired in 1995, and its synthesis from MINC, acquired in 1998. Synopsys Design Compiler has been around since the late 1980s. The EDA industry is running on architectural foundations that predate the web browser. These tools work — in the sense that a 1991 Toyota also works — but the algorithms inside them are heuristic, the interfaces are hostile, and nobody has rethought the fundamentals in decades.

The goal of Llama Logic Corporation is to challenge this. Modern EDA with proper AI-augmented formal methods — analog, digital, and FPGA. New languages. New solvers. New tools. Not “AI for EDA” in the Silicon Valley sense of wrapping an LLM around Verilog and hoping for the best, but the real thing: algorithms with correctness guarantees, backed by the mathematical foundations that already exist and that the industry has been too comfortable to adopt.

In the next instalment, I will demonstrate qbf-designer doing FPGA technology mapping — covering a small circuit with k-input Look-Up Tables. The formal methods stack is growing. The software works. It does not hallucinate.

The repository: LyDiA — language and toolchain for Model-Based Diagnosis.

April 8, 2026
Friday Archaeology: A Quarter-Century-Old Crypto Library, the Cult of the Dead Cow, and a Rijndael Buffer Overwrite
It is Friday. El Reg informs us that 45 percent of AI-generated code now ships with security flaws, that vibe-coded apps are leaking student data to unauthenticated attackers, and that rogue AI agents have learned to escalate privileges and exfiltrate secrets without being asked. In this climate of automated incompetence, I thought it might be instructive to look at some code written by a human, with a book, in 1999. Today we are going down memory lane — this code has never once hallucinated a dependency.

The Dig

Because everything I do is technical, even nostalgia comes with a tarball. I unearthed this:

https://gitlab.llama.gs/attic/scl

SCL — the Small Crypto Library — and its companion SSSL, the Small Secure Socket Library. Approximately 20,000 lines of C++ implementing, from scratch: a bignum library, RSA, DSA, ElGamal, Rabin-Williams, Blum-Goldwasser, Diffie-Hellman, MQV, all five AES candidates (Rijndael was selected in October 2000, so I was ahead of the news cycle), seven hash functions, seven block cipher modes, DER encoding, a secure socket layer with protocol negotiation, and the beginnings of a TLS 1.0 implementation. Written between 1999 and 2001. I was 22.

Now, it is received wisdom that when programmers look at their old code, they recoil in horror, as one might upon discovering a photograph of oneself in flared trousers at a school disco. I looked at mine and thought: actually, this is rather good.

This puts me in mind of Bill Bryson’s observation in Neither Here Nor There about his friend Stephen Katz’s relationship with women. Bryson notes that most men, as they age, gradually lower their standards. Katz, however, had actually raised his — he had started from such a comprehensively low base that the only possible direction was up. My situation is the inverse but structurally identical: the class hierarchy is clean, the block cipher modes compose correctly, the DER encoder works. I looked at 22-year-old me’s code with twenty-five more years of context, and the younger version passed review. Standards were apparently already set.

The Story

Some context. In the summer of 1999, I was in Varna, Bulgaria, teaching UNIX courses to save money and waiting for my B.Sc. to finish. I ordered Bruce Schneier’s Applied Cryptography from Amazon. It cost me a significant fraction of a Bulgarian salary. The book arrived. I read it. Then I did what any reasonable person would do: I implemented everything in it.

The test vectors in the repository? Typed by hand from Schneier’s appendices. Every single DES permutation. Every Blowfish round. The IDEA vectors. The lot. The first three lines of the test vector file read:
```
# This is a comment.
# I like comments very much.
# The next line is empty.
```
That is a 22-year-old testing his flex parser.

In June 2000, I graduated and made aliyah to Israel, where I joined Zend Technologies in Ramat Gan — the company behind the PHP language engine. The crypto library came with me. The CVS timestamps tell the whole story: initial import Saturday June 9, 2001, a furious week of refactoring the DER encoding layer, and then silence. The last commit is Saturday June 16, 2001. I had renamed AddPrimitive to addValue and Write to toFile halfway through, left half the callers using the old names, commented out a constructor I hadn’t finished implementing, and walked away.

Why? Because the Israeli army started sending letters. I had already done my time as a conscript in the Bulgarian navy — an experience that cured me permanently of any romantic notions about military service — and I was not about to do it again. I left for the Netherlands in rather a hurry. The crypto library stayed behind, frozen mid-refactor, a monument to the universal truth that API migrations are never completed.

(In a parallel timeline, I might have stayed. Before Zend, I had applied for a master’s at the Weizmann Institute. The admissions interview was with Adi Shamir — yes, that Shamir, the S in RSA, whose algorithm I had just finished implementing. They asked basic mathematics questions. Nobody told me I should prepare. I didn’t get in. Ended up doing both a master’s and a doctorate at Delft instead, which worked out rather well. Zero regrets, but it remains a good dinner party story.)

The Hacktivists

Here is where it gets interesting. Towards the end of my time in Israel, I started receiving emails about SCL. They came from hacked accounts — which should have been the first clue about the correspondents — and referenced mailing lists populated by legitimate security researchers. The group was interested in using SCL+SSSL as the crypto layer for an anti-censorship tool.

The group was the Cult of the Dead Cow. The tool was Peekabooty.

For those too young or insufficiently misspent to remember: cDc was the hacking collective founded in a Texas slaughterhouse in 1984, famous for Back Orifice, for coining the term “hacktivism,” and for having a membership roster that included a future U.S. congressional candidate (Beto O’Rourke) and the man who would become DARPA’s Chief Information Officer (Peiter “Mudge” Zatko). Their offshoot Hacktivismo, led by the pseudonymous Oxblood Ruffin, was building tools to punch through national firewalls — specifically China’s.

Peekabooty was a peer-to-peer anonymity network that routed web requests through encrypted relays using standard SSL, so that censors couldn’t distinguish it from ordinary e-commerce traffic. The design started in July 2000 — the exact month I arrived at Zend. Paul Baranowski and Joey deVilla built it in Toronto, previewed it at DEF CON 9 in the summer of 2001, and it was, in concept, a direct predecessor of Tor.

They needed a small, BSD-licensed, self-contained C++ crypto library with an SSL socket layer. In 2001, the options were OpenSSL (enormous, GPL-ish, and famously hostile to casual integration) or mine. The emails were real. The interest was genuine.

And then I left for the Netherlands, the library sat unfinished, and Peekabooty eventually shipped using other crypto. The world got Tor instead. Oxblood Ruffin is now in Berlin. Mudge is running IT at DARPA. Joey deVilla plays accordion at tech conferences in Tampa. Baranowski designs card games in New York. And the library sat in a tarball on a backup drive for twenty-five years.

The Resurrection

This week, I made it compile on Slackware 15.0. This involved: modernising the autotools, fixing a DER API that was half-refactored in June 2001, discovering a buffer overwrite in the Rijndael key schedule that had been silently scribbling past the end of an array since 1999, finding the same bug in Blowfish, mass-replacing register keywords that C++17 no longer tolerates, const-correcting approximately four hundred string literals, and explaining to a 26-year-old libtool that SONAME is not optional.

The Rijndael bug is worth mentioning. AES-256 needs 60 round key words. The key schedule macro generates 8 per iteration. Seven iterations produce 56 — but you need indices 56 through 59, so the seventh iteration is necessary. It also writes indices 60 through 63, which are past the end of the wEncryptionKey[60] array. This has been undefined behaviour since the Clinton administration. It worked because whatever sat after the array in memory didn’t matter. The fix is to make the array 64 elements. The compiler finally noticed in 2026.

The code is now on GitLab, in the attic where it belongs:

https://gitlab.llama.gs/attic/scl

Next Week

Normal service resumes. There are things to open-source and a rather long arc to lay out properly. The crypto library was the prologue. The interesting parts come next.
April 3, 2026