Tag: FPGA

  • On the Micro-Parallelism of FPGAs, Simulating AXI Busses, and Blinking LEDs

    Field-Programmable Gate Arrays and the languages for programming (ahem, configuring them) have the reputation of being difficult to master. I still remember when I heard for the first time: at the beginning of my Ph.D. studies at TU Delft, a new friend of mine, visiting the computer engineering department, came home and told me he was supposed to implement some kind of GPS signal processing that required a lot of linear algebra operations with vectors and matrices (duh, isn’t all computing, including the insanely large vector matrix multiplication of LLMs, like that).

    What stroke me at the time is that my friend thought he could just automatically translate Matlab code to VHDL (this is the European version of Verilog, for you Yankees) and benefit from the “micro-parallelism” of the platform. During this phase of my studies, I was already told that turning a sequential algorithm and making it parallel cannot be done by a machine.

    But I am diverging. Years later, I still have problems with saying that we program FPGAs. Verilog, VHDL, and SystemC are not classical programming languages like C++ and Python. They have both declarative features for synthesis and procedural features for simulation and fancier techniques like model checking (here I come with my novel ideas and techniques). For the actual physical manifestation of an FPGA like transforming input electrical signals to output signals, the term “FPGA configuration” is more apt than programming. But a job ad for FPGA configurator would sound strange, thus FPGA programming it is!

    Demo

    Let us fast-forward to the demo. I have felt numerous times the gusto of blinking a LED, so what could be better compared to a bunch of them. In the video below, you can see in action the auto-generated Verilog from the previous article combined with driving the 7-segment LED of the Basys-3.

    Of course, this has been done sufficient times and is at the level of a high-school student, if it were not for the connection to the AXI bus, the behavioral simulation and the composition of IP blocks.

    Composability and an AXI Architecture for Synthesis

    In my previous article on the topic of FPGA, I showed that not all Verilog should be manually written. To do that, I went to implement common digital circuits in Python, saving them as DSLs and translating to Verilog. Of course, testing these circuits with real FPGA hardware requires a lot of infrastructure, and I used Vivado soft processor to do that. Vivado provides a quick graphical way to connect various IP blocks.

    I will make a small deviation here. Although these days the boundaries between hardware and software are fuzzy, there is far less open-source hardware than open-source software. The reasons are, I believe, two: first, shipping hardware costs, on top of your time, also cash. The second reason, I believe, is more psychological: that developing hardware provides less instantenous gratification than developing software. That is why the building blocks of electronic chip design are called IP blocks where IP stands for Intellectual Property. Anyhow. Because companies still want to provide closed-source, obfuscated IP blocks, Vivado provides methods for composing these blocks into architectures. And if you want to be anybody in the chip making business you have to adhere to these practices.

    One should always be suspicious of acronyms that have both the words Advanced and eXtnsible in them (the I in AXI stands for Interface). But this one is good. It is a part of AMBA and is a standard for chips to connect fast with each other. This is not unlike SPI or I2C, only faster and with many wires.

    In the previous blog, I showed you the IP block diagram for the RISC-V architecture that uses autogenerated logic circuits. For the demo that follows, I have developed another IP block that drives the 7 segment LEDs of the Basys 3 development board. The new IP block diagram that includes this LED control block is shown below.

    AXI Architecture for Simulation

    Because simulation is distinct from synthesis, it is handy to have a separate Vivado block diagram for simulation. The reason is that we need a lot of infrastructure to generate the AXI bus signals, and this is already provided by the big players who have skin in the AXI game: AMD, ARM and the likes. It would be very difficult to toggle all AXI signals of a transaction by hand (remember that an AXI-compliant IP has tens of ports; one has to address a 32-bit address space, for example). Thankfully, this is already done by an IP block that contains only simulation code and is called “AXI Verification IP”. The resulting block diagram for the simulation is shown below.

    It turns out that to connect to such a beast as an AXI-bus; it does not use the relatively basic simulation primitives of the original Verilog. One needs more abstraction, and it is provided by the dynamic extended features of System Verilog which extends the original Verilog for system verification. The code excerpt below shows the gist of the test-bench and illustrates how easy it is to generate an AXI transaction.

    `timescale 1ns / 1ps
    
    import axi_vip_pkg::*;
    import design_2_axi_vip_0_0_pkg::*;
    
    module tb_tof_2481_axi;
     WE// ⋮ (module definition and DUT instantiation)
      initial begin
          // ⋮ (code omitted)
          master = new("master", dut.design_2_i.axi_vip_0.inst.IF);
          master.start_master();
          // ⋮ (code omitted)
          master.AXI4LITE_WRITE_BURST(32'h44A0_0000,
                                      3'b000,
                                      32'h0000_BEEF,
                                      resp);
          $display("WRITE 0 resp=%0d", resp);
          master.AXI4LITE_WRITE_BURST(32'h44A0_0004,
                                      3'b000,
                                      32'h0000_FFFF,
                                      resp);
          $display("WRITE 1 resp=%0d", resp);
          // ⋮ (code omitted)
          $finish;
      end
    endmodule

    Having concocted the above test-bench for sending the input signals to the simulation of the AXI connected 7-segment LED display, we can click in Vivavdo, and lo-and-behold, wave-forms come out.

    Of course, to perform even further testing and validation of the seven segment LEDs, one could convert the individual signals driving the LEDs to hexadecimals signals and could compare what the LEDs show to what the AXI master sent (in our case the hexadecimal value of 0xbeef). But the journey toward model checking, automatic diagnostics and testing is more interesting, and we have discussed LEDs more than enough.

    Reflection

    FPGA programming tool chains and Vivado are a hairball of design by committee, we do something this way because we did it the same way when we were young and when we used to bike ten miles to school every day (uphill both ways and into the wind). On the more positive side, these are complex tools, and they work, and people use them for their digital design.

    All that being said, our understanding of both the theory and practice of computing has improved dramatically since the mid 20-th century, and it is time to revisit these old ways of designing and implementing circuits. If we do this carefully, maybe, maybe, we will design the hardware, software, and even AI algorithms that are not shameful to write about and use.

    What is Next?

    In what follows, we will grow our demo and I will show you how crappy the actual FPGA implementation that Vivado does. We will also discuss more accurate simulations, clocks, frequencies, timing analysis, and what proper AI algorithms (not the ones everybody is discussing but the ones that are not used for fraud and deception).

    The Real Deal

    Unlike what is these days practice in most of the Silicon Valley, everything I talk about is accessible and reproducible. So here is the repository that allowed me to write this blog article and make the demo:
    https://gitlab.llama.gs/llogic_basys3

    Ceterum censeo slopem esse delendam.

    (Cato the Elder ended every speech in the Roman Senate with “Carthage must be destroyed” — regardless of the topic. This is that, but for AI slop.)

  • Diagnosing Circuits into Existence

    Cadence recently unveiled ChipStack AI, which El Reg memorably described as “vibe coding for chips.” The idea is that an LLM agent will design your next processor for you, provided you don’t mind the occasional hallucinated transistor. One Reg commenter recalled Jensen Huang’s declaration that nobody needs to learn programming anymore, and suggested he try designing his next GPU with it. Quite. Meanwhile, a water desalination company spent $200,000 on AI-generated engineering advice that turned out to be — and I use the technical term — wrong. They then built a second AI to filter out the nonsense from the first one, which is the silicon valley equivalent of hiring a second drunk to drive the first one home. One does wonder what the industry will achieve once it sobers up. In the meantime, I have been doing something unfashionable: using mathematics to design circuits that are provably correct.

    Today I am releasing qbf-designer, a tool for exact digital circuit synthesis from arbitrary component libraries via Quantified Boolean Formula solving. It is the top of a dependency stack that has also been modernised and released: llogic for logic representation, transformation, and solving; lcfgen for generating circuit primitives used both in the QBF encoding itself and as benchmark specifications; and a collection of solver bindings — pydepqbf, pylgl, pyllq, and pycudd — that connect the Python layer to the C/C++ solvers doing the actual heavy lifting. The software works. It synthesises provably minimal circuits from specifications. It found a five-gate full-subtractor that improves on the seven-gate textbook design. It does not hallucinate.

    Now, explaining what “provably minimal circuit design” actually means turns out to require rather more than a single blog post — so this is the first in a series. The short version: given a functional specification (“I want a circuit that adds two numbers”) and a bag of components (“here are some AND, OR, and XOR gates”), find the smallest circuit that does the job. The practical application is technology mapping for FPGAs, where you need to cover a circuit with k-input Look-Up Tables using as few LUTs as possible. The silicon is already on the chip and you have already paid for it — every LUT you save is space freed for more logic, letting you fit a larger design onto the same device. Current industry tools — Vivado, Yosys — use heuristics for this. qbf-designer gives you the exact answer, at least for sub-circuits small enough to chew on. Early results are promising: on a 2-bit comparator mapped to 3-LUTs, the solver finds a 5-LUT implementation where heuristic methods produce 6. One does not need to be a venture capitalist to notice that 5 is fewer than 6.

    There is a fundamental difference between this work and the “AI for chip design” circus currently touring Silicon Valley. Circuit synthesis — the problem of finding a minimum-size circuit equivalent to a specification — sits at the second level of the polynomial hierarchy (Σ₂ᵖ-complete, for those keeping score). This is not a problem you can solve by autocompleting Verilog. It has a precise computational complexity classification, a formal proof of correctness, and a guarantee of optimality. In other words, it is science, the kind that involves theorems rather than pitch decks, and where “it works” means something more rigorous than “the demo didn’t crash during the investor meeting.”

    My interest in circuit synthesis comes from an unexpected direction: breaking things. I spent years working on Model-Based Diagnosis of digital circuits — the problem of figuring out which component in a circuit has failed, given observed misbehaviour. My colleague Johan de Kleer, who has been thinking about this sort of thing since before most AI entrepreneurs were born, used to describe synthesis as “diagnosing a circuit into existence.” The idea is beautifully perverse: start with an empty circuit, treat the absence of every gate as a “fault,” and ask the diagnostic engine what collection of fixes would make the circuit behave like, say, a 32-bit ALU. It turns out that the mathematical machinery for diagnosis and synthesis is nearly identical — the same ∃∀ quantifier structure, the same miter-based equivalence checking, the same PSPACE-hard satisfaction problems. The only difference is whether you are looking for what went wrong or what should be there in the first place.

    The theoretical foundations and experimental results have been written up and submitted to Constraints, a journal that publishes work reviewed by people who can tell the difference between a proof and a press release. The paper covers the QBF encoding, the universal component cell, the configurable interconnection fabric, symmetry breaking, and extensive benchmarks on arithmetic circuits, 74XXX integrated circuits, and exact synthesis function sets. I mention this not to boast but to draw a gentle contrast with the prevailing approach to AI research in the Valley, where the peer review process consists of checking whether the blog post got enough likes on Twitter X and the replication methodology is “we lost the weights.” Should the reviewers find fault with the work, they will at least be able to point to a specific equation rather than gesturing vaguely at a loss curve and muttering about emergence.

    In the next instalment, I will demonstrate qbf-designer doing FPGA technology mapping — taking a circuit, mapping it to k-input Look-Up Tables, and producing a result that uses fewer LUTs than Xilinx Vivado. Not by a little. Not by accident. By mathematics.

  • The Synthesis Problem: Why I’m Building a New Logic Toolchain

    Modern chip design leaves performance on the table. A lot of it. Meanwhile, billionaire CEOs with the technical depth of a drunk high-schooler who wants to be new age when he grows up keep calling a glorified autocomplete “AGI.” Nobody’s asking if the circuit itself is well-designed — just whether the output sounds smart.

    The tools we use to go from a logical specification to a physical circuit are decades old in their core ideas. They work — billions of transistors ship every year — but they settle for “good enough” at almost every stage of the pipeline. Synthesis heuristics that don’t explore the real optimization space. Representation formats that can’t talk to each other. A wall between the people who study formal logic and the people who tape out silicon.

    I want to build better circuits. Not a better CPU, not a better GPU — better circuits, generally. Classical, reversible, quantum. The kind of improvement that comes from rethinking the synthesis process itself, not from adding more transistors.

    That’s what this project is about.

    What I Actually Built

    Over the past several years, I’ve been assembling an open source toolchain that connects formal logic to real hardware. Each piece exists because I hit a wall with existing tools.

    llogic is the foundation — a library of logic representations. Boolean formulas, CNF, DNF, OBDDs, QBF, combinational circuits, reversible circuits, quantum circuits. They all live under one roof because they share more structure than the textbooks let on. A circuit is a formula is a constraint problem. If your tools understand that, you can move between representations and optimize across them.

    lcfgen generates circuit families — parameterized circuit structures that let you explore design spaces systematically instead of hand-wiring one instance at a time.

    llogic2verilog translates circuits from llogic’s internal representation into synthesizable Verilog. This is the bridge from formal logic to hardware toolchains.

    lverilog is a Verilog parser that produces a clean AST, because I needed one that I could actually inspect and transform programmatically without fighting a legacy codebase.

    llogic_basys3 is the proof that this isn’t academic exercise. It targets the Digilent Basys3 board — a Xilinx Artix-7 FPGA — and runs brute-force integer factorization by testing 16×16 bit multiplication at 50 MHz. A MicroBlaze soft processor drives the circuit over AXI, exposed as a UART interface. You feed it a number, it searches for factors — on a $150 hobby board, clocking through candidates at 50 million per second.

    Theory in. Hardware out. No marketing budget required, no claims of sentience.

    Why This Matters

    The connection between Boolean satisfiability, quantified Boolean formulas, and circuit structure is well-studied in theory. My published work on QBF-based circuit synthesis showed that you can use the structure of quantified formulas to derive circuits with provable properties. But the research community largely stops at the paper. The tooling to go from that theory to running hardware didn’t exist.

    It does now.

    And the scope is broader than classical digital logic. The same formal framework that represents a combinational circuit can represent a reversible circuit or a quantum circuit. The same optimization that simplifies a Boolean formula can simplify a quantum algorithm’s gate structure. There’s a deep connection to Bayesian inference here too — probabilistic reasoning over circuit structure — that I’ll write about separately.

    Where This Goes

    I’m not building a toolchain for the sake of building a toolchain. I care about two things: scalability and energy efficiency. Better synthesis means smaller circuits. Smaller circuits mean less power, less area, more throughput. At scale, this is the difference between a computation that’s feasible and one that isn’t.

    The implications reach beyond hardware design. Optimized circuit structures have direct applications in machine learning acceleration — which is to say, making the very large circuits that people mistake for intelligence actually run efficiently. The same goes for cryptanalysis and scientific computing — anywhere you’re bottlenecked by the gap between what you want to compute and what the hardware can deliver. I’ll write about those connections in future posts.

    The FPGA demo is the first milestone — a hobby board factoring integers to prove the pipeline works end-to-end. The next steps involve pushing the optimization boundaries, extending to quantum targets, and making the case — with working hardware — that this approach produces better circuits.

    If you’re a researcher working on synthesis, a hardware engineer frustrated with your tools, or a program manager looking for the next leap in design methodology: let’s talk.

    The code is open source. The results are reproducible. The ambition is to build circuits more powerful than anything that exists today.

    Repositories: