Category: Uncategorized

  • On the Micro-Parallelism of FPGAs, Simulating AXI Busses, and Blinking LEDs

    Field-Programmable Gate Arrays and the languages for programming (ahem, configuring them) have the reputation of being difficult to master. I still remember when I heard for the first time: at the beginning of my Ph.D. studies at TU Delft, a new friend of mine, visiting the computer engineering department, came home and told me he was supposed to implement some kind of GPS signal processing that required a lot of linear algebra operations with vectors and matrices (duh, isn’t all computing, including the insanely large vector matrix multiplication of LLMs, like that).

    What stroke me at the time is that my friend thought he could just automatically translate Matlab code to VHDL (this is the European version of Verilog, for you Yankees) and benefit from the “micro-parallelism” of the platform. During this phase of my studies, I was already told that turning a sequential algorithm and making it parallel cannot be done by a machine.

    But I am diverging. Years later, I still have problems with saying that we program FPGAs. Verilog, VHDL, and SystemC are not classical programming languages like C++ and Python. They have both declarative features for synthesis and procedural features for simulation and fancier techniques like model checking (here I come with my novel ideas and techniques). For the actual physical manifestation of an FPGA like transforming input electrical signals to output signals, the term “FPGA configuration” is more apt than programming. But a job ad for FPGA configurator would sound strange, thus FPGA programming it is!

    Demo

    Let us fast-forward to the demo. I have felt numerous times the gusto of blinking a LED, so what could be better compared to a bunch of them. In the video below, you can see in action the auto-generated Verilog from the previous article combined with driving the 7-segment LED of the Basys-3.

    Of course, this has been done sufficient times and is at the level of a high-school student, if it were not for the connection to the AXI bus, the behavioral simulation and the composition of IP blocks.

    Composability and an AXI Architecture for Synthesis

    In my previous article on the topic of FPGA, I showed that not all Verilog should be manually written. To do that, I went to implement common digital circuits in Python, saving them as DSLs and translating to Verilog. Of course, testing these circuits with real FPGA hardware requires a lot of infrastructure, and I used Vivado soft processor to do that. Vivado provides a quick graphical way to connect various IP blocks.

    I will make a small deviation here. Although these days the boundaries between hardware and software are fuzzy, there is far less open-source hardware than open-source software. The reasons are, I believe, two: first, shipping hardware costs, on top of your time, also cash. The second reason, I believe, is more psychological: that developing hardware provides less instantenous gratification than developing software. That is why the building blocks of electronic chip design are called IP blocks where IP stands for Intellectual Property. Anyhow. Because companies still want to provide closed-source, obfuscated IP blocks, Vivado provides methods for composing these blocks into architectures. And if you want to be anybody in the chip making business you have to adhere to these practices.

    One should always be suspicious of acronyms that have both the words Advanced and eXtnsible in them (the I in AXI stands for Interface). But this one is good. It is a part of AMBA and is a standard for chips to connect fast with each other. This is not unlike SPI or I2C, only faster and with many wires.

    In the previous blog, I showed you the IP block diagram for the RISC-V architecture that uses autogenerated logic circuits. For the demo that follows, I have developed another IP block that drives the 7 segment LEDs of the Basys 3 development board. The new IP block diagram that includes this LED control block is shown below.

    AXI Architecture for Simulation

    Because simulation is distinct from synthesis, it is handy to have a separate Vivado block diagram for simulation. The reason is that we need a lot of infrastructure to generate the AXI bus signals, and this is already provided by the big players who have skin in the AXI game: AMD, ARM and the likes. It would be very difficult to toggle all AXI signals of a transaction by hand (remember that an AXI-compliant IP has tens of ports; one has to address a 32-bit address space, for example). Thankfully, this is already done by an IP block that contains only simulation code and is called “AXI Verification IP”. The resulting block diagram for the simulation is shown below.

    It turns out that to connect to such a beast as an AXI-bus; it does not use the relatively basic simulation primitives of the original Verilog. One needs more abstraction, and it is provided by the dynamic extended features of System Verilog which extends the original Verilog for system verification. The code excerpt below shows the gist of the test-bench and illustrates how easy it is to generate an AXI transaction.

    `timescale 1ns / 1ps
    
    import axi_vip_pkg::*;
    import design_2_axi_vip_0_0_pkg::*;
    
    module tb_tof_2481_axi;
     WE// ⋮ (module definition and DUT instantiation)
      initial begin
          // ⋮ (code omitted)
          master = new("master", dut.design_2_i.axi_vip_0.inst.IF);
          master.start_master();
          // ⋮ (code omitted)
          master.AXI4LITE_WRITE_BURST(32'h44A0_0000,
                                      3'b000,
                                      32'h0000_BEEF,
                                      resp);
          $display("WRITE 0 resp=%0d", resp);
          master.AXI4LITE_WRITE_BURST(32'h44A0_0004,
                                      3'b000,
                                      32'h0000_FFFF,
                                      resp);
          $display("WRITE 1 resp=%0d", resp);
          // ⋮ (code omitted)
          $finish;
      end
    endmodule

    Having concocted the above test-bench for sending the input signals to the simulation of the AXI connected 7-segment LED display, we can click in Vivavdo, and lo-and-behold, wave-forms come out.

    Of course, to perform even further testing and validation of the seven segment LEDs, one could convert the individual signals driving the LEDs to hexadecimals signals and could compare what the LEDs show to what the AXI master sent (in our case the hexadecimal value of 0xbeef). But the journey toward model checking, automatic diagnostics and testing is more interesting, and we have discussed LEDs more than enough.

    Reflection

    FPGA programming tool chains and Vivado are a hairball of design by committee, we do something this way because we did it the same way when we were young and when we used to bike ten miles to school every day (uphill both ways and into the wind). On the more positive side, these are complex tools, and they work, and people use them for their digital design.

    All that being said, our understanding of both the theory and practice of computing has improved dramatically since the mid 20-th century, and it is time to revisit these old ways of designing and implementing circuits. If we do this carefully, maybe, maybe, we will design the hardware, software, and even AI algorithms that are not shameful to write about and use.

    What is Next?

    In what follows, we will grow our demo and I will show you how crappy the actual FPGA implementation that Vivado does. We will also discuss more accurate simulations, clocks, frequencies, timing analysis, and what proper AI algorithms (not the ones everybody is discussing but the ones that are not used for fraud and deception).

    The Real Deal

    Unlike what is these days practice in most of the Silicon Valley, everything I talk about is accessible and reproducible. So here is the repository that allowed me to write this blog article and make the demo:
    https://gitlab.llama.gs/llogic_basys3

    Ceterum censeo slopem esse delendam.

    (Cato the Elder ended every speech in the Roman Senate with “Carthage must be destroyed” — regardless of the topic. This is that, but for AI slop.)

  • Teaching a Language to Think in Hierarchies

    Bitcoin miners are liquidating their holdings to pivot into AI hosting. The machines that wasted electricity producing imaginary money will now waste it producing imaginary intelligence. Anthropic has secured 3.5 gigawatts of compute — the consumption of three and a half million households — to serve language models.

    GCC compiles the entire Linux kernel in fifteen minutes on a single machine drawing 200 watts. Fifty watt-hours. A light bulb left on for an afternoon. It manages this because it is not guessing. It has a grammar, a type system, and an optimisation pipeline where every transformation preserves semantics. There is no temperature parameter. There is no “try again and hope.”

    A compiler’s cost is \(O(n \log n)\) in the size of the input. A language model’s cost is \(O(n \cdot d)\) where \(d\) is the dimensionality of a model that cannot tell you whether the answer is correct. When the task has a formal specification, you do not need gigawatts. You need a parser.

    I have been writing parsers for twenty years. Today I started improving the one that matters most: the circuit description language at the heart of llogic, qbf-designer, and the formal methods toolchain I am building at Llama Logic.

    My first encounter with a compiler was at Zend Technologies in Ramat Gan in 2000. I was twenty-two, fresh off the plane from Bulgaria, and I did not know what a parser was. Zend built the PHP language engine. I watched a small team turn a grammar into a working language that ran half the web. I did not understand how.

    A few years later, at Delft, I read the Dragon Book and took the compiler construction course of Koen Langendoen. We became friends over my many years at the university. That course turned out to be one of the most useful things I have ever learned. It is the skill that lets me write software that works — not approximately, not statistically, not when the vibes are right, but deterministically, on all inputs, by construction.

    It is also how I got into diagnosis. At the end of my master’s I went to Koen and asked for a Ph.D. position in compiler construction. He told me “compilers are passé” — but I could go work with Arjan J.C. van Gemund doing diagnostics. Arjan has since retired north to compose music, which is a better use of a fine mind than supervising Ph.D. students, though he was good at both. They needed a compiler for LyDiA, the diagnostic modelling language. So I built one. Then I built many more. Every research system I have worked on since — LyDiA, the DXC framework at NASA Ames, the synthesis tools at PARC, and now llogic — has a parser at its core. The compiler is never the point. The compiler is always the point.

    A domain-specific language is a small language built for one job. SQL is a DSL for databases. Regular expressions are a DSL for pattern matching. Makefiles are a DSL for build dependencies. You do not write an operating system in SQL. You do not query a database with a Makefile. The language fits the problem, and because it fits, it can enforce constraints that a general-purpose language cannot.

    This is the point that the vibe-coding movement misses entirely. A grammar is not a convenience. It is a contract. When I write a parser for a circuit description language, the grammar specifies exactly what constitutes a valid circuit. If you misspell a gate type, the parser rejects your input. If you connect an output to a nonexistent signal, the parser tells you. If you instantiate a module that does not exist, you get an error message with a line number — not a plausible-looking circuit that silently computes the wrong function.

    This is what determinism means in practice. The parser either accepts or rejects. There is no 95% confidence. There is no temperature. The same input produces the same result every time, on every machine, for every user. A QBF solver receiving a malformed netlist will produce garbage. A diagnosis engine receiving an inconsistent model will compute meaningless results. The parser is the gate that keeps garbage out. It costs milliwatts. It works.

    There is a second reason, less often discussed. Humans need to read these things. An engineer debugging a faulty adder needs to look at the circuit description and understand it. A reviewer verifying a synthesis result needs to confirm that the specification matches the intent. This is not a machine-to-machine format. It is a language — with the same design obligations as any language: clarity, consistency, and the ability to say exactly what you mean and nothing else.

    The circuit DSL in llogic had outgrown its grammar. The new format adds modules, arrays, imports, and arbitrary nesting. A full adder, from primitives to a 4-bit module with array slicing:

    # 4-bit ripple carry adder
    
    import "std_logic.circ"
    
    module half_adder(input a, b; output s, c):
        x: s = xor(a, b)
        a: c = and(a, b)
    end
    
    module full_adder(input a, b, ci; output s, co):
        wire f, p, q
    
        inst half_adder ha1(a=a, b=b, s=f, c=p)
        inst half_adder ha2(a=ci, b=f, s=s, c=q)
        o: co = or(p, q)
    end
    
    module adder2(input a[2], b[2], ci; output s[2], co):
        wire c0
    
        inst full_adder bit0(a=a[0], b=b[0], ci=ci, s=s[0], co=c0)
        inst full_adder bit1(a=a[1], b=b[1], ci=c0, s=s[1], co=co)
    end
    
    module adder4(input a[4], b[4], ci; output s[4], co):
        wire cm
    
        inst adder2 lo(a=a[0:1], b=b[0:1], ci=ci, s=s[0:1], co=cm)
        inst adder2 hi(a=a[2:3], b=b[2:3], ci=cm, s=s[2:3], co=co)
    end

    Four levels of nesting. Modules, arrays, slices, named connections. The flattener — a recursive tree walk, the same algorithm I used in LyDiA for system descriptions — traverses the instantiation tree and emits the flat netlist the solver has always consumed. The hierarchy is for the engineer. The solver does not know it exists.

    Sequential circuits work the same way. A 4-bit serial adder with synchronous reset:

    # 4-bit serial adder with synchronous reset
    
    module shift4(input d, rst; output q):
        wire d1, d2, d3
    
        f1: d1 = dff(d, rst)
        f2: d2 = dff(d1, rst)
        f3: d3 = dff(d2, rst)
        f4: q = dff(d3, rst)
    end
    
    module seq_adder4(input a, b, rst; output s, co):
        wire i1, i2, ci
    
        inst shift4 sa(d=a, rst=rst, q=i1)
        inst shift4 sb(d=b, rst=rst, q=i2)
        inst full_adder fa(a=i1, b=i2, ci=ci, s=s, co=co)
        c: ci = dff(co, rst)
    end

    A dff with one argument is a plain register. Two arguments: synchronous reset. This maps directly to the standard Verilog template always @(posedge clk) if (rst) q <= 0; else q <= d; — making translation between the two languages mechanical.

    So why not just use Verilog?

    Because Verilog is a simulation language that has been coerced into serving as a synthesis input. A synthesis tool reads an always block, pattern-matches the sensitivity list, and infers what is a register and what is combinational logic. The engineer writes behaviour and hopes the tool’s heuristics match their intent. In llogic, a dff is a dff. An and is an and. There is no inference. The circuit says what it is.

    This matters for formal methods. Diagnosis requires knowing exactly what components exist. Synthesis requires a precise specification of the design space. Neither tolerates a language that hides structure behind inference rules. Verilog is the right language for RTL designers who want to describe behaviour and let tools figure out the structure. Llogic is the right language when the structure is the point.

    The parser, AST, and flattener should take a few days. When they are done I will update the llogic repository on the feature/hierarchical-dsl branch.

    Three and a half million households’ worth of electricity to serve a model that cannot tell whether it is thinking deeply or not. Fifty watt-hours to compile a kernel. Considerably less to parse a circuit. The tools that work have always been quiet, small, and correct. The software will continue to not hallucinate.

    Ceterum censeo slopem esse delendam.

    (Cato the Elder ended every speech in the Roman Senate with “Carthage must be destroyed” — regardless of the topic. This is that, but for AI slop.)

    Repository: llogic

  • From DSL to FPGA: Closing the Loop

    This quarter, 4,500 CEOs told PwC their AI investments produced nothing. Separately, someone used AI to rewrite SQLite in Rust — 2,000 times slower. I have a cunning plan: what if we used computers to do actual computing?

    Last week I said I was building a toolchain that goes from formal logic to real hardware. That post was a manifesto. This one is a receipt.

    Seven days later: a full ALU — add, subtract, multiply, divide, integer factorization — running at 100MHz on a $150 Basys3 FPGA. UART command line with tab completion and history. No manual Verilog. No hand-optimized netlists. No venture capital. No pitch deck.

    The arithmetic circuits are generated programmatically in llogic, translated to synthesizable Verilog by llogic2verilog, and deployed on a MicroBlaze soft processor over AXI4-Lite. The entire path from logical specification to working silicon is automated. One person, one week, open source.

    The factor command brute-forces integer factorization by driving the multiplier at clock speed — 100 million candidates per second on a hobby board. Not a simulation. Not a testbench. Electrons moving through gates on a Xilinx Artix-7.

    Nobody wrote this Verilog

    That’s the point. Not the ALU — any undergraduate can write an adder. The point is that no human touched the HDL.

    The circuit specifications live in llogic‘s DSL — a formal representation that spans Boolean formulas, CNF, circuits, and reversible/quantum circuits under one roof. lcfgen generates parameterized circuit families from that representation. llogic2verilog translates them to synthesizable Verilog. Vivado takes it the rest of the way.

    llogic DSL → lcfgenllogic2verilog → Vivado → FPGA

    Every step automated. Every component open source. No license fees, no NDAs, no EDA vendor lock-in.

    What’s next

    If you work on synthesis, hardware, or you’re funding research — the code is open and the board costs $150.

    Next post: cryptographic circuit generators for DES and SHA, synthesized from the DSL, deployed to the FPGA. After that: an open-source architecture for SHA-1 collision hunting that makes Bitcoin’s address space look rather less comfortable. All designs public — because if the vulnerability exists, pretending otherwise is just poor manners. Any coins found can fund something useful. Clean energy. Quantum computing. Not espresso machines with a subscription model.

    Ceterum censeo slopem esse delendam.

    (Cato the Elder ended every speech in the Roman Senate with “Carthage must be destroyed” — regardless of the topic. This is that, but for AI slop.)

    Repositories: