Presentation at Northeastern University, November 2008
It's Not The Tools, It's the Language
The Effective Depreciation of RTL
through Bluespec SystemVerilog
Previously: "An Individual Engineer's Opinion of the
Most Usefully-
Disruptive Change to the Business of Circuit Design since RTL"
Agenda
- Motivation
- Landscape: RTLs and ESLs
- Language Features
- Target Independence
- Exploration, Productivity and Correctness
- Abstract Inside
- Platform Blue
- Q&A
Motivation
- Digital Systems Design: Need to do more with more, in less time...
- "The Productivity Gap"
- Evolution of schematics, netlists, PALs, ASICs, FPGAs, RTLs...
- Now what?
Productivity
Time-to-Proficiency --> Time-to-Solution
Correctness and Scalability
Enjoyable Practice: Innovation can be fun!
Let the compiler do the mundane, error-prone parts
RTLs
- VHDL and Verilog have been around for a long time System Verilog (SV) more contemporary
- Established and Stable: are not going to go away (or change much)
- Can describe behavior and/or structure
- An excellent intermediate platform (stable and sufficient)
- Think of RTL as a technology-independent "byte code"
- Lack contemporary software-language features
- Will that signal/variable/wire be a register's output?
- RTLs infer registers based on assignments from clocked, concurrent processes
- Clock cycle-based abstraction scales poorly
- Fine for simple circuits in isolation
- System-level reasoning about clocks can be challenging
RTL Balance Sheet
- Pros
- Established and stable
- Can express behavior or structure
- Todays mainstream design expression for FPGA/ASIC
- Sequential, imperative process semantics for verification
- Cons
- Lack contemporary (software) language features
- Abstracting beyond the clock-cycle (e.g. TLM) is difficult
- Not so concise (perhaps 5x as may source lines vs. 'C')
- Sequential, imperative expression difficult to synthesize
ESL: "C to Gates"
-
Most C to Gates ESLs are CDFG elaborators
-
Function signature translated to circuit inputs and outputs
-
Constraint-driven CDFG parallelization and scheduling
-

C-to-Gates ESL Balance Sheet
- Pros
- Familiar syntax (C or C-like)
- Familiar behavioral expression
- Cons
- (Possibly) Non-ANSI language features (e.g. pragmas)
- Difficult to predict FMAX and Area (historically poor)
- A sequential, imperative expression of a concurrent behavior...
- C-to-gates does not bring "scalable atomicity" any more than an imperative language supports threads. In some cases may propagate "The Problem with Threads" [Lee2006] into the hardware camp.
Sidebar: Spatial v. Temporal

[DeHon2000]
- Digital design is a trade-space balancing act
- Revisit your motivation for a h/w vs. s/w solution
- Most FPGA applications require some spatial expression to achieve sufficient speedup over common (x86) ISAs
Segue and Analogy
- If not RTL, then what?
And why? - "balance" and "simplicity" often matter
- Just because a new technology excels at something doesn't make it necessary
- RTL is certainly good enough for many things;
like C is good enough for Hello, world. - But what if you have a tough problem that needs to scale?
A Sudoku solver, for example: 'C', or your favorite OOP-lang?
Problem: Abstraction and Detail
- How does one reconcile the seemingly conflicting objectives of
A) high-level, abstract, scalable, component-based design
with
D) complete control over every single bit of state in the system? - Why? Both are important concerns!
A) Correct-by-Construction, Scalability, Reuse
D) FMAX and Area (LUTs, FFs) matter; µArchitecture matters
(Note: Quick dive into the deep end, hold on )
Solution: Scalable Atomicity (1/2)
- Scalable Atomicity is the ability to reason about discrete, step-wise, "atomic" behaviors independent of scale
- Consider what actions govern the change of state of the following circuit
elements:
A single flip-flop
A counter
A FIFO
A completion buffer
A endpoint or root-complex
A chip, board, or systems level aggregation of these and other circuits - RTL View: Clock-cycle reasoning is a super-linear effort
With more bits, modules, and interactions; comes an exponentially greater burden on the design and verification
Solution: Scalable Atomicity (2/2)
- Scalable Atomicity in a design language is the ability to describe
atomic, state-changing, rule-step actions at any level of hierarchy
Updating the state of a flip-flop
ENQ or DEQ from a FIFO
An interface transaction
A system-level transaction - Rules either execute completely ("fire") or not, consequently
Behavior (and changes to behavior) are concise
Complex interactions are no-longer intractable
(We dove into the deep-end, now back to basics )
Bluespec SystemVerilog (BSV)
- Research in term rewriting systems (TRS), λ-calculus, and functional programming helped set the stage for BSV
- Bluespec Inc. sells a compiler that translates the BSV language into technology-agnostic IEEE Verilog RTL
- BSV looks similar to SV syntax
Uses as much SV syntax as practical
No procedural always@(posedge CLK) blocks (rules instead)
Interfaces are composed of methods (rule semantics again)
Rules
- Rules make it easier to reason about segments of code that can change the state of a circuit
- They either FIRE or they dont all or nothing --> Atomic!
- Rules can have an explicit conditions, like a predicate
- And implicit conditions, such as the readiness of a method within the rule's context
- The BSV compiler generates a deterministic schedule to resolve rule and resource conflicts
Atomicity and Energy Efficiency
- Consider any sequential circuit that has a requirement for computational energy efficiency
- If you could wrap-up the costly operation in a rule, which only "fires" when there is real work to be done...
- Reasoning about energy efficiency might be easier
If you actually produce a circuit that is more or less efficient is up to your design - Generally, expect some improvement
Easy to apply VSS or VDD when not fired
BSV Language
"... there are two ways of constructing a design: One way is to make
it so simple that there are obviously no deficiencies and the other
way is to make it so complicated that there are no obvious deficiencies.
The first method is far more difficult.
Tony Hoare, 1980 ACM Turing Award Lecture
- BSV brings many of the features of its Haskell-based roots
It is concise
It has powerful type-classes (polymorphic interfaces)
It is functional programming with explicit state
As a result, initial language "Time-to-Proficiency" issue for some
(almost) Everything Synthesizes
- Unlike RTL, BSV does not have a "synthesizable subset"
- If the interface can be ripped to bits, it can be synthesized
Minor exceptions ($display calls, file I/O) - Allowing the designer to easily perform unit-level regressions as part
of the edit-compile-build cycle provides:
Reduction in the number of unit-level defects
Continuous feedback on FMAX and area
A comfort zone "sandbox" in which to explore
Target Substrate Independence
- The BSV compiler emits technology-agnostic RTL
- If you create a vector of registers and incrementally nudge up the width and depth of the register file, the RTL will look substantially the same...
- ...but if you observe the technology mapping from the RTL synthesis tool { Synplify | XST | Quartus }, you will see the implementation change from discrete registers, to distributed ram, to BRAM or MRAM
- BSV source codes are insulated from substrate-specific features, unless they wish to expose them
Exploration, Productivity, and Correctness
- Beyond any specific feature, BSV allows the digital circuit designer to
easily explore and measure alternatives With RTL, you often have so much
"housekeeping", that a specific µArchitecture has all but
been pre-selected from the start you get what you get
With BSV, I find myself frequently exploring different architectures for the same module and often gaining valuable insights I would otherwise have motored past - BSV allows the designer to separate correct "function"
from correct "performance"
Verification Impact: Fewer crosscutting concerns
Abstract Inside
- Get-Put semantics allow SDF/KPN behavior without presupposing any particular (e.g. FIFO) implementation
- Core IP may be coded agnostic to any particular protocol
- Healthy exercise to help precipitate the designer's distinction of interface and implementation characteristics
- Still more helpful orthogonalization
Adjust interfaces separately from implementation
Verification: Test interfaces separately from implementation - Late-stage ECO and tech refresh costs lowered
Platform Blue
- Interoperability and performance test platform
- Common BSV source on different vendors FPGA silicon
- Measure latency and throughput in different deployment scenarios

[Siegel2008]
3-Node V5 ML555 Scale-Up

[Siegel2008]
Learning More
- The materials on the Bluespec website [BSV] are good
- The 13-lecture, 3-day BSV training is much better
Appreciate the lectures and read the "reference-guide" - MIT teaches BSV in Digital
Systems 6.375
Online courseware and projects - Actually solving your own (sufficiently hard) problem is best
Homework Question
- Give one way in which describing a design in Bluespec is superior to writing RTL level code. And give one way in which describing a design at the RTL level is superior to Bluespec.
Thank you!
Shepard Siegel, CTO
Atomic Rules LLC
Acknowledgements
These companies (alphabetically) have provided support for this work:
Altera fpga dev kit and Quartus software
Bluespec BSV compiler and Bluespec workstation
Mentor ModelSim simulator
PLX -PCIe switch models
Synopsys Synplicity/SynplifyPro synthesis
Xilinx fpga dev kit and ISE software
References
[BSV] Bluespec Inc. www.bluespec.com
[Lee2006] The
Problem With Threads
[DeHon2000] The
Density Advantage of Configurable Computing, Andre DeHon, IEEE Computer
April 2000
[Rishiyur2007] Bluespec
Sudoku
[Siegel2008] Authors blog www.atomicrules.blogspot.com/
![]()
Download the full presentation (413K PDF).