Designing and Configuring Custom, Ultra-Low Power FPGAs

Seyi Ayorinde
University of Virginia
February 17th, 2015
Motivation: Low-power sensors in Ubiquitous Computing

- Requirements
  - Low Power/Energy Consumption
  - Substantial Processing Capability
  - Flexible Hardware
  - Low Development and Deployment Cost

http://www.valencell.com/blog/2013/12/wearable-technology-all-about-people

https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcCNI8XqlhdhnSVVyp4y1A4MnKeoVOVFLoMoVQtqyQT-wQR6i_y7
Current Options

- Build System w/ Commercial-Off-The-Shelf (COTS) parts
  - Flexible, but too high power consumption and size
- Build ultra-low power (ULP) SoCs
  - Efficient and powerful, but inflexible

**Problem** – neither option of these options fulfill all of the requirements pervasive low-power sensing

**Solution** – design of ULP Field Programmable Gate Arrays (FPGAs) for balance between efficiency and flexibility
Outline

- Motivation
  - Ultra-Low Power FPGAs
  - FPGA Background
  - Custom-FPGA Design
- Thrust 1: FPGA Sub-Circuit Design Exploration
- Thrust 2: FPGA Architecture Re-examination
- Thrust 3: RCGC – Reconfigurable Circuit Generation and Configuration
- Thrust 4: Embedded FPGAs in ULP SoCs
- Timeline & Publications
- High Level Impact
FPGA Background

Connection Boxes (CBs)

Configurable Logic Blocks (CLBs)

Global Interconnect

Switch Boxes (SBs)
Motivation – Custom-FPGA Design

- Circuit-level and architectural optimizations for ULP FPGAs need to be tested at the system-level
  - Build full FPGA schematic
  - Configure FPGA schematic

- Problems
  - Building FPGA schematics by hand is infeasible
    - # of transistors
    - # of design knobs
  - No tools for configuration
    - Commercial tools only work for specific hardware
    - Open-source tools are abstractions of FPGA mappings, not configuration bit locations (VTR)
Proposed Solution

- **Toolflow – Reconfigurable Circuit Generation and Configuration (RCGC)**
  - Generate schematics of FPGA fabrics
  - Generate configurations for schematics
Thesis Statement(s)

- ULP FPGAs combine efficiency, flexibility, and computing capability to create a single, low-cost platform for ULP applications.
- ULP FPGA fabrics can also serve as small IP-blocks to create flexibility and low-overhead testability in ULP SoCs.
- Extending FPGA mapping tools to generate configurations and schematics for custom-FPGA fabrics allow thorough design verification and validation.
## ULP FPGAs in Industry/Academia

<table>
<thead>
<tr>
<th>FPGA</th>
<th>Size (# of LUTs)</th>
<th>Power (µW)</th>
<th>Configuration Bit Topology</th>
<th>Frequency (MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Lattice iCE40(^1)</td>
<td>384-7680</td>
<td>Static: 21-250</td>
<td>SRAM</td>
<td>275</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Active: just ↓ 1k(^7,8)</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Microsemi IGLOO nano(^1)</td>
<td>100-3000</td>
<td>Static: 2 Active: 400(^6)</td>
<td>Flash</td>
<td>160-250</td>
</tr>
<tr>
<td>Ryan et al [6](^2)</td>
<td>1134</td>
<td>Static: ~35(^3,4) Active: ~12.5(^3,4)</td>
<td>5T-SRAM</td>
<td>~33(^3)</td>
</tr>
<tr>
<td>Grossmann et al [7](^2)</td>
<td>128</td>
<td>Static: 8.9 Active: 34.6</td>
<td>6T Latch</td>
<td>16.7</td>
</tr>
<tr>
<td>Tuan et al [8](^2)</td>
<td>1500-15000</td>
<td>Static: 46-460 Active: 13k-130k</td>
<td>SRAM</td>
<td>244(^5)</td>
</tr>
</tbody>
</table>

1. Commercial ULP FPGAs
2. Academic ULP FPGAs
3. Estimated from plots in the paper
4. Simulation result of 780 LUTs
5. Reported approx. 27% reduction from Xilinx Spartan-3
6. Obtained from Microsemi Power Calculator worksheet
7. Mid-range iCE40 model
8. From news article in EE times: Ultra-low power FPGAs enable always-on sensor solutions for context-aware mobile apps
Outline

- Motivation
- Background
- Thrust 1: FPGA Sub-Circuit Design Exploration
- Thrust 2: FPGA Architecture Re-examination
- Thrust 3: RCGC – Reconfigurable Circuit Generation and Configuration
- Thrust 4: Embedded FPGAs in ULP SoCs
- Timeline
- Publications
Motivation: FPGA Sub-Circuit Exploration

- **Problem:** FPGAs overlooked for ULP applications
  - High overhead for flexibility

- **Research Question:** How can we redesign the circuit elements in FPGAs to minimize power consumption, while still providing adequate functionality and performance for ULP applications?
Approach:
FPGA Sub-Circuit Exploration

For each sub-circuit

Survey subckts → Untested? → YES → MC sims

New optimal design?

NO

Compare subckts → NO

Any new opt’s? → YES → Build FPGA w/ RCGC

Full-FPGA Sims

Better than state-of-the-art?

NO

Recommend existing subckts → NO

YES

Recommend new subckts
Knobs:
FPGA Sub-Circuit Exploration

- Circuit topology
  - Routing switches: pass gate, buffer, etc.
  - CLBs: intra-CLB connectivity
  - Configuration bits: SRAMs, latches, etc.
- Operating voltage
- Transistor type
  - High $V_T$, etc.
- Transistor sizing
- Path length (for routing switches)
Metrics of Importance: FPGA Sub-Circuit Exploration

- Area
- Power consumption
- Energy consumption
- Robustness
  - Process, voltage, and temperature (PVT) variations
- Routeability (for CLBs)
- Hold Margin (for configuration bits)
- Retention Voltage (for configuration bits)
CLB Topology Exploration

- **Mux-Based CLB**
  - Standard practice for FPGAs
  - Knob – depopulation

- **Mini-FPGA CLB**
  - Use FPGA-style connectivity for the CLB to connect BLEs
  - Knob – channel width

VPR version 5.0 manual

Ryan et al CICC ’10
Preliminary Results: Area CLB Topology Exploration

Small N - Mux-based CLBs minimize area

Large N - Mini-FPGA CLBs minimize area
Preliminary Results: Area CLB Topology Exploration

<table>
<thead>
<tr>
<th>Channel Width</th>
<th>Break Even Points @ Different Depopulation %'s</th>
<th>K = 4</th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td>0%</td>
<td>50%</td>
<td>66%</td>
</tr>
<tr>
<td>2</td>
<td>Always Less</td>
<td>N = 4</td>
<td>N = 5</td>
<td>N = 6</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td></td>
<td>N = 3</td>
<td>N = 8</td>
<td>N = 11</td>
<td>N = 14</td>
</tr>
<tr>
<td>6</td>
<td></td>
<td>N = 6</td>
<td>N = 11</td>
<td>N = 16</td>
<td>N = 22</td>
</tr>
<tr>
<td>8</td>
<td></td>
<td>N = 9</td>
<td>N = 15</td>
<td>N = 23</td>
<td>N = 29</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Channel Width</th>
<th>Break Even Points @ Different Depopulation %'s</th>
<th>K = 6</th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
<td>0%</td>
<td>50%</td>
<td>66%</td>
</tr>
<tr>
<td>2</td>
<td>Always Less</td>
<td>N = 2</td>
<td>N = 3</td>
<td>N = 4</td>
<td></td>
</tr>
<tr>
<td>4</td>
<td></td>
<td>N = 2</td>
<td>N = 4</td>
<td>N = 5</td>
<td>N = 8</td>
</tr>
<tr>
<td>6</td>
<td></td>
<td>N = 4</td>
<td>N = 6</td>
<td>N = 7</td>
<td>N = 11</td>
</tr>
<tr>
<td>8</td>
<td></td>
<td>N = 4</td>
<td>N = 6</td>
<td>N = 14</td>
<td>N = 16</td>
</tr>
</tbody>
</table>
Contributions: FPGA Sub-Circuit Exploration

- Survey of different techniques for design of FPGA sub-circuits for ULP operation
  - Configuration Bits
  - Routing switches
  - Configurable Logic Blocks (CLBs)
- Design space exploration across circuit-level and architectural knobs
- Recommendations for circuit-level optimizations for ULP FPGA design
Outline

- Motivation
- Background
- Thrust 1: Circuit Design Exploration
- **Thrust 2: FPGA Architecture Re-examination**
- Thrust 3: RCGC – Reconfigurable Circuit Generation and Configuration
- Thrust 4: Embedded FPGAs in ULP SoCs
- Timeline & Publications
- High Level Impact
Motivation: FPGA Architecture Re-examination

- **Problem:** Driving force for FPGA design in industry is performance
  - GHz performance
  - ULP applications - Low performance requirements (kHz – MHz)

- **Research Question:** How does the optimal FPGA architecture change with a different set of primary metrics, namely area and power consumption?
Approach: FPGA Architecture Re-examination
Knobs: FPGA Architecture Re-examination

- Intra-CLB architecture \((k, N)\)
- Channel width \((W)\)^*
- Channel Fanout (FC)
  - Different for CLB inputs, CLB outputs, and I/O blocks
- Segment Length \((L)\)
  - Commercial FPGAs – distributions of \(L\)
- Uni- vs. bi-directionality of interconnect wires
Metrics of Importance: FPGA Architecture Re-examination

- VTR Exploration
  - Channel Utilization
  - FPGA Size
  - Routing, Logic, and Total Area
  - Power consumption
  - Channel Width

- Simulation of generated FPGAs
  - Leakage Power
  - Total Power
  - Area
  - Energy/Op
Contributions: FPGA Architecture Re-examination

- Thorough design space exploration of FPGA architectures across different knobs
- Recommendations for architecture parameters for ultra-low power FPGA design
- Both CAD- and simulation-based exploration
- Simulated comparisons of proposed architectures w/ current commercial and academic FPGA architectures
Outline

- Motivation
- Background
- Thrust 1: Circuit Design Exploration
- Thrust 2: FPGA Architecture Re-examination
- Thrust 3: RCGC – Reconfigurable Circuit Generation and Configuration
- Thrust 4: Embedded FPGAs in ULP SoCs
- Timeline & Publications
- High Level Impact
Motivation: RCGC

- **Research Question:** How can we extend available FPGA mapping tools to incorporate circuit-level parameters and configuration?
Approach: RCGC
Current Progress: RCGC

- Benchmark Circuit (.v)
  - VTR Flow
    - Virtual Mapping
    - Bitstream Generator
  - Architecture File Generator
    - Architecture File (.xml)
  - WL/BL Map (.txt)
    - Configuration Bitstream
    - Simulation Initial Condition Statements
    - Simulation Files
  - Parameters (Architecture and Circuit-level)
    - WL/BL Map Generator
    - Schematic Generator
      - FPGA Schematic
    - ED Curves

- Completed
- Not Completed
- In Progress
- Other student
Contributions: RCGC

- Generates FPGA schematic from set of circuit-level and architectural parameters
- Enables rapid design space exploration (circuit-level & architecture)
- Generates configurations for custom-FPGAs
  - Initial conditions and configuration bitstream
- Enables architectural and circuit-level co-optimizations for full custom-FPGA
Outline

- Motivation
- Background
- Thrust 1: Circuit Design Exploration
- Thrust 2: FPGA Architecture Re-examination
- Thrust 3: RCGC – Reconfigurable Circuit Generation and Configuration
- Thrust 4: Embedded FPGAs in ULP SoCs
- Timeline & Publications
- High Level Impact
Motivation: Embedded FPGAs in ULP SoCs

- **Problem:** ULP SoCs are effective, low-power solutions, but are inflexible and costly to update

- **Research Question:** Can embedding FPGA fabric in ULP SoCs improve flexibility while keeping the power consumption low enough to maintain ULP functionality?
Approach: Embedded FPGAs in ULP SoCs
Metrics of Importance: Embedded FPGAs in ULP SoCs

- FPGA Size
- Power Consumption
- Energy Consumption
- Testability
  - Resources necessary for node BIST
Contributions: Embedded FPGAs in ULP SoCs

- Body Sensor Node (BSN) algorithm implementations on ULP FPGA fabric
- Comparison between ASIC and FPGA implementations for BSN algorithms
- Recommendation of feasibility for FPGA implementation on ULP SoCs
- FPGA implementation of test structures for ULP SoCs
Outline

- Motivation
- Background
- Thrust 1: Circuit Design Exploration
- Thrust 2: FPGA Architecture Re-examination
- Thrust 3: RCGC – Reconfigurable Circuit Generation and Configuration
- Thrust 4: Embedded FPGAs in ULP SoCs
- Timeline & Publications
- High Level Impact
## Timeline

<table>
<thead>
<tr>
<th>Research Thrust</th>
<th>#</th>
<th>Task Description</th>
<th>Status</th>
<th>Related Publications</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Background</strong></td>
<td>1</td>
<td>Characterization of Commercial LP FPGAs</td>
<td>March '15</td>
<td>[OAA2]</td>
</tr>
<tr>
<td></td>
<td>2</td>
<td>Initial Routing Switch Exploration</td>
<td>Completed</td>
<td></td>
</tr>
<tr>
<td></td>
<td>3</td>
<td>Initial Sense Amp exploration</td>
<td>Completed</td>
<td></td>
</tr>
<tr>
<td><strong>Circuit-Level</strong></td>
<td>4</td>
<td>CLB Simulations</td>
<td>February '15</td>
<td>[OAA3]</td>
</tr>
<tr>
<td>Optimization</td>
<td>5</td>
<td>5T Bitcell Testing</td>
<td>March '15</td>
<td></td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>Revisited Routing Switch Sims</td>
<td>April '15</td>
<td>[OAA4] [OAA5]</td>
</tr>
<tr>
<td></td>
<td>7</td>
<td>Configuration Bit Simulations</td>
<td>May '15</td>
<td></td>
</tr>
<tr>
<td></td>
<td>8</td>
<td>Small FPGA/Test Structure Tapeout</td>
<td>Summer '15</td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>Chip Testing</td>
<td>December '15</td>
<td></td>
</tr>
<tr>
<td><strong>Architecture</strong></td>
<td>1</td>
<td>Initial Architecture Exploration</td>
<td>Completed</td>
<td>[OAA6]</td>
</tr>
<tr>
<td>Optimization</td>
<td>2</td>
<td>Characterization of FPGA sub-circuits for VTR</td>
<td>August '15</td>
<td></td>
</tr>
<tr>
<td></td>
<td>3</td>
<td>FPGA Architecture Design Space Exploration (using VTR)</td>
<td>September '15</td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>FPGA Architecture Simulations</td>
<td>October '15</td>
<td></td>
</tr>
<tr>
<td><strong>RCGC Toolflow</strong></td>
<td>1</td>
<td>Finish Architecture File Generator</td>
<td>Completed</td>
<td>[OAA7]</td>
</tr>
<tr>
<td></td>
<td>2</td>
<td>Finish Schematic Generator</td>
<td>February '15</td>
<td></td>
</tr>
<tr>
<td></td>
<td>3</td>
<td>Finish Bitstream Generator</td>
<td>Completed</td>
<td></td>
</tr>
<tr>
<td></td>
<td>4</td>
<td>Finish Toolflow Wrapper</td>
<td>March '15</td>
<td></td>
</tr>
<tr>
<td></td>
<td>5</td>
<td>Proof of concept simulations</td>
<td>April '15</td>
<td></td>
</tr>
<tr>
<td><strong>Embedded FPGA Fabric</strong></td>
<td>1</td>
<td>Determine algorithms for embedded FPGA</td>
<td>December '15</td>
<td>[OAA8]</td>
</tr>
<tr>
<td></td>
<td>2</td>
<td>Comparison of FPGA vs. ASIC implementations</td>
<td>January '16</td>
<td></td>
</tr>
<tr>
<td></td>
<td>3</td>
<td>Feasibility analysis for embedded FPGAs</td>
<td>February '16</td>
<td></td>
</tr>
</tbody>
</table>
Publications

Completed:

Planned:
2. Dynamic power consumptions in commercial ULP FPGAs
3. Using FPGA-style Local Interconnect in CLBs for Low-Power FPGAs
4. Exploring routing switch topologies for ULP FPGA interconnects
5. Configuration Bits for ULP FPGAs
6. A new architecture for Sub-mW FPGAs
7. RCGC: A toolflow for generating custom FPGA schematics and configurations
8. Feasibility Analysis of Embedded FPGAs for ULP SoCs
High Level Impact

Current State

- Limited options for ULP FPGAs
- Inability to configure custom-FPGAs
- Infeasible for FPGA-level design space exploration
- Inflexibile ULP SoCs

Future State

- In-depth circuit and architectural exploration of ULP FPGA fabrics
- Recommendations for FPGAs as sole, low-cost solutions for low power sensors
- RCGC – enabling rapid, thorough design space exploration
- Feasibility analysis of embedded FPGAs in ULP SoCs
References


References


Thank you!
Backup Slide: Prior Work in ULP FPGA Sub-Circuits

- Anderson et al [9] – Interconnect routing switches
  - Lower power by adding sleep modes to routing buffers
- Grossmann et al [7] – Compared configuration bit topologies
  - Suggested 6T latches (no ratio’d circuits)
- Tuan et al [8] – uses mid-oxide high-$V_T$ devices
Backup Slide: Prior Work in FPGA Architecture Analysis

  - K = 4-6, N = 3-10 → best area-delay product (ADP)
- Li et al [3] – optimize k, N, L, and switch topology for power minimization
  - K = 4 minimizes power, N = 12 minimizes power and power-delay product
  - High frequency: unidirectional → lower energy
  - Low frequency: bidirectional → lower energy
Backup Slide: Prior Work in Custom-FPGA toolflows

- DAGGER – Extension of Virtual Place-and-Route (VPR)
  - Designed to configure specific device
- Soni et al – Open source bitstream generation tool
  - Designed for use on existing FPGA devices
- XBits
  - Bitstream generation for custom FPGA using XML format
Backup Slide:
Determining algorithms for FPGA

- Specific algorithms for different applications