πŸͺ° Super Fly! πŸͺ°
 ═══┄┄┄┄┄┄┄┄┄┄┄═══
September  27, 2025

Overview
════════
 β€’ Forth FPGA Synthesis for the ICE40
 β€’ Why an object-oriented approach?
   β€’ The Flyweight Pattern
   β€’ Super Fly!
 β€’ Status of the project

Background
══════════
β€’ Forth 2023 gifted a pico-ice by Christopher Lozinski
  - Raspberry Pi Pico (RP2040) - 264K RAM
  - iCE40UP5K FPGA β˜…
  - 4MB SPI Flash for CPU
  - 4MB SPI Flash for FPGA
  - 8MB low power qSPI RAM
  - Shared RGB LED, All RP2040 + iCE40 pins exposed
  - RP2040 can feed configuration to FPGA!
β€’ By December ported uEforth
  - and ability to send an FPGA image

iCE40UP5K
═════════
β€’ 5280 LUTs
β€’ 1Mbit single port RAM
β€’ 120Kb dual port RAM
β€’ 8 x DSP blocks
β€’ Part of a larger family with similar structure

icestorm/YoSYS
══════════════
β€’ Community has reverse engineered
  the iCE40 bitstream format!
β€’ Built an open source verilog
β€’ icepack capture fairly simple config layout
https://prjicestorm.readthedocs.io/en/latest/format.html

Format Challenge!
═════════════════
β€’ Strugged to unravel + simplify open source representation
  - things are represented as data instead of code
β€’ Realized routing is VERY COMPLEX
β€’ But the architecture is relatively regular!

@ice40_structure.png


@ice40_plb.png


iCE40 Format
════════════
Layout of Config RAM is a bitmap in 4 banks:
  if (right_half)
    cram_x = bank_xoff + column_width - 1 - bit_x;
  else
    cram_x = bank_xoff + bit_x;
  if (top_half)
    cram_y = bank_yoff + (15 - bit_y);
  else
    cram_y = bank_yoff + bit_y;

@ice40_banks.png


@ice40_cram.png


@ice40_spans.png


.logic_tile_bitmap
Nobrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
--orrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbbbCbbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
- ... unknown bit
r ... routing
b ... buffer
l ... logic bits
o ... ColBufCtrl
C ... CarryInSet
N ... NegClk

@tilebits.jpg


chipdb-5k.txt
-------------
.net 514
0 2 sp4_h_r_0
1 2 sp4_h_r_13
2 2 sp4_h_r_24
3 2 sp4_h_r_37
4 2 sp4_h_l_37
 
.routing 25 30 103375 B6[11] B6[13] B7[12]
001 86500
010 98041
011 103318
100 86493
101 98040
110 98045
111 103314

.buffer 25 30 103209 B6[14] B7[14] B7[15] B7[16] B7[17]
00001 103357
00011 78827
00101 103265
00111 90387
01001 103344
01011 64134
01111 103317
10011 98042
10101 94028
10111 98431
11001 103257
11011 98176
11101 94151
11111 98441

Visualizing the Structure
═════════════════════════
β€’ Python script to decode data file
  β€’ Visualize as a graph with graphviz? β†’ Fail
  β€’ Visualize in a big chart!

@ice40_graphviz.png


@visualize1.png


@visualize2.png


@visualize3.png


@visualize4.png


@visualize5.png


         β”Œβ”€β”€β”€β”€β”€β”€β”€β”
 ────────β”₯       β”‚  β”Œβ”€β”€β”€β”€β”€β”
 ────────β”₯ LUT4  ┣──β”₯ FF? ┣──────
 ────────β”₯       β”‚  β””β”€β”€β”€β”€β”€β”˜
 ────────β”₯       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”˜
( optional carry )

output       /  β”€β”€β”€β”€βž€  local_gX_Y  β”€β”€β”€β”€βž€  lutff_A/in_B  β”€β”€β”€β”€βž€  Span4 H V RV /
B T BL L     /            X=0..3              A=0..7              Span12 H V
TR TL BL R   /            Y=0..7              B=0..3
Span4 H V RV /
Span12 H V   /
Globals

How to do this?
═══════════════
 β€’ Model as a Graph
   β€’ Node for each wire
   β€’ 26 x 32 x 8 x 4 in several variants
   β€’ That's a lot of objects?
     β€’ Only 264K main memory!
 β€’ OO + Flyweight?

When to OO?
═══════════
 β€’ OO is often overused
   β€’ Objects easily have impedence mismatch
     "It is better to have 100 functions operate
      on one data structure than 10 functions
      on 10 data structures." -- Alan Perlis
   β€’ Requires lots packing and repacking data
     β€’ Abstractions leak
 β€’ But...
   β€’ Works well for polymorphic simulations
     β€’ Applying common interface to varied data

Flyweight Design Pattern
════════════════════════
 β€’ Design Patterns:
     Elements of Reusable
       Object-Oriented Software (1994)
   - Gang of Four:
     Erich Gamma, Richard Helm,
     Ralph Johnson, and John Vlissides
   - Higher level template for problem solving
 β€’ Flyweight Pattern
   β€’ Separate "intrinsic" from "extrinsic" state
   β€’ Keep one immutable flyweight per possible "intrinsic" state
   β€’ Pass "extrinsic" state (context) in as a parameter
   β€’ Canonical example: One object per letter

Flyweight + ICE40 Routing
═════════════════════════
 β€’ Intrinsic State:
   β€’ Type of wire
   β€’ Position within a cell
 β€’ Extrinsic State:
   β€’ Cell X Y
   β€’ Selection of route
 β€’ Observations:
   β€’ Extrinsic state is tiny aside from route
   β€’ Route has to be stored in CRAM bitmap
   β€’ What if we kept mutable state in the CRAM bitmap?
   β€’ All the state except for the route is tiny
     β€’ Why keep a whole object on the heap?

Super Fly!
══════════
 β€’ Store intrinsic state packed
   in a single machine word
 β€’ Store extrinsic state
   in a global structure (CRAM bitmap)
 β€’ Keep type as tag in that word instead of a VTable
   β€’ Dynamic dispatch on the tag

object ptr β”€β”€β”€β”€βž€  OBJECT
                   ------
                   vtbl ptr β”€β”€β”€β”€βž€  VTABLE
                   x                ------
                   y                .print
                   bit              .neighbors
                                    .draw

[ type x y bit ]
  
DISPATCH TABLE ( classes x methods )
--------------
.print .neighbors .draw  (class 1)
.print .neighbors .draw  (class 2)
.print .neighbors .draw  (class 3)
...

type = 0..9 ( 10 )
x = -10..10 ( 21 )
y = -10..10 ( 21 )
type + x * 10 + y * 10 * 21

type = 0..9 ( 10 - 4 bits )
x = -10..10 ( 21 - 5 bits )
y = -10..10 ( 21 - 5 bits )
[ type 4 | x 5 | y 5 ]

0 value classes
1 value methods
0 value dispatch
0 value implementing
   
: flyclass   create classes , 1 +to classes does> @ ;
: method& ( m cls -- a ) classes mod methods * + cells dispatch + ;
: accrued ( -- a ) 0 implementing method& ;
: method   create methods , 1 +to methods does> @ over method& @ execute ;
: implementation ( cls -- ) to implementing ;
: >min ( a -- n ) cell+ @ ;         : >max ( a -- a ) @ ;
: >below ( a -- a ) 2 cells + @ ;   : >above ( a -- a) 3 cells + @ ;
: field ( min max -- "name" )
   create 2dup , , accrued @ , swap - 1+ accrued @ * dup , accrued !
   does> >r r@ >above mod r@ >below / r> >min + ;
: doput ( n o -- o "name" ) >r dup r@ >below mod swap r@ >above / r@ >above * +
                            swap r@ >max min r@ >min - r> >below * + ;
: put ( n o -- o "name" ) ' >body postpone literal postpone doput ; immediate
: extension ( cls -- ) 0 swap method& accrued methods cells cmove ;
: initiate   here to dispatch
             classes 1- for classes , methods 1- 1- for ['] abort , next next ;
: do:: ( o cls m -- ) swap method& @ execute ;
: :: ( o cls "name" -- ) ' >body @ postpone literal postpone do:: ; immediate
: m:   ' >body @ :noname ;
: ;m   postpone ; swap implementing method& ! ; immediate

0 value classes
1 value methods
0 value dispatch
0 value implementing
0 value dispatch-mask
  
: bits ( n -- n ) 0 begin over while 1+ swap 2/ swap repeat nip ;
: bits>mask ( n -- n ) 1 swap lshift 1- ;
: flyclass   create classes , 1 +to classes does> @ ;
: method& ( m cls -- a ) dispatch-mask and methods * + cells dispatch + ;
: accrued ( -- a ) 0 implementing method& ;
: method   create methods , 1 +to methods does> @ over method& @ execute ;
: implementation ( cls -- ) to implementing ;
: >min ( a -- n ) cell+ @ ;         : >max ( a -- a ) @ ;
: >below ( a -- a ) 2 cells + @ ;   : >mask ( a -- a) 3 cells + @ ;
: field ( min max -- "name" )
   create 2dup , , accrued @ , swap - bits dup bits>mask , accrued +!
   does> >r r@ >below rshift r@ >mask and r> >min + ;
: doput ( n o -- o "name" ) >r r@ >mask r@ >below lshift invert and
                            swap r@ >max min r@ >min - r> >below lshift or ;
: put ( n o -- o "name" ) ' >body postpone literal postpone doput ; immediate
: extension ( cls -- ) 0 swap method& accrued methods cells cmove ;
: initiate   here to dispatch
             classes 1- bits bits>mask to dispatch-mask
             classes 1- for
               classes 1- bits , methods 1- 1- for ['] abort , next
             next ;
: do:: ( o cls m -- ) swap method& @ execute ;
: :: ( o cls "name" -- ) ' >body @ postpone literal postpone do:: ; immediate
: m:   ' >body @ :noname ;
: ;m   postpone ; swap implementing method& ! ; immediate

flyclass CramBit
flyclass CramCell
  flyclass Output
    flyclass Input
      flyclass Input0
      flyclass Input1
      flyclass Input2
      flyclass Input3
  flyclass LocalG
    flyclass LocalG0
    flyclass LocalG1
    flyclass LocalG2
    flyclass LocalG3
  flyclass SpanWire
    flyclass Sp4HR
    flyclass Sp4VB
    flyclass Sp12HR
    flyclass Sp12VB
flyclass NotConnected

WIRE "INTERFACE"
════════════════
method .create ( <various> o -- o )
method .optionCount ( o -- n )
method .optionWire ( i o -- wire )
method .getOption ( o -- n )
method .setOption ( n o -- )
method .print ( o -- )

ROUTING ALGORITHM
═════════════════
: route { src dst -- f }
  src dst = if -1 exit then
  dst .getOption { p }
  p if src p dst .optionWire recurse exit then
  dst .optionCount { n }
  n 0 ?do
    i dst .setOption
    src i dst .optionWire recurse if -1 unloop exit then
    0 dst .setOption
  loop 0
;

method .getXY ( o -- x y )
method .getBit ( o -- b ) ( overloaded for wires and CramBits )
method .setBit ( b o -- )
method .inside ( x y o -- o' )
method .isLogic? ( o -- f )
method .isRam? ( o -- f )
method .isIO? ( o -- f )
method .isInside? ( o -- f )
method .listBits ( x o -- )
method .enableBit ( o -- bit )
method .setNoResetBit ( o -- bit )
method .asyncResetBit ( o -- bit )
method .carryEnableBit ( o -- bit )
method .dffEnableBit ( o -- bit )
method .setPath ( n o -- )
method .getPath ( o -- n )
method .getInput ( n o -- wire )
method .setLogic ( n o -- )
method .getLogic ( o -- n )
method .routes ( xt target o -- ) ( xt gets: bit wire )
method .walk ( xt o -- ) ( xt gets: bit wire )

initiate
  
CramBit implementation
  0 cram-bank-width 2* 1- field x
  0 cram-height 1- field y
  m: .create ( x y o -- o ) put y put x ;m
  m: .print { o -- } ." CramBit(" o x . ."  , " o y . ." ) " ;m
  m: .setBit { b o -- } b o x o y cram! ;m
  m: .getBit { o -- b } o x o y cram@ ;m

Input0 implementation Input extension
  m: .print { o -- } ." Input0(" o .getXY swap . . ."  , " o .getBit . ." ) " ;m
  m: .enableBit { o -- wire } 29 o .getBit 2* 1+ o .inside ;m
  m: .listBits { x o -- } 26 o .getBit 2* 1+ o .inside x execute
                          26 o .getBit 2*    o .inside x execute
                          27 o .getBit 2* 1+ o .inside x execute
                          28 o .getBit 2* 1+ o .inside x execute ;m
  m: .optionWire ( i o -- wire ) $a $5 inOptWire ;m

Status
══════
β€’ Implemented objects for all major wire types:
    Input, Output, LocalG, Span4, Span12
β€’ What works:
  β€’ Can input in synthesis language
  β€’ Routes greedily
  β€’ Output loads in visualizer
β€’ What doesn't works:
  β€’ Routing fails pretty easily
    β€’ No cross span routing
    β€’ LUT4 inputs can be permuted
    β€’ Backtracks only within one route
  β€’ No IO Pins / PLLs / BRAMs / DSPs
  β€’ No globals
  β€’ Can't simulate in design
  β€’ cram_write not used (but could be)

needs ice40.fs
ice40 synthesis
 
10 1 >locus
4 REGISTER constant v1
8 7 >locus
4 REGISTER constant v2
 
10 5 >locus
v1 INVERT constant v1i
10 7 >locus
v2 INVERT constant v2i
 
10 6 >locus
v1i v2i XOR constant xorval
11 6 >locus
v1i v2i AND constant andval
 
ice40 storage
s" out/craft.bin" save

@synth1.png


@synth2.png


   60 ice40_allocation.fs
   37 ice40_config.fs
    3 ice40.fs
  406 ice40_layout.fs
  175 ice40_storage.fs
   75 ice40_synthesis.fs
   30 flyclasses.fs
  786 TOTAL

What's Next?
════════════
β€’ Allow for more backtracking
β€’ More validation of routes
β€’ IO Pins

DEMO

QUESTIONS❓
    πŸ™
 Thank you!