πͺ° Super Fly! πͺ° βββββββββββββββββ September 27, 2025 Overview ββββββββ β’ Forth FPGA Synthesis for the ICE40 β’ Why an object-oriented approach? β’ The Flyweight Pattern β’ Super Fly! β’ Status of the project Background ββββββββββ β’ Forth 2023 gifted a pico-ice by Christopher Lozinski - Raspberry Pi Pico (RP2040) - 264K RAM - iCE40UP5K FPGA β - 4MB SPI Flash for CPU - 4MB SPI Flash for FPGA - 8MB low power qSPI RAM - Shared RGB LED, All RP2040 + iCE40 pins exposed - RP2040 can feed configuration to FPGA! β’ By December ported uEforth - and ability to send an FPGA image iCE40UP5K βββββββββ β’ 5280 LUTs β’ 1Mbit single port RAM β’ 120Kb dual port RAM β’ 8 x DSP blocks β’ Part of a larger family with similar structure icestorm/YoSYS ββββββββββββββ β’ Community has reverse engineered the iCE40 bitstream format! β’ Built an open source verilog β’ icepack capture fairly simple config layout https://prjicestorm.readthedocs.io/en/latest/format.html Format Challenge! βββββββββββββββββ β’ Strugged to unravel + simplify open source representation - things are represented as data instead of code β’ Realized routing is VERY COMPLEX β’ But the architecture is relatively regular! @ice40_structure.png@ice40_plb.png
iCE40 Format ββββββββββββ Layout of Config RAM is a bitmap in 4 banks: if (right_half) cram_x = bank_xoff + column_width - 1 - bit_x; else cram_x = bank_xoff + bit_x; if (top_half) cram_y = bank_yoff + (15 - bit_y); else cram_y = bank_yoff + bit_y; @ice40_banks.png
@ice40_cram.png
@ice40_spans.png
.logic_tile_bitmap Nobrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb --orrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbbbCbbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb - ... unknown bit r ... routing b ... buffer l ... logic bits o ... ColBufCtrl C ... CarryInSet N ... NegClk @tilebits.jpg
chipdb-5k.txt ------------- .net 514 0 2 sp4_h_r_0 1 2 sp4_h_r_13 2 2 sp4_h_r_24 3 2 sp4_h_r_37 4 2 sp4_h_l_37 .routing 25 30 103375 B6[11] B6[13] B7[12] 001 86500 010 98041 011 103318 100 86493 101 98040 110 98045 111 103314 .buffer 25 30 103209 B6[14] B7[14] B7[15] B7[16] B7[17] 00001 103357 00011 78827 00101 103265 00111 90387 01001 103344 01011 64134 01111 103317 10011 98042 10101 94028 10111 98431 11001 103257 11011 98176 11101 94151 11111 98441 Visualizing the Structure βββββββββββββββββββββββββ β’ Python script to decode data file β’ Visualize as a graph with graphviz? β Fail β’ Visualize in a big chart! @ice40_graphviz.png
@visualize1.png
@visualize2.png
@visualize3.png
@visualize4.png
@visualize5.png
βββββββββ βββββββββ₯ β βββββββ βββββββββ₯ LUT4 β£βββ₯ FF? β£ββββββ βββββββββ₯ β βββββββ βββββββββ₯ β βββββββββ ( optional carry ) output / βββββ€ local_gX_Y βββββ€ lutff_A/in_B βββββ€ Span4 H V RV / B T BL L / X=0..3 A=0..7 Span12 H V TR TL BL R / Y=0..7 B=0..3 Span4 H V RV / Span12 H V / Globals How to do this? βββββββββββββββ β’ Model as a Graph β’ Node for each wire β’ 26 x 32 x 8 x 4 in several variants β’ That's a lot of objects? β’ Only 264K main memory! β’ OO + Flyweight? When to OO? βββββββββββ β’ OO is often overused β’ Objects easily have impedence mismatch "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures." -- Alan Perlis β’ Requires lots packing and repacking data β’ Abstractions leak β’ But... β’ Works well for polymorphic simulations β’ Applying common interface to varied data Flyweight Design Pattern ββββββββββββββββββββββββ β’ Design Patterns: Elements of Reusable Object-Oriented Software (1994) - Gang of Four: Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides - Higher level template for problem solving β’ Flyweight Pattern β’ Separate "intrinsic" from "extrinsic" state β’ Keep one immutable flyweight per possible "intrinsic" state β’ Pass "extrinsic" state (context) in as a parameter β’ Canonical example: One object per letter Flyweight + ICE40 Routing βββββββββββββββββββββββββ β’ Intrinsic State: β’ Type of wire β’ Position within a cell β’ Extrinsic State: β’ Cell X Y β’ Selection of route β’ Observations: β’ Extrinsic state is tiny aside from route β’ Route has to be stored in CRAM bitmap β’ What if we kept mutable state in the CRAM bitmap? β’ All the state except for the route is tiny β’ Why keep a whole object on the heap? Super Fly! ββββββββββ β’ Store intrinsic state packed in a single machine word β’ Store extrinsic state in a global structure (CRAM bitmap) β’ Keep type as tag in that word instead of a VTable β’ Dynamic dispatch on the tag object ptr βββββ€ OBJECT ------ vtbl ptr βββββ€ VTABLE x ------ y .print bit .neighbors .draw [ type x y bit ] DISPATCH TABLE ( classes x methods ) -------------- .print .neighbors .draw (class 1) .print .neighbors .draw (class 2) .print .neighbors .draw (class 3) ... type = 0..9 ( 10 ) x = -10..10 ( 21 ) y = -10..10 ( 21 ) type + x * 10 + y * 10 * 21 type = 0..9 ( 10 - 4 bits ) x = -10..10 ( 21 - 5 bits ) y = -10..10 ( 21 - 5 bits ) [ type 4 | x 5 | y 5 ] 0 value classes 1 value methods 0 value dispatch 0 value implementing : flyclass create classes , 1 +to classes does> @ ; : method& ( m cls -- a ) classes mod methods * + cells dispatch + ; : accrued ( -- a ) 0 implementing method& ; : method create methods , 1 +to methods does> @ over method& @ execute ; : implementation ( cls -- ) to implementing ; : >min ( a -- n ) cell+ @ ; : >max ( a -- a ) @ ; : >below ( a -- a ) 2 cells + @ ; : >above ( a -- a) 3 cells + @ ; : field ( min max -- "name" ) create 2dup , , accrued @ , swap - 1+ accrued @ * dup , accrued ! does> >r r@ >above mod r@ >below / r> >min + ; : doput ( n o -- o "name" ) >r dup r@ >below mod swap r@ >above / r@ >above * + swap r@ >max min r@ >min - r> >below * + ; : put ( n o -- o "name" ) ' >body postpone literal postpone doput ; immediate : extension ( cls -- ) 0 swap method& accrued methods cells cmove ; : initiate here to dispatch classes 1- for classes , methods 1- 1- for ['] abort , next next ; : do:: ( o cls m -- ) swap method& @ execute ; : :: ( o cls "name" -- ) ' >body @ postpone literal postpone do:: ; immediate : m: ' >body @ :noname ; : ;m postpone ; swap implementing method& ! ; immediate 0 value classes 1 value methods 0 value dispatch 0 value implementing 0 value dispatch-mask : bits ( n -- n ) 0 begin over while 1+ swap 2/ swap repeat nip ; : bits>mask ( n -- n ) 1 swap lshift 1- ; : flyclass create classes , 1 +to classes does> @ ; : method& ( m cls -- a ) dispatch-mask and methods * + cells dispatch + ; : accrued ( -- a ) 0 implementing method& ; : method create methods , 1 +to methods does> @ over method& @ execute ; : implementation ( cls -- ) to implementing ; : >min ( a -- n ) cell+ @ ; : >max ( a -- a ) @ ; : >below ( a -- a ) 2 cells + @ ; : >mask ( a -- a) 3 cells + @ ; : field ( min max -- "name" ) create 2dup , , accrued @ , swap - bits dup bits>mask , accrued +! does> >r r@ >below rshift r@ >mask and r> >min + ; : doput ( n o -- o "name" ) >r r@ >mask r@ >below lshift invert and swap r@ >max min r@ >min - r> >below lshift or ; : put ( n o -- o "name" ) ' >body postpone literal postpone doput ; immediate : extension ( cls -- ) 0 swap method& accrued methods cells cmove ; : initiate here to dispatch classes 1- bits bits>mask to dispatch-mask classes 1- for classes 1- bits , methods 1- 1- for ['] abort , next next ; : do:: ( o cls m -- ) swap method& @ execute ; : :: ( o cls "name" -- ) ' >body @ postpone literal postpone do:: ; immediate : m: ' >body @ :noname ; : ;m postpone ; swap implementing method& ! ; immediate flyclass CramBit flyclass CramCell flyclass Output flyclass Input flyclass Input0 flyclass Input1 flyclass Input2 flyclass Input3 flyclass LocalG flyclass LocalG0 flyclass LocalG1 flyclass LocalG2 flyclass LocalG3 flyclass SpanWire flyclass Sp4HR flyclass Sp4VB flyclass Sp12HR flyclass Sp12VB flyclass NotConnected WIRE "INTERFACE" ββββββββββββββββ method .create ( <various> o -- o ) method .optionCount ( o -- n ) method .optionWire ( i o -- wire ) method .getOption ( o -- n ) method .setOption ( n o -- ) method .print ( o -- ) ROUTING ALGORITHM βββββββββββββββββ : route { src dst -- f } src dst = if -1 exit then dst .getOption { p } p if src p dst .optionWire recurse exit then dst .optionCount { n } n 0 ?do i dst .setOption src i dst .optionWire recurse if -1 unloop exit then 0 dst .setOption loop 0 ; method .getXY ( o -- x y ) method .getBit ( o -- b ) ( overloaded for wires and CramBits ) method .setBit ( b o -- ) method .inside ( x y o -- o' ) method .isLogic? ( o -- f ) method .isRam? ( o -- f ) method .isIO? ( o -- f ) method .isInside? ( o -- f ) method .listBits ( x o -- ) method .enableBit ( o -- bit ) method .setNoResetBit ( o -- bit ) method .asyncResetBit ( o -- bit ) method .carryEnableBit ( o -- bit ) method .dffEnableBit ( o -- bit ) method .setPath ( n o -- ) method .getPath ( o -- n ) method .getInput ( n o -- wire ) method .setLogic ( n o -- ) method .getLogic ( o -- n ) method .routes ( xt target o -- ) ( xt gets: bit wire ) method .walk ( xt o -- ) ( xt gets: bit wire ) initiate CramBit implementation 0 cram-bank-width 2* 1- field x 0 cram-height 1- field y m: .create ( x y o -- o ) put y put x ;m m: .print { o -- } ." CramBit(" o x . ." , " o y . ." ) " ;m m: .setBit { b o -- } b o x o y cram! ;m m: .getBit { o -- b } o x o y cram@ ;m Input0 implementation Input extension m: .print { o -- } ." Input0(" o .getXY swap . . ." , " o .getBit . ." ) " ;m m: .enableBit { o -- wire } 29 o .getBit 2* 1+ o .inside ;m m: .listBits { x o -- } 26 o .getBit 2* 1+ o .inside x execute 26 o .getBit 2* o .inside x execute 27 o .getBit 2* 1+ o .inside x execute 28 o .getBit 2* 1+ o .inside x execute ;m m: .optionWire ( i o -- wire ) $a $5 inOptWire ;m Status ββββββ β’ Implemented objects for all major wire types: Input, Output, LocalG, Span4, Span12 β’ What works: β’ Can input in synthesis language β’ Routes greedily β’ Output loads in visualizer β’ What doesn't works: β’ Routing fails pretty easily β’ No cross span routing β’ LUT4 inputs can be permuted β’ Backtracks only within one route β’ No IO Pins / PLLs / BRAMs / DSPs β’ No globals β’ Can't simulate in design β’ cram_write not used (but could be) needs ice40.fs ice40 synthesis 10 1 >locus 4 REGISTER constant v1 8 7 >locus 4 REGISTER constant v2 10 5 >locus v1 INVERT constant v1i 10 7 >locus v2 INVERT constant v2i 10 6 >locus v1i v2i XOR constant xorval 11 6 >locus v1i v2i AND constant andval ice40 storage s" out/craft.bin" save @synth1.png
@synth2.png
60 ice40_allocation.fs 37 ice40_config.fs 3 ice40.fs 406 ice40_layout.fs 175 ice40_storage.fs 75 ice40_synthesis.fs 30 flyclasses.fs 786 TOTAL What's Next? ββββββββββββ β’ Allow for more backtracking β’ More validation of routes β’ IO Pins DEMO QUESTIONSβ π Thank you!