µEforth on the pico-ice ═══════════════════════ December 16, 2023 🎁 🧊 🎁 On Forth Day 2023 Christopher Lozinski 🧊 generously gifted 🧊 me a pico-ice! 🎁 🧊 🎁 pico-ice ════════ • Raspberry Pi Pico (RP2040) + iCE40UP5K FPGA • 4MB SPI Flash for CPU • 4MB SPI Flash for FPGA • 8MB low power qSPI RAM • Shared RGB LED • All RP2040 + iCE40 pins exposed • 2 pushbuttons (one mostly for reset) • FPGA clock controlled by RP2040 • RP2040 can feed configuration to FPGA! RP2040 ══════ • 133 MHz dual ARM Cortex-M0+ • 264K SRAM • 30 GPIO pins (4 can be analog) • 2 UARTs • 2 SPI controllers • 2 I²C controllers • 16 PWM channels • USB 1.1 controller and PHY, with host and device support • 8 programmed input–output (PIO) state machines • Boots over SPI or USB PIO ═══ • 2 x Versatile programmable I/O blocks - 4 Shift registers + state machines each • Hardware can handle: - 8080 and 6800 parallel bus - I2C, 3-pin I2S - SDIO - SPI, DSPI, QSPI - UART - DVI or VGA (via resistor DAC) @pio_layout.pngPIO Model ═════════ • Two 32-bit shift registers – either direction, any shift count • Two 32-bit scratch registers • 4×32-bit bus FIFO in each direction (TX/RX), reconfigurable as 8×32 in a single direction • Fractional clock divider (16 integer, 8 fractional bits) • Flexible GPIO mapping (all 30 pins) • DMA interface, sustained throughput up to 1 word per clock from system DMA • IRQ flag set/clear/status @pio_model.png
@pio_square.png
@pio_instr.png
iCE40UP5K ═════════ • 5280 LUTs • 1Mbit single port RAM • 120Kb dual port RAM • 8 x DSP blocks @ice40-top.png
sysDSP ══════ • 8-bit x 8-bit Multiplier • 16-bit x 16-bit Multiplier • 16-bit Adder/Subtracter • 32-bit Adder/Subtracter • 16-bit Accumulator • 32-bit Accumulator • 8-bit x 8-bit Multiply-Accumulate • 16-bit x 16-bit Multiply-Accumulate @ice40-dsp.png
@ice40-io.png
@ice40-dpram.png
@ice40-spram.png
Programming the pico-ice ════════════════════════ • pico-sdk • pico-ice-sdk • Transfer via Flash drive + UF2 • I/O over one or more virtual serial devices UF2 ═══ • UF2 = USB Flashing Format • Created by Microsoft for MakeCode • 512 Blocks to align with FAT Flash drive blocks • Whole blocks arrive or not • Each block has: - magic numbers at the beginning and at the end - address where the data should be flashed - up to 476 bytes of data µEforth ═══════ • Indirect-threaded Forth • Written in C • Extensive use of X-Macros • Ports to: - ESP32 (Known as ESP32forth) - Linux/Posix - Windows - Web (via Asm.js) - pico-ice? X-MACROS ════════ • Use the C Preprocessor • Lists of data to build code • Make the caller a parameter #define PRIMITIVE_LIST \ X("+", ADD, tos += *sp; --sp) \ Y(DROP, tos = *sp; --sp) \ Y(SWAP, w = tos; tos = *sp; *sp = w) \ Y(AND, tos = tos & *sp; --sp) \ Y(OR, tos = tos | *sp; --sp) \ enum { #define X(sname, name, code) OP_ ## name, PRIMITIVE_LIST #undef X }; Porting µEforth ═══════════════ • git submodules for SDKs • Add CMake build invoked by Make • Leave out fault handling to start • New builtins.h + main.cpp • And... it didn't run... Debugging ═════════ • Confirmed basic IO samples work • Added a "sim" variant like ESP32 • Found failures in the startup before IO ready • And... it came up! Bindings ════════ • memory support • yield + ms-ticks support • general posix files • ice - ice_cram_* - ice_flash_* - ice_fpga_* - ice_led_* - ice_spi_* - ice_sram_* • pico - adc_* - TODO: MORE! Making it Faster ════════════════ • Shallow submodules • Init only the needed modules • Build only your target! • cmake -G Ninja Publishing ══════════ • Some refactoring of documentation • For now don't publish by default • Beta only for now What's Next? ════════════ • Expose more USB bindings • Expose more hardware bindings • Add BLOCKs + use FPGA Flash for it • Figure out how to build directly/faster? - Arduino tools build instead? • Synthesize for the FPGA in Forth! icestorm/YoSYS ══════════════ • Community has reverse engineered the iCE40 bitstream format! • Built an open source verilog • icepack capture fairly simple config layout @ice40-plb.png
@ice40-viewer.png
https://knielsen.github.io/ice40_viewer/ice40_viewer.html iCE40 Format ════════════ • icepack capture fairly simple config layout: if (right_half) cram_x = bank_xoff + column_width - 1 - bit_x; else cram_x = bank_xoff + bit_x; if (top_half) cram_y = bank_yoff + (15 - bit_y); else cram_y = bank_yoff + bit_y; .logic_tile_bitmap Nobrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb --orrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbbbCbbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb - ... unknown bit r ... routing b ... buffer l ... logic bits o ... ColBufCtrl C ... CarryInSet N ... NegClk @ice40-spans.png
Plan for Synthesis ══════════════════ • Keep an allocation map for LUTs + Routes • Reserve LUTs + routes greedily, provide manual words to move allocation point • Group signal in link list "bus" value BUS ( bool-wire bus -- bus ) UNBUS ( bus -- bool-wire bus ) 2LUT ( a b T -- v ) AT ( x y -- ) : AND ( a b -- v ) $8 2LUT ; : OR ( a b -- v ) $E 2LUT ; : XOR ( a b -- v ) $9 2LUT ; : HA ( a b -- s c ) 2DUP AND >R XOR R> ; : FA ( a b c -- s c ) HA >R HA R> OR ; : ADDER ( cin a b n -- s cout ) NEWBUS SWAP 0 DO >R UNBUS >R SWAP UNBUS >R FA SWAP R> SWAP R> SWAP R> BUS LOOP NIP NIP SWAP ; 1 IN PIN go 1 OUT PIN result 8 BIT REGISTER counter go counter counter ADDER ON-CLOCK counter counter UNBUS DROP ALWAYS result DEMO QUESTIONS❓ 🙏 Thank you!