µEforth on the pico-ice
═══════════════════════
December 16, 2023
🎁 🧊 🎁
On Forth Day 2023
Christopher Lozinski
🧊 generously gifted 🧊
me a pico-ice!
🎁 🧊 🎁
pico-ice
════════
• Raspberry Pi Pico (RP2040) + iCE40UP5K FPGA
• 4MB SPI Flash for CPU
• 4MB SPI Flash for FPGA
• 8MB low power qSPI RAM
• Shared RGB LED
• All RP2040 + iCE40 pins exposed
• 2 pushbuttons (one mostly for reset)
• FPGA clock controlled by RP2040
• RP2040 can feed configuration to FPGA!
RP2040
══════
• 133 MHz dual ARM Cortex-M0+
• 264K SRAM
• 30 GPIO pins (4 can be analog)
• 2 UARTs
• 2 SPI controllers
• 2 I²C controllers
• 16 PWM channels
• USB 1.1 controller and PHY,
with host and device support
• 8 programmed input–output (PIO)
state machines
• Boots over SPI or USB
PIO
═══
• 2 x Versatile programmable I/O blocks
- 4 Shift registers + state machines each
• Hardware can handle:
- 8080 and 6800 parallel bus
- I2C, 3-pin I2S
- SDIO
- SPI, DSPI, QSPI
- UART
- DVI or VGA (via resistor DAC)
@pio_layout.png
PIO Model
═════════
• Two 32-bit shift registers – either direction, any shift count
• Two 32-bit scratch registers
• 4×32-bit bus FIFO in each direction (TX/RX),
reconfigurable as 8×32 in a single direction
• Fractional clock divider (16 integer, 8 fractional bits)
• Flexible GPIO mapping (all 30 pins)
• DMA interface, sustained throughput
up to 1 word per clock from system DMA
• IRQ flag set/clear/status
@pio_model.png
@pio_square.png
@pio_instr.png
iCE40UP5K
═════════
• 5280 LUTs
• 1Mbit single port RAM
• 120Kb dual port RAM
• 8 x DSP blocks
@ice40-top.png
sysDSP
══════
• 8-bit x 8-bit Multiplier
• 16-bit x 16-bit Multiplier
• 16-bit Adder/Subtracter
• 32-bit Adder/Subtracter
• 16-bit Accumulator
• 32-bit Accumulator
• 8-bit x 8-bit Multiply-Accumulate
• 16-bit x 16-bit Multiply-Accumulate
@ice40-dsp.png
@ice40-io.png
@ice40-dpram.png
@ice40-spram.png
Programming the pico-ice
════════════════════════
• pico-sdk
• pico-ice-sdk
• Transfer via Flash drive + UF2
• I/O over one or more
virtual serial devices
UF2
═══
• UF2 = USB Flashing Format
• Created by Microsoft for MakeCode
• 512 Blocks to align with FAT Flash drive blocks
• Whole blocks arrive or not
• Each block has:
- magic numbers at the beginning and at the end
- address where the data should be flashed
- up to 476 bytes of data
µEforth
═══════
• Indirect-threaded Forth
• Written in C
• Extensive use of X-Macros
• Ports to:
- ESP32 (Known as ESP32forth)
- Linux/Posix
- Windows
- Web (via Asm.js)
- pico-ice?
X-MACROS
════════
• Use the C Preprocessor
• Lists of data to build code
• Make the caller a parameter
#define PRIMITIVE_LIST \
X("+", ADD, tos += *sp; --sp) \
Y(DROP, tos = *sp; --sp) \
Y(SWAP, w = tos; tos = *sp; *sp = w) \
Y(AND, tos = tos & *sp; --sp) \
Y(OR, tos = tos | *sp; --sp) \
enum {
#define X(sname, name, code) OP_ ## name,
PRIMITIVE_LIST
#undef X
};
Porting µEforth
═══════════════
• git submodules for SDKs
• Add CMake build invoked by Make
• Leave out fault handling to start
• New builtins.h + main.cpp
• And... it didn't run...
Debugging
═════════
• Confirmed basic IO samples work
• Added a "sim" variant like ESP32
• Found failures in the startup before IO ready
• And... it came up!
Bindings
════════
• memory support
• yield + ms-ticks support
• general posix files
• ice
- ice_cram_*
- ice_flash_*
- ice_fpga_*
- ice_led_*
- ice_spi_*
- ice_sram_*
• pico
- adc_*
- TODO: MORE!
Making it Faster
════════════════
• Shallow submodules
• Init only the needed modules
• Build only your target!
• cmake -G Ninja
Publishing
══════════
• Some refactoring of documentation
• For now don't publish by default
• Beta only for now
What's Next?
════════════
• Expose more USB bindings
• Expose more hardware bindings
• Add BLOCKs + use FPGA Flash for it
• Figure out how to build directly/faster?
- Arduino tools build instead?
• Synthesize for the FPGA in Forth!
icestorm/YoSYS
══════════════
• Community has reverse engineered
the iCE40 bitstream format!
• Built an open source verilog
• icepack capture fairly simple config layout
@ice40-plb.png
@ice40-viewer.png
https://knielsen.github.io/ice40_viewer/ice40_viewer.html
iCE40 Format
════════════
• icepack capture fairly simple config layout:
if (right_half)
cram_x = bank_xoff + column_width - 1 - bit_x;
else
cram_x = bank_xoff + bit_x;
if (top_half)
cram_y = bank_yoff + (15 - bit_y);
else
cram_y = bank_yoff + bit_y;
.logic_tile_bitmap
Nobrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
--orrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbbbCbbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
bbbrrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb-bbbb
bborrrr-rrrrrrbbbbbb-bbbbbbbbbbbbbbbllllllllllbbb--bbb
- ... unknown bit
r ... routing
b ... buffer
l ... logic bits
o ... ColBufCtrl
C ... CarryInSet
N ... NegClk
@ice40-spans.png
Plan for Synthesis
══════════════════
• Keep an allocation map for LUTs + Routes
• Reserve LUTs + routes greedily,
provide manual words to move allocation point
• Group signal in link list "bus" value
BUS ( bool-wire bus -- bus )
UNBUS ( bus -- bool-wire bus )
2LUT ( a b T -- v )
AT ( x y -- )
: AND ( a b -- v ) $8 2LUT ;
: OR ( a b -- v ) $E 2LUT ;
: XOR ( a b -- v ) $9 2LUT ;
: HA ( a b -- s c ) 2DUP AND >R XOR R> ;
: FA ( a b c -- s c ) HA >R HA R> OR ;
: ADDER ( cin a b n -- s cout )
NEWBUS
SWAP 0 DO
>R UNBUS >R SWAP UNBUS >R FA
SWAP R> SWAP R> SWAP R> BUS
LOOP
NIP NIP SWAP
;
1 IN PIN go
1 OUT PIN result
8 BIT REGISTER counter
go counter counter ADDER ON-CLOCK counter
counter UNBUS DROP ALWAYS result
DEMO
QUESTIONS❓
🙏
Thank you!