ESP32 Corner Asm & Disasm News 🙘 November 12, 2022 Errata 🙘 • Fixed CASE, thanks mpetremann93m! • Thanks for the suggestion to invoke Arduino build at the command line! - PORT=com3 make esp32-flash - PORT=com6 make esp32c3-flash Why an Assembler? 🙘 • ML kernels, may need to hand optimize • Might need thunks when working with native code • No Forth is complete without one • Work towards ditching the C compiler Why a Disassembler? 🙘 • Better understand the code generated by C - Register assignment - Quality of the generated code • Examine boot code • Learn more about the undocumented raw WiFi / Bluetooth layers • Reach out and touch the machine Target Systems 🙘 • ESP32forth - C based Forth, no ASM out of the box • ESP32/-S2/-S3 Tensillica Xtensa LX6/LX7 - https://0x04.net/~mwk/doc/xtensa.pdf • ESP32-C3 RISC-V - https://riscv.org/wp-content/uploads /2017/05/riscv-spec-v2.2.pdf Xtensa LX6 (ESP32/-S2 flavor) 🙘 • SIP Core with lots of options • 2 and 3 byte instructions • 16 x 32-bit integer registers a0-a15 - Internal register window, slid in multiples of 4 - 64 physical registers • 16 x 32-bit floating point registers f0-f15 RISC-V (ESP32-C3 flavor) 🙘 • Open-source core with layered instruction set • 2 and 4 byte instructions • 32 x 32-bit integer registers x0-x15 - x0 / zero is fixed at zero used for some encodings - jump and link style RISC • 16 x 32-bit floating point registers f0-f15 @add.png@add.n.png
@slli.png
@j.png
CALL0 offset ------------ Return address goes in a0 @call0.png
@callx0.png
CALL4 offset ------------ a0 → pushed a1 → pushed a2 → pushed a3 → pushed a4 → a0 a5 → a1 a6 → a2 a7 → a3 a8 → a4 a9 → a5 a10 → a6 a11 → a7 a12 → a8 a13 → a9 a14 → a10 a15 → a11 new → a12 new → a13 new → a14 new → a15 @call4.png
@entry.png
How to Approach an Assembler? 🙘 • Forth is excellent at domain specific languages • Pick a syntax that "easy" to implement • Brad Rodriguez's excellent tutorial: https://www.bradrodriguez.com/papers/tcjassem.txt HEX : NOP, 12 C, ; HEX : INHERENT CREATE C, DOES> C@ C, ; 12 INHERENT NOP, 3A INHERENT ABX, 3D INHERENT MUL, HEX : ORCC, ( n -- ) 1A C, C, ; 0F ORCC, HEX 89 GENERAL-OP ADC, 84 GENERAL-OP AND, 85 GENERAL-OP BIT, Bill Ragsdale's AMAZING 6502 Assembler 🙘 • 96 lines in 6 64x16 screens • Structured flow control @6502a.png
@6502b.png
What about Disassembly? 🙘 • Decode instructions like the CPU • Use a big CASE statement or jump table • Emit operands and opcodes Aren't these two kind of the same? 🙘 • Both list out all the opcodes • Both describe the meaning of operands / addressing modes • But one is a forward transform, the other the reverse • Could we describe both at once? • Could we make it general purpose? What do we need? 🙘 • A few kinds of operand "bit" words: - Register-like operands - Number-like operands • Addressing modes • Then describe each opcode @add.png
1 0 0 0 0 0 0 0 r r r r s s s s t t t t 0 0 0 0 OP ADD, l o o o o o o o r r r r s s s s t t t t o o o o OP ADD, 1 0 0 0 | 0 0 0 0 | r r r r | s s s s | t t t t | 0 0 0 0 ← Instruction --------------------------------------------------------- 1 0 0 0 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 ← Pattern 1 1 1 1 | 1 1 1 1 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 | 1 1 1 1 ← Mask --------------------------------------------------------- 0 0 0 0 | 0 0 0 0 | 1 1 1 1 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 ← r Op-mask 0 0 0 0 | 0 0 0 0 | 0 0 0 0 | 1 1 1 1 | 0 0 0 0 | 0 0 0 0 ← s Op-mask 0 0 0 0 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 | 1 1 1 1 | 0 0 0 0 ← t Op-mask What does each bit word do? 🙘 o → 1 in mask, 0 in pattern, 0 elsewhere l → 1 in mask, 1 in pattern, 0 elsewhere r → 1 in r, 0 elsewhere s → 1 in s, 0 elsewhere t → 1 in t, 0 elsewhere Make it easy to pass parameters 🙘 • Patterns likely need to be parameters, not other operands. → bits ( val n -- ) Do o/l for each bit in val. → : pattern o o o o 4 bits r r r r s s s s OP ; 8 OP FOO, 9 OP BAR, Operands 🙘 • Defining an operand requires 3 things: - How to go from its stack value to bit pattern - How to go from its bit pattern to print it - Mask of the affected bits 16 names a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 : reg. ( n -- ) base @ >r decimal ." a" . r> base ! ; : register ( -- in print ) ['] nop ['] reg. ; register operand r : rrrr r r r r ; register operand s : ssss s s s s ; register operand t : tttt t t t t ; l o o o o o o o rrrr ssss tttt o o o o OP ADD, o l l o o o o o rrrr o o o o tttt o o o o OP NEG, o l l o o o o o rrrr o o o l tttt o o o o OP ABS, o l l o o o o l sr tttt o o o o OP XSR, : ALU 4 bits o o o o rrrr ssss tttt o o o o OP ; $1 ALU AND, $2 ALU OR, $3 ALU XOR, ( $6 ABS/NEG ) $8 ALU ADD, $9 ALU ADDX2, $a ALU ADDX4, $b ALU ADDX8, $c ALU SUB, $d ALU SUBX2, $e ALU SUBX4, $f ALU SUBX8, ( Offsets for J ) : >ofs ( n -- n ) chere - 4 - ; : ofs. ( n -- ) 18 sextend address @ + 4 + . ; ' >ofs ' ofs. operand ofs : offset 18 for aft ofs then next ; offset o o o l l o OP J, : BRANCH2 imm12 ssss 4 bits o l l o OP ; : BRANCH2a imm8 rrrr ssss 4 bits o l l o OP ; : BRANCH2e entry12 ssss 4 bits o l l o OP ; ( $0 J, ) $1 BRANCH2 BEQZ, $2 BRANCH2a BEQI, $3 BRANCH2e ENTRY, ( $4 J, ) $5 BRANCH2 BNEZ, $6 BRANCH2a BNEI, ( BRANCH2b's ) ( $8 J, ) $9 BRANCH2 BLTZ, $a BRANCH2a BLTI, $b BRANCH2a BLTUI, ( $c J, ) $d BRANCH2 BGEZ, $e BRANCH2a BGEI, $f BRANCH2a BGEUI, : BRANCH2b imm8 4 bits ssss o l l l o l l o OP ; $0 BRANCH2b BF, $1 BRANCH2b BT, $8 BRANCH2b LOOP, $9 BRANCH2b LOOPNEZ, $a BRANCH2b LOOPGTZ, : ALU.S 4 bits l o l o rrrr ssss tttt o o o o OP ; $0 ALU.S ADD.S, $1 ALU.S SUB.S, $2 ALU.S MUL.S, $4 ALU.S MADD.S, $5 ALU.S MSUB.S, $8 ALU.S ROUND.S, $9 ALU.S TRUNC.S, $a ALU.S FLOOR.S, $b ALU.S CEIL.S, $c ALU.S FLOAT.S, $d ALU.S UFLOAT.S, $e ALU.S UTRUNC.S, : ALU2.S l l l l l o l o rrrr ssss 4 bits o o o o OP ; $0 ALU2.S MOV.S, $1 ALU2.S ABS.S, $4 ALU2.S RFR, $5 ALU2.S WFR, $6 ALU2.S NEG.S, Code Words 🙘 • Registers assigned by C are abitrary and hidden • Use a C call signature for stable ABI • Not all memory can be used for ESP32 code • ESP32 and Linux support only at the moment • Assembler coming soon • Lazy loaded by ASM word CODE ( "name" ) -- Define a code word END-CODE -- End a code word CHERE ( -- a ) -- Get code pointer CALLOT ( n -- ) -- Allocate code memory CODE1, ( n -- ) -- Compile 1 byte CODE2, ( n -- ) -- Compile 2 byte CODE3, ( n -- ) -- Compile 3 byte CODE4, ( n -- ) -- Compile 4 byte Forth to C Entry SIGNATURE: cell_t *myfunc(cell_t *sp, float **fp); ------- On x64: %rdi - Data stack pointer in %rax - Data stack pointer out %rsi - Pointer to floating stack pointer ------- On Xtensa (ESP32): a0 - Return Address a1 - Stack Pointer a2 - Data stack pointer in & out a3 - Pointer to floating stack pointer ------- On RISC-V: x0 - zero x1 - Return Address x2 - Stack Pointer x10 - Data stack pointer in & out x11 - Pointer to floating stack pointer YV(internals, CALLCODE, float *t_fp = fp; DUP; \ sp = (cell_t *) (*(call_t*) (w + sizeof(cell_t)))(sp, &t_fp); \ fp = t_fp; DROP) \ ASM FORTH CODE my2* $48 CODE1, $89 CODE1, $f8 CODE1, ( mov %rdi, %rax ) $48 CODE1, $d1 CODE1, $27 CODE1, ( shlq [%rdi] ) $c3 CODE1, ( ret ) END-CODE ASM FORTH CODE my2* $004136 CODE3, ( entry a1, 32 ) $0288 CODE2, ( l32i.n a8, a2, 0 ) $1188f0 CODE3, ( slli a8, a8, 1 ) $0289 CODE2, ( s32i.n a8, a2, 0 ) $f01d CODE2, ( retw.n ) END-CODE ' - 20 hex dump 416208 34 2A 40 00 00 00 00 00 416210 B2 3E 41 00 00 00 00 00 08 03 00 00 00 00 00 00 416220 16 2A 40 00 00 00 00 00 enmask & demask 🙘 HEX 2468 F0F0 demask → 26 12 F0F0 enmask → 1020 -1 1 rshift invert constant high-bit : odd? ( n -- f ) 1 and ; : >>1 ( n -- n ) 1 rshift ; : enmask ( n m -- n ) 0 -rot cell 8 * 1- for rot >>1 -rot dup odd? if over odd? if rot high-bit or -rot then swap >>1 swap then >>1 next 2drop ; : demask ( n m -- n ) 0 >r begin dup while dup 0< if over 0< if r> 2* 1+ >r else r> 2* >r then then 2* swap 2* swap repeat 2drop r> ; variable length variable pattern variable mask : bit! ( n a -- ) dup @ 2* rot 1 and or swap ! ; : skip 1 length +! 0 mask bit! 0 pattern bit! advance ; : bit ( n -- ) 1 length +! 1 mask bit! pattern bit! advance ; : bits ( val n ) 1- for dup r@ rshift bit next drop ; : o 0 bit ; : l 1 bit ; ( struct: pattern next inop printop ) : operand ( inop printop "name" ) create 0 , operands @ , latestxt operands ! swap , , does> skip 1 swap +! ; : names ( n "names"*n --) 0 swap 1- for dup constant 1+ next drop ; : op ( "name" ) create opcodes @ , latestxt opcodes ! length @ , pattern @ , mask @ , ['] op-snap for-operands 0 , reset does> >xt >r r@ >pattern 0 r@ >operands begin dup @ while >r 1+ r> 2 cells + repeat swap for aft 2 cells - dup >r swap >r dup cell+ @ >r @ >inop execute r> enmask r> or r> then next drop r> >length coden, ; variable address : matchit ( a xt -- a ) >r dup m@ r@ >mask and r@ >pattern = if r@ >operands begin dup @ while >r dup m@ r@ cell+ @ demask r@ @ >printop execute r> 2 cells + repeat drop r@ see. r@ >length 8 / istep ! then rdrop ; : disasm1 ( a -- a ) dup address ! dup . ." -- " 0 istep ! ['] matchit for-ops istep @ 0= if 1 istep ! ." UNKNOWN!!!" then 9 emit 9 emit ." -- " dup m@ istep @ m. istep @ + cr ; : disasm ( a n -- ) for aft disasm1 then next drop ; CODE MY2* a1 32 ENTRY, a8 a2 0 L32I.N, a8 a8 1 SLLI, a8 a2 0 S32I.N, RETW.N, END-CODE --> decimal ' my2* cell+ @ 5 disasm 1074356660 -- a1 32 ENTRY, -- 004136 1074356663 -- a8 a2 0 L32I.N, -- 0288 1074356665 -- a8 a8 1 SLLI, -- 1188F0 1074356668 -- a8 a2 0 S32I.N, -- 0289 1074356670 -- RETW.N, -- F01D ok --> hex ' my2* cell+ @ 5 disasm 400961B4 -- a1 20 ENTRY, -- 004136 400961B7 -- a8 a2 0 L32I.N, -- 0288 400961B9 -- a8 a8 1 SLLI, -- 1188F0 400961BC -- a8 a2 0 S32I.N, -- 0289 400961BE -- RETW.N, -- F01D ok Properties 🙘 • ~113 lines generic assembler / disassembler • ~304 lines Xtensa specific code • ~152 lines Xtensa specific code • Lazy loaded on first run • Layered vocabularies: FORTH > asm > xtensa/riscv • Memory safe / misaligned reads mixed in Future Directions 🙘 • Wrap up a few missing opcodes • Fix some numeric operands • Add structured flow control • Better tests • Clean up what's there • Implement x86 / x64 • What about emulation? Maybe emulation like this? 🙘 l o o o o o o o r r r r s s s s t t t t o o o o OP: ADD, s@ t@ + r! ; DEMO QUESTIONS? 🏵 Thank you!