Assembler / Disassembler
     for the ESP32
          🙘
    October 22, 2022

Target System
  🙘
• ESP32forth
  - C based Forth, no ASM out of the box
• Tensillica Xtensa LX6
• ESP32-S2/S3 use LX7
• https://0x04.net/~mwk/doc/xtensa.pdf

Why an Assembler?
  🙘
• ML kernels, may need to hand optimize
• Might need thunks when working with native code
• No Forth is complete without one
• Work towards ditching the C compiler

Why a Disassembler?
  🙘
• Better understand the code generated by C
  - Register assignment
  - Quality of the generated code
• Examine boot code
• Learn more about the undocumented raw
                WiFi / Bluetooth layers
• Reach out and touch the machine

Xtensa LX6 (ESP32 flavor)
  🙘
• 2 and 3 byte instructions
• 16 x 32-bit integer registers a0-a15
  - Internal register window, slid in multiples of 4
  - 64 physical registers
• 16 x 32-bit floating point registers f0-f15

@add.png


@add.n.png


@slli.png


@j.png


CALL0 offset
------------
Return address goes in a0

@call0.png


@callx0.png


CALL4 offset
------------
 a0 → pushed 
 a1 → pushed
 a2 → pushed
 a3 → pushed
 a4 → a0
 a5 → a1
 a6 → a2
 a7 → a3
 a8 → a4
 a9 → a5
a10 → a6
a11 → a7
a12 → a8
a13 → a9
a14 → a10
a15 → a11
new → a12
new → a13
new → a14
new → a15

@call4.png


@entry.png


How to Approach an Assembler?
  🙘
• Forth is excellent at domain specific languages
• Pick a syntax that "easy" to implement
• Brad Rodriguez's excellent tutorial:
  https://www.bradrodriguez.com/papers/tcjassem.txt

HEX
: NOP,   12 C, ;

HEX
: INHERENT CREATE C, DOES> C@ C, ;
12 INHERENT NOP,
3A INHERENT ABX,
3D INHERENT MUL,

HEX
: ORCC, ( n -- ) 1A C, C, ;
0F ORCC,

HEX 89 GENERAL-OP ADC,
    84 GENERAL-OP AND,
    85 GENERAL-OP BIT,

Bill Ragsdale's
   AMAZING 6502 Assembler
     🙘
• 96 lines in 6 64x16 screens
• Structured flow control

@6502a.png


@6502b.png


What about Disassembly?
  🙘
• Decode instructions like the CPU
• Use a big CASE statement or jump table
• Emit operands and opcodes

Aren't these two kind of the same?
  🙘
• Both list out all the opcodes
• Both describe the meaning of
  operands / addressing modes
• But one is a forward transform,
  the other the reverse
• Could we describe both at once?
• Could we make it general purpose?

What do we need?
  🙘
• A few kinds of operand "bit" words:
  - Register-like operands
  - Number-like operands
• Addressing modes
• Then describe each opcode

@add.png


1 0 0 0  0 0 0 0  r r r r  s s s s  t t t t  0 0 0 0  OP ADD,

l o o o  o o o o  r r r r  s s s s  t t t t  o o o o  OP ADD,

1 0 0 0 | 0 0 0 0 | r r r r | s s s s | t t t t | 0 0 0 0 ← Instruction
---------------------------------------------------------
1 0 0 0 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 ← Pattern
1 1 1 1 | 1 1 1 1 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 | 1 1 1 1 ← Mask
---------------------------------------------------------
0 0 0 0 | 0 0 0 0 | 1 1 1 1 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 ← r Op-mask
0 0 0 0 | 0 0 0 0 | 0 0 0 0 | 1 1 1 1 | 0 0 0 0 | 0 0 0 0 ← s Op-mask
0 0 0 0 | 0 0 0 0 | 0 0 0 0 | 0 0 0 0 | 1 1 1 1 | 0 0 0 0 ← t Op-mask

What does each bit word do?
  🙘
o → 1 in mask, 0 in pattern, 0 elsewhere
l → 1 in mask, 1 in pattern, 0 elsewhere
r → 1 in r, 0 elsewhere
s → 1 in s, 0 elsewhere
t → 1 in t, 0 elsewhere

Make it easy to pass parameters
  🙘
• Patterns likely need to be parameters,
  not other operands.
    → 
  bits ( val n -- ) Do o/l for each bit in val.
    → 
  : pattern  o o o o  4 bits  r r r r  s s s s  OP ;
  8 OP FOO,   9 OP BAR,

Operands
  🙘
• Defining an operand requires 3 things:
  - How to go from its stack value to bit pattern
  - How to go from its bit pattern to print it
  - Mask of the affected bits

16 names a0 a1 a2 a3
         a4 a5 a6 a7
         a8 a9 a10 a11
         a12 a13 a14 a15
: reg. ( n -- ) base @ >r decimal ." a" . r> base ! ;
: register ( -- in print ) ['] nop ['] reg. ;

register operand r   : rrrr   r r r r ;
register operand s   : ssss   s s s s ;
register operand t   : tttt   t t t t ;

l o o o  o o o o  rrrr  ssss  tttt  o o o o  OP ADD,

o l l o  o o o o  rrrr  o o o o  tttt  o o o o  OP NEG,
o l l o  o o o o  rrrr  o o o l  tttt  o o o o  OP ABS,
o l l o  o o o l  sr             tttt  o o o o  OP XSR,

: ALU   4 bits  o o o o  rrrr  ssss  tttt  o o o o  OP ;
              $1 ALU AND,   $2 ALU OR,     $3 ALU XOR,
( $6 ABS/NEG )
$8 ALU ADD,   $9 ALU ADDX2, $a ALU ADDX4,  $b ALU ADDX8,
$c ALU SUB,   $d ALU SUBX2, $e ALU SUBX4,  $f ALU SUBX8,

( Offsets for J )
: >ofs ( n -- n ) chere - 4 - ;
: ofs. ( n -- ) 18 sextend address @ + 4 + . ;
' >ofs ' ofs. operand ofs
: offset   18 for aft ofs then next ;

offset  o o  o l l o  OP J,

: BRANCH2   imm12  ssss  4 bits  o l l o  OP ;
: BRANCH2a   imm8  rrrr  ssss  4 bits  o l l o  OP ;
: BRANCH2e   entry12  ssss  4 bits  o l l o  OP ;
( $0 J, )  $1 BRANCH2 BEQZ,  $2 BRANCH2a BEQI,  $3 BRANCH2e ENTRY,
( $4 J, )  $5 BRANCH2 BNEZ,  $6 BRANCH2a BNEI,  ( BRANCH2b's )
( $8 J, )  $9 BRANCH2 BLTZ,  $a BRANCH2a BLTI,  $b BRANCH2a BLTUI,
( $c J, )  $d BRANCH2 BGEZ,  $e BRANCH2a BGEI,  $f BRANCH2a BGEUI,
: BRANCH2b   imm8  4 bits  ssss  o l l l  o l l o  OP ;
$0 BRANCH2b BF,    $1 BRANCH2b BT,
$8 BRANCH2b LOOP,  $9 BRANCH2b LOOPNEZ,  $a BRANCH2b LOOPGTZ,

: ALU.S   4 bits  l o l o  rrrr  ssss  tttt  o o o o  OP ;
$0 ALU.S ADD.S,    $1 ALU.S SUB.S,    $2 ALU.S MUL.S,
$4 ALU.S MADD.S,   $5 ALU.S MSUB.S,
$8 ALU.S ROUND.S,  $9 ALU.S TRUNC.S,  $a ALU.S FLOOR.S,  $b ALU.S CEIL.S,
$c ALU.S FLOAT.S,  $d ALU.S UFLOAT.S, $e ALU.S UTRUNC.S,
: ALU2.S   l l l l  l o l o  rrrr  ssss  4 bits  o o o o  OP ;
$0 ALU2.S MOV.S,   $1 ALU2.S ABS.S,
$4 ALU2.S RFR,     $5 ALU2.S WFR,     $6 ALU2.S NEG.S,

Code Words
  🙘
• Use a C call signature for stable ABI
• Not all memory can be used for ESP32 code
• ESP32 and Linux support only at the moment
• Assembler coming soon
• Lazy loaded by ASM word

CODE ( "name" ) -- Define a code word
END-CODE        -- End a code word
CHERE ( -- a )  -- Get code pointer
CALLOT ( n -- ) -- Allocate code memory
CODE1, ( n -- ) -- Compile 1 byte
CODE2, ( n -- ) -- Compile 2 byte
CODE3, ( n -- ) -- Compile 3 byte
CODE4, ( n -- ) -- Compile 4 byte

Forth to C Entry SIGNATURE:
cell_t *myfunc(cell_t *sp, float **fp);
-------
On x64:
  %rdi - Data stack pointer in
  %rax - Data stack pointer out
  %rsi - Pointer to floating stack pointer
-------
On Xtensa (ESP32):
  a0 - Return Address
  a1 - Stack Pointer
  a2 - Data stack pointer in & out
  a3 - Pointer to floating stack pointer

YV(internals, CALLCODE, float *t_fp = fp; DUP; \
    sp = (cell_t *) (*(call_t*) (w + sizeof(cell_t)))(sp, &t_fp); \
    fp = t_fp; DROP) \

ASM FORTH
CODE my2*
  $48 CODE1, $89 CODE1, $f8 CODE1, ( mov %rdi, %rax )
  $48 CODE1, $d1 CODE1, $27 CODE1, ( shlq [%rdi] )
  $c3 CODE1,                       ( ret )
END-CODE

ASM FORTH
CODE my2*
  $004136 CODE3, ( entry   a1, 32 )
  $0288   CODE2, ( l32i.n  a8, a2, 0 )
  $1188f0 CODE3, ( slli    a8, a8, 1 )
  $0289   CODE2, ( s32i.n  a8, a2, 0 )
  $f01d   CODE2, ( retw.n )
END-CODE

' - 20 hex dump
 
416208                                      34 2A 40 00 00 00 00 00
416210              B2 3E 41 00 00 00 00 00 08 03 00 00 00 00 00 00
416220              16 2A 40 00 00 00 00 00

enmask & demask
  🙘
HEX
2468 F0F0 demask → 26
12 F0F0 enmask → 1020

-1 1 rshift invert constant high-bit
: odd? ( n -- f ) 1 and ;
: >>1 ( n -- n ) 1 rshift ;
: enmask ( n m -- n )
  0 -rot cell 8 * 1- for
    rot >>1 -rot
    dup odd? if
      over odd? if rot high-bit or -rot then
      swap >>1 swap
    then
    >>1
  next
  2drop
;

: demask ( n m -- n )
  0 >r begin dup while
    dup 0< if over 0< if r> 2* 1+ >r else r> 2* >r then then
    2* swap 2* swap
  repeat 2drop r>
;

variable length   variable pattern   variable mask
: bit! ( n a -- ) dup @ 2* rot 1 and or swap ! ;

: skip  1 length +!  0 mask bit!  0 pattern bit!  advance ;
: bit ( n -- ) 1 length +!  1 mask bit!  pattern bit!  advance ;
: bits ( val n ) 1- for dup r@ rshift bit next drop ;
: o   0 bit ;   : l   1 bit ;

( struct: pattern next inop printop )
: operand ( inop printop "name" )
   create 0 , operands @ , latestxt operands ! swap , ,
   does> skip 1 swap +! ;
: names ( n "names"*n --) 0 swap 1- for dup constant 1+ next drop ;

: op ( "name" )
   create opcodes @ , latestxt opcodes !
          length @ , pattern @ , mask @ ,
          ['] op-snap for-operands 0 , reset
   does> >xt >r
         r@ >pattern
         0 r@ >operands begin dup @ while >r 1+ r> 2 cells + repeat
         swap for aft
           2 cells - dup >r swap >r dup cell+ @ >r @ >inop execute r> enmask r> or r>
         then next
         drop
         r> >length coden,
;

variable address
: matchit ( a xt -- a )
  >r dup m@ r@ >mask and r@ >pattern = if
    r@ >operands begin dup @ while
      >r dup m@ r@ cell+ @ demask r@ @ >printop execute r> 2 cells +
    repeat drop
    r@ see.
    r@ >length 8 / istep !
  then rdrop ;
: disasm1 ( a -- a )
  dup address ! dup . ."  --  " 0 istep ! ['] matchit for-ops
  istep @ 0= if 1 istep ! ." UNKNOWN!!!" then
  9 emit 9 emit ." -- " dup m@ istep @ m.
  istep @ +
  cr
;
: disasm ( a n -- ) for aft disasm1 then next drop ;

CODE MY2*
  a1 32 ENTRY,
  a8 a2 0 L32I.N,
  a8 a8 1 SLLI,
  a8 a2 0 S32I.N,
  RETW.N,
END-CODE

--> decimal ' my2* cell+ @ 5 disasm
1074356660  --  a1 32 ENTRY,            -- 004136
1074356663  --  a8 a2 0 L32I.N,         -- 0288
1074356665  --  a8 a8 1 SLLI,           -- 1188F0
1074356668  --  a8 a2 0 S32I.N,         -- 0289
1074356670  --  RETW.N,                 -- F01D
 ok

--> hex ' my2* cell+ @ 5 disasm
400961B4  --  a1 20 ENTRY,              -- 004136
400961B7  --  a8 a2 0 L32I.N,           -- 0288
400961B9  --  a8 a8 1 SLLI,             -- 1188F0
400961BC  --  a8 a2 0 S32I.N,           -- 0289
400961BE  --  RETW.N,                   -- F01D
 ok

Properties
  🙘
• ~113 lines generic assembler / disassembler
• ~286 lines Xtensa specific code
• Lazy loaded on first run
• Memory safe / misaligned reads mixed in

Future Directions
  🙘
• Wrap up a few missing opcodes
• Fix some numeric operands
• Add structured flow control
• Better tests
• Clean up what there
• Try it for another architecture like:
  RISC-V, x64
• What about emulation?

Maybe emulation like this?
  🙘
l o o o  o o o o  r r r r  s s s s  t t t t  o o o o
  OP: ADD,  s@ t@ + r! ;

DEMO

QUESTIONS?
   🏵
 Thank you!