EFORTH
 IMPLEMENTED IN C
         
   Brad Nelson
       😷
    Forth Day
November 21, 2020

Why C?

MOTIVATION
    🙜
 • Interface with the OS
 • Temptation of Libraries
 • Tame modern architectures
 • Suffer the tools we deserve
 • Fight complexity with its own tools

BUT WHY _C_?
     🙜
 • Portable Assembler: x86, ARM, ESP32
 • Basis for nearly all OS ABIs
 • Wide range of substantive libraries
 • Rust: Arch support good, but rides the same stack
 • Go: Tiny Go does ESP32, but LLVM at bottom

    💱
 Interface
with the OS

OS COMPLEXITY
     🙜
 • I live in a multi-OS world
    • OSX and Linux at Work
    • Windows and Linux at Home
    • Raspberry Pi
    • ESP8266/ESP32 RTOS

OS COMPLEXITY
     🙜
 • OSes do more than file systems
    • Graphics with shaders
    • TCP/IP, Wi-Fi, BLE, Intents
    • Encyption, certificates
    • Image, audio, and video formats
    • Clipboards, icons, compression

OS COMPLEXITY
     🙜
 • Complex executables
    • Dynamic linking
    • Multi-size icons
    • Fat binaries

    📞
  Calling
Conventions

LINUX/OSX x64 CONVENTION
           🙜
  Arguments: RDI, RSI, RDX, RCX, R8, R9
  Caller saved: RAX, R10, R11
  Callee saved: RBX, RBP,
                RSP, R12, R13, R14, R15
     16-byte aligned stack, 128-byte red zone

MICROSOFT x64 CONVENTION
           🙜
  Arguments: RCX, RDX, R8, R9
  Caller saved: RAX, R10, R11
  Callee saved: RBX, RBP, RDI, RSI,
                RSP, R12, R13, R14, R15
  16-byte aligned stack, 32-byte shadow space

typedef struct {
  int type;             /* of event */
  unsigned long serial; /* # of last request processed by server */
  Bool send_event;      /* true if this came from a SendEvent request */
  Display *display;     /* Display the event was read from */
  Window window;        /* "event" window it is reported relative to */
  Window root;          /* root window that the event occurred on */
  Window subwindow;     /* child window */
  Time time;            /* milliseconds */
  int x, y;             /* pointer x, y coordinates in event window */
  int x_root, y_root;   /* coordinates relative to root */
  unsigned int state;   /* key or button mask */
  unsigned int keycode; /* detail */
  Bool same_screen;     /* same screen flag */
} XKeyEvent;

class WiFiClient : public ESPLwIPClient
{
protected:
    std::shared_ptr clientSocketHandle;
    std::shared_ptr _rxBuffer;
    bool _connected;
    
public:
    WiFiClient *next;
    WiFiClient();
    WiFiClient(int fd);
    ~WiFiClient();
    int connect(IPAddress ip, uint16_t port);
    int connect(IPAddress ip, uint16_t port, int32_t timeout);
    int connect(const char *host, uint16_t port);
    int connect(const char *host, uint16_t port, int32_t timeout);
    size_t write(uint8_t data);
    size_t write(const uint8_t *buf, size_t size);
    size_t write_P(PGM_P buf, size_t size);
    size_t write(Stream &stream);
    int available();
    int read();
    int read(uint8_t *buf, size_t size);
    int peek();
    void flush();
    void stop();
    uint8_t connected();
...

Dynamic
Linking

( Xlib functions / macros )
c-function BlackPixel BlackPixel a n -- n
c-function DefaultColormap DefaultColormap a n -- n
c-function DefaultScreen DefaultScreen a -- n
c-function RootWindow RootWindow a n -- n
c-function WhitePixel WhitePixel a n -- n
c-function XCheckMaskEvent XCheckMaskEvent a n a -- n
c-function XCreateGC XCreateGC a n n a -- a

c-function XCreateImage XCreateImage a a n n n a n n n n -- a
c-function XCreateSimpleWindow XCreateSimpleWindow a n n n n n n n n -- n
c-function XDefaultDepth XDefaultDepth a n -- n
c-function XDefaultVisual XDefaultVisual a n -- a
c-function XDestroyImage XDestroyImage a -- void
c-function XFlush XFlush a -- void

c-function XLookupString XLookupString a a n a a -- n
c-function XMapWindow XMapWindow a n -- void
c-function XNextEvent XNextEvent a a -- void
c-function XOpenDisplay XOpenDisplay a -- a
c-function XPutImage XPutImage a n a a n n n n n n -- void
c-function XSelectInput XSelectInput a n n -- void

Structures

( Accessors for members of XEvent )
\c #define XEventSize() sizeof(XEvent)
c-function XEventSize XEventSize -- n
\c #define XEventType(e) (((XEvent*)(e))->type)
c-function XEventType XEventType a -- n
\c #define XEventConfigureWidth(e) (((XEvent*)(e))->xconfigure.width)
c-function XEventConfigureWidth XEventConfigureWidth a -- n
\c #define XEventConfigureHeight(e) (((XEvent*)(e))->xconfigure.height)
c-function XEventConfigureHeight XEventConfigureHeight a -- n

\c #define XEventKeyEvent(e) (&((XEvent*)(e))->xkey)
c-function XEventKeyEvent XEventKeyEvent a -- a
\c #define XEventKeycode(e) (((XEvent*)(e))->xkey.keycode)
c-function XEventKeycode XEventKeycode a -- n

\c #define XEventButton(e) (((XEvent*)(e))->xbutton.button)
c-function XEventButton XEventButton a -- n
\c #define XEventX(e) (((XEvent*)(e))->xmotion.x)
c-function XEventX XEventX a -- n
\c #define XEventY(e) (((XEvent*)(e))->xmotion.y)
c-function XEventY XEventY a -- n
\c #define XEventExposeCount(e) (((XEvent*)(e))->xexpose.count)
c-function XEventExposeCount XEventExposeCount a -- n

    🏗️
   Code
Generation

CPU COMPLEXITY?
      🙜
 • Microarchitecture
   • 16-byte blocks of code
 • red-zones, floating point state
 • SIMD
 • 👻 Spectre 👻

NOP

  More than one
way to do nothing!

90
6690
0f1f00
0f1f4000
0f1f440000
660f1f440000
0f1f8000000000
0f1f840000000000
660f1f840000000000

NOP
66 NOP
NOP DWORD ptr [EAX]
NOP DWORD ptr [EAX + 00H]
NOP DWORD ptr [EAX + EAX*1 + 00H]
66 NOP DWORD ptr [EAX + EAX*1 + 00H]
NOP DWORD ptr [EAX + 00000000H]
NOP DWORD ptr [EAX + EAX*1 + 00000000H]
66 NOP DWORD ptr [EAX + EAX*1 + 00000000H]

  🛠️
Trapped
by Tools

INTEL SYNTAX (nasm/yasm)
     🙜
instr   foo,segreg:[base+index*scale+disp]
mov     eax,[ebx+20h]
add     eax,[ebx+ecx*2h
lea     eax,[ebx+ecx]
sub     eax,[ebx+ecx*4h-20h]

AT&T SYNTAX (GNU as)
     🙜
instr   %segreg:disp(base,index,scale),foo
movl    0x20(%ebx),%eax
addl    (%ebx,%ecx,0x2),%eax
leal    (%ebx,%ecx),%eax
subl    -0x20(%ebx,%ecx,0x4),%eax

ESP32
  🙜
 • ESP-IDF Toolchain - 727MB
   • LibC, RTOS, Wi-Fi, BLE
   • Really needs Linux
 • Arduino
   • Easy to Install
   • Package Manager
   • Enlarge who can play

WINDOWS RESOURCES
       🙜
 • .png x n → .ico
 • .ico + .rc → .res / .obj
 • .obj x m → .EXE

S😎LD

So How?

INTERPRETER APPROACHES
         🙜
 • Words as Functions
 • Token Threaded
 • Computed Goto
 • Hybrid

         MOVING FORTH
              🙜
https://www.bradrodriguez.com
           /papers/moving1.htm

DIRECT THREADED
      🙜
   (IP) -> W   fetch memory pointed by IP into "W" register
   IP+2 -> IP  advance IP (assuming 2-byte addresses)
   JP (W)      jump to the address in the W register

INDIRECT THREADED
       🙜
   (IP) -> W  fetch memory pointed by IP into "W" register
              ...W now holds address of the Code Field
   IP+2 -> IP advance IP, just like a program counter
              (assuming 2-byte addresses in the thread)
   (W) ->  X  fetch memory pointed by W into "X" register
              ...X now holds address of the machine code 
   JP (X)     jump to the address in the X register

Words as Functions

void add() {
  tos += *sp;
}

memory[cfa] = &add;

for(;;) {
  code = (void (*)()) *ip++;
  code();
}

ADD:
   0:   48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 7 
   7:   48 8b 00                mov    (%rax),%rax
   a:   48 01 05 00 00 00 00    add    %rax,0x0(%rip)        # 11 
  11:   c3                      retq
  12:   66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
  19:   00 00 00 00
  1d:   0f 1f 00                nopl   (%rax)

MAIN LOOP:
 430:   48 8d 42 08             lea    0x8(%rdx),%rax
 434:   48 8b 12                mov    (%rdx),%rdx
 437:   48 89 05 00 00 00 00    mov    %rax,0x0(%rip)        # 43e 
 43e:   31 c0                   xor    %eax,%eax
 440:   ff 12                   callq  *(%rdx)
 442:   48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 449 
 449:   eb e5                   jmp    430 

   ❌
X-Macros

X-MACROS
   🙜
 • Use the C Preprocessor
 • Lists of data to build code
 • Make the caller a parameter

 #define PRIMITIVE_LIST \
   X("DROP", DROP, tos = *sp; --sp) \
   X("SWAP", SWAP, w = tos; tos = *sp; *sp = w) \
   X("AND", AND, tos = tos & *sp; --sp) \
   X("OR", OR, tos = tos | *sp; --sp) \

 enum {
 #define X(sname, name, code) OP_ ## name,
 PRIMITIVE_LIST
 #undef X
 };

FROM STENOFORTH
      🙜
 for (;;) {
   ir = ip[w];
   ++ip;
   switch (ir & 0xff) {
 #define X(sname, name, code) case OP_ ## name: code; break;
     PRIMITIVE_LIST
 #undef X
     default:
       break;
   }
   break;
 }

      🎟️
Token Threaded

GCC (token) - AND
    🙜
 3b0:   48 89 f0                mov    %rsi,%rax
 3b3:   48 89 d7                mov    %rdx,%rdi
 3b6:   48 8d 76 f8             lea    -0x8(%rsi),%rsi
 3ba:   48 23 18                and    (%rax),%rbx
 3bd:   e9 66 ff ff ff          jmpq   328 
 3c2:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

GCC (token) - DISPATCH
    🙜
 328:   48 83 3f 07             cmpq   $0x7,(%rdi)
 32c:   48 8d 57 08             lea    0x8(%rdi),%rdx
 330:   77 2e                   ja     360 
 332:   48 8b 07                mov    (%rdi),%rax
 335:   48 63 04 81             movslq (%rcx,%rax,4),%rax
 339:   48 01 c8                add    %rcx,%rax
 33c:   ff e0                   jmpq   *%rax
 33e:   66 90                   xchg   %ax,%ax
    🙜
 360:   4c 89 c8                mov    %r9,%rax
 363:   4d 8d 49 f8             lea    -0x8(%r9),%r9
 367:   48 89 50 f8             mov    %rdx,-0x8(%rax)
 36b:   48 8b 7a c0             mov    -0x40(%rdx),%rdi
 36f:   eb b7                   jmp    328 
 371:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)

CLANG (token) - DISPATCH
    🙜
 2f0:   48 83 c6 f8             add    $0xfffffffffffffff8,%rsi
 2f4:   48 89 cf                mov    %rcx,%rdi
 2f7:   48 8d 4f 08             lea    0x8(%rdi),%rcx
 2fb:   48 8b 07                mov    (%rdi),%rax
 2fe:   48 83 c0 ff             add    $0xffffffffffffffff,%rax
 302:   48 83 f8 06             cmp    $0x6,%rax
 306:   77 14                   ja     31c 
 308:   ff 24 c5 00 00 00 00    jmpq   *0x0(,%rax,8)
    🙜
 31c:   48 89 4a f8             mov    %rcx,-0x8(%rdx)
 320:   48 83 c2 f8             add    $0xfffffffffffffff8,%rdx
 324:   48 8b 7f c8             mov    -0x38(%rdi),%rdi
 328:   eb cd                   jmp    2f7 

CLANG (token) - AND & OR
    🙜
 32a:   4c 23 06                and    (%rsi),%r8
 32d:   eb c1                   jmp    2f0 
 32f:   4c 0b 06                or     (%rsi),%r8
 332:   eb bc                   jmp    2f0 

    💻↷ 
Computed Goto

and:
  tos &= *sp--;
  NEXT;

memory[cfa] = (cell_t) && and;
...
goto memory[xyz];

Must be in the
same function!

 #define IDT_NEXT \
   w = (cell_t *) *ip++; \
   x = (cell_t *) *w; \
   goto *x;

 #define DT_NEXT \
   w = (cell_t *) *ip++; \
   goto *w;

 #define CORE_WORDS \
   X(dup, *++sp = tos) \
   X(add, tos += *sp--) \
   X(and, tos &= *sp--) \
   X(or, tos |= *sp--) \
   X(dolit, *++sp = tos; tos = *ip++) \
   X(docolon, *++rp = (cell_t) ip; ip = ((cell_t *) w) + 1) \
   X(doexit, ip = (cell_t *) *rp--) \
   X(load, tos = * (cell_t *) tos) \
   X(store, *(cell_t *) tos = *sp--; tos = *sp--) \

 #define COMMON_REGS \
   register cell_t *ip asm("rdi") = memory; \
   register cell_t *sp asm("rsi") = dstack; \
   register cell_t *rp asm("r9") = rstack; \
   register cell_t tos asm("rbx") = *sp--; \
   register cell_t *w asm("rcx"); \
   int i = 0; \

 #define X(name, code) memory[i++] = (cell_t) && name;
 CORE_WORDS
 #undef X

 #define X(name, code) name: code; IDT_NEXT;
 CORE_WORDS
 #undef X

GCC (direct)
     🙜
 2d0:   48 89 f0                mov    %rsi,%rax
 2d3:   48 8d 76 f8             lea    -0x8(%rsi),%rsi
 2d7:   48 23 18                and    (%rax),%rbx
 2da:   48 89 f8                mov    %rdi,%rax
 2dd:   48 8d 7f 08             lea    0x8(%rdi),%rdi
 2e1:   48 8b 00                mov    (%rax),%rax
 2e4:   48 89 c1                mov    %rax,%rcx
 2e7:   ff e0                   jmpq   *%rax
 2e9:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)

GCC (indirect)
     🙜
  c0:   48 89 f8                mov    %rdi,%rax
  c3:   48 8d 7f 08             lea    0x8(%rdi),%rdi
  c7:   48 8b 00                mov    (%rax),%rax
  ca:   48 89 c1                mov    %rax,%rcx
  cd:   48 8b 00                mov    (%rax),%rax
  d0:   ff e0                   jmpq   *%rax
  d2:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
     🙜
 160:   48 89 f0                mov    %rsi,%rax
 163:   48 8d 76 f8             lea    -0x8(%rsi),%rsi
 167:   48 23 18                and    (%rax),%rbx
 16a:   e9 51 ff ff ff          jmpq   c0 
 16f:   90                      nop

CLANG (indirect)
      🙜
  d0:   48 23 06                and    (%rsi),%rax
  d3:   48 83 c6 f8             add    $0xfffffffffffffff8,%rsi
  d7:   48 8b 0f                mov    (%rdi),%rcx
  da:   48 83 c7 08             add    $0x8,%rdi
  de:   ff 21                   jmpq   *(%rcx)

CLANG (direct)
     🙜
 240:   48 23 06                and    (%rsi),%rax
 243:   48 83 c6 f8             add    $0xfffffffffffffff8,%rsi
 247:   48 8b 0f                mov    (%rdi),%rcx
 24a:   48 83 c7 08             add    $0x8,%rdi
 24e:   ff e1                   jmpq   *%rcx

  🐏
Memory
 Model

MEMORY CONSIDERATIONS
         🙜
 • OS Memory is big and full of faults
 • Address Space Layout Randomization
 • Relocatable Code

MEMORY MODELS
     🙜
 • Classic EForth 16-bit
   • poke / peek outside
   • 8-bit wrap around stacks
 • Win32Forth
   • Base Address in EDI
 • Direct (Gforth)
   • Signal handler

    📕
 Boostrap
Dictionary

BOOSTRAP APPROACHES
        🙜
 • Generate in Forth (which one?)
 • Generate in Python (Dr. Ting's early EForth)
 • Generate in JavaScript (Stenoforth)
 • Generate in C (Later EForth)

  X("+", PLUS, top += stack[(unsigned char)S--]) \
  X("INVERSE", INVERSE, top = -top-1) \
  X("NEGATE", NEGATE, top = 0 - top) \
  ...

 #define MACRO_LIST \
   X(BEGIN, pushR=P) \
   X(AGAIN, data[P++]=BRANCH; data[P++]=popR*sizeof(cell_t)) \
   X(UNTIL, data[P++]=QBRANCH; data[P++]=popR*sizeof(cell_t)) \
   X(WHILE, data[P++]=QBRANCH; data[P++]=0; k=popR; pushR=(P-1); pushR=k) \
   X(REPEAT, data[P++]=BRANCH; data[P++]=popR*sizeof(cell_t); \
     data[popR]=P*sizeof(cell_t)) \
   X(IF, data[P++]=QBRANCH; pushR=P; data[P++]=0) \
   X(ELSE, data[P++]=BRANCH; data[P++]=0; \
     data[popR]=P*sizeof(cell_t); pushR=P-1) \
   ...

int LBRAC=COLON_IMMEDIATE("[", DOLIT,INTER,TEVAL,STORE,EXIT);
int DOTOK=COLON(".OK", CR,DOLIT,INTER,TEVAL,AT,EQUAL,IF,
                  TOR,TOR,TOR,DUP,DOT,RFROM,DUP,DOT,RFROM,DUP,DOT,
                  RFROM,DUP,DOT,DOTQ," ok> ",THEN,EXIT);
int EVAL=COLON("EVAL", LBRAC,BEGIN,TOKEN,DUP,AT,WHILE,TEVAL,ATEXE,
               REPEAT,DROP,NOP,EXIT);
int QUIT=COLON("QUIT", LBRAC,BEGIN,DOTOK,QUERY,EVAL,AGAIN,EXIT);

 #define COLON_WITH_FLAGS(intOLON(...) COLON_WITH_FLAGS(0, __VA_ARGS__)
 #define COLON_IMMEDIATE(...) COLON_WITH_FLAGS(IMEDD, __VA_ARGS__)

int COLON_WITH_FLAGS(int flags, const char *name, ...) {
  HEADER(flags, name);
  int addr=IP;
  P=IP/sizeof(cell_t);
  data[P++] = as_DOLIST;
  va_list argList;
  va_start(argList, name);
  ...

IMPLEMENTATIONS
      🙜
 • Stenoforth
   • github.com/flagxor/stenoforth
 • ESPForth (EForth for ESP32)
   • github.com/flagxor/espforth
     • may_54 - with forth boot words in a data file
     • eforth63 - web support, with forth words built in C, macros
     • bg - web support, 2 core support, boot words in a data file
     • master - builds with the esp-idf toolchain (not arduino),
                or native linux

   What's up
with the slides?

PRESENTATION FORMAT
         🙤
 • Takahashi Method
   • Minimalistic presentation style
   • suckless →  sent →  websent
   • 200 LOC
 • Focus on content
   • Unicode text file as input
   • Show each paragraph fullscreen
   • @image.png for full page images

@stenoforth.png


trees
🌲🌲🌲🌲
 
evil
🙈🙉🙊
 
♤ spades
♥ hearts
♧ clubs
♦ diamonds

trees
🌲🌲🌲🌲

evil
🙈🙉🙊

♤ spades
♥ hearts
♧ clubs
♦ diamonds

QUESTIONS?
    ⚘
thank you!