EFORTH IMPLEMENTED IN C Brad Nelson 😷 Forth Day November 21, 2020 Why C? MOTIVATION 🙜 • Interface with the OS • Temptation of Libraries • Tame modern architectures • Suffer the tools we deserve • Fight complexity with its own tools BUT WHY _C_? 🙜 • Portable Assembler: x86, ARM, ESP32 • Basis for nearly all OS ABIs • Wide range of substantive libraries • Rust: Arch support good, but rides the same stack • Go: Tiny Go does ESP32, but LLVM at bottom 💱 Interface with the OS OS COMPLEXITY 🙜 • I live in a multi-OS world • OSX and Linux at Work • Windows and Linux at Home • Raspberry Pi • ESP8266/ESP32 RTOS OS COMPLEXITY 🙜 • OSes do more than file systems • Graphics with shaders • TCP/IP, Wi-Fi, BLE, Intents • Encyption, certificates • Image, audio, and video formats • Clipboards, icons, compression OS COMPLEXITY 🙜 • Complex executables • Dynamic linking • Multi-size icons • Fat binaries 📞 Calling Conventions LINUX/OSX x64 CONVENTION 🙜 Arguments: RDI, RSI, RDX, RCX, R8, R9 Caller saved: RAX, R10, R11 Callee saved: RBX, RBP, RSP, R12, R13, R14, R15 16-byte aligned stack, 128-byte red zone MICROSOFT x64 CONVENTION 🙜 Arguments: RCX, RDX, R8, R9 Caller saved: RAX, R10, R11 Callee saved: RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15 16-byte aligned stack, 32-byte shadow space typedef struct { int type; /* of event */ unsigned long serial; /* # of last request processed by server */ Bool send_event; /* true if this came from a SendEvent request */ Display *display; /* Display the event was read from */ Window window; /* "event" window it is reported relative to */ Window root; /* root window that the event occurred on */ Window subwindow; /* child window */ Time time; /* milliseconds */ int x, y; /* pointer x, y coordinates in event window */ int x_root, y_root; /* coordinates relative to root */ unsigned int state; /* key or button mask */ unsigned int keycode; /* detail */ Bool same_screen; /* same screen flag */ } XKeyEvent; class WiFiClient : public ESPLwIPClient { protected: std::shared_ptrclientSocketHandle; std::shared_ptr _rxBuffer; bool _connected; public: WiFiClient *next; WiFiClient(); WiFiClient(int fd); ~WiFiClient(); int connect(IPAddress ip, uint16_t port); int connect(IPAddress ip, uint16_t port, int32_t timeout); int connect(const char *host, uint16_t port); int connect(const char *host, uint16_t port, int32_t timeout); size_t write(uint8_t data); size_t write(const uint8_t *buf, size_t size); size_t write_P(PGM_P buf, size_t size); size_t write(Stream &stream); int available(); int read(); int read(uint8_t *buf, size_t size); int peek(); void flush(); void stop(); uint8_t connected(); ... Dynamic Linking ( Xlib functions / macros ) c-function BlackPixel BlackPixel a n -- n c-function DefaultColormap DefaultColormap a n -- n c-function DefaultScreen DefaultScreen a -- n c-function RootWindow RootWindow a n -- n c-function WhitePixel WhitePixel a n -- n c-function XCheckMaskEvent XCheckMaskEvent a n a -- n c-function XCreateGC XCreateGC a n n a -- a c-function XCreateImage XCreateImage a a n n n a n n n n -- a c-function XCreateSimpleWindow XCreateSimpleWindow a n n n n n n n n -- n c-function XDefaultDepth XDefaultDepth a n -- n c-function XDefaultVisual XDefaultVisual a n -- a c-function XDestroyImage XDestroyImage a -- void c-function XFlush XFlush a -- void c-function XLookupString XLookupString a a n a a -- n c-function XMapWindow XMapWindow a n -- void c-function XNextEvent XNextEvent a a -- void c-function XOpenDisplay XOpenDisplay a -- a c-function XPutImage XPutImage a n a a n n n n n n -- void c-function XSelectInput XSelectInput a n n -- void Structures ( Accessors for members of XEvent ) \c #define XEventSize() sizeof(XEvent) c-function XEventSize XEventSize -- n \c #define XEventType(e) (((XEvent*)(e))->type) c-function XEventType XEventType a -- n \c #define XEventConfigureWidth(e) (((XEvent*)(e))->xconfigure.width) c-function XEventConfigureWidth XEventConfigureWidth a -- n \c #define XEventConfigureHeight(e) (((XEvent*)(e))->xconfigure.height) c-function XEventConfigureHeight XEventConfigureHeight a -- n \c #define XEventKeyEvent(e) (&((XEvent*)(e))->xkey) c-function XEventKeyEvent XEventKeyEvent a -- a \c #define XEventKeycode(e) (((XEvent*)(e))->xkey.keycode) c-function XEventKeycode XEventKeycode a -- n \c #define XEventButton(e) (((XEvent*)(e))->xbutton.button) c-function XEventButton XEventButton a -- n \c #define XEventX(e) (((XEvent*)(e))->xmotion.x) c-function XEventX XEventX a -- n \c #define XEventY(e) (((XEvent*)(e))->xmotion.y) c-function XEventY XEventY a -- n \c #define XEventExposeCount(e) (((XEvent*)(e))->xexpose.count) c-function XEventExposeCount XEventExposeCount a -- n 🏗️ Code Generation CPU COMPLEXITY? 🙜 • Microarchitecture • 16-byte blocks of code • red-zones, floating point state • SIMD • 👻 Spectre 👻 NOP More than one way to do nothing! 90 6690 0f1f00 0f1f4000 0f1f440000 660f1f440000 0f1f8000000000 0f1f840000000000 660f1f840000000000 NOP 66 NOP NOP DWORD ptr [EAX] NOP DWORD ptr [EAX + 00H] NOP DWORD ptr [EAX + EAX*1 + 00H] 66 NOP DWORD ptr [EAX + EAX*1 + 00H] NOP DWORD ptr [EAX + 00000000H] NOP DWORD ptr [EAX + EAX*1 + 00000000H] 66 NOP DWORD ptr [EAX + EAX*1 + 00000000H] 🛠️ Trapped by Tools INTEL SYNTAX (nasm/yasm) 🙜 instr foo,segreg:[base+index*scale+disp] mov eax,[ebx+20h] add eax,[ebx+ecx*2h lea eax,[ebx+ecx] sub eax,[ebx+ecx*4h-20h] AT&T SYNTAX (GNU as) 🙜 instr %segreg:disp(base,index,scale),foo movl 0x20(%ebx),%eax addl (%ebx,%ecx,0x2),%eax leal (%ebx,%ecx),%eax subl -0x20(%ebx,%ecx,0x4),%eax ESP32 🙜 • ESP-IDF Toolchain - 727MB • LibC, RTOS, Wi-Fi, BLE • Really needs Linux • Arduino • Easy to Install • Package Manager • Enlarge who can play WINDOWS RESOURCES 🙜 • .png x n → .ico • .ico + .rc → .res / .obj • .obj x m → .EXE S😎LD So How? INTERPRETER APPROACHES 🙜 • Words as Functions • Token Threaded • Computed Goto • Hybrid MOVING FORTH 🙜 https://www.bradrodriguez.com /papers/moving1.htm DIRECT THREADED 🙜 (IP) -> W fetch memory pointed by IP into "W" register IP+2 -> IP advance IP (assuming 2-byte addresses) JP (W) jump to the address in the W register INDIRECT THREADED 🙜 (IP) -> W fetch memory pointed by IP into "W" register ...W now holds address of the Code Field IP+2 -> IP advance IP, just like a program counter (assuming 2-byte addresses in the thread) (W) -> X fetch memory pointed by W into "X" register ...X now holds address of the machine code JP (X) jump to the address in the X register Words as Functions void add() { tos += *sp; } memory[cfa] = &add; for(;;) { code = (void (*)()) *ip++; code(); } ADD: 0: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 7 7: 48 8b 00 mov (%rax),%rax a: 48 01 05 00 00 00 00 add %rax,0x0(%rip) # 11 11: c3 retq 12: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1) 19: 00 00 00 00 1d: 0f 1f 00 nopl (%rax) MAIN LOOP: 430: 48 8d 42 08 lea 0x8(%rdx),%rax 434: 48 8b 12 mov (%rdx),%rdx 437: 48 89 05 00 00 00 00 mov %rax,0x0(%rip) # 43e 43e: 31 c0 xor %eax,%eax 440: ff 12 callq *(%rdx) 442: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 449 449: eb e5 jmp 430 ❌ X-Macros X-MACROS 🙜 • Use the C Preprocessor • Lists of data to build code • Make the caller a parameter #define PRIMITIVE_LIST \ X("DROP", DROP, tos = *sp; --sp) \ X("SWAP", SWAP, w = tos; tos = *sp; *sp = w) \ X("AND", AND, tos = tos & *sp; --sp) \ X("OR", OR, tos = tos | *sp; --sp) \ enum { #define X(sname, name, code) OP_ ## name, PRIMITIVE_LIST #undef X }; FROM STENOFORTH 🙜 for (;;) { ir = ip[w]; ++ip; switch (ir & 0xff) { #define X(sname, name, code) case OP_ ## name: code; break; PRIMITIVE_LIST #undef X default: break; } break; } 🎟️ Token Threaded GCC (token) - AND 🙜 3b0: 48 89 f0 mov %rsi,%rax 3b3: 48 89 d7 mov %rdx,%rdi 3b6: 48 8d 76 f8 lea -0x8(%rsi),%rsi 3ba: 48 23 18 and (%rax),%rbx 3bd: e9 66 ff ff ff jmpq 328 3c2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) GCC (token) - DISPATCH 🙜 328: 48 83 3f 07 cmpq $0x7,(%rdi) 32c: 48 8d 57 08 lea 0x8(%rdi),%rdx 330: 77 2e ja 360 332: 48 8b 07 mov (%rdi),%rax 335: 48 63 04 81 movslq (%rcx,%rax,4),%rax 339: 48 01 c8 add %rcx,%rax 33c: ff e0 jmpq *%rax 33e: 66 90 xchg %ax,%ax 🙜 360: 4c 89 c8 mov %r9,%rax 363: 4d 8d 49 f8 lea -0x8(%r9),%r9 367: 48 89 50 f8 mov %rdx,-0x8(%rax) 36b: 48 8b 7a c0 mov -0x40(%rdx),%rdi 36f: eb b7 jmp 328 371: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) CLANG (token) - DISPATCH 🙜 2f0: 48 83 c6 f8 add $0xfffffffffffffff8,%rsi 2f4: 48 89 cf mov %rcx,%rdi 2f7: 48 8d 4f 08 lea 0x8(%rdi),%rcx 2fb: 48 8b 07 mov (%rdi),%rax 2fe: 48 83 c0 ff add $0xffffffffffffffff,%rax 302: 48 83 f8 06 cmp $0x6,%rax 306: 77 14 ja 31c 308: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8) 🙜 31c: 48 89 4a f8 mov %rcx,-0x8(%rdx) 320: 48 83 c2 f8 add $0xfffffffffffffff8,%rdx 324: 48 8b 7f c8 mov -0x38(%rdi),%rdi 328: eb cd jmp 2f7 CLANG (token) - AND & OR 🙜 32a: 4c 23 06 and (%rsi),%r8 32d: eb c1 jmp 2f0 32f: 4c 0b 06 or (%rsi),%r8 332: eb bc jmp 2f0 💻↷ Computed Goto and: tos &= *sp--; NEXT; memory[cfa] = (cell_t) && and; ... goto memory[xyz]; Must be in the same function! #define IDT_NEXT \ w = (cell_t *) *ip++; \ x = (cell_t *) *w; \ goto *x; #define DT_NEXT \ w = (cell_t *) *ip++; \ goto *w; #define CORE_WORDS \ X(dup, *++sp = tos) \ X(add, tos += *sp--) \ X(and, tos &= *sp--) \ X(or, tos |= *sp--) \ X(dolit, *++sp = tos; tos = *ip++) \ X(docolon, *++rp = (cell_t) ip; ip = ((cell_t *) w) + 1) \ X(doexit, ip = (cell_t *) *rp--) \ X(load, tos = * (cell_t *) tos) \ X(store, *(cell_t *) tos = *sp--; tos = *sp--) \ #define COMMON_REGS \ register cell_t *ip asm("rdi") = memory; \ register cell_t *sp asm("rsi") = dstack; \ register cell_t *rp asm("r9") = rstack; \ register cell_t tos asm("rbx") = *sp--; \ register cell_t *w asm("rcx"); \ int i = 0; \ #define X(name, code) memory[i++] = (cell_t) && name; CORE_WORDS #undef X #define X(name, code) name: code; IDT_NEXT; CORE_WORDS #undef X GCC (direct) 🙜 2d0: 48 89 f0 mov %rsi,%rax 2d3: 48 8d 76 f8 lea -0x8(%rsi),%rsi 2d7: 48 23 18 and (%rax),%rbx 2da: 48 89 f8 mov %rdi,%rax 2dd: 48 8d 7f 08 lea 0x8(%rdi),%rdi 2e1: 48 8b 00 mov (%rax),%rax 2e4: 48 89 c1 mov %rax,%rcx 2e7: ff e0 jmpq *%rax 2e9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) GCC (indirect) 🙜 c0: 48 89 f8 mov %rdi,%rax c3: 48 8d 7f 08 lea 0x8(%rdi),%rdi c7: 48 8b 00 mov (%rax),%rax ca: 48 89 c1 mov %rax,%rcx cd: 48 8b 00 mov (%rax),%rax d0: ff e0 jmpq *%rax d2: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 🙜 160: 48 89 f0 mov %rsi,%rax 163: 48 8d 76 f8 lea -0x8(%rsi),%rsi 167: 48 23 18 and (%rax),%rbx 16a: e9 51 ff ff ff jmpq c0 16f: 90 nop CLANG (indirect) 🙜 d0: 48 23 06 and (%rsi),%rax d3: 48 83 c6 f8 add $0xfffffffffffffff8,%rsi d7: 48 8b 0f mov (%rdi),%rcx da: 48 83 c7 08 add $0x8,%rdi de: ff 21 jmpq *(%rcx) CLANG (direct) 🙜 240: 48 23 06 and (%rsi),%rax 243: 48 83 c6 f8 add $0xfffffffffffffff8,%rsi 247: 48 8b 0f mov (%rdi),%rcx 24a: 48 83 c7 08 add $0x8,%rdi 24e: ff e1 jmpq *%rcx 🐏 Memory Model MEMORY CONSIDERATIONS 🙜 • OS Memory is big and full of faults • Address Space Layout Randomization • Relocatable Code MEMORY MODELS 🙜 • Classic EForth 16-bit • poke / peek outside • 8-bit wrap around stacks • Win32Forth • Base Address in EDI • Direct (Gforth) • Signal handler 📕 Boostrap Dictionary BOOSTRAP APPROACHES 🙜 • Generate in Forth (which one?) • Generate in Python (Dr. Ting's early EForth) • Generate in JavaScript (Stenoforth) • Generate in C (Later EForth) X("+", PLUS, top += stack[(unsigned char)S--]) \ X("INVERSE", INVERSE, top = -top-1) \ X("NEGATE", NEGATE, top = 0 - top) \ ... #define MACRO_LIST \ X(BEGIN, pushR=P) \ X(AGAIN, data[P++]=BRANCH; data[P++]=popR*sizeof(cell_t)) \ X(UNTIL, data[P++]=QBRANCH; data[P++]=popR*sizeof(cell_t)) \ X(WHILE, data[P++]=QBRANCH; data[P++]=0; k=popR; pushR=(P-1); pushR=k) \ X(REPEAT, data[P++]=BRANCH; data[P++]=popR*sizeof(cell_t); \ data[popR]=P*sizeof(cell_t)) \ X(IF, data[P++]=QBRANCH; pushR=P; data[P++]=0) \ X(ELSE, data[P++]=BRANCH; data[P++]=0; \ data[popR]=P*sizeof(cell_t); pushR=P-1) \ ... int LBRAC=COLON_IMMEDIATE("[", DOLIT,INTER,TEVAL,STORE,EXIT); int DOTOK=COLON(".OK", CR,DOLIT,INTER,TEVAL,AT,EQUAL,IF, TOR,TOR,TOR,DUP,DOT,RFROM,DUP,DOT,RFROM,DUP,DOT, RFROM,DUP,DOT,DOTQ," ok> ",THEN,EXIT); int EVAL=COLON("EVAL", LBRAC,BEGIN,TOKEN,DUP,AT,WHILE,TEVAL,ATEXE, REPEAT,DROP,NOP,EXIT); int QUIT=COLON("QUIT", LBRAC,BEGIN,DOTOK,QUERY,EVAL,AGAIN,EXIT); #define COLON_WITH_FLAGS(intOLON(...) COLON_WITH_FLAGS(0, __VA_ARGS__) #define COLON_IMMEDIATE(...) COLON_WITH_FLAGS(IMEDD, __VA_ARGS__) int COLON_WITH_FLAGS(int flags, const char *name, ...) { HEADER(flags, name); int addr=IP; P=IP/sizeof(cell_t); data[P++] = as_DOLIST; va_list argList; va_start(argList, name); ... IMPLEMENTATIONS 🙜 • Stenoforth • github.com/flagxor/stenoforth • ESPForth (EForth for ESP32) • github.com/flagxor/espforth • may_54 - with forth boot words in a data file • eforth63 - web support, with forth words built in C, macros • bg - web support, 2 core support, boot words in a data file • master - builds with the esp-idf toolchain (not arduino), or native linux What's up with the slides? PRESENTATION FORMAT 🙤 • Takahashi Method • Minimalistic presentation style • suckless → sent → websent • 200 LOC • Focus on content • Unicode text file as input • Show each paragraph fullscreen • @image.png for full page images @stenoforth.png trees 🌲🌲🌲🌲 evil 🙈🙉🙊 ♤ spades ♥ hearts ♧ clubs ♦ diamonds trees 🌲🌲🌲🌲 evil 🙈🙉🙊 ♤ spades ♥ hearts ♧ clubs ♦ diamonds QUESTIONS? ⚘ thank you!