| 1 | Perfect! I have gathered all the configuration information. Let me create a comprehensive summary report for you. |
| 2 | |
| 3 | ## Build & Toolchain Configuration Report: stripe-dashboard |
| 4 | |
| 5 | ### Project Type: Single Crate with Local Dependencies |
| 6 | The project is **not a workspace** but a single ESP32 binary crate with local path dependencies. It includes multiple supporting crates (sprite-gen, display-core, stripe-core, simulator, guide-capture, web-sim). |
| 7 | |
| 8 | --- |
| 9 | |
| 10 | ## 1. Cargo.toml - Main Package |
| 11 | |
| 12 | **File**: `/Users/[REDACTED]/Projects/stripe-dashboard/Cargo.toml` |
| 13 | |
| 14 | ```toml |
| 15 | [package] |
| 16 | name = "stripe-dashboard" |
| 17 | version = "0.2.0" |
| 18 | edition = "2021" |
| 19 | resolver = "2" |
| 20 | |
| 21 | [dependencies] |
| 22 | esp-idf-hal = "0.45" |
| 23 | esp-idf-svc = { version = "0.51", features = ["experimental"] } |
| 24 | esp-idf-sys = "0.36" |
| 25 | embedded-hal = "1.0" |
| 26 | embedded-svc = "0.28" |
| 27 | smart-leds = "0.4" |
| 28 | ws2812-esp32-rmt-driver = { version = "0.13", features = ["smart-leds-trait"] } |
| 29 | display-core = { path = "display-core" } |
| 30 | stripe-core = { path = "stripe-core", features = ["logging"] } |
| 31 | serde = { version = "1", default-features = false, features = ["derive"] } |
| 32 | serde_json = { version = "1", default-features = false, features = ["alloc"] } |
| 33 | log = "0.4" |
| 34 | base64 = { version = "0.22", default-features = false, features = ["alloc"] } |
| 35 | anyhow = "1" |
| 36 | libc = "0.2" |
| 37 | |
| 38 | [features] |
| 39 | auto-update = [] |
| 40 | |
| 41 | [build-dependencies] |
| 42 | embuild = { version = "0.32", features = ["espidf"] } |
| 43 | flate2 = "1" |
| 44 | tera = { version = "1", default-features = false } |
| 45 | |
| 46 | [profile.dev] |
| 47 | opt-level = 2 # WiFi requires opt-level >= 2 |
| 48 | debug = true |
| 49 | |
| 50 | [profile.release] |
| 51 | opt-level = 2 |
| 52 | debug = true # Keep debug symbols for stack traces |
| 53 | lto = "off" # ESP-IDF doesn't support LTO well |
| 54 | ``` |
| 55 | |
| 56 | **Key Dependencies Summary**: |
| 57 | - **esp-idf-hal 0.45** - Hardware abstraction layer |
| 58 | - **esp-idf-svc 0.51** - System services (with experimental features) |
| 59 | - **esp-idf-sys 0.36** - Low-level bindings |
| 60 | - **smart-leds 0.4** - LED control traits |
| 61 | - **ws2812-esp32-rmt-driver 0.13** - WS2812B LED driver (with smart-leds-trait) |
| 62 | - **Local deps**: display-core, stripe-core (with logging feature) |
| 63 | - **Serialization**: serde 1.x, serde_json 1.x (both with default-features=false) |
| 64 | - **Build tools**: embuild 0.32, flate2 1, tera 1 |
| 65 | |
| 66 | --- |
| 67 | |
| 68 | ## 2. .cargo/config.toml - Cargo Configuration |
| 69 | |
| 70 | **File**: `/Users/[REDACTED]/Projects/stripe-dashboard/.cargo/config.toml` |
| 71 | |
| 72 | ```toml |
| 73 | [build] |
| 74 | target = "xtensa-esp32-espidf" |
| 75 | |
| 76 | [target.xtensa-esp32-espidf] |
| 77 | linker = "ldproxy" |
| 78 | runner = "espflash flash --monitor" |
| 79 | |
| 80 | [unstable] |
| 81 | build-std = ["std", "panic_abort"] |
| 82 | |
| 83 | [env] |
| 84 | MCU = "esp32" |
| 85 | ESP_IDF_VERSION = "v5.3" |
| 86 | ``` |
| 87 | |
| 88 | **Key Settings**: |
| 89 | - **Target**: `xtensa-esp32-espidf` (custom LLVM fork for Xtensa architecture) |
| 90 | - **Linker**: `ldproxy` (wrapper around ESP-IDF linker) |
| 91 | - **Runner**: `espflash flash --monitor` (automatic flashing and serial monitoring) |
| 92 | - **Build STD**: Builds custom std with panic_abort instead of unwinding |
| 93 | - **ESP-IDF Version**: v5.3 |
| 94 | |
| 95 | --- |
| 96 | |
| 97 | ## 3. build.rs - Build Script |
| 98 | |
| 99 | **File**: `/Users/[REDACTED]/Projects/stripe-dashboard/build.rs` |
| 100 | |
| 101 | ```rust |
| 102 | use flate2::write::GzEncoder; |
| 103 | use flate2::Compression; |
| 104 | use std::io::Write; |
| 105 | use std::path::PathBuf; |
| 106 | use tera::{Context, Tera}; |
| 107 | |
| 108 | fn render_template(name: &str, source: &str, ctx: &Context) -> String { |
| 109 | let mut tera = Tera::default(); |
| 110 | tera.add_raw_template(name, source) |
| 111 | .unwrap_or_else(|e| panic!("Failed to parse template {}: {}", name, e)); |
| 112 | tera.render(name, ctx) |
| 113 | .unwrap_or_else(|e| panic!("Failed to render template {}: {}", name, e)) |
| 114 | } |
| 115 | |
| 116 | fn main() { |
| 117 | embuild::espidf::sysenv::output(); |
| 118 | |
| 119 | // Re-run if the partition table changes so the build picks up updates. |
| 120 | println!("cargo:rerun-if-changed=partitions.csv"); |
| 121 | println!("cargo:rerun-if-changed=partitions-4mb.csv"); |
| 122 | println!("cargo:rerun-if-changed=release.toml"); |
| 123 | println!("cargo:rerun-if-changed=web/setup.html"); |
| 124 | println!("cargo:rerun-if-changed=web/index.html"); |
| 125 | |
| 126 | // Pass RELEASE_NAME env var through to the crate (set by `make release`). |
| 127 | if let Ok(name) = std::env::var("RELEASE_NAME") { |
| 128 | println!("cargo:rustc-env=RELEASE_NAME={}", name); |
| 129 | } |
| 130 | |
| 131 | // Pass FLASH_VARIANT (8mb/4mb) so the firmware knows which OTA asset to download. |
| 132 | println!( |
| 133 | "cargo:rustc-env=FLASH_VARIANT={}", |
| 134 | std::env::var("FLASH_VARIANT").unwrap_or_else(|_| "8mb".into()) |
| 135 | ); |
| 136 | println!("cargo:rerun-if-env-changed=FLASH_VARIANT"); |
| 137 | |
| 138 | // Template context for web portal HTML. |
| 139 | let mut ctx = Context::new(); |
| 140 | |
| 141 | // Stripe App install link: set STRIPE_APP_URL to include "Quick Setup" |
| 142 | // sections in the web portal. When unset, those sections are hidden and |
| 143 | // only the manual key setup flow is shown. |
| 144 | let stripe_app_url = std::env::var("STRIPE_APP_URL").ok(); |
| 145 | println!("cargo:rerun-if-env-changed=STRIPE_APP_URL"); |
| 146 | ctx.insert("stripe_app_url", &stripe_app_url); |
| 147 | |
| 148 | // Auto-update UI sections are only included when the feature is enabled. |
| 149 | let has_auto_update = std::env::var("CARGO_FEATURE_AUTO_UPDATE").is_ok(); |
| 150 | ctx.insert("auto_update", &has_auto_update); |
| 151 | |
| 152 | // Gzip-compress web portal HTML at build time. |
| 153 | // Served with Content-Encoding: gzip — browsers decompress transparently. |
| 154 | let out_dir = PathBuf::from(std::env::var("OUT_DIR").unwrap()); |
| 155 | for name in &["setup", "index"] { |
| 156 | let raw = std::fs::read_to_string(format!("web/{}.html", name)) |
| 157 | .unwrap_or_else(|e| panic!("Failed to read web/{}.html: {}", name, e)); |
| 158 | |
| 159 | let html = render_template(name, &raw, &ctx); |
| 160 | |
| 161 | let gz_path = out_dir.join(format!("{}.html.gz", name)); |
| 162 | let file = std::fs::File::create(&gz_path).unwrap(); |
| 163 | let mut encoder = GzEncoder::new(file, Compression::best()); |
| 164 | encoder.write_all(html.as_bytes()).unwrap(); |
| 165 | encoder.finish().unwrap(); |
| 166 | } |
| 167 | } |
| 168 | ``` |
| 169 | |
| 170 | **Key Functions**: |
| 171 | - Calls `embuild::espidf::sysenv::output()` to set up ESP-IDF environment variables |
| 172 | - Watches partition tables and web HTML files for changes |
| 173 | - Passes RELEASE_NAME and FLASH_VARIANT (8mb/4mb) to the crate as env vars |
| 174 | - Renders web portal HTML with Tera templates (conditional Stripe app URL, auto-update UI) |
| 175 | - Gzip-compresses web/setup.html and web/index.html for efficient serving |
| 176 | |
| 177 | --- |
| 178 | |
| 179 | ## 4. sdkconfig.defaults - ESP-IDF Configuration (8MB Flash) |
| 180 | |
| 181 | **File**: `/Users/[REDACTED]/Projects/stripe-dashboard/sdkconfig.defaults` |
| 182 | |
| 183 | ``` |
| 184 | # WiFi |
| 185 | CONFIG_ESP_WIFI_ENABLED=y |
| 186 | # Pin main task to CPU1 so RMT interrupt (LED matrix) runs on a different |
| 187 | # core from WiFi and system interrupts (CPU0), preventing WS2812B signal corruption |
| 188 | CONFIG_ESP_MAIN_TASK_AFFINITY_CPU1=y |
| 189 | |
| 190 | # TLS - use certificate bundle for HTTPS client connections |
| 191 | CONFIG_MBEDTLS_CERTIFICATE_BUNDLE=y |
| 192 | CONFIG_MBEDTLS_CERTIFICATE_BUNDLE_DEFAULT_FULL=y |
| 193 | |
| 194 | # HTTP server config |
| 195 | CONFIG_HTTPD_MAX_REQ_HDR_LEN=1024 |
| 196 | CONFIG_HTTPD_MAX_URI_LEN=512 |
| 197 | |
| 198 | # Partition table - custom OTA layout (two 3MB app slots) |
| 199 | CONFIG_PARTITION_TABLE_CUSTOM=y |
| 200 | CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="partitions.csv" |
| 201 | |
| 202 | # OTA rollback - revert to previous firmware if new one fails to boot |
| 203 | CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y |
| 204 | |
| 205 | # Flash size |
| 206 | CONFIG_ESPTOOLPY_FLASHSIZE_8MB=y |
| 207 | |
| 208 | # NTP/SNTP |
| 209 | CONFIG_LWIP_SNTP_MAX_SERVERS=2 |
| 210 | |
| 211 | # Stack sizes - bump for TLS |
| 212 | CONFIG_ESP_MAIN_TASK_STACK_SIZE=16384 |
| 213 | CONFIG_PTHREAD_TASK_STACK_SIZE_DEFAULT=4096 |
| 214 | |
| 215 | # Logging |
| 216 | CONFIG_LOG_DEFAULT_LEVEL_INFO=y |
| 217 | ``` |
| 218 | |
| 219 | --- |
| 220 | |
| 221 | ## 5. sdkconfig-4mb.defaults - ESP-IDF Configuration (4MB Flash Variant) |
| 222 | |
| 223 | **File**: `/Users/[REDACTED]/Projects/stripe-dashboard/sdkconfig-4mb.defaults` |
| 224 | |
| 225 | ``` |
| 226 | # 4MB flash overrides (layered on top of sdkconfig.defaults) |
| 227 | CONFIG_ESPTOOLPY_FLASHSIZE_4MB=y |
| 228 | |
| 229 | # Use common CA bundle (~50% smaller) to fit in tighter 4MB partitions. |
| 230 | # Covers 99% of sites including Stripe (DigiCert). |
| 231 | CONFIG_MBEDTLS_CERTIFICATE_BUNDLE_DEFAULT_CMN=y |
| 232 | |
| 233 | # No rollback on 4MB — slots are too tight for the overhead |
| 234 | CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=n |
| 235 | ``` |
| 236 | |
| 237 | --- |
| 238 | |
| 239 | ## 6. Partition Tables |
| 240 | |
| 241 | ### partitions.csv (8MB Flash - Default) |
| 242 | **File**: `/Users/[REDACTED]/Projects/stripe-dashboard/partitions.csv` |
| 243 | |
| 244 | ``` |
| 245 | # ESP-IDF Partition Table — OTA support (8MB flash) |
| 246 | # Name, Type, SubType, Offset, Size, Flags |
| 247 | nvs, data, nvs, 0x9000, 0x6000, |
| 248 | otadata, data, ota, 0xf000, 0x2000, |
| 249 | phy_init, data, phy, 0x11000, 0x1000, |
| 250 | ota_0, app, ota_0, 0x20000, 0x300000, |
| 251 | ota_1, app, ota_1, 0x320000, 0x300000, |
| 252 | ``` |
| 253 | |
| 254 | Layout: NVS (24KB) → OTA data (8KB) → PHY init (4KB) → Two 3MB app slots for OTA updates |
| 255 | |
| 256 | ### partitions-4mb.csv (4MB Flash Variant) |
| 257 | **File**: `/Users/[REDACTED]/Projects/stripe-dashboard/partitions-4mb.csv` |
| 258 | |
| 259 | ``` |
| 260 | # ESP-IDF Partition Table — Dual OTA (4MB flash) |
| 261 | # Name, Type, SubType, Offset, Size |
| 262 | nvs, data, nvs, 0x9000, 0x6000, |
| 263 | otadata, data, ota, 0xf000, 0x2000, |
| 264 | phy_init, data, phy, 0x11000, 0x1000, |
| 265 | ota_0, app, ota_0, 0x20000, 0x1F0000, |
| 266 | ota_1, app, ota_1, 0x210000, 0x1F0000, |
| 267 | ``` |
| 268 | |
| 269 | Layout: Same structure but with 1.97MB app slots instead of 3MB (tighter fit for 4MB chips) |
| 270 | |
| 271 | --- |
| 272 | |
| 273 | ## 7. Makefile - Build Automation |
| 274 | |
| 275 | **File**: `/Users/[REDACTED]/Projects/stripe-dashboard/Makefile` |
| 276 | |
| 277 | **Key Variables**: |
| 278 | ```makefile |
| 279 | BINARY := target/xtensa-esp32-espidf/debug/stripe-dashboard |
| 280 | RELEASE_BINARY := target/xtensa-esp32-espidf/release/stripe-dashboard |
| 281 | |
| 282 | export PATH := $(HOME)/.cargo/bin:$(HOME)/.rustup/toolchains/esp/bin:$(HOME)/.rustup/toolchains/esp/xtensa-esp-elf/esp-15.2.0_20250920/xtensa-esp-elf/bin:$(PATH) |
| 283 | export LIBCLANG_PATH := $(HOME)/.rustup/toolchains/esp/xtensa-esp32-elf-clang/esp-20.1.1_20250829/esp-clang/lib |
| 284 | export RUSTUP_TOOLCHAIN := esp |
| 285 | ``` |
| 286 | |
| 287 | **Toolchain Details**: |
| 288 | - Uses **esp** rustup toolchain (not nightly) |
| 289 | - Xtensa GCC: **esp-15.2.0_20250920** |
| 290 | - Clang: **esp-20.1.1_20250829** |
| 291 | |
| 292 | **Key Targets**: |
| 293 | - `make build` - Build for 8MB (FLASH_VARIANT=8mb) |
| 294 | - `make build-4mb` - Build for 4MB (FLASH_VARIANT=4mb) |
| 295 | - `make flash` / `make flash-4mb` - Build + flash with auto-retry |
| 296 | - `make release` / `make release-4mb` - Create release binaries with version info |
| 297 | - `make test` - Run tests in sprite-gen, stripe-core, display-core (all with stable toolchain) |
| 298 | - `make sim` - Run simulator with stable toolchain |
| 299 | - `make web` - Build web-sim with wasm-pack |
| 300 | - `make monitor` - Serial monitoring with espflash |
| 301 | - `make erase` - Erase flash with esptool |
| 302 | - `make clean` - Clean builds |
| 303 | |
| 304 | **Build Configuration**: |
| 305 | - Sets `FLASH_VARIANT` env var (8mb/4mb) that gets passed to build.rs |
| 306 | - Dynamically creates `target/sdkconfig.partitions` with partition filename |
| 307 | - Supports custom `STRIPE_APP_URL` and `AUTO_UPDATE` features |
| 308 | |
| 309 | --- |
| 310 | |
| 311 | ## 8. Local Dependencies Structure |
| 312 | |
| 313 | ### display-core |
| 314 | **File**: `/Users/[REDACTED]/Projects/stripe-dashboard/display-core/Cargo.toml` |
| 315 | |
| 316 | ```toml |
| 317 | [package] |
| 318 | name = "display-core" |
| 319 | version = "0.1.0" |
| 320 | edition = "2021" |
| 321 | |
| 322 | [dependencies] |
| 323 | smart-leds-trait = "0.3" |
| 324 | |
| 325 | [build-dependencies] |
| 326 | sprite-gen = { path = "../sprite-gen" } |
| 327 | ``` |
| 328 | |
| 329 | ### stripe-core |
| 330 | **File**: `/Users/[REDACTED]/Projects/stripe-dashboard/stripe-core/Cargo.toml` |
| 331 | |
| 332 | ```toml |
| 333 | [package] |
| 334 | name = "stripe-core" |
| 335 | version = "0.1.0" |
| 336 | edition = "2021" |
| 337 | |
| 338 | [dependencies] |
| 339 | serde = { version = "1", features = ["derive"] } |
| 340 | serde_json = "1" |
| 341 | log = { version = "0.4", optional = true } |
| 342 | |
| 343 | [features] |
| 344 | default = [] |
| 345 | logging = ["log"] |
| 346 | ``` |
| 347 | |
| 348 | --- |
| 349 | |
| 350 | ## 9. Key Configuration Summary for Replication |
| 351 | |
| 352 | **For a new Rust ESP32 project, you'll need**: |
| 353 | |
| 354 | 1. **Rust Toolchain**: ESP-specific toolchain (installed via rustup with `esp` profile) |
| 355 | 2. **Target**: `xtensa-esp32-espidf` |
| 356 | 3. **ESP-IDF**: v5.3 (managed by esp-idf-sys via embuild) |
| 357 | 4. **Edition**: 2021 with resolver = "2" |
| 358 | 5. **Critical Dependencies**: |
| 359 | - esp-idf-hal 0.45 |
| 360 | - esp-idf-svc 0.51 (with experimental features) |
| 361 | - esp-idf-sys 0.36 |
| 362 | - embuild 0.32 (with espidf feature) |
| 363 | |
| 364 | 6. **Build Profile Requirements**: |
| 365 | - dev: opt-level = 2 (WiFi requirement), debug = true |
| 366 | - release: opt-level = 2, debug = true, lto = "off" |
| 367 | |
| 368 | 7. **Linker & Runner**: |
| 369 | - Linker: ldproxy |
| 370 | - Runner: espflash flash --monitor |
| 371 | - Build-std: ["std", "panic_abort"] |
| 372 | |
| 373 | 8. **Optional Configurations**: |
| 374 | - Support for both 8MB and 4MB flash variants |
| 375 | - Custom partition tables with OTA support |
| 376 | - Dynamic feature flags (auto-update) |
| 377 | - Build scripts for gzip compression and templating |
| 378 | |
| 379 | All absolute file paths have been provided for reference. |
| 380 | agentId: a9c2407e119714d12 (for resuming to continue this agent's work if needed) |
| 381 | <usage>total_tokens: 28480 |
| 382 | tool_uses: 17 |
| 383 | duration_ms: 51536</usage> |
| 1 | User has approved your plan. You can now start coding. Start with updating your todo list if applicable |
| 2 | |
| 3 | Your plan has been saved to: /Users/[REDACTED]/.claude/plans/joyful-cuddling-jellyfish.md |
| 4 | You can refer back to it if needed during implementation. |
| 5 | |
| 6 | ## Approved Plan: |
| 7 | # ESP32 Rust Port of microgpt |
| 8 | |
| 9 | ## Context |
| 10 | |
| 11 | Karpathy's [microgpt](https://karpathy.github.io/2026/02/12/microgpt/) is a ~200-line pure Python GPT that trains and runs inference with zero dependencies. It includes a scalar autograd engine, a 1-layer transformer (~4,192 params), and trains on a names dataset. The goal is to port this to Rust and run it on an ESP32 — training and inference on-device, output via serial monitor. |
| 12 | |
| 13 | The existing [stripe-dashboard](/Users/[REDACTED]/Projects/stripe-dashboard) project provides a proven Rust+ESP32 build setup to replicate. |
| 14 | |
| 15 | ## Key Design Decision: No Scalar Autograd |
| 16 | |
| 17 | The Python version creates a computation graph node per scalar operation (~30K-50K nodes per forward pass, 1-2MB). This exceeds ESP32's 520KB SRAM. Instead, we implement **explicit matrix-level forward and backward passes**, storing only the activations needed for backprop (~27KB). This is the standard approach in production ML frameworks and keeps memory bounded. |
| 18 | |
| 19 | ## Model Specs (matching the Python original) |
| 20 | |
| 21 | | Parameter | Value | |
| 22 | |---|---| |
| 23 | | Embedding dim | 16 | |
| 24 | | Attention heads | 4 | |
| 25 | | Layers | 1 | |
| 26 | | Block size | 16 | |
| 27 | | Vocab size | 27 (a-z + BOS) | |
| 28 | | Total params | ~4,192 | |
| 29 | | Training steps | 1,000 | |
| 30 | | Optimizer | Adam (lr=0.01, β1=0.85, β2=0.99) | |
| 31 | |
| 32 | ## Memory Budget (~520KB SRAM, ~300KB usable without WiFi) |
| 33 | |
| 34 | - Model parameters: 4,192 × 4 bytes = **~17KB** |
| 35 | - Gradients: **~17KB** |
| 36 | - Adam state (m + v): **~34KB** |
| 37 | - Activations cache for backprop: **~27KB** |
| 38 | - Dataset line offsets: **~5KB** (indices only; raw text stays in flash) |
| 39 | - Stack + overhead: **~50KB** |
| 40 | - **Total: ~150KB** — fits comfortably |
| 41 | |
| 42 | ## Project Structure |
| 43 | |
| 44 | ``` |
| 45 | esp32gpt/ |
| 46 | ├── .cargo/ |
| 47 | │ └── config.toml # ESP32 target, linker, build-std |
| 48 | ├── src/ |
| 49 | │ ├── main.rs # ESP entry point, training loop, inference loop |
| 50 | │ ├── tensor.rs # Simple Matrix struct (Vec<f32> + shape), basic ops |
| 51 | │ ├── model.rs # GPT forward pass, parameter storage, weight init |
| 52 | │ ├── backward.rs # Manual backward pass for all ops (the hard part) |
| 53 | │ ├── optimizer.rs # Adam optimizer over flat parameter buffer |
| 54 | │ ├── tokenizer.rs # Char-level encode/decode (a-z + BOS) |
| 55 | │ └── rng.rs # Xorshift32 PRNG + Box-Muller for Gaussian init |
| 56 | ├── data/ |
| 57 | │ └── names.txt # Training dataset (embedded via include_str!) |
| 58 | ├── Cargo.toml |
| 59 | ├── build.rs # embuild ESP-IDF setup |
| 60 | ├── sdkconfig.defaults # No WiFi, generous stack |
| 61 | ├── partitions.csv # Single app partition (no OTA needed) |
| 62 | └── Makefile # Build/flash/monitor commands |
| 63 | ``` |
| 64 | |
| 65 | ## Implementation Steps |
| 66 | |
| 67 | ### Step 1: Project scaffolding |
| 68 | Create the ESP32 project skeleton replicating the build setup from stripe-dashboard: |
| 69 | - `Cargo.toml` with esp-idf-hal/svc/sys, embuild, log, anyhow |
| 70 | - `.cargo/config.toml` targeting xtensa-esp32-espidf |
| 71 | - `build.rs` calling `embuild::espidf::sysenv::output()` |
| 72 | - `sdkconfig.defaults` disabling WiFi, setting stack size to 16384 |
| 73 | - `partitions.csv` with a single app partition |
| 74 | - `Makefile` with build/flash/monitor targets |
| 75 | - Minimal `main.rs` that boots and logs "hello" to serial |
| 76 | |
| 77 | ### Step 2: Core math — `tensor.rs` |
| 78 | Simple `Matrix` struct: |
| 79 | - `data: Vec<f32>`, `rows: usize`, `cols: usize` |
| 80 | - Operations: matmul, add, element-wise multiply, transpose, softmax, ReLU, scaled |
| 81 | - All operations return new matrices (no in-place mutation needed at this scale) |
| 82 | |
| 83 | ### Step 3: Tokenizer — `tokenizer.rs` |
| 84 | - `encode(name: &str) -> Vec<usize>` — BOS (0) + char indices (a=1..z=26) |
| 85 | - `decode(token: usize) -> char` |
| 86 | - `VOCAB_SIZE = 27`, `BOS = 0` |
| 87 | |
| 88 | ### Step 4: RNG — `rng.rs` |
| 89 | - Xorshift32 PRNG (seeded from ESP32 hardware RNG or fixed seed) |
| 90 | - `next_f32()` → uniform [0, 1) |
| 91 | - `next_gaussian()` → Box-Muller transform for weight initialization |
| 92 | - `sample_from_probs(probs: &[f32]) -> usize` → categorical sampling |
| 93 | |
| 94 | ### Step 5: Model forward pass — `model.rs` |
| 95 | Flat parameter buffer with named offset ranges: |
| 96 | - `wte` — token embedding (27 × 16) |
| 97 | - `wpe` — position embedding (16 × 16) |
| 98 | - `wq`, `wk`, `wv`, `wo` — attention projections (16 × 16 each) |
| 99 | - `w1` — FFN up-projection (16 × 64) |
| 100 | - `w2` — FFN down-projection (64 × 16) |
| 101 | - `wout` — output projection (16 × 27) |
| 102 | |
| 103 | Forward pass processes tokens sequentially (like the Python KV cache approach): |
| 104 | 1. Look up token + position embeddings, sum them |
| 105 | 2. Compute Q for current position, K and V appended to cache |
| 106 | 3. Attention: Q @ K^T / sqrt(d), causal mask, softmax, @ V |
| 107 | 4. Residual connection |
| 108 | 5. FFN: ReLU(x @ W1) @ W2 + residual |
| 109 | 6. Output logits: hidden @ Wout |
| 110 | |
| 111 | Store all intermediate activations in a cache struct for backward pass. |
| 112 | |
| 113 | ### Step 6: Backward pass — `backward.rs` |
| 114 | Manual gradient computation mirroring each forward step in reverse: |
| 115 | - Cross-entropy loss gradient → output projection grad |
| 116 | - FFN backward (with ReLU mask) |
| 117 | - Attention backward (Q/K/V grads with accumulated KV cache grads) |
| 118 | - Embedding gradients (scatter-add into wte/wpe grad rows) |
| 119 | |
| 120 | This is the most complex and error-prone file. We'll validate correctness with numerical gradient checking in tests. |
| 121 | |
| 122 | ### Step 7: Adam optimizer — `optimizer.rs` |
| 123 | Simple loop over the flat parameter + gradient buffers: |
| 124 | - First moment: `m = β1 * m + (1 - β1) * grad` |
| 125 | - Second moment: `v = β2 * v + (1 - β2) * grad²` |
| 126 | - Bias correction + parameter update |
| 127 | - Learning rate with linear decay over 1000 steps |
| 128 | |
| 129 | ### Step 8: Training loop + inference — `main.rs` |
| 130 | - Load dataset from `include_str!("../data/names.txt")` |
| 131 | - Parse into list of encoded names |
| 132 | - Training loop: sample random name, forward, loss, backward, optimizer step |
| 133 | - Every 100 steps: log loss to serial |
| 134 | - After training: generate 20 names via autoregressive sampling with temperature=0.8 |
| 135 | - Log generated names to serial |
| 136 | |
| 137 | ### Step 9: Testing |
| 138 | - All core modules (tensor, model, backward, optimizer, tokenizer, rng) are ESP-independent |
| 139 | - `cargo test` on host with numerical gradient verification for backward pass |
| 140 | - Flash to ESP32 and verify training + generation via serial monitor |
| 141 | |
| 142 | ## Files to Reference |
| 143 | |
| 144 | - [stripe-dashboard/.cargo/config.toml](/Users/[REDACTED]/Projects/stripe-dashboard/.cargo/config.toml) — ESP32 target config |
| 145 | - [stripe-dashboard/Cargo.toml](/Users/[REDACTED]/Projects/stripe-dashboard/Cargo.toml) — dependency versions |
| 146 | - [stripe-dashboard/build.rs](/Users/[REDACTED]/Projects/stripe-dashboard/build.rs) — embuild setup |
| 147 | - [stripe-dashboard/Makefile](/Users/[REDACTED]/Projects/stripe-dashboard/Makefile) — toolchain PATH setup |
| 148 | - [stripe-dashboard/sdkconfig.defaults](/Users/[REDACTED]/Projects/stripe-dashboard/sdkconfig.defaults) — ESP-IDF config reference |
| 149 | |
| 150 | ## Verification |
| 151 | |
| 152 | 1. `cargo test` — run unit tests on host (tensor ops, gradient checking, tokenizer) |
| 153 | 2. `make build` — confirm it compiles for ESP32 |
| 154 | 3. `make flash` — flash to device, observe serial output: |
| 155 | - Training loss decreasing over 1000 steps |
| 156 | - 20 generated names that look plausible (e.g. "emma", "alex", not random garbage) |
| 157 | |
| 1 | BINARY := target/xtensa-esp32-espidf/debug/stripe-dashboard |
| 2 | RELEASE_BINARY := target/xtensa-esp32-espidf/release/stripe-dashboard |
| 3 | |
| 4 | export PATH := $(HOME)/.cargo/bin:$(HOME)/.rustup/toolchains/esp/bin:$(HOME)/.rustup/toolchains/esp/xtensa-esp-elf/esp-15.2.0_20250920/xtensa-esp-elf/bin:$(PATH) |
| 5 | export LIBCLANG_PATH := $(HOME)/.rustup/toolchains/esp/xtensa-esp32-elf-clang/esp-20.1.1_20250829/esp-clang/lib |
| 6 | export RUSTUP_TOOLCHAIN := esp |
| 7 | |
| 8 | VERSION := $(shell grep '^version' Cargo.toml | head -1 | sed 's/.*"\(.*\)"/\1/') |
| 9 | CODENAME := $(shell grep '^codename' release.toml 2>/dev/null | sed 's/.*"\(.*\)"/\1/') |
| 10 | CODENAME_LOWER := $(shell echo "$(CODENAME)" | tr 'A-Z' 'a-z') |
| 11 | |
| 12 | # Optional features (e.g. AUTO_UPDATE=1 make build) |
| 13 | CARGO_FEATURES := |
| 14 | ifdef AUTO_UPDATE |
| 15 | CARGO_FEATURES += auto-update |
| 16 | endif |
| 17 | ifneq ($(CARGO_FEATURES),) |
| 18 | FEATURES_FLAG := --features "$(strip $(CARGO_FEATURES))" |
| 19 | endif |
| 20 | |
| 21 | .PHONY: build flash monitor erase clean test sim web web-serve build-4mb flash-4mb release release-4mb release-full release-full-4mb guide |
| 22 | |
| 23 | build: |
| 24 | @mkdir -p target |
| 25 | @echo 'CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="$(CURDIR)/partitions.csv"' > target/sdkconfig.partitions |
| 26 | FLASH_VARIANT=8mb ESP_IDF_SDKCONFIG_DEFAULTS="$(CURDIR)/sdkconfig.defaults;$(CURDIR)/target/sdkconfig.partitions" cargo build $(FEATURES_FLAG) |
| 27 | |
| 28 | flash: build |
| 29 | @until espflash flash --baud 115200 --partition-table partitions.csv $(BINARY); do \ |
| 30 | echo "Flash failed, retrying..."; \ |
| 31 | sleep 1; \ |
| 32 | done |
| 33 | espflash monitor |
| 34 | |
| 35 | build-4mb: |
| 36 | @mkdir -p target |
| 37 | @echo 'CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="$(CURDIR)/partitions-4mb.csv"' > target/sdkconfig.partitions |
| 38 | FLASH_VARIANT=4mb ESP_IDF_SDKCONFIG_DEFAULTS="$(CURDIR)/sdkconfig.defaults;$(CURDIR)/sdkconfig-4mb.defaults;$(CURDIR)/target/sdkconfig.partitions" cargo build $(FEATURES_FLAG) |
| 39 | |
| 40 | flash-4mb: build-4mb |
| 41 | @until espflash flash --baud 115200 --partition-table partitions-4mb.csv --flash-size 4mb $(BINARY); do \ |
| 42 | echo "Flash failed, retrying..."; \ |
| 43 | sleep 1; \ |
| 44 | done |
| 45 | espflash monitor |
| 46 | |
| 47 | monitor: |
| 48 | espflash monitor |
| 49 | |
| 50 | erase: |
| 51 | uvx esptool --chip esp32 erase-flash |
| 52 | |
| 53 | test: |
| 54 | @echo "==> Testing sprite-gen" |
| 55 | cd sprite-gen && RUSTUP_TOOLCHAIN=stable cargo test |
| 56 | @echo "==> Testing stripe-core" |
| 57 | cd stripe-core && RUSTUP_TOOLCHAIN=stable cargo test |
| 58 | @echo "==> Testing display-core" |
| 59 | cd display-core && RUSTUP_TOOLCHAIN=stable cargo test |
| 60 | |
| 61 | sim: |
| 62 | cd simulator && RUSTUP_TOOLCHAIN=stable cargo run |
| 63 | |
| 64 | web: |
| 65 | cd web-sim && RUSTUP_TOOLCHAIN=stable wasm-pack build --target web --release --out-dir www/pkg |
| 66 | |
| 67 | web-serve: web |
| 68 | cd web-sim/www && python3 -m http.server 8080 |
| 69 | |
| 70 | release: |
| 71 | @if [ -z "$(CODENAME)" ]; then echo "Error: No codename in release.toml"; exit 1; fi |
| 72 | @mkdir -p target |
| 73 | @echo 'CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="$(CURDIR)/partitions.csv"' > target/sdkconfig.partitions |
| 74 | @echo 'CONFIG_APP_PROJECT_VER_FROM_CONFIG=y' >> target/sdkconfig.partitions |
| 75 | @echo 'CONFIG_APP_PROJECT_VER="$(VERSION) $(CODENAME) 8mb"' >> target/sdkconfig.partitions |
| 76 | RELEASE_NAME="$(CODENAME)" \ |
| 77 | FLASH_VARIANT=8mb \ |
| 78 | ESP_IDF_SDKCONFIG_DEFAULTS="$(CURDIR)/sdkconfig.defaults;$(CURDIR)/target/sdkconfig.partitions" \ |
| 79 | cargo build --release $(FEATURES_FLAG) |
| 80 | espflash save-image --chip esp32 $(RELEASE_BINARY) \ |
| 81 | target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-8mb.bin |
| 82 | @echo "" |
| 83 | @echo "Release build complete:" |
| 84 | @ls -lh target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-8mb.bin |
| 85 | @echo "Version: $(VERSION) $(CODENAME)" |
| 86 | |
| 87 | release-full: release |
| 88 | espflash save-image --chip esp32 --merge \ |
| 89 | --partition-table partitions.csv \ |
| 90 | $(RELEASE_BINARY) \ |
| 91 | target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-8mb-full.bin |
| 92 | @echo "" |
| 93 | @echo "Full flash image (bootloader + partition table + app):" |
| 94 | @ls -lh target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-8mb-full.bin |
| 95 | @echo "Flash with: espflash write-bin 0x0 target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-8mb-full.bin" |
| 96 | |
| 97 | release-4mb: |
| 98 | @if [ -z "$(CODENAME)" ]; then echo "Error: No codename in release.toml"; exit 1; fi |
| 99 | @mkdir -p target |
| 100 | @echo 'CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="$(CURDIR)/partitions-4mb.csv"' > target/sdkconfig.partitions |
| 101 | @echo 'CONFIG_APP_PROJECT_VER_FROM_CONFIG=y' >> target/sdkconfig.partitions |
| 102 | @echo 'CONFIG_APP_PROJECT_VER="$(VERSION) $(CODENAME) 4mb"' >> target/sdkconfig.partitions |
| 103 | RELEASE_NAME="$(CODENAME)" \ |
| 104 | FLASH_VARIANT=4mb \ |
| 105 | ESP_IDF_SDKCONFIG_DEFAULTS="$(CURDIR)/sdkconfig.defaults;$(CURDIR)/sdkconfig-4mb.defaults;$(CURDIR)/target/sdkconfig.partitions" \ |
| 106 | cargo build --release $(FEATURES_FLAG) |
| 107 | espflash save-image --chip esp32 --flash-size 4mb $(RELEASE_BINARY) \ |
| 108 | target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-4mb.bin |
| 109 | @echo "" |
| 110 | @echo "Release build complete:" |
| 111 | @ls -lh target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-4mb.bin |
| 112 | @echo "Version: $(VERSION) $(CODENAME)" |
| 113 | |
| 114 | release-full-4mb: release-4mb |
| 115 | espflash save-image --chip esp32 --merge --flash-size 4mb \ |
| 116 | --partition-table partitions-4mb.csv \ |
| 117 | $(RELEASE_BINARY) \ |
| 118 | target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-4mb-full.bin |
| 119 | @echo "" |
| 120 | @echo "Full flash image (bootloader + partition table + app):" |
| 121 | @ls -lh target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-4mb-full.bin |
| 122 | @echo "Flash with: espflash write-bin 0x0 target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-4mb-full.bin" |
| 123 | |
| 124 | guide: |
| 125 | cd guide-capture && RUSTUP_TOOLCHAIN=stable cargo run |
| 126 | cd guide-capture && uv run embed_images.py |
| 127 | |
| 128 | clean: |
| 129 | rm -rf ./target |
| 130 | cargo clean |
| 131 | |
| 1 | <retrieval_status>success</retrieval_status> |
| 2 | |
| 3 | <task_id>bwx7jcxzx</task_id> |
| 4 | |
| 5 | <task_type>local_bash</task_type> |
| 6 | |
| 7 | <status>failed</status> |
| 8 | |
| 9 | <exit_code>101</exit_code> |
| 10 | |
| 11 | <output> |
| 12 | [Truncated. Full output: /private/tmp/claude-501/-Users-[REDACTED]-Projects-esp32gpt/tasks/bwx7jcxzx.output] |
| 13 | |
| 14 | Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_common /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_ana_cmpr /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_cam /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_dac /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_gpio /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_gptimer /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_i2c /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_i2s /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_isp /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_jpeg /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_ledc /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_mcpwm /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_parlio /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_pcnt /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_ppa /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_rmt /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_sdio /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_sdm /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_sdmmc /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_sdspi /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_spi /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_tsens /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_uart /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_driver_usb_serial_jtag /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_eth /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_event /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_gdbstub /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_hid /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_http_client /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_http_server /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_https_ota /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_https_server /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_hw_support /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_lcd /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_local_ctrl /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_mm /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_netif /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_netif_stack /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_partition /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_phy /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_pm /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_psram /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_ringbuf /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_rom /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_system /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_timer /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_vfs_console /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_wifi /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/espcoredump /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esptool_py /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/fatfs /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/freertos /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/hal /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/heap /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/http_parser /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/idf_test /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/ieee802154 /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/json /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/log /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/lwip /Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/main /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/mbedtls /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/mqtt /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/newlib /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/nvs_flash /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/nvs_sec_provider /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/openthread /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/perfmon /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/protobuf-c /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/protocomm /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/pthread /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/sdmmc /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/soc /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/spi_flash /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/spiffs /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/tcp_transport /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/ulp /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/unity /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/usb /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/vfs /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/wear_levelling /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/wifi_provisioning /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/wpa_supplicant /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/xtensa |
| 15 | -- Configuring done |
| 16 | -- Generating done |
| 17 | -- Build files have been written to: /Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/build |
| 18 | |
| 19 | --- stderr |
| 20 | Build configuration: BuildConfig { |
| 21 | esp_idf_tools_install_dir: None, |
| 22 | esp_idf_sdkconfig: None, |
| 23 | esp_idf_sdkconfig_defaults: None, |
| 24 | mcu: Some( |
| 25 | "esp32", |
| 26 | ), |
| 27 | native: NativeConfig { |
| 28 | esp_idf_version: Some( |
| 29 | Tag( |
| 30 | "v5.3", |
| 31 | ), |
| 32 | ), |
| 33 | esp_idf_repository: None, |
| 34 | esp_idf_cmake_generator: None, |
| 35 | idf_path: None, |
| 36 | extra_components: [], |
| 37 | esp_idf_components: None, |
| 38 | esp_idf_component_manager: None, |
| 39 | }, |
| 40 | esp_idf_sys_root_crate: None, |
| 41 | } |
| 42 | Using managed esp-idf repository: RemoteSdk { repo_url: None, git_ref: Tag("v5.3") } |
| 43 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3'... |
| 44 | warning: refs/tags/v5.3 3f12347da960a1136acaed48b5890cb439a432f1 is not a commit! |
| 45 | Note: switching to 'e0991facf5ecb362af6aac1fae972139eb38d2e4'. |
| 46 | |
| 47 | You are in 'detached HEAD' state. You can look around, make experimental |
| 48 | changes and commit them, and you can discard any commits you make in this |
| 49 | state without impacting any branches by switching back to a branch. |
| 50 | |
| 51 | If you want to create a new branch to retain commits you create, you may |
| 52 | do so (now or later) by using -c with the switch command. Example: |
| 53 | |
| 54 | git switch -c <new-branch-name> |
| 55 | |
| 56 | Or undo this operation with: |
| 57 | |
| 58 | git switch - |
| 59 | |
| 60 | Turn off this advice by setting config variable advice.detachedHead to false |
| 61 | |
| 62 | Updating files: 46% (6950/14814)
Updating files: 47% (6963/14814)
Updating files: 48% (7111/14814)
Updating files: 49% (7259/14814)
Updating files: 50% (7407/14814)
Updating files: 51% (7556/14814)
Updating files: 52% (7704/14814)
Updating files: 53% (7852/14814)
Updating files: 54% (8000/14814)
Updating files: 55% (8148/14814)
Updating files: 56% (8296/14814)
Updating files: 57% (8444/14814)
Updating files: 58% (8593/14814)
Updating files: 59% (8741/14814)
Updating files: 60% (8889/14814)
Updating files: 61% (9037/14814)
Updating files: 62% (9185/14814)
Updating files: 63% (9333/14814)
Updating files: 64% (9481/14814)
Updating files: 65% (9630/14814)
Updating files: 66% (9778/14814)
Updating files: 67% (9926/14814)
Updating files: 68% (10074/14814)
Updating files: 69% (10222/14814)
Updating files: 70% (10370/14814)
Updating files: 71% (10518/14814)
Updating files: 72% (10667/14814)
Updating files: 73% (10815/14814)
Updating files: 74% (10963/14814)
Updating files: 75% (11111/14814)
Updating files: 76% (11259/14814)
Updating files: 77% (11407/14814)
Updating files: 78% (11555/14814)
Updating files: 79% (11704/14814)
Updating files: 80% (11852/14814)
Updating files: 81% (12000/14814)
Updating files: 82% (12148/14814)
Updating files: 83% (12296/14814)
Updating files: 84% (12444/14814)
Updating files: 85% (12592/14814)
Updating files: 86% (12741/14814)
Updating files: 87% (12889/14814)
Updating files: 88% (13037/14814)
Updating files: 89% (13185/14814)
Updating files: 90% (13333/14814)
Updating files: 90% (13433/14814)
Updating files: 91% (13481/14814)
Updating files: 92% (13629/14814)
Updating files: 93% (13778/14814)
Updating files: 94% (13926/14814)
Updating files: 95% (14074/14814)
Updating files: 96% (14222/14814)
Updating files: 97% (14370/14814)
Updating files: 98% (14518/14814)
Updating files: 99% (14666/14814)
Updating files: 100% (14814/14814)
Updating files: 100% (14814/14814), done. |
| 63 | Submodule 'components/bootloader/subproject/components/micro-ecc/micro-ecc' (https://github.com/kmackay/micro-ecc.git) registered for path 'components/bootloader/subproject/components/micro-ecc/micro-ecc' |
| 64 | Submodule 'components/bt/controller/lib_esp32' (https://github.com/espressif/esp32-bt-lib.git) registered for path 'components/bt/controller/lib_esp32' |
| 65 | Submodule 'components/bt/controller/lib_esp32c2/esp32c2-bt-lib' (https://github.com/espressif/esp32c2-bt-lib.git) registered for path 'components/bt/controller/lib_esp32c2/esp32c2-bt-lib' |
| 66 | Submodule 'components/bt/controller/lib_esp32c3_family' (https://github.com/espressif/esp32c3-bt-lib.git) registered for path 'components/bt/controller/lib_esp32c3_family' |
| 67 | Submodule 'components/bt/controller/lib_esp32c5/esp32c5-bt-lib' (https://github.com/espressif/esp32c5-bt-lib.git) registered for path 'components/bt/controller/lib_esp32c5/esp32c5-bt-lib' |
| 68 | Submodule 'components/bt/controller/lib_esp32c6/esp32c6-bt-lib' (https://github.com/espressif/esp32c6-bt-lib.git) registered for path 'components/bt/controller/lib_esp32c6/esp32c6-bt-lib' |
| 69 | Submodule 'components/bt/controller/lib_esp32h2/esp32h2-bt-lib' (https://github.com/espressif/esp32h2-bt-lib.git) registered for path 'components/bt/controller/lib_esp32h2/esp32h2-bt-lib' |
| 70 | Submodule 'components/bt/esp_ble_mesh/lib/lib' (https://github.com/espressif/esp-ble-mesh-lib.git) registered for path 'components/bt/esp_ble_mesh/lib/lib' |
| 71 | Submodule 'components/bt/host/nimble/nimble' (https://github.com/espressif/esp-nimble.git) registered for path 'components/bt/host/nimble/nimble' |
| 72 | Submodule 'components/cmock/CMock' (https://github.com/ThrowTheSwitch/CMock.git) registered for path 'components/cmock/CMock' |
| 73 | Submodule 'components/esp_coex/lib' (https://github.com/espressif/esp-coex-lib.git) registered for path 'components/esp_coex/lib' |
| 74 | Submodule 'components/esp_phy/lib' (https://github.com/espressif/esp-phy-lib.git) registered for path 'components/esp_phy/lib' |
| 75 | Submodule 'components/esp_wifi/lib' (https://github.com/espressif/esp32-wifi-lib.git) registered for path 'components/esp_wifi/lib' |
| 76 | Submodule 'components/heap/tlsf' (https://github.com/espressif/tlsf.git) registered for path 'components/heap/tlsf' |
| 77 | Submodule 'components/json/cJSON' (https://github.com/DaveGamble/cJSON.git) registered for path 'components/json/cJSON' |
| 78 | Submodule 'components/lwip/lwip' (https://github.com/espressif/esp-lwip.git) registered for path 'components/lwip/lwip' |
| 79 | Submodule 'components/mbedtls/mbedtls' (https://github.com/espressif/mbedtls.git) registered for path 'components/mbedtls/mbedtls' |
| 80 | Submodule 'components/mqtt/esp-mqtt' (https://github.com/espressif/esp-mqtt.git) registered for path 'components/mqtt/esp-mqtt' |
| 81 | Submodule 'components/openthread/lib' (https://github.com/espressif/esp-thread-lib.git) registered for path 'components/openthread/lib' |
| 82 | Submodule 'components/openthread/openthread' (https://github.com/espressif/openthread.git) registered for path 'components/openthread/openthread' |
| 83 | Submodule 'components/protobuf-c/protobuf-c' (https://github.com/protobuf-c/protobuf-c.git) registered for path 'components/protobuf-c/protobuf-c' |
| 84 | Submodule 'components/spiffs/spiffs' (https://github.com/pellepl/spiffs.git) registered for path 'components/spiffs/spiffs' |
| 85 | Submodule 'components/unity/unity' (https://github.com/ThrowTheSwitch/Unity.git) registered for path 'components/unity/unity' |
| 86 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/bootloader/subproject/components/micro-ecc/micro-ecc'... |
| 87 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/bt/controller/lib_esp32'... |
| 88 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/bt/controller/lib_esp32c3_family'... |
| 89 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/cmock/CMock'... |
| 90 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/bt/controller/lib_esp32c5/esp32c5-bt-lib'... |
| 91 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/bt/controller/lib_esp32c2/esp32c2-bt-lib'... |
| 92 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_phy/lib'... |
| 93 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/bt/controller/lib_esp32c6/esp32c6-bt-lib'... |
| 94 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_coex/lib'... |
| 95 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/bt/host/nimble/nimble'... |
| 96 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/heap/tlsf'... |
| 97 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/bt/esp_ble_mesh/lib/lib'... |
| 98 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/json/cJSON'... |
| 99 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/bt/controller/lib_esp32h2/esp32h2-bt-lib'... |
| 100 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/mqtt/esp-mqtt'... |
| 101 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/openthread/lib'... |
| 102 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/openthread/openthread'... |
| 103 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/spiffs/spiffs'... |
| 104 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/lwip/lwip'... |
| 105 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/unity/unity'... |
| 106 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/protobuf-c/protobuf-c'... |
| 107 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/mbedtls/mbedtls'... |
| 108 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/esp_wifi/lib'... |
| 109 | From https://github.com/kmackay/micro-ecc |
| 110 | * branch 24c60e243580c7868f4334a1ba3123481fe1aa48 -> FETCH_HEAD |
| 111 | From https://github.com/espressif/esp32-bt-lib |
| 112 | * branch a4d7731a95db8a6cfb98e5068b6757c32ecfaa2a -> FETCH_HEAD |
| 113 | From https://github.com/espressif/esp32c2-bt-lib |
| 114 | * branch 8ddd8acac498fcbb76b5a39c5c7d4025238298ab -> FETCH_HEAD |
| 115 | From https://github.com/espressif/esp32c3-bt-lib |
| 116 | * branch 4b1338827fa19fbacc02dd9e46e76be2b0dd17a9 -> FETCH_HEAD |
| 117 | From https://github.com/espressif/esp32c5-bt-lib |
| 118 | * branch 5f428f914114c88470bf0a785f08840c2b35abca -> FETCH_HEAD |
| 119 | From https://github.com/espressif/esp32c6-bt-lib |
| 120 | * branch ed6c0b4e0ab3b8ddce5d8bc65e417b1adcbca5b4 -> FETCH_HEAD |
| 121 | From https://github.com/espressif/esp32h2-bt-lib |
| 122 | * branch 2d69367e13a928afb73d1a8c579c0dad98eb9393 -> FETCH_HEAD |
| 123 | From https://github.com/espressif/esp-ble-mesh-lib |
| 124 | * branch 4934ca903807dd74f7f808dadcd9a478e18fc6c3 -> FETCH_HEAD |
| 125 | From https://github.com/espressif/esp-nimble |
| 126 | * branch 73112f9b4068ef7dc541c88c555ff829bebb9f8f -> FETCH_HEAD |
| 127 | From https://github.com/ThrowTheSwitch/CMock |
| 128 | * branch eeecc49ce8af123cf8ad40efdb9673e37b56230f -> FETCH_HEAD |
| 129 | Submodule 'vendor/c_exception' (https://github.com/throwtheswitch/cexception.git) registered for path 'components/cmock/CMock/vendor/c_exception' |
| 130 | Submodule 'vendor/unity' (https://github.com/throwtheswitch/unity.git) registered for path 'components/cmock/CMock/vendor/unity' |
| 131 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/cmock/CMock/vendor/c_exception'... |
| 132 | Cloning into '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/cmock/CMock/vendor/unity'... |
| 133 | From https://github.com/throwtheswitch/cexception |
| 134 | * branch 71b47be7c950f1bf5f7e5303779fa99a16224bb6 -> FETCH_HEAD |
| 135 | From https://github.com/throwtheswitch/unity |
| 136 | * branch cf949f45ca6d172a177b00da21310607b97bc7a7 -> FETCH_HEAD |
| 137 | From https://github.com/espressif/esp-coex-lib |
| 138 | * branch 56d324c3fe3fb7649f8736bbb3b9f00b7f612449 -> FETCH_HEAD |
| 139 | From https://github.com/espressif/esp-phy-lib |
| 140 | * branch 06e7625de197bc12797dd701d6762229bca01826 -> FETCH_HEAD |
| 141 | From https://github.com/espressif/esp32-wifi-lib |
| 142 | * branch 17509c30aecde2c38bf4d3cc3e860b9297cf23e8 -> FETCH_HEAD |
| 143 | From https://github.com/espressif/tlsf |
| 144 | * branch 8fc595fe223cd0b3b5d7b29eb86825e4bd38e6e8 -> FETCH_HEAD |
| 145 | From https://github.com/DaveGamble/cJSON |
| 146 | * branch acc76239bee01d8e9c858ae2cab296704e52d916 -> FETCH_HEAD |
| 147 | From https://github.com/espressif/esp-lwip |
| 148 | * branch f79221431fa9042b3572d271d687de66da7560c4 -> FETCH_HEAD |
| 149 | From https://github.com/espressif/mbedtls |
| 150 | * branch 72aa687352a469044cbb946f3fdb261430e41ce1 -> FETCH_HEAD |
| 151 | From https://github.com/espressif/esp-mqtt |
| 152 | * branch aa6f889fb4f6f743b3a550aa587713aabbdca1fc -> FETCH_HEAD |
| 153 | From https://github.com/espressif/esp-thread-lib |
| 154 | * branch 34d698a274940730901b934caa023a3281aca53e -> FETCH_HEAD |
| 155 | From https://github.com/espressif/openthread |
| 156 | * branch be7d36e4ff9cf7df6dfce54e58a31163c87b93f7 -> FETCH_HEAD |
| 157 | From https://github.com/protobuf-c/protobuf-c |
| 158 | * branch abc67a11c6db271bedbb9f58be85d6f4e2ea8389 -> FETCH_HEAD |
| 159 | From https://github.com/pellepl/spiffs |
| 160 | * branch 0dbb3f71c5f6fae3747a9d935372773762baf852 -> FETCH_HEAD |
| 161 | From https://github.com/ThrowTheSwitch/Unity |
| 162 | * branch bf560290f6020737eafaa8b5cbd2177c3956c03f -> FETCH_HEAD |
| 163 | Using esp-idf v5.3.0 at '/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3' |
| 164 | running: cd "/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/build" && CMAKE_PREFIX_PATH="" ESP_ROM_ELF_DIR="/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/esp-rom-elfs/20240305/" EXTRA_COMPONENT_DIRS="" IDF_COMPONENT_MANAGER="1" IDF_PATH="/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3" IDF_TARGET="esp32" IDF_TOOLS_PATH="/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif" LC_ALL="C" PATH="/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/xtensa-esp-elf/esp-13.2.0_20240530/xtensa-esp-elf/bin:/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/esp-clang/16.0.1-fe4f10a809/esp-clang/bin:/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/esp32ulp-elf/2.38_20240113/esp32ulp-elf/bin:/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/cmake/3.24.0/CMake.app/Contents/bin:/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/ninja/1.11.1:/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/esp-rom-elfs/20240305:/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/python_env/idf5.3_py3.14_env/bin:/Users/[REDACTED]/.cargo/bin:/Users/[REDACTED]/.rustup/toolchains/esp/bin:/Users/[REDACTED]/.rustup/toolchains/esp/xtensa-esp-elf/esp-15.2.0_20250920/xtensa-esp-elf/bin:/Users/[REDACTED]/.local/bin:/opt/homebrew/opt/libpq/bin:/Users/[REDACTED]/.cargo/bin:/Users/[REDACTED]/.rustup/toolchains/esp/bin:/Users/[REDACTED]/.rustup/toolchains/esp/xtensa-esp-elf/esp-15.2.0_20250920/xtensa-esp-elf/bin:/Users/[REDACTED]/.local/bin:/opt/homebrew/opt/libpq/bin:$PATH:/Applications/Visual Studio Code.app/Contents/Resources/app/bin:/Users/[REDACTED]/.lmstudio/bin:/Applications/Visual Studio Code.app/Contents/Resources/app/bin:/Users/[REDACTED]/.lmstudio/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin" PROJECT_DIR="/Users/[REDACTED]/Projects/esp32gpt" SDKCONFIG_DEFAULTS="/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/gen-sdkconfig.defaults;/Users/[REDACTED]/Projects/esp32gpt/sdkconfig.defaults" "cmake" "/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out" "-B" "/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/build" "-G" "Ninja" "-DCMAKE_TOOLCHAIN_FILE=/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/tools/cmake/toolchain-esp32.cmake" "-DCMAKE_BUILD_TYPE=" "-DPYTHON=/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/python_env/idf5.3_py3.14_env/bin/python" "-DCMAKE_INSTALL_PREFIX=/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out" "-DCMAKE_C_FLAGS= -mlongcalls -Wno-frame-address -fno-builtin-memcpy -fno-builtin-memset -fno-builtin-bzero -fno-builtin-stpcpy -fno-builtin-strncpy -w" "-DCMAKE_CXX_FLAGS= -mlongcalls -Wno-frame-address -fno-builtin-memcpy -fno-builtin-memset -fno-builtin-bzero -fno-builtin-stpcpy -fno-builtin-strncpy -w" "-DCMAKE_ASM_FLAGS= -mlongcalls -w" |
| 165 | info: INFO: Symbol IDF_TARGET_LINUX defined in multiple locations (see below). Please check if this is a correct behavior or a random name match: |
| 166 | /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/Kconfig:162 |
| 167 | /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/Kconfig:78 |
| 168 | CMake Warning at /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/project_include.cmake:23 (message): |
| 169 | Partition table CSV file |
| 170 | /Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/partitions.csv |
| 171 | not found. Change custom partition CSV path in menuconfig. |
| 172 | Call Stack (most recent call first): |
| 173 | /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/tools/cmake/build.cmake:400 (include) |
| 174 | /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/tools/cmake/build.cmake:632 (__build_process_project_includes) |
| 175 | /Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/tools/cmake/project.cmake:710 (idf_build_process) |
| 176 | CMakeLists.txt:28 (project) |
| 177 | |
| 178 | |
| 179 | Traceback (most recent call last): |
| 180 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 363, in <module> |
| 181 | main() |
| 182 | ~~~~^^ |
| 183 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 332, in main |
| 184 | target = ParttoolTarget(**target_args) |
| 185 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 88, in __init__ |
| 186 | with open(partition_table_file, 'rb') as f: |
| 187 | ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 188 | FileNotFoundError: [Errno 2] No such file or directory: '/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/partitions.csv' |
| 189 | Traceback (most recent call last): |
| 190 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 363, in <module> |
| 191 | main() |
| 192 | ~~~~^^ |
| 193 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 332, in main |
| 194 | target = ParttoolTarget(**target_args) |
| 195 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 88, in __init__ |
| 196 | with open(partition_table_file, 'rb') as f: |
| 197 | ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 198 | FileNotFoundError: [Errno 2] No such file or directory: '/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/partitions.csv' |
| 199 | Traceback (most recent call last): |
| 200 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 363, in <module> |
| 201 | main() |
| 202 | ~~~~^^ |
| 203 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 332, in main |
| 204 | target = ParttoolTarget(**target_args) |
| 205 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 88, in __init__ |
| 206 | with open(partition_table_file, 'rb') as f: |
| 207 | ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 208 | FileNotFoundError: [Errno 2] No such file or directory: '/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/partitions.csv' |
| 209 | Traceback (most recent call last): |
| 210 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 363, in <module> |
| 211 | main() |
| 212 | ~~~~^^ |
| 213 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 332, in main |
| 214 | target = ParttoolTarget(**target_args) |
| 215 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 88, in __init__ |
| 216 | with open(partition_table_file, 'rb') as f: |
| 217 | ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 218 | FileNotFoundError: [Errno 2] No such file or directory: '/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/partitions.csv' |
| 219 | Traceback (most recent call last): |
| 220 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 363, in <module> |
| 221 | main() |
| 222 | ~~~~^^ |
| 223 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 332, in main |
| 224 | target = ParttoolTarget(**target_args) |
| 225 | File "/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3/components/partition_table/parttool.py", line 88, in __init__ |
| 226 | with open(partition_table_file, 'rb') as f: |
| 227 | ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 228 | FileNotFoundError: [Errno 2] No such file or directory: '/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/partitions.csv' |
| 229 | running: cd "/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/build" && ESP_ROM_ELF_DIR="/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/esp-rom-elfs/20240305/" EXTRA_COMPONENT_DIRS="" IDF_COMPONENT_MANAGER="1" IDF_PATH="/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/esp-idf/v5.3" IDF_TARGET="esp32" IDF_TOOLS_PATH="/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif" LC_ALL="C" PATH="/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/xtensa-esp-elf/esp-13.2.0_20240530/xtensa-esp-elf/bin:/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/esp-clang/16.0.1-fe4f10a809/esp-clang/bin:/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/esp32ulp-elf/2.38_20240113/esp32ulp-elf/bin:/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/cmake/3.24.0/CMake.app/Contents/bin:/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/ninja/1.11.1:/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/tools/esp-rom-elfs/20240305:/Users/[REDACTED]/Projects/esp32gpt/.embuild/espressif/python_env/idf5.3_py3.14_env/bin:/Users/[REDACTED]/.cargo/bin:/Users/[REDACTED]/.rustup/toolchains/esp/bin:/Users/[REDACTED]/.rustup/toolchains/esp/xtensa-esp-elf/esp-15.2.0_20250920/xtensa-esp-elf/bin:/Users/[REDACTED]/.local/bin:/opt/homebrew/opt/libpq/bin:/Users/[REDACTED]/.cargo/bin:/Users/[REDACTED]/.rustup/toolchains/esp/bin:/Users/[REDACTED]/.rustup/toolchains/esp/xtensa-esp-elf/esp-15.2.0_20250920/xtensa-esp-elf/bin:/Users/[REDACTED]/.local/bin:/opt/homebrew/opt/libpq/bin:$PATH:/Applications/Visual Studio Code.app/Contents/Resources/app/bin:/Users/[REDACTED]/.lmstudio/bin:/Applications/Visual Studio Code.app/Contents/Resources/app/bin:/Users/[REDACTED]/.lmstudio/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin" PROJECT_DIR="/Users/[REDACTED]/Projects/esp32gpt" SDKCONFIG_DEFAULTS="/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/gen-sdkconfig.defaults;/Users/[REDACTED]/Projects/esp32gpt/sdkconfig.defaults" "cmake" "--build" "/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/build" "--config" "RelWithDebInfo" "--parallel" "10" |
| 230 | ninja: error: '/Users/[REDACTED]/Projects/esp32gpt/target/xtensa-esp32-espidf/debug/build/esp-idf-sys-ea3ffe5aeae90cf4/out/partitions.csv', needed by 'partition_table/partition-table.bin', missing and no known rule to make it |
| 231 | |
| 232 | thread 'main' (101036720) panicked at /Users/[REDACTED]/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/cmake-0.1.57/src/lib.rs:1132:5: |
| 233 | |
| 234 | command did not execute successfully, got: exit status: 1 |
| 235 | |
| 236 | build script failed, must exit now |
| 237 | note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace |
| 238 | </output> |
| 1 | /// Manual backward pass for the GPT model. |
| 2 | /// |
| 3 | /// Computes parameter gradients by backpropagating through each position in |
| 4 | /// reverse order. The tricky part is attention: position t's query attends to |
| 5 | /// all keys/values at positions 0..t, so key/value gradients accumulate |
| 6 | /// contributions from multiple future positions. |
| 7 | |
| 8 | use crate::model::*; |
| 9 | use crate::tensor::*; |
| 10 | use crate::tokenizer::VOCAB_SIZE; |
| 11 | |
| 12 | /// Backpropagate through the full sequence, accumulating gradients. |
| 13 | /// |
| 14 | /// Returns the average cross-entropy loss over the sequence. |
| 15 | /// `targets[t]` is the target token for position t (i.e., `tokens[t+1]` in the |
| 16 | /// encoded name). `grads` must be pre-zeroed. |
| 17 | pub fn backward( |
| 18 | params: &[f32], |
| 19 | cache: &ForwardCache, |
| 20 | targets: &[usize], |
| 21 | grads: &mut [f32], |
| 22 | ) -> f32 { |
| 23 | let seq_len = cache.seq_len; |
| 24 | assert_eq!(targets.len(), seq_len); |
| 25 | let scale = 1.0 / (HEAD_DIM as f32).sqrt(); |
| 26 | let inv_seq_len = 1.0 / seq_len as f32; |
| 27 | |
| 28 | // Accumulated key/value gradients: d_k_cache[pos] and d_v_cache[pos] collect |
| 29 | // contributions from all positions that attend to them. |
| 30 | let mut d_k_cache = vec![0.0f32; seq_len * EMBED_DIM]; |
| 31 | let mut d_v_cache = vec![0.0f32; seq_len * EMBED_DIM]; |
| 32 | |
| 33 | let mut total_loss = 0.0f32; |
| 34 | |
| 35 | // Process positions in reverse so that when we reach position t, |
| 36 | // d_k_cache[t] and d_v_cache[t] are complete. |
| 37 | for t in (0..seq_len).rev() { |
| 38 | let target = targets[t]; |
| 39 | |
| 40 | // --- Cross-entropy loss gradient --- |
| 41 | let logits = cache.logits_at(t); |
| 42 | let probs = softmax(logits); |
| 43 | total_loss += -probs[target].max(1e-10).ln(); |
| 44 | |
| 45 | // d_logits = (probs - one_hot(target)) / seq_len |
| 46 | let mut d_logits = probs; |
| 47 | d_logits[target] -= 1.0; |
| 48 | for v in d_logits.iter_mut() { |
| 49 | *v *= inv_seq_len; |
| 50 | } |
| 51 | |
| 52 | // --- Output projection backward: logits = res2 @ Wout --- |
| 53 | let res2 = cache.res2_at(t); |
| 54 | let d_res2 = vec_mat_mul_backward_x(&d_logits, wout(params), EMBED_DIM, VOCAB_SIZE); |
| 55 | vec_mat_mul_backward_w(res2, &d_logits, &mut grads[WOUT_OFFSET..], EMBED_DIM, VOCAB_SIZE); |
| 56 | |
| 57 | // --- FFN residual backward: res2 = res1 + ffn_out --- |
| 58 | // d_res1 and d_ffn_out both receive d_res2 |
| 59 | let mut d_res1 = d_res2.clone(); |
| 60 | let d_ffn_out = d_res2; |
| 61 | |
| 62 | // --- FFN down-projection backward: ffn_out = ffn_relu @ W2 --- |
| 63 | let ffn_relu = cache.ffn_relu_at(t); |
| 64 | let d_ffn_relu = vec_mat_mul_backward_x(&d_ffn_out, w2(params), FFN_DIM, EMBED_DIM); |
| 65 | vec_mat_mul_backward_w(ffn_relu, &d_ffn_out, &mut grads[W2_OFFSET..], FFN_DIM, EMBED_DIM); |
| 66 | |
| 67 | // --- ReLU backward --- |
| 68 | let ffn_hidden = cache.ffn_hidden_at(t); |
| 69 | let d_ffn_hidden: Vec<f32> = d_ffn_relu.iter().zip(ffn_hidden.iter()) |
| 70 | .map(|(&dg, &h)| if h > 0.0 { dg } else { 0.0 }) |
| 71 | .collect(); |
| 72 | |
| 73 | // --- FFN up-projection backward: ffn_hidden = res1 @ W1 --- |
| 74 | let res1 = cache.res1_at(t); |
| 75 | let d_res1_from_ffn = vec_mat_mul_backward_x(&d_ffn_hidden, w1(params), EMBED_DIM, FFN_DIM); |
| 76 | vec_mat_mul_backward_w(res1, &d_ffn_hidden, &mut grads[W1_OFFSET..], EMBED_DIM, FFN_DIM); |
| 77 | vec_add_inplace(&mut d_res1, &d_res1_from_ffn); |
| 78 | |
| 79 | // --- Attention residual backward: res1 = emb + proj --- |
| 80 | let d_proj = d_res1.clone(); |
| 81 | let mut d_emb = d_res1; // emb receives same gradient from residual |
| 82 | |
| 83 | // --- Output projection backward: proj = att_out @ Wo --- |
| 84 | let att_out = cache.att_out_at(t); |
| 85 | let d_att_out = vec_mat_mul_backward_x(&d_proj, wo(params), EMBED_DIM, EMBED_DIM); |
| 86 | vec_mat_mul_backward_w(att_out, &d_proj, &mut grads[WO_OFFSET..], EMBED_DIM, EMBED_DIM); |
| 87 | |
| 88 | // --- Multi-head attention backward --- |
| 89 | let mut d_q = vec![0.0f32; EMBED_DIM]; |
| 90 | |
| 91 | for h in 0..NUM_HEADS { |
| 92 | let ho = h * HEAD_DIM; |
| 93 | let d_out_h = &d_att_out[ho..ho + HEAD_DIM]; |
| 94 | let q_h = &cache.query_at(t)[ho..ho + HEAD_DIM]; |
| 95 | |
| 96 | // out_h = sum_i probs[i] * v_cache_h[i] |
| 97 | for i in 0..=t { |
| 98 | let v_h = &cache.v_cache[i * EMBED_DIM + ho..i * EMBED_DIM + ho + HEAD_DIM]; |
| 99 | let prob = cache.att_prob(t, h, i); |
| 100 | |
| 101 | // d_probs[i] = dot(d_out_h, v_h) |
| 102 | let d_prob_i = vec_dot(d_out_h, v_h); |
| 103 | |
| 104 | // d_v_cache_h[i] += prob * d_out_h |
| 105 | for d in 0..HEAD_DIM { |
| 106 | d_v_cache[i * EMBED_DIM + ho + d] += prob * d_out_h[d]; |
| 107 | } |
| 108 | |
| 109 | // We'll collect d_probs and do softmax backward below. |
| 110 | // For now, just store d_prob_i temporarily by recomputing in the next loop. |
| 111 | let _ = d_prob_i; // used below |
| 112 | } |
| 113 | |
| 114 | // Softmax backward: d_scores = probs * (d_probs - sum(probs * d_probs)) |
| 115 | // Recompute d_probs (small loop, avoids allocation). |
| 116 | let mut d_probs = vec![0.0f32; t + 1]; |
| 117 | for i in 0..=t { |
| 118 | let v_h = &cache.v_cache[i * EMBED_DIM + ho..i * EMBED_DIM + ho + HEAD_DIM]; |
| 119 | d_probs[i] = vec_dot(d_out_h, v_h); |
| 120 | } |
| 121 | |
| 122 | let mut dot_sum = 0.0f32; |
| 123 | for i in 0..=t { |
| 124 | dot_sum += cache.att_prob(t, h, i) * d_probs[i]; |
| 125 | } |
| 126 | |
| 127 | for i in 0..=t { |
| 128 | let prob = cache.att_prob(t, h, i); |
| 129 | let d_score = prob * (d_probs[i] - dot_sum) * scale; |
| 130 | |
| 131 | // scores[i] = dot(q_h, k_h[i]) * scale |
| 132 | // d_q_h += d_score * k_h[i] (scale already applied above) |
| 133 | let k_h = &cache.k_cache[i * EMBED_DIM + ho..i * EMBED_DIM + ho + HEAD_DIM]; |
| 134 | for d in 0..HEAD_DIM { |
| 135 | d_q[ho + d] += d_score * k_h[d]; |
| 136 | d_k_cache[i * EMBED_DIM + ho + d] += d_score * q_h[d]; |
| 137 | } |
| 138 | } |
| 139 | } |
| 140 | |
| 141 | // --- Q projection backward: q = emb @ Wq --- |
| 142 | let emb = cache.emb_at(t); |
| 143 | let d_emb_from_q = vec_mat_mul_backward_x(&d_q, wq(params), EMBED_DIM, EMBED_DIM); |
| 144 | vec_mat_mul_backward_w(emb, &d_q, &mut grads[WQ_OFFSET..], EMBED_DIM, EMBED_DIM); |
| 145 | vec_add_inplace(&mut d_emb, &d_emb_from_q); |
| 146 | |
| 147 | // --- K, V projection backward (d_k_cache[t] and d_v_cache[t] are now complete) --- |
| 148 | let d_k_t = &d_k_cache[t * EMBED_DIM..(t + 1) * EMBED_DIM]; |
| 149 | let d_emb_from_k = vec_mat_mul_backward_x(d_k_t, wk(params), EMBED_DIM, EMBED_DIM); |
| 150 | vec_mat_mul_backward_w(emb, d_k_t, &mut grads[WK_OFFSET..], EMBED_DIM, EMBED_DIM); |
| 151 | vec_add_inplace(&mut d_emb, &d_emb_from_k); |
| 152 | |
| 153 | let d_v_t = &d_v_cache[t * EMBED_DIM..(t + 1) * EMBED_DIM]; |
| 154 | let d_emb_from_v = vec_mat_mul_backward_x(d_v_t, wv(params), EMBED_DIM, EMBED_DIM); |
| 155 | vec_mat_mul_backward_w(emb, d_v_t, &mut grads[WV_OFFSET..], EMBED_DIM, EMBED_DIM); |
| 156 | vec_add_inplace(&mut d_emb, &d_emb_from_v); |
| 157 | |
| 158 | // --- Embedding backward: emb = wte[token] + wpe[pos] --- |
| 159 | let token = cache.tokens[t]; |
| 160 | let wte_start = WTE_OFFSET + token * EMBED_DIM; |
| 161 | for d in 0..EMBED_DIM { |
| 162 | grads[wte_start + d] += d_emb[d]; |
| 163 | } |
| 164 | let wpe_start = WPE_OFFSET + t * EMBED_DIM; |
| 165 | for d in 0..EMBED_DIM { |
| 166 | grads[wpe_start + d] += d_emb[d]; |
| 167 | } |
| 168 | } |
| 169 | |
| 170 | total_loss * inv_seq_len |
| 171 | } |
| 172 | |
| 173 | #[cfg(test)] |
| 174 | mod tests { |
| 175 | use super::*; |
| 176 | use crate::rng::Rng; |
| 177 | |
| 178 | /// Numerical gradient check: compare analytic gradients from backward() |
| 179 | /// against finite-difference approximations for every parameter. |
| 180 | #[test] |
| 181 | fn test_gradient_check() { |
| 182 | let mut rng = Rng::new(42); |
| 183 | let mut params = vec![0.0f32; NUM_PARAMS]; |
| 184 | init_weights(&mut params, &mut rng); |
| 185 | |
| 186 | // Short sequence for faster checking |
| 187 | let input_tokens = vec![0, 5, 13]; // BOS, e, m |
| 188 | let targets = vec![5, 13, 0]; // e, m, BOS |
| 189 | |
| 190 | // Analytic gradients |
| 191 | let mut cache = ForwardCache::new(); |
| 192 | forward(¶ms, &input_tokens, &mut cache); |
| 193 | let mut grads = vec![0.0f32; NUM_PARAMS]; |
| 194 | let loss = backward(¶ms, &cache, &targets, &mut grads); |
| 195 | assert!(loss.is_finite()); |
| 196 | |
| 197 | // Numerical gradients for a random subset of parameters |
| 198 | let eps = 1e-3; |
| 199 | let mut max_err = 0.0f32; |
| 200 | let mut checked = 0; |
| 201 | let check_indices: Vec<usize> = (0..NUM_PARAMS).step_by(17).collect(); |
| 202 | for &i in &check_indices { |
| 203 | let mut p_plus = params.clone(); |
| 204 | p_plus[i] += eps; |
| 205 | let mut c_plus = ForwardCache::new(); |
| 206 | forward(&p_plus, &input_tokens, &mut c_plus); |
| 207 | let mut g_dummy = vec![0.0f32; NUM_PARAMS]; |
| 208 | let loss_plus = backward(&p_plus, &c_plus, &targets, &mut g_dummy); |
| 209 | |
| 210 | let mut p_minus = params.clone(); |
| 211 | p_minus[i] -= eps; |
| 212 | let mut c_minus = ForwardCache::new(); |
| 213 | forward(&p_minus, &input_tokens, &mut c_minus); |
| 214 | let mut g_dummy2 = vec![0.0f32; NUM_PARAMS]; |
| 215 | let loss_minus = backward(&p_minus, &c_minus, &targets, &mut g_dummy2); |
| 216 | |
| 217 | let numerical = (loss_plus - loss_minus) / (2.0 * eps); |
| 218 | let analytic = grads[i]; |
| 219 | let err = (analytic - numerical).abs() / (analytic.abs() + numerical.abs() + 1e-8); |
| 220 | if err > max_err { |
| 221 | max_err = err; |
| 222 | } |
| 223 | if err > 0.01 { |
| 224 | panic!( |
| 225 | "Gradient check failed at param {}: analytic={:.6}, numerical={:.6}, rel_err={:.6}", |
| 226 | i, analytic, numerical, err |
| 227 | ); |
| 228 | } |
| 229 | checked += 1; |
| 230 | } |
| 231 | assert!(checked > 100, "checked too few params: {}", checked); |
| 232 | eprintln!("Gradient check passed: {} params checked, max relative error = {:.6}", checked, max_err); |
| 233 | } |
| 234 | } |
| 235 | |
| 1 | Perfect! Here's a concise summary of each file for your README: |
| 2 | |
| 3 | ## File-by-File Summary |
| 4 | |
| 5 | ### src/main.rs (100 lines) |
| 6 | Main training and inference loop that demonstrates the full pipeline: |
| 7 | - Loads dataset (names.txt) from flash |
| 8 | - Initializes weights and Adam optimizer |
| 9 | - Trains for 1000 steps with: |
| 10 | - Random batch sampling from dataset |
| 11 | - Forward pass (computes logits) |
| 12 | - Backward pass (computes gradients) |
| 13 | - Adam optimizer step with linear learning rate decay |
| 14 | - Loss logging every 100 steps |
| 15 | - Generates 20 sample names via autoregressive sampling with temperature control |
| 16 | - Handles both ESP32 (xtensa) and native targets with conditional initialization |
| 17 | |
| 18 | ### src/model.rs (339 lines) |
| 19 | 1-layer transformer architecture with key components: |
| 20 | - **Constants:** |
| 21 | - EMBED_DIM=16, NUM_HEADS=4, HEAD_DIM=4, FFN_DIM=64, BLOCK_SIZE=16 |
| 22 | - Total parameters: 4,192 |
| 23 | - **Parameter layout:** Token embedding (WTE), position embedding (WPE), attention weights (WQ, WK, WV, WO), and feed-forward (W1, W2), output (WOUT) |
| 24 | - **ForwardCache struct:** Stores all intermediate activations needed for backprop (embeddings, queries, KV cache, attention probs, FFN hidden states, logits, etc.) |
| 25 | - **forward():** Full transformer forward pass with multi-head causal attention, residual connections, and ReLU FFN |
| 26 | - **generate():** Autoregressive text generation with temperature sampling |
| 27 | - Tests for parameter count, forward pass correctness, and generation |
| 28 | |
| 29 | ### src/backward.rs (doc comment + signature only) |
| 30 | - Manual backpropagation through the full sequence in reverse |
| 31 | - Computes parameter gradients via reverse-mode autodiff |
| 32 | - Returns average cross-entropy loss |
| 33 | - Handles tricky attention gradient accumulation (key/value gradients from multiple future positions) |
| 34 | |
| 35 | ### src/tokenizer.rs (55 lines) |
| 36 | Character-level tokenizer with 27 tokens: |
| 37 | - VOCAB_SIZE=27 (token 0=BOS, tokens 1-26=a-z) |
| 38 | - **encode():** Converts string → [BOS, char tokens..., BOS] |
| 39 | - **decode():** Converts single token back to character ('.' for BOS, 'a'-'z' otherwise) |
| 40 | - Simple, efficient for embedded devices |
| 41 | - Tests for encode/decode roundtrip |
| 42 | |
| 43 | ### src/rng.rs (108 lines) |
| 44 | Lightweight PRNG for embedded use: |
| 45 | - **Xorshift32:** Fast, decent quality for training |
| 46 | - **Box-Muller:** Gaussian sample generation (caches second value) |
| 47 | - **next_f32():** Uniform [0, 1) |
| 48 | - **random_index():** Pick random element in range |
| 49 | - **sample_from_probs():** Categorical sampling for temperature-based generation |
| 50 | - Minimal dependencies—uses only core math (ln, sin, cos) |
| 51 | - Tests verify uniform range, Gaussian mean, and probability sampling |
| 52 | |
| 53 | ### src/tensor.rs (145 lines) |
| 54 | Flat-array tensor math (row-major storage): |
| 55 | - **vec_mat_mul():** Vector-matrix multiplication x @ W |
| 56 | - **vec_mat_mul_backward_x/w():** Gradient computation for backward pass |
| 57 | - **vec_add/vec_add_inplace/vec_dot:** Element-wise operations |
| 58 | - **softmax():** Numerically stable (max-subtraction trick) |
| 59 | - **relu():** Element-wise ReLU |
| 60 | - **vec_scale():** Scalar multiplication |
| 61 | - Tests include numerical gradient checks |
| 62 | |
| 63 | ### src/optimizer.rs (57 lines) |
| 64 | Adam optimizer: |
| 65 | - Maintains first moment (m) and second moment (v) estimates |
| 66 | - Configurable beta1 (0.85) and beta2 (0.99) for momentum and adaptive learning rate |
| 67 | - Bias correction in each step |
| 68 | - Epsilon=1e-8 for numerical stability |
| 69 | - Test: converges x from 5.0 to ~0 minimizing f(x)=x² |
| 70 | |
| 71 | ### Cargo.toml |
| 72 | - Package: esp32gpt 0.1.0 (2021 edition) |
| 73 | - Dependencies: |
| 74 | - log 0.4, anyhow 1 |
| 75 | - Conditional: esp-idf-hal/svc/sys for xtensa (ESP32), env_logger for native |
| 76 | - Build dependencies: embuild 0.32 with espidf feature |
| 77 | - Optimization: opt-level=2, debug info enabled, LTO disabled |
| 78 | |
| 79 | ### Makefile |
| 80 | Build, flash, and test targets: |
| 81 | - **build:** Compiles for ESP32 (xtensa) with custom partition table |
| 82 | - **flash:** Flashes to ESP32 and starts monitor (with retry) |
| 83 | - **monitor:** Starts serial monitor |
| 84 | - **erase:** Erases ESP32 flash |
| 85 | - **test:** Runs tests on native target (aarch64-apple-darwin) |
| 86 | - **clean:** Removes build artifacts |
| 87 | - Configures ESP toolchain, xtensa compiler, and libclang path |
| 88 | |
| 89 | ## Total Line Count |
| 90 | **1,044 lines** of Rust source code across all .rs files in src/ |
| 91 | agentId: afadb563b62b8d1a6 (for resuming to continue this agent's work if needed) |
| 92 | <usage>total_tokens: 28860 |
| 93 | tool_uses: 11 |
| 94 | duration_ms: 23295</usage> |
| 1 | Based on my analysis of the esp32gpt project files, here's a comprehensive summary for your CLAUDE.md: |
| 2 | |
| 3 | ## Project Summary |
| 4 | |
| 5 | **esp32gpt** is a Rust project that implements a micro-scale GPT model (name generator) trained and running on an ESP32 microcontroller. It's a complete machine learning pipeline in embedded systems context. |
| 6 | |
| 7 | ## Key Configuration Details |
| 8 | |
| 9 | ### Build System & Toolchain |
| 10 | - **Target**: `xtensa-esp32-espidf` (ESP-IDF framework) |
| 11 | - **ESP-IDF Version**: v5.3 |
| 12 | - **MCU**: ESP32 with 4MB flash |
| 13 | - **Rust Edition**: 2021 with resolver v2 |
| 14 | - **Toolchain Setup**: Custom PATH configuration in Makefile pointing to: |
| 15 | - ESP Rust toolchain at `~/.rustup/toolchains/esp/` |
| 16 | - Xtensa compiler at esp-15.2.0_20250920 |
| 17 | - ESP-Clang at esp-20.1.1_20250829 |
| 18 | |
| 19 | ### Build Targets & Conditional Compilation |
| 20 | The project uses aggressive conditional compilation: |
| 21 | - **Xtensa (ESP32)**: Links `esp-idf-hal`, `esp-idf-svc`, `esp-idf-sys` for embedded functionality |
| 22 | - **Non-Xtensa (development)**: Uses `env_logger` for logging on desktop |
| 23 | - **build.rs** only runs ESP-IDF setup when TARGET contains "xtensa" |
| 24 | |
| 25 | ### Important Compilation Flags |
| 26 | - Both dev and release profiles use `opt-level = 2` (not aggressive optimization to preserve debugging) |
| 27 | - Release builds have debug symbols enabled (`debug = true`) |
| 28 | - LTO is disabled in release (`lto = "off"`) |
| 29 | |
| 30 | ### ESP-IDF Configuration (sdkconfig.defaults) |
| 31 | - **No WiFi**: Explicitly disabled to free 50-80KB SRAM for model training |
| 32 | - **Memory**: Generous stack allocations for matrix operations: |
| 33 | - Main task stack: 32KB |
| 34 | - Pthread default: 4KB |
| 35 | - **Custom partition table**: Uses custom partitions.csv for app allocation |
| 36 | - **Flash**: 4MB configured |
| 37 | - **Logging**: Default level set to INFO |
| 38 | |
| 39 | ### Partition Table (partitions.csv) |
| 40 | Standard 3-partition layout: |
| 41 | - **nvs** (0x6000 bytes): Non-volatile storage |
| 42 | - **phy_init** (0x1000 bytes): PHY initialization data |
| 43 | - **factory** (0x3F0000 bytes): Main application (4MB - 64KB reserved) |
| 44 | |
| 45 | ### Main Application (src/main.rs) |
| 46 | The app demonstrates: |
| 47 | - **Dataset**: Embedded text file (`data/names.txt`) loaded at compile-time |
| 48 | - **Training**: 1000 steps with Adam optimizer (lr: 0.01, momentum: 0.85/0.99) |
| 49 | - **Model**: Multi-head transformer with configurable embedding dimension |
| 50 | - **Inference**: Temperature-based name generation (T=0.8) |
| 51 | - **Conditional Logging**: `#[cfg]` blocks for ESP32 vs. desktop initialization |
| 52 | - **Safety**: Halts with infinite sleep loop on ESP32 instead of returning (prevents watchdog reset) |
| 53 | |
| 54 | ### Makefile Targets |
| 55 | - `build`: Compiles with custom partition config |
| 56 | - `flash`: Flashes binary with 115200 baud, includes retry logic |
| 57 | - `monitor`: Opens serial monitor via espflash |
| 58 | - `erase`: Full chip erase via esptool |
| 59 | - `test`: Runs unit tests on aarch64-apple-darwin (desktop) |
| 60 | - `clean`: Removes build artifacts |
| 61 | |
| 62 | ### Dependencies |
| 63 | **Runtime**: |
| 64 | - `log` (0.4): Logging facade |
| 65 | - `anyhow` (1): Error handling |
| 66 | - Platform-specific: esp-idf-{hal,svc,sys} for embedded or env_logger for desktop |
| 67 | |
| 68 | **Build**: |
| 69 | - `embuild` (0.32): ESP-IDF build system integration |
| 70 | |
| 71 | ## Important Notes |
| 72 | |
| 73 | 1. **Dual-Mode Architecture**: Code is designed to compile for both ESP32 hardware and desktop (for testing) |
| 74 | 2. **Memory Constraints**: No WiFi enabled specifically to preserve SRAM for ML model |
| 75 | 3. **Model Training on Device**: Training happens on-device, not just inference |
| 76 | 4. **.gitignore is MISSING**: The project does not have a .gitignore file — one should be created to exclude: |
| 77 | - `target/` directory (build artifacts) |
| 78 | - `sdkconfig` files (generated) |
| 79 | - `Cargo.lock` (debatable, but common for binaries) |
| 80 | - IDE files (.vscode, .idea, etc.) |
| 81 | 5. **Custom Build Script**: The Makefile generates `target/sdkconfig.partitions` at build time to pass partition config |
| 82 | 6. **Compiler Versions**: Uses specific, pinned versions of Xtensa toolchain (esp-15.2.0, esp-clang 20.1.1) |
| 83 | agentId: af59108f243e0b857 (for resuming to continue this agent's work if needed) |
| 84 | <usage>total_tokens: 16396 |
| 85 | tool_uses: 8 |
| 86 | duration_ms: 16838</usage> |
| 1 | # ESP32 Rust Port of microgpt |
| 2 | |
| 3 | ## Context |
| 4 | |
| 5 | Karpathy's [microgpt](https://karpathy.github.io/2026/02/12/microgpt/) is a ~200-line pure Python GPT that trains and runs inference with zero dependencies. It includes a scalar autograd engine, a 1-layer transformer (~4,192 params), and trains on a names dataset. The goal is to port this to Rust and run it on an ESP32 — training and inference on-device, output via serial monitor. |
| 6 | |
| 7 | The existing [stripe-dashboard](/Users/[REDACTED]/Projects/stripe-dashboard) project provides a proven Rust+ESP32 build setup to replicate. |
| 8 | |
| 9 | ## Key Design Decision: No Scalar Autograd |
| 10 | |
| 11 | The Python version creates a computation graph node per scalar operation (~30K-50K nodes per forward pass, 1-2MB). This exceeds ESP32's 520KB SRAM. Instead, we implement **explicit matrix-level forward and backward passes**, storing only the activations needed for backprop (~27KB). This is the standard approach in production ML frameworks and keeps memory bounded. |
| 12 | |
| 13 | ## Model Specs (matching the Python original) |
| 14 | |
| 15 | | Parameter | Value | |
| 16 | |---|---| |
| 17 | | Embedding dim | 16 | |
| 18 | | Attention heads | 4 | |
| 19 | | Layers | 1 | |
| 20 | | Block size | 16 | |
| 21 | | Vocab size | 27 (a-z + BOS) | |
| 22 | | Total params | ~4,192 | |
| 23 | | Training steps | 1,000 | |
| 24 | | Optimizer | Adam (lr=0.01, β1=0.85, β2=0.99) | |
| 25 | |
| 26 | ## Memory Budget (~520KB SRAM, ~300KB usable without WiFi) |
| 27 | |
| 28 | - Model parameters: 4,192 × 4 bytes = **~17KB** |
| 29 | - Gradients: **~17KB** |
| 30 | - Adam state (m + v): **~34KB** |
| 31 | - Activations cache for backprop: **~27KB** |
| 32 | - Dataset line offsets: **~5KB** (indices only; raw text stays in flash) |
| 33 | - Stack + overhead: **~50KB** |
| 34 | - **Total: ~150KB** — fits comfortably |
| 35 | |
| 36 | ## Project Structure |
| 37 | |
| 38 | ``` |
| 39 | esp32gpt/ |
| 40 | ├── .cargo/ |
| 41 | │ └── config.toml # ESP32 target, linker, build-std |
| 42 | ├── src/ |
| 43 | │ ├── main.rs # ESP entry point, training loop, inference loop |
| 44 | │ ├── tensor.rs # Simple Matrix struct (Vec<f32> + shape), basic ops |
| 45 | │ ├── model.rs # GPT forward pass, parameter storage, weight init |
| 46 | │ ├── backward.rs # Manual backward pass for all ops (the hard part) |
| 47 | │ ├── optimizer.rs # Adam optimizer over flat parameter buffer |
| 48 | │ ├── tokenizer.rs # Char-level encode/decode (a-z + BOS) |
| 49 | │ └── rng.rs # Xorshift32 PRNG + Box-Muller for Gaussian init |
| 50 | ├── data/ |
| 51 | │ └── names.txt # Training dataset (embedded via include_str!) |
| 52 | ├── Cargo.toml |
| 53 | ├── build.rs # embuild ESP-IDF setup |
| 54 | ├── sdkconfig.defaults # No WiFi, generous stack |
| 55 | ├── partitions.csv # Single app partition (no OTA needed) |
| 56 | └── Makefile # Build/flash/monitor commands |
| 57 | ``` |
| 58 | |
| 59 | ## Implementation Steps |
| 60 | |
| 61 | ### Step 1: Project scaffolding |
| 62 | Create the ESP32 project skeleton replicating the build setup from stripe-dashboard: |
| 63 | - `Cargo.toml` with esp-idf-hal/svc/sys, embuild, log, anyhow |
| 64 | - `.cargo/config.toml` targeting xtensa-esp32-espidf |
| 65 | - `build.rs` calling `embuild::espidf::sysenv::output()` |
| 66 | - `sdkconfig.defaults` disabling WiFi, setting stack size to 16384 |
| 67 | - `partitions.csv` with a single app partition |
| 68 | - `Makefile` with build/flash/monitor targets |
| 69 | - Minimal `main.rs` that boots and logs "hello" to serial |
| 70 | |
| 71 | ### Step 2: Core math — `tensor.rs` |
| 72 | Simple `Matrix` struct: |
| 73 | - `data: Vec<f32>`, `rows: usize`, `cols: usize` |
| 74 | - Operations: matmul, add, element-wise multiply, transpose, softmax, ReLU, scaled |
| 75 | - All operations return new matrices (no in-place mutation needed at this scale) |
| 76 | |
| 77 | ### Step 3: Tokenizer — `tokenizer.rs` |
| 78 | - `encode(name: &str) -> Vec<usize>` — BOS (0) + char indices (a=1..z=26) |
| 79 | - `decode(token: usize) -> char` |
| 80 | - `VOCAB_SIZE = 27`, `BOS = 0` |
| 81 | |
| 82 | ### Step 4: RNG — `rng.rs` |
| 83 | - Xorshift32 PRNG (seeded from ESP32 hardware RNG or fixed seed) |
| 84 | - `next_f32()` → uniform [0, 1) |
| 85 | - `next_gaussian()` → Box-Muller transform for weight initialization |
| 86 | - `sample_from_probs(probs: &[f32]) -> usize` → categorical sampling |
| 87 | |
| 88 | ### Step 5: Model forward pass — `model.rs` |
| 89 | Flat parameter buffer with named offset ranges: |
| 90 | - `wte` — token embedding (27 × 16) |
| 91 | - `wpe` — position embedding (16 × 16) |
| 92 | - `wq`, `wk`, `wv`, `wo` — attention projections (16 × 16 each) |
| 93 | - `w1` — FFN up-projection (16 × 64) |
| 94 | - `w2` — FFN down-projection (64 × 16) |
| 95 | - `wout` — output projection (16 × 27) |
| 96 | |
| 97 | Forward pass processes tokens sequentially (like the Python KV cache approach): |
| 98 | 1. Look up token + position embeddings, sum them |
| 99 | 2. Compute Q for current position, K and V appended to cache |
| 100 | 3. Attention: Q @ K^T / sqrt(d), causal mask, softmax, @ V |
| 101 | 4. Residual connection |
| 102 | 5. FFN: ReLU(x @ W1) @ W2 + residual |
| 103 | 6. Output logits: hidden @ Wout |
| 104 | |
| 105 | Store all intermediate activations in a cache struct for backward pass. |
| 106 | |
| 107 | ### Step 6: Backward pass — `backward.rs` |
| 108 | Manual gradient computation mirroring each forward step in reverse: |
| 109 | - Cross-entropy loss gradient → output projection grad |
| 110 | - FFN backward (with ReLU mask) |
| 111 | - Attention backward (Q/K/V grads with accumulated KV cache grads) |
| 112 | - Embedding gradients (scatter-add into wte/wpe grad rows) |
| 113 | |
| 114 | This is the most complex and error-prone file. We'll validate correctness with numerical gradient checking in tests. |
| 115 | |
| 116 | ### Step 7: Adam optimizer — `optimizer.rs` |
| 117 | Simple loop over the flat parameter + gradient buffers: |
| 118 | - First moment: `m = β1 * m + (1 - β1) * grad` |
| 119 | - Second moment: `v = β2 * v + (1 - β2) * grad²` |
| 120 | - Bias correction + parameter update |
| 121 | - Learning rate with linear decay over 1000 steps |
| 122 | |
| 123 | ### Step 8: Training loop + inference — `main.rs` |
| 124 | - Load dataset from `include_str!("../data/names.txt")` |
| 125 | - Parse into list of encoded names |
| 126 | - Training loop: sample random name, forward, loss, backward, optimizer step |
| 127 | - Every 100 steps: log loss to serial |
| 128 | - After training: generate 20 names via autoregressive sampling with temperature=0.8 |
| 129 | - Log generated names to serial |
| 130 | |
| 131 | ### Step 9: Testing |
| 132 | - All core modules (tensor, model, backward, optimizer, tokenizer, rng) are ESP-independent |
| 133 | - `cargo test` on host with numerical gradient verification for backward pass |
| 134 | - Flash to ESP32 and verify training + generation via serial monitor |
| 135 | |
| 136 | ## Files to Reference |
| 137 | |
| 138 | - [stripe-dashboard/.cargo/config.toml](/Users/[REDACTED]/Projects/stripe-dashboard/.cargo/config.toml) — ESP32 target config |
| 139 | - [stripe-dashboard/Cargo.toml](/Users/[REDACTED]/Projects/stripe-dashboard/Cargo.toml) — dependency versions |
| 140 | - [stripe-dashboard/build.rs](/Users/[REDACTED]/Projects/stripe-dashboard/build.rs) — embuild setup |
| 141 | - [stripe-dashboard/Makefile](/Users/[REDACTED]/Projects/stripe-dashboard/Makefile) — toolchain PATH setup |
| 142 | - [stripe-dashboard/sdkconfig.defaults](/Users/[REDACTED]/Projects/stripe-dashboard/sdkconfig.defaults) — ESP-IDF config reference |
| 143 | |
| 144 | ## Verification |
| 145 | |
| 146 | 1. `cargo test` — run unit tests on host (tensor ops, gradient checking, tokenizer) |
| 147 | 2. `make build` — confirm it compiles for ESP32 |
| 148 | 3. `make flash` — flash to device, observe serial output: |
| 149 | - Training loss decreasing over 1000 steps |
| 150 | - 20 generated names that look plausible (e.g. "emma", "alex", not random garbage) |
| 1 | BINARY := target/xtensa-esp32-espidf/debug/stripe-dashboard |
| 2 | RELEASE_BINARY := target/xtensa-esp32-espidf/release/stripe-dashboard |
| 3 | |
| 4 | export PATH := $(HOME)/.cargo/bin:$(HOME)/.rustup/toolchains/esp/bin:$(HOME)/.rustup/toolchains/esp/xtensa-esp-elf/esp-15.2.0_20250920/xtensa-esp-elf/bin:$(PATH) |
| 5 | export LIBCLANG_PATH := $(HOME)/.rustup/toolchains/esp/xtensa-esp32-elf-clang/esp-20.1.1_20250829/esp-clang/lib |
| 6 | export RUSTUP_TOOLCHAIN := esp |
| 7 | |
| 8 | VERSION := $(shell grep '^version' Cargo.toml | head -1 | sed 's/.*"\(.*\)"/\1/') |
| 9 | CODENAME := $(shell grep '^codename' release.toml 2>/dev/null | sed 's/.*"\(.*\)"/\1/') |
| 10 | CODENAME_LOWER := $(shell echo "$(CODENAME)" | tr 'A-Z' 'a-z') |
| 11 | |
| 12 | # Optional features (e.g. AUTO_UPDATE=1 make build) |
| 13 | CARGO_FEATURES := |
| 14 | ifdef AUTO_UPDATE |
| 15 | CARGO_FEATURES += auto-update |
| 16 | endif |
| 17 | ifneq ($(CARGO_FEATURES),) |
| 18 | FEATURES_FLAG := --features "$(strip $(CARGO_FEATURES))" |
| 19 | endif |
| 20 | |
| 21 | .PHONY: build flash monitor erase clean test sim web web-serve build-4mb flash-4mb release release-4mb release-full release-full-4mb guide |
| 22 | |
| 23 | build: |
| 24 | @mkdir -p target |
| 25 | @echo 'CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="$(CURDIR)/partitions.csv"' > target/sdkconfig.partitions |
| 26 | FLASH_VARIANT=8mb ESP_IDF_SDKCONFIG_DEFAULTS="$(CURDIR)/sdkconfig.defaults;$(CURDIR)/target/sdkconfig.partitions" cargo build $(FEATURES_FLAG) |
| 27 | |
| 28 | flash: build |
| 29 | @until espflash flash --baud 115200 --partition-table partitions.csv $(BINARY); do \ |
| 30 | echo "Flash failed, retrying..."; \ |
| 31 | sleep 1; \ |
| 32 | done |
| 33 | espflash monitor |
| 34 | |
| 35 | build-4mb: |
| 36 | @mkdir -p target |
| 37 | @echo 'CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="$(CURDIR)/partitions-4mb.csv"' > target/sdkconfig.partitions |
| 38 | FLASH_VARIANT=4mb ESP_IDF_SDKCONFIG_DEFAULTS="$(CURDIR)/sdkconfig.defaults;$(CURDIR)/sdkconfig-4mb.defaults;$(CURDIR)/target/sdkconfig.partitions" cargo build $(FEATURES_FLAG) |
| 39 | |
| 40 | flash-4mb: build-4mb |
| 41 | @until espflash flash --baud 115200 --partition-table partitions-4mb.csv --flash-size 4mb $(BINARY); do \ |
| 42 | echo "Flash failed, retrying..."; \ |
| 43 | sleep 1; \ |
| 44 | done |
| 45 | espflash monitor |
| 46 | |
| 47 | monitor: |
| 48 | espflash monitor |
| 49 | |
| 50 | erase: |
| 51 | uvx esptool --chip esp32 erase-flash |
| 52 | |
| 53 | test: |
| 54 | @echo "==> Testing sprite-gen" |
| 55 | cd sprite-gen && RUSTUP_TOOLCHAIN=stable cargo test |
| 56 | @echo "==> Testing stripe-core" |
| 57 | cd stripe-core && RUSTUP_TOOLCHAIN=stable cargo test |
| 58 | @echo "==> Testing display-core" |
| 59 | cd display-core && RUSTUP_TOOLCHAIN=stable cargo test |
| 60 | |
| 61 | sim: |
| 62 | cd simulator && RUSTUP_TOOLCHAIN=stable cargo run |
| 63 | |
| 64 | web: |
| 65 | cd web-sim && RUSTUP_TOOLCHAIN=stable wasm-pack build --target web --release --out-dir www/pkg |
| 66 | |
| 67 | web-serve: web |
| 68 | cd web-sim/www && python3 -m http.server 8080 |
| 69 | |
| 70 | release: |
| 71 | @if [ -z "$(CODENAME)" ]; then echo "Error: No codename in release.toml"; exit 1; fi |
| 72 | @mkdir -p target |
| 73 | @echo 'CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="$(CURDIR)/partitions.csv"' > target/sdkconfig.partitions |
| 74 | @echo 'CONFIG_APP_PROJECT_VER_FROM_CONFIG=y' >> target/sdkconfig.partitions |
| 75 | @echo 'CONFIG_APP_PROJECT_VER="$(VERSION) $(CODENAME) 8mb"' >> target/sdkconfig.partitions |
| 76 | RELEASE_NAME="$(CODENAME)" \ |
| 77 | FLASH_VARIANT=8mb \ |
| 78 | ESP_IDF_SDKCONFIG_DEFAULTS="$(CURDIR)/sdkconfig.defaults;$(CURDIR)/target/sdkconfig.partitions" \ |
| 79 | cargo build --release $(FEATURES_FLAG) |
| 80 | espflash save-image --chip esp32 $(RELEASE_BINARY) \ |
| 81 | target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-8mb.bin |
| 82 | @echo "" |
| 83 | @echo "Release build complete:" |
| 84 | @ls -lh target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-8mb.bin |
| 85 | @echo "Version: $(VERSION) $(CODENAME)" |
| 86 | |
| 87 | release-full: release |
| 88 | espflash save-image --chip esp32 --merge \ |
| 89 | --partition-table partitions.csv \ |
| 90 | $(RELEASE_BINARY) \ |
| 91 | target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-8mb-full.bin |
| 92 | @echo "" |
| 93 | @echo "Full flash image (bootloader + partition table + app):" |
| 94 | @ls -lh target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-8mb-full.bin |
| 95 | @echo "Flash with: espflash write-bin 0x0 target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-8mb-full.bin" |
| 96 | |
| 97 | release-4mb: |
| 98 | @if [ -z "$(CODENAME)" ]; then echo "Error: No codename in release.toml"; exit 1; fi |
| 99 | @mkdir -p target |
| 100 | @echo 'CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="$(CURDIR)/partitions-4mb.csv"' > target/sdkconfig.partitions |
| 101 | @echo 'CONFIG_APP_PROJECT_VER_FROM_CONFIG=y' >> target/sdkconfig.partitions |
| 102 | @echo 'CONFIG_APP_PROJECT_VER="$(VERSION) $(CODENAME) 4mb"' >> target/sdkconfig.partitions |
| 103 | RELEASE_NAME="$(CODENAME)" \ |
| 104 | FLASH_VARIANT=4mb \ |
| 105 | ESP_IDF_SDKCONFIG_DEFAULTS="$(CURDIR)/sdkconfig.defaults;$(CURDIR)/sdkconfig-4mb.defaults;$(CURDIR)/target/sdkconfig.partitions" \ |
| 106 | cargo build --release $(FEATURES_FLAG) |
| 107 | espflash save-image --chip esp32 --flash-size 4mb $(RELEASE_BINARY) \ |
| 108 | target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-4mb.bin |
| 109 | @echo "" |
| 110 | @echo "Release build complete:" |
| 111 | @ls -lh target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-4mb.bin |
| 112 | @echo "Version: $(VERSION) $(CODENAME)" |
| 113 | |
| 114 | release-full-4mb: release-4mb |
| 115 | espflash save-image --chip esp32 --merge --flash-size 4mb \ |
| 116 | --partition-table partitions-4mb.csv \ |
| 117 | $(RELEASE_BINARY) \ |
| 118 | target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-4mb-full.bin |
| 119 | @echo "" |
| 120 | @echo "Full flash image (bootloader + partition table + app):" |
| 121 | @ls -lh target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-4mb-full.bin |
| 122 | @echo "Flash with: espflash write-bin 0x0 target/stripe-dashboard-$(VERSION)-$(CODENAME_LOWER)-4mb-full.bin" |
| 123 | |
| 124 | guide: |
| 125 | cd guide-capture && RUSTUP_TOOLCHAIN=stable cargo run |
| 126 | cd guide-capture && uv run embed_images.py |
| 127 | |
| 128 | clean: |
| 129 | rm -rf ./target |
| 130 | cargo clean |
| 131 | |
| 1 | use flate2::write::GzEncoder; |
| 2 | use flate2::Compression; |
| 3 | use std::io::Write; |
| 4 | use std::path::PathBuf; |
| 5 | use tera::{Context, Tera}; |
| 6 | |
| 7 | fn render_template(name: &str, source: &str, ctx: &Context) -> String { |
| 8 | let mut tera = Tera::default(); |
| 9 | tera.add_raw_template(name, source) |
| 10 | .unwrap_or_else(|e| panic!("Failed to parse template {}: {}", name, e)); |
| 11 | tera.render(name, ctx) |
| 12 | .unwrap_or_else(|e| panic!("Failed to render template {}: {}", name, e)) |
| 13 | } |
| 14 | |
| 15 | fn main() { |
| 16 | embuild::espidf::sysenv::output(); |
| 17 | |
| 18 | // Re-run if the partition table changes so the build picks up updates. |
| 19 | println!("cargo:rerun-if-changed=partitions.csv"); |
| 20 | println!("cargo:rerun-if-changed=partitions-4mb.csv"); |
| 21 | println!("cargo:rerun-if-changed=release.toml"); |
| 22 | println!("cargo:rerun-if-changed=web/setup.html"); |
| 23 | println!("cargo:rerun-if-changed=web/index.html"); |
| 24 | |
| 25 | // Pass RELEASE_NAME env var through to the crate (set by `make release`). |
| 26 | if let Ok(name) = std::env::var("RELEASE_NAME") { |
| 27 | println!("cargo:rustc-env=RELEASE_NAME={}", name); |
| 28 | } |
| 29 | |
| 30 | // Pass FLASH_VARIANT (8mb/4mb) so the firmware knows which OTA asset to download. |
| 31 | println!( |
| 32 | "cargo:rustc-env=FLASH_VARIANT={}", |
| 33 | std::env::var("FLASH_VARIANT").unwrap_or_else(|_| "8mb".into()) |
| 34 | ); |
| 35 | println!("cargo:rerun-if-env-changed=FLASH_VARIANT"); |
| 36 | |
| 37 | // Template context for web portal HTML. |
| 38 | let mut ctx = Context::new(); |
| 39 | |
| 40 | // Stripe App install link: set STRIPE_APP_URL to include "Quick Setup" |
| 41 | // sections in the web portal. When unset, those sections are hidden and |
| 42 | // only the manual key setup flow is shown. |
| 43 | let stripe_app_url = std::env::var("STRIPE_APP_URL").ok(); |
| 44 | println!("cargo:rerun-if-env-changed=STRIPE_APP_URL"); |
| 45 | ctx.insert("stripe_app_url", &stripe_app_url); |
| 46 | |
| 47 | // Auto-update UI sections are only included when the feature is enabled. |
| 48 | let has_auto_update = std::env::var("CARGO_FEATURE_AUTO_UPDATE").is_ok(); |
| 49 | ctx.insert("auto_update", &has_auto_update); |
| 50 | |
| 51 | // Gzip-compress web portal HTML at build time. |
| 52 | // Served with Content-Encoding: gzip — browsers decompress transparently. |
| 53 | let out_dir = PathBuf::from(std::env::var("OUT_DIR").unwrap()); |
| 54 | for name in &["setup", "index"] { |
| 55 | let raw = std::fs::read_to_string(format!("web/{}.html", name)) |
| 56 | .unwrap_or_else(|e| panic!("Failed to read web/{}.html: {}", name, e)); |
| 57 | |
| 58 | let html = render_template(name, &raw, &ctx); |
| 59 | |
| 60 | let gz_path = out_dir.join(format!("{}.html.gz", name)); |
| 61 | let file = std::fs::File::create(&gz_path).unwrap(); |
| 62 | let mut encoder = GzEncoder::new(file, Compression::best()); |
| 63 | encoder.write_all(html.as_bytes()).unwrap(); |
| 64 | encoder.finish().unwrap(); |
| 65 | } |
| 66 | } |
| 67 | |
| 1 | /// Lightweight tensor math operating on flat f32 slices. |
| 2 | /// |
| 3 | /// All "matrices" are stored row-major: M[i][j] = slice[i * cols + j]. |
| 4 | /// Vectors are just &[f32] slices of the appropriate length. |
| 5 | |
| 6 | /// y = x @ W where x: [n], W: [n × m] row-major, y: [m]. |
| 7 | pub fn vec_mat_mul(x: &[f32], w: &[f32], n: usize, m: usize) -> Vec<f32> { |
| 8 | debug_assert_eq!(x.len(), n); |
| 9 | debug_assert_eq!(w.len(), n * m); |
| 10 | let mut y = vec![0.0; m]; |
| 11 | for i in 0..n { |
| 12 | let xi = x[i]; |
| 13 | let row = &w[i * m..(i + 1) * m]; |
| 14 | for j in 0..m { |
| 15 | y[j] += xi * row[j]; |
| 16 | } |
| 17 | } |
| 18 | y |
| 19 | } |
| 20 | |
| 21 | /// Backward of y = x @ W w.r.t. x: d_x[i] = sum_j d_y[j] * W[i][j]. |
| 22 | pub fn vec_mat_mul_backward_x(d_y: &[f32], w: &[f32], n: usize, m: usize) -> Vec<f32> { |
| 23 | debug_assert_eq!(d_y.len(), m); |
| 24 | debug_assert_eq!(w.len(), n * m); |
| 25 | let mut d_x = vec![0.0; n]; |
| 26 | for i in 0..n { |
| 27 | let row = &w[i * m..(i + 1) * m]; |
| 28 | let mut sum = 0.0; |
| 29 | for j in 0..m { |
| 30 | sum += d_y[j] * row[j]; |
| 31 | } |
| 32 | d_x[i] = sum; |
| 33 | } |
| 34 | d_x |
| 35 | } |
| 36 | |
| 37 | /// Backward of y = x @ W w.r.t. W: d_W[i][j] += x[i] * d_y[j] (accumulates). |
| 38 | pub fn vec_mat_mul_backward_w(x: &[f32], d_y: &[f32], d_w: &mut [f32], n: usize, m: usize) { |
| 39 | debug_assert_eq!(x.len(), n); |
| 40 | debug_assert_eq!(d_y.len(), m); |
| 41 | debug_assert_eq!(d_w.len(), n * m); |
| 42 | for i in 0..n { |
| 43 | let xi = x[i]; |
| 44 | let row = &mut d_w[i * m..(i + 1) * m]; |
| 45 | for j in 0..m { |
| 46 | row[j] += xi * d_y[j]; |
| 47 | } |
| 48 | } |
| 49 | } |
| 50 | |
| 51 | /// Element-wise a + b. |
| 52 | pub fn vec_add(a: &[f32], b: &[f32]) -> Vec<f32> { |
| 53 | debug_assert_eq!(a.len(), b.len()); |
| 54 | a.iter().zip(b.iter()).map(|(x, y)| x + y).collect() |
| 55 | } |
| 56 | |
| 57 | /// a += b in place. |
| 58 | pub fn vec_add_inplace(a: &mut [f32], b: &[f32]) { |
| 59 | debug_assert_eq!(a.len(), b.len()); |
| 60 | for (ai, bi) in a.iter_mut().zip(b.iter()) { |
| 61 | *ai += bi; |
| 62 | } |
| 63 | } |
| 64 | |
| 65 | /// Dot product. |
| 66 | pub fn vec_dot(a: &[f32], b: &[f32]) -> f32 { |
| 67 | debug_assert_eq!(a.len(), b.len()); |
| 68 | a.iter().zip(b.iter()).map(|(x, y)| x * y).sum() |
| 69 | } |
| 70 | |
| 71 | /// Softmax of a slice, returning a new Vec. |
| 72 | pub fn softmax(v: &[f32]) -> Vec<f32> { |
| 73 | let max = v.iter().cloned().fold(f32::NEG_INFINITY, f32::max); |
| 74 | let exps: Vec<f32> = v.iter().map(|&x| (x - max).exp()).collect(); |
| 75 | let sum: f32 = exps.iter().sum(); |
| 76 | exps.iter().map(|&e| e / sum).collect() |
| 77 | } |
| 78 | |
| 79 | /// Element-wise ReLU. |
| 80 | pub fn relu(v: &[f32]) -> Vec<f32> { |
| 81 | v.iter().map(|&x| if x > 0.0 { x } else { 0.0 }).collect() |
| 82 | } |
| 83 | |
| 84 | /// Scale each element by s. |
| 85 | pub fn vec_scale(v: &[f32], s: f32) -> Vec<f32> { |
| 86 | v.iter().map(|&x| x * s).collect() |
| 87 | } |
| 88 | |
| 89 | #[cfg(test)] |
| 90 | mod tests { |
| 91 | use super::*; |
| 92 | |
| 93 | #[test] |
| 94 | fn test_vec_mat_mul() { |
| 95 | // x = [1, 2], W = [[3, 4], [5, 6]] (row-major: [3, 4, 5, 6]) |
| 96 | // y = [1*3 + 2*5, 1*4 + 2*6] = [13, 16] |
| 97 | let x = [1.0, 2.0]; |
| 98 | let w = [3.0, 4.0, 5.0, 6.0]; |
| 99 | let y = vec_mat_mul(&x, &w, 2, 2); |
| 100 | assert_eq!(y, vec![13.0, 16.0]); |
| 101 | } |
| 102 | |
| 103 | #[test] |
| 104 | fn test_backward_x() { |
| 105 | // Numerical gradient check for vec_mat_mul w.r.t. x |
| 106 | let x = [1.0, 2.0, 3.0]; |
| 107 | let w = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6]; |
| 108 | let d_y = [1.0, 1.0]; |
| 109 | let d_x = vec_mat_mul_backward_x(&d_y, &w, 3, 2); |
| 110 | |
| 111 | let eps = 1e-4; |
| 112 | for i in 0..3 { |
| 113 | let mut x_plus = x.to_vec(); |
| 114 | x_plus[i] += eps; |
| 115 | let mut x_minus = x.to_vec(); |
| 116 | x_minus[i] -= eps; |
| 117 | let y_plus = vec_mat_mul(&x_plus, &w, 3, 2); |
| 118 | let y_minus = vec_mat_mul(&x_minus, &w, 3, 2); |
| 119 | let numerical: f32 = y_plus.iter().zip(y_minus.iter()) |
| 120 | .zip(d_y.iter()) |
| 121 | .map(|((yp, ym), dy)| (yp - ym) / (2.0 * eps) * dy) |
| 122 | .sum(); |
| 123 | assert!((d_x[i] - numerical).abs() < 1e-3, |
| 124 | "d_x[{}]: analytic={}, numerical={}", i, d_x[i], numerical); |
| 125 | } |
| 126 | } |
| 127 | |
| 128 | #[test] |
| 129 | fn test_softmax() { |
| 130 | let v = [1.0, 2.0, 3.0]; |
| 131 | let s = softmax(&v); |
| 132 | let sum: f32 = s.iter().sum(); |
| 133 | assert!((sum - 1.0).abs() < 1e-6); |
| 134 | assert!(s[2] > s[1] && s[1] > s[0]); |
| 135 | } |
| 136 | |
| 137 | #[test] |
| 138 | fn test_relu() { |
| 139 | let v = [-1.0, 0.0, 1.0, -0.5, 2.0]; |
| 140 | let r = relu(&v); |
| 141 | assert_eq!(r, vec![0.0, 0.0, 1.0, 0.0, 2.0]); |
| 142 | } |
| 143 | } |
| 1 | /// GPT model: 1-layer transformer with multi-head self-attention. |
| 2 | /// |
| 3 | /// All parameters live in a single flat f32 buffer. The forward pass processes |
| 4 | /// tokens sequentially, building a KV cache, and stores all intermediate |
| 5 | /// activations needed by the backward pass. |
| 6 | |
| 7 | use crate::rng::Rng; |
| 8 | use crate::tensor::{softmax, vec_add, vec_mat_mul, relu}; |
| 9 | use crate::tokenizer::{VOCAB_SIZE, BOS}; |
| 10 | |
| 11 | // Architecture constants (matching Karpathy's microgpt). |
| 12 | pub const EMBED_DIM: usize = 16; |
| 13 | pub const NUM_HEADS: usize = 4; |
| 14 | pub const HEAD_DIM: usize = EMBED_DIM / NUM_HEADS; // 4 |
| 15 | pub const FFN_DIM: usize = EMBED_DIM * 4; // 64 |
| 16 | pub const BLOCK_SIZE: usize = 16; |
| 17 | |
| 18 | // Parameter layout in the flat buffer. |
| 19 | pub const WTE_OFFSET: usize = 0; |
| 20 | pub const WTE_SIZE: usize = VOCAB_SIZE * EMBED_DIM; // 27 × 16 = 432 |
| 21 | |
| 22 | pub const WPE_OFFSET: usize = WTE_OFFSET + WTE_SIZE; // 432 |
| 23 | pub const WPE_SIZE: usize = BLOCK_SIZE * EMBED_DIM; // 16 × 16 = 256 |
| 24 | |
| 25 | pub const WQ_OFFSET: usize = WPE_OFFSET + WPE_SIZE; // 688 |
| 26 | pub const WQ_SIZE: usize = EMBED_DIM * EMBED_DIM; // 256 |
| 27 | |
| 28 | pub const WK_OFFSET: usize = WQ_OFFSET + WQ_SIZE; // 944 |
| 29 | pub const WK_SIZE: usize = EMBED_DIM * EMBED_DIM; // 256 |
| 30 | |
| 31 | pub const WV_OFFSET: usize = WK_OFFSET + WK_SIZE; // 1200 |
| 32 | pub const WV_SIZE: usize = EMBED_DIM * EMBED_DIM; // 256 |
| 33 | |
| 34 | pub const WO_OFFSET: usize = WV_OFFSET + WV_SIZE; // 1456 |
| 35 | pub const WO_SIZE: usize = EMBED_DIM * EMBED_DIM; // 256 |
| 36 | |
| 37 | pub const W1_OFFSET: usize = WO_OFFSET + WO_SIZE; // 1712 |
| 38 | pub const W1_SIZE: usize = EMBED_DIM * FFN_DIM; // 1024 |
| 39 | |
| 40 | pub const W2_OFFSET: usize = W1_OFFSET + W1_SIZE; // 2736 |
| 41 | pub const W2_SIZE: usize = FFN_DIM * EMBED_DIM; // 1024 |
| 42 | |
| 43 | pub const WOUT_OFFSET: usize = W2_OFFSET + W2_SIZE; // 3760 |
| 44 | pub const WOUT_SIZE: usize = EMBED_DIM * VOCAB_SIZE; // 432 |
| 45 | |
| 46 | pub const NUM_PARAMS: usize = WOUT_OFFSET + WOUT_SIZE; // 4192 |
| 47 | |
| 48 | // Slice accessors for parameter groups. |
| 49 | pub fn wte(p: &[f32]) -> &[f32] { &p[WTE_OFFSET..WTE_OFFSET + WTE_SIZE] } |
| 50 | pub fn wpe(p: &[f32]) -> &[f32] { &p[WPE_OFFSET..WPE_OFFSET + WPE_SIZE] } |
| 51 | pub fn wq(p: &[f32]) -> &[f32] { &p[WQ_OFFSET..WQ_OFFSET + WQ_SIZE] } |
| 52 | pub fn wk(p: &[f32]) -> &[f32] { &p[WK_OFFSET..WK_OFFSET + WK_SIZE] } |
| 53 | pub fn wv(p: &[f32]) -> &[f32] { &p[WV_OFFSET..WV_OFFSET + WV_SIZE] } |
| 54 | pub fn wo(p: &[f32]) -> &[f32] { &p[WO_OFFSET..WO_OFFSET + WO_SIZE] } |
| 55 | pub fn w1(p: &[f32]) -> &[f32] { &p[W1_OFFSET..W1_OFFSET + W1_SIZE] } |
| 56 | pub fn w2(p: &[f32]) -> &[f32] { &p[W2_OFFSET..W2_OFFSET + W2_SIZE] } |
| 57 | pub fn wout(p: &[f32]) -> &[f32] { &p[WOUT_OFFSET..WOUT_OFFSET + WOUT_SIZE] } |
| 58 | |
| 59 | /// Embedding row for a given token. |
| 60 | pub fn wte_row(p: &[f32], token: usize) -> &[f32] { |
| 61 | let start = WTE_OFFSET + token * EMBED_DIM; |
| 62 | &p[start..start + EMBED_DIM] |
| 63 | } |
| 64 | |
| 65 | /// Position embedding row. |
| 66 | pub fn wpe_row(p: &[f32], pos: usize) -> &[f32] { |
| 67 | let start = WPE_OFFSET + pos * EMBED_DIM; |
| 68 | &p[start..start + EMBED_DIM] |
| 69 | } |
| 70 | |
| 71 | /// Initialize all weights with small Gaussian noise. |
| 72 | pub fn init_weights(params: &mut [f32], rng: &mut Rng) { |
| 73 | for p in params.iter_mut() { |
| 74 | *p = rng.next_gaussian() * 0.1; |
| 75 | } |
| 76 | } |
| 77 | |
| 78 | /// Cached activations from the forward pass, needed by backward. |
| 79 | /// |
| 80 | /// All 2D data is stored flat with row-major indexing: `[position * dim + i]`. |
| 81 | /// Attention probs are stored as `[position * NUM_HEADS * BLOCK_SIZE + head * BLOCK_SIZE + i]`. |
| 82 | pub struct ForwardCache { |
| 83 | pub seq_len: usize, |
| 84 | pub tokens: Vec<usize>, |
| 85 | |
| 86 | // Per-position activations, flat: [pos * dim + i] |
| 87 | pub embeddings: Vec<f32>, // [BLOCK_SIZE × EMBED_DIM] |
| 88 | pub queries: Vec<f32>, // [BLOCK_SIZE × EMBED_DIM] |
| 89 | pub k_cache: Vec<f32>, // [BLOCK_SIZE × EMBED_DIM] |
| 90 | pub v_cache: Vec<f32>, // [BLOCK_SIZE × EMBED_DIM] |
| 91 | pub att_outs: Vec<f32>, // [BLOCK_SIZE × EMBED_DIM] |
| 92 | pub proj_outs: Vec<f32>, // [BLOCK_SIZE × EMBED_DIM] |
| 93 | pub residual1: Vec<f32>, // [BLOCK_SIZE × EMBED_DIM] |
| 94 | pub ffn_hidden: Vec<f32>, // [BLOCK_SIZE × FFN_DIM] |
| 95 | pub ffn_relu: Vec<f32>, // [BLOCK_SIZE × FFN_DIM] |
| 96 | pub ffn_out: Vec<f32>, // [BLOCK_SIZE × EMBED_DIM] |
| 97 | pub residual2: Vec<f32>, // [BLOCK_SIZE × EMBED_DIM] |
| 98 | pub logits: Vec<f32>, // [BLOCK_SIZE × VOCAB_SIZE] |
| 99 | |
| 100 | // Attention probs: [pos * NUM_HEADS * BLOCK_SIZE + head * BLOCK_SIZE + i] |
| 101 | pub att_probs: Vec<f32>, // [BLOCK_SIZE × NUM_HEADS × BLOCK_SIZE] |
| 102 | } |
| 103 | |
| 104 | impl ForwardCache { |
| 105 | pub fn new() -> Self { |
| 106 | Self { |
| 107 | seq_len: 0, |
| 108 | tokens: Vec::new(), |
| 109 | embeddings: vec![0.0; BLOCK_SIZE * EMBED_DIM], |
| 110 | queries: vec![0.0; BLOCK_SIZE * EMBED_DIM], |
| 111 | k_cache: vec![0.0; BLOCK_SIZE * EMBED_DIM], |
| 112 | v_cache: vec![0.0; BLOCK_SIZE * EMBED_DIM], |
| 113 | att_outs: vec![0.0; BLOCK_SIZE * EMBED_DIM], |
| 114 | proj_outs: vec![0.0; BLOCK_SIZE * EMBED_DIM], |
| 115 | residual1: vec![0.0; BLOCK_SIZE * EMBED_DIM], |
| 116 | ffn_hidden: vec![0.0; BLOCK_SIZE * FFN_DIM], |
| 117 | ffn_relu: vec![0.0; BLOCK_SIZE * FFN_DIM], |
| 118 | ffn_out: vec![0.0; BLOCK_SIZE * EMBED_DIM], |
| 119 | residual2: vec![0.0; BLOCK_SIZE * EMBED_DIM], |
| 120 | logits: vec![0.0; BLOCK_SIZE * VOCAB_SIZE], |
| 121 | att_probs: vec![0.0; BLOCK_SIZE * NUM_HEADS * BLOCK_SIZE], |
| 122 | } |
| 123 | } |
| 124 | |
| 125 | /// Reset for a new sequence without reallocating. |
| 126 | pub fn clear(&mut self) { |
| 127 | self.seq_len = 0; |
| 128 | self.tokens.clear(); |
| 129 | // Zero out would be ideal but forward pass overwrites what it uses. |
| 130 | } |
| 131 | |
| 132 | // Accessors for a specific position's data. |
| 133 | |
| 134 | pub fn emb_at(&self, pos: usize) -> &[f32] { |
| 135 | &self.embeddings[pos * EMBED_DIM..(pos + 1) * EMBED_DIM] |
| 136 | } |
| 137 | pub fn query_at(&self, pos: usize) -> &[f32] { |
| 138 | &self.queries[pos * EMBED_DIM..(pos + 1) * EMBED_DIM] |
| 139 | } |
| 140 | pub fn k_at(&self, pos: usize) -> &[f32] { |
| 141 | &self.k_cache[pos * EMBED_DIM..(pos + 1) * EMBED_DIM] |
| 142 | } |
| 143 | pub fn v_at(&self, pos: usize) -> &[f32] { |
| 144 | &self.v_cache[pos * EMBED_DIM..(pos + 1) * EMBED_DIM] |
| 145 | } |
| 146 | pub fn att_out_at(&self, pos: usize) -> &[f32] { |
| 147 | &self.att_outs[pos * EMBED_DIM..(pos + 1) * EMBED_DIM] |
| 148 | } |
| 149 | pub fn res1_at(&self, pos: usize) -> &[f32] { |
| 150 | &self.residual1[pos * EMBED_DIM..(pos + 1) * EMBED_DIM] |
| 151 | } |
| 152 | pub fn ffn_hidden_at(&self, pos: usize) -> &[f32] { |
| 153 | &self.ffn_hidden[pos * FFN_DIM..(pos + 1) * FFN_DIM] |
| 154 | } |
| 155 | pub fn ffn_relu_at(&self, pos: usize) -> &[f32] { |
| 156 | &self.ffn_relu[pos * FFN_DIM..(pos + 1) * FFN_DIM] |
| 157 | } |
| 158 | pub fn res2_at(&self, pos: usize) -> &[f32] { |
| 159 | &self.residual2[pos * EMBED_DIM..(pos + 1) * EMBED_DIM] |
| 160 | } |
| 161 | pub fn logits_at(&self, pos: usize) -> &[f32] { |
| 162 | &self.logits[pos * VOCAB_SIZE..(pos + 1) * VOCAB_SIZE] |
| 163 | } |
| 164 | |
| 165 | /// Attention prob for position `pos`, head `h`, attending to position `i`. |
| 166 | pub fn att_prob(&self, pos: usize, h: usize, i: usize) -> f32 { |
| 167 | self.att_probs[pos * NUM_HEADS * BLOCK_SIZE + h * BLOCK_SIZE + i] |
| 168 | } |
| 169 | } |
| 170 | |
| 171 | /// Run the forward pass for a full sequence of input tokens. |
| 172 | /// |
| 173 | /// Input tokens are the tokens to process (excluding the final target). |
| 174 | /// For a name like "emma", the encoded form is [BOS, e, m, m, a, BOS], |
| 175 | /// and the input tokens would be [BOS, e, m, m, a] (first 5 tokens). |
| 176 | /// The targets are [e, m, m, a, BOS] (last 5 tokens). |
| 177 | pub fn forward(params: &[f32], tokens: &[usize], cache: &mut ForwardCache) { |
| 178 | let seq_len = tokens.len(); |
| 179 | assert!(seq_len <= BLOCK_SIZE); |
| 180 | cache.seq_len = seq_len; |
| 181 | cache.tokens = tokens.to_vec(); |
| 182 | |
| 183 | let scale = 1.0 / (HEAD_DIM as f32).sqrt(); |
| 184 | |
| 185 | for t in 0..seq_len { |
| 186 | let token = tokens[t]; |
| 187 | |
| 188 | // 1. Token + position embeddings |
| 189 | let tok_emb = wte_row(params, token); |
| 190 | let pos_emb = wpe_row(params, t); |
| 191 | let emb = vec_add(tok_emb, pos_emb); |
| 192 | cache.embeddings[t * EMBED_DIM..(t + 1) * EMBED_DIM].copy_from_slice(&emb); |
| 193 | |
| 194 | // 2. Q, K, V projections |
| 195 | let q = vec_mat_mul(&emb, wq(params), EMBED_DIM, EMBED_DIM); |
| 196 | cache.queries[t * EMBED_DIM..(t + 1) * EMBED_DIM].copy_from_slice(&q); |
| 197 | |
| 198 | let k = vec_mat_mul(&emb, wk(params), EMBED_DIM, EMBED_DIM); |
| 199 | cache.k_cache[t * EMBED_DIM..(t + 1) * EMBED_DIM].copy_from_slice(&k); |
| 200 | |
| 201 | let v = vec_mat_mul(&emb, wv(params), EMBED_DIM, EMBED_DIM); |
| 202 | cache.v_cache[t * EMBED_DIM..(t + 1) * EMBED_DIM].copy_from_slice(&v); |
| 203 | |
| 204 | // 3. Multi-head causal self-attention |
| 205 | let mut att_out = vec![0.0f32; EMBED_DIM]; |
| 206 | for h in 0..NUM_HEADS { |
| 207 | let q_h = &q[h * HEAD_DIM..(h + 1) * HEAD_DIM]; |
| 208 | |
| 209 | // Compute attention scores for positions 0..t+1 |
| 210 | let mut scores = vec![0.0f32; t + 1]; |
| 211 | for i in 0..=t { |
| 212 | let k_i = &cache.k_cache[i * EMBED_DIM + h * HEAD_DIM..i * EMBED_DIM + (h + 1) * HEAD_DIM]; |
| 213 | let mut dot = 0.0; |
| 214 | for d in 0..HEAD_DIM { |
| 215 | dot += q_h[d] * k_i[d]; |
| 216 | } |
| 217 | scores[i] = dot * scale; |
| 218 | } |
| 219 | |
| 220 | // Softmax |
| 221 | let probs = softmax(&scores); |
| 222 | for i in 0..=t { |
| 223 | cache.att_probs[t * NUM_HEADS * BLOCK_SIZE + h * BLOCK_SIZE + i] = probs[i]; |
| 224 | } |
| 225 | |
| 226 | // Weighted sum of values |
| 227 | for i in 0..=t { |
| 228 | let v_i = &cache.v_cache[i * EMBED_DIM + h * HEAD_DIM..i * EMBED_DIM + (h + 1) * HEAD_DIM]; |
| 229 | for d in 0..HEAD_DIM { |
| 230 | att_out[h * HEAD_DIM + d] += probs[i] * v_i[d]; |
| 231 | } |
| 232 | } |
| 233 | } |
| 234 | cache.att_outs[t * EMBED_DIM..(t + 1) * EMBED_DIM].copy_from_slice(&att_out); |
| 235 | |
| 236 | // 4. Output projection + residual |
| 237 | let proj = vec_mat_mul(&att_out, wo(params), EMBED_DIM, EMBED_DIM); |
| 238 | cache.proj_outs[t * EMBED_DIM..(t + 1) * EMBED_DIM].copy_from_slice(&proj); |
| 239 | let res1 = vec_add(&emb, &proj); |
| 240 | cache.residual1[t * EMBED_DIM..(t + 1) * EMBED_DIM].copy_from_slice(&res1); |
| 241 | |
| 242 | // 5. FFN: ReLU(x @ W1) @ W2 |
| 243 | let hidden = vec_mat_mul(&res1, w1(params), EMBED_DIM, FFN_DIM); |
| 244 | cache.ffn_hidden[t * FFN_DIM..(t + 1) * FFN_DIM].copy_from_slice(&hidden); |
| 245 | let activated = relu(&hidden); |
| 246 | cache.ffn_relu[t * FFN_DIM..(t + 1) * FFN_DIM].copy_from_slice(&activated); |
| 247 | let ffn = vec_mat_mul(&activated, w2(params), FFN_DIM, EMBED_DIM); |
| 248 | cache.ffn_out[t * EMBED_DIM..(t + 1) * EMBED_DIM].copy_from_slice(&ffn); |
| 249 | |
| 250 | // 6. FFN residual |
| 251 | let res2 = vec_add(&res1, &ffn); |
| 252 | cache.residual2[t * EMBED_DIM..(t + 1) * EMBED_DIM].copy_from_slice(&res2); |
| 253 | |
| 254 | // 7. Output logits |
| 255 | let logits = vec_mat_mul(&res2, wout(params), EMBED_DIM, VOCAB_SIZE); |
| 256 | cache.logits[t * VOCAB_SIZE..(t + 1) * VOCAB_SIZE].copy_from_slice(&logits); |
| 257 | } |
| 258 | } |
| 259 | |
| 260 | /// Generate a name by autoregressive sampling. |
| 261 | pub fn generate(params: &[f32], cache: &mut ForwardCache, rng: &mut Rng, temperature: f32) -> String { |
| 262 | let mut tokens = vec![BOS]; |
| 263 | |
| 264 | for _ in 0..BLOCK_SIZE - 1 { |
| 265 | cache.clear(); |
| 266 | forward(params, &tokens, cache); |
| 267 | |
| 268 | let logits = cache.logits_at(tokens.len() - 1); |
| 269 | let scaled: Vec<f32> = logits.iter().map(|&l| l / temperature).collect(); |
| 270 | let probs = softmax(&scaled); |
| 271 | let next = rng.sample_from_probs(&probs); |
| 272 | |
| 273 | if next == BOS { |
| 274 | break; |
| 275 | } |
| 276 | tokens.push(next); |
| 277 | } |
| 278 | |
| 279 | use crate::tokenizer::decode; |
| 280 | tokens[1..].iter().map(|&t| decode(t)).collect() |
| 281 | } |
| 282 | |
| 283 | #[cfg(test)] |
| 284 | mod tests { |
| 285 | use super::*; |
| 286 | |
| 287 | #[test] |
| 288 | fn test_param_count() { |
| 289 | assert_eq!(NUM_PARAMS, 4192); |
| 290 | } |
| 291 | |
| 292 | #[test] |
| 293 | fn test_forward_smoke() { |
| 294 | let mut rng = Rng::new(42); |
| 295 | let mut params = vec![0.0f32; NUM_PARAMS]; |
| 296 | init_weights(&mut params, &mut rng); |
| 297 | |
| 298 | let tokens = vec![0, 5, 13]; // BOS, e, m |
| 299 | let mut cache = ForwardCache::new(); |
| 300 | forward(¶ms, &tokens, &mut cache); |
| 301 | |
| 302 | // Logits should be finite |
| 303 | for t in 0..3 { |
| 304 | let logits = cache.logits_at(t); |
| 305 | for &l in logits { |
| 306 | assert!(l.is_finite(), "non-finite logit"); |
| 307 | } |
| 308 | // Softmax of logits should sum to 1 |
| 309 | let probs = softmax(logits); |
| 310 | let sum: f32 = probs.iter().sum(); |
| 311 | assert!((sum - 1.0).abs() < 1e-5); |
| 312 | } |
| 313 | } |
| 314 | |
| 315 | #[test] |
| 316 | fn test_generate_smoke() { |
| 317 | let mut rng = Rng::new(42); |
| 318 | let mut params = vec![0.0f32; NUM_PARAMS]; |
| 319 | init_weights(&mut params, &mut rng); |
| 320 | |
| 321 | let mut cache = ForwardCache::new(); |
| 322 | let name = generate(¶ms, &mut cache, &mut rng, 1.0); |
| 323 | // Should produce some non-empty string of lowercase letters |
| 324 | assert!(name.len() > 0); |
| 325 | assert!(name.chars().all(|c| c >= 'a' && c <= 'z')); |
| 326 | } |
| 327 | } |
| 1 | /// Manual backward pass for the GPT model. |
| 2 | /// |
| 3 | /// Computes parameter gradients by backpropagating through each position in |
| 4 | /// reverse order. The tricky part is attention: position t's query attends to |
| 5 | /// all keys/values at positions 0..t, so key/value gradients accumulate |
| 6 | /// contributions from multiple future positions. |
| 7 | |
| 8 | use crate::model::*; |
| 9 | use crate::tensor::*; |
| 10 | |
| 11 | /// Backpropagate through the full sequence, accumulating gradients. |
| 12 | /// |
| 13 | /// Returns the average cross-entropy loss over the sequence. |
| 14 | /// `targets[t]` is the target token for position t (i.e., `tokens[t+1]` in the |
| 15 | /// encoded name). `grads` must be pre-zeroed. |
| 16 | pub fn backward( |
| 17 | params: &[f32], |
| 18 | cache: &ForwardCache, |
| 19 | targets: &[usize], |
| 20 | grads: &mut [f32], |
| 21 | ) -> f32 { |
| 22 | let seq_len = cache.seq_len; |
| 23 | assert_eq!(targets.len(), seq_len); |
| 24 | let scale = 1.0 / (HEAD_DIM as f32).sqrt(); |
| 25 | let inv_seq_len = 1.0 / seq_len as f32; |
| 26 | |
| 27 | // Accumulated key/value gradients: d_k_cache[pos] and d_v_cache[pos] collect |
| 28 | // contributions from all positions that attend to them. |
| 29 | let mut d_k_cache = vec![0.0f32; seq_len * EMBED_DIM]; |
| 30 | let mut d_v_cache = vec![0.0f32; seq_len * EMBED_DIM]; |
| 31 | |
| 32 | let mut total_loss = 0.0f32; |
| 33 | |
| 34 | // Process positions in reverse so that when we reach position t, |
| 35 | // d_k_cache[t] and d_v_cache[t] are complete. |
| 36 | for t in (0..seq_len).rev() { |
| 37 | let target = targets[t]; |
| 38 | |
| 39 | // --- Cross-entropy loss gradient --- |
| 40 | let logits = cache.logits_at(t); |
| 41 | let probs = softmax(logits); |
| 42 | total_loss += -probs[target].max(1e-10).ln(); |
| 43 | |
| 44 | // d_logits = (probs - one_hot(target)) / seq_len |
| 45 | let mut d_logits = probs; |
| 46 | d_logits[target] -= 1.0; |
| 47 | for v in d_logits.iter_mut() { |
| 48 | *v *= inv_seq_len; |
| 49 | } |
| 50 | |
| 51 | // --- Output projection backward: logits = res2 @ Wout --- |
| 52 | let res2 = cache.res2_at(t); |
| 53 | let d_res2 = vec_mat_mul_backward_x(&d_logits, wout(params), EMBED_DIM, VOCAB_SIZE); |
| 54 | vec_mat_mul_backward_w(res2, &d_logits, &mut grads[WOUT_OFFSET..], EMBED_DIM, VOCAB_SIZE); |
| 55 | |
| 56 | // --- FFN residual backward: res2 = res1 + ffn_out --- |
| 57 | // d_res1 and d_ffn_out both receive d_res2 |
| 58 | let mut d_res1 = d_res2.clone(); |
| 59 | let d_ffn_out = d_res2; |
| 60 | |
| 61 | // --- FFN down-projection backward: ffn_out = ffn_relu @ W2 --- |
| 62 | let ffn_relu = cache.ffn_relu_at(t); |
| 63 | let d_ffn_relu = vec_mat_mul_backward_x(&d_ffn_out, w2(params), FFN_DIM, EMBED_DIM); |
| 64 | vec_mat_mul_backward_w(ffn_relu, &d_ffn_out, &mut grads[W2_OFFSET..], FFN_DIM, EMBED_DIM); |
| 65 | |
| 66 | // --- ReLU backward --- |
| 67 | let ffn_hidden = cache.ffn_hidden_at(t); |
| 68 | let d_ffn_hidden: Vec<f32> = d_ffn_relu.iter().zip(ffn_hidden.iter()) |
| 69 | .map(|(&dg, &h)| if h > 0.0 { dg } else { 0.0 }) |
| 70 | .collect(); |
| 71 | |
| 72 | // --- FFN up-projection backward: ffn_hidden = res1 @ W1 --- |
| 73 | let res1 = cache.res1_at(t); |
| 74 | let d_res1_from_ffn = vec_mat_mul_backward_x(&d_ffn_hidden, w1(params), EMBED_DIM, FFN_DIM); |
| 75 | vec_mat_mul_backward_w(res1, &d_ffn_hidden, &mut grads[W1_OFFSET..], EMBED_DIM, FFN_DIM); |
| 76 | vec_add_inplace(&mut d_res1, &d_res1_from_ffn); |
| 77 | |
| 78 | // --- Attention residual backward: res1 = emb + proj --- |
| 79 | let d_proj = d_res1.clone(); |
| 80 | let mut d_emb = d_res1; // emb receives same gradient from residual |
| 81 | |
| 82 | // --- Output projection backward: proj = att_out @ Wo --- |
| 83 | let att_out = cache.att_out_at(t); |
| 84 | let d_att_out = vec_mat_mul_backward_x(&d_proj, wo(params), EMBED_DIM, EMBED_DIM); |
| 85 | vec_mat_mul_backward_w(att_out, &d_proj, &mut grads[WO_OFFSET..], EMBED_DIM, EMBED_DIM); |
| 86 | |
| 87 | // --- Multi-head attention backward --- |
| 88 | let mut d_q = vec![0.0f32; EMBED_DIM]; |
| 89 | |
| 90 | for h in 0..NUM_HEADS { |
| 91 | let ho = h * HEAD_DIM; |
| 92 | let d_out_h = &d_att_out[ho..ho + HEAD_DIM]; |
| 93 | let q_h = &cache.query_at(t)[ho..ho + HEAD_DIM]; |
| 94 | |
| 95 | // out_h = sum_i probs[i] * v_cache_h[i] |
| 96 | for i in 0..=t { |
| 97 | let v_h = &cache.v_cache[i * EMBED_DIM + ho..i * EMBED_DIM + ho + HEAD_DIM]; |
| 98 | let prob = cache.att_prob(t, h, i); |
| 99 | |
| 100 | // d_probs[i] = dot(d_out_h, v_h) |
| 101 | let d_prob_i = vec_dot(d_out_h, v_h); |
| 102 | |
| 103 | // d_v_cache_h[i] += prob * d_out_h |
| 104 | for d in 0..HEAD_DIM { |
| 105 | d_v_cache[i * EMBED_DIM + ho + d] += prob * d_out_h[d]; |
| 106 | } |
| 107 | |
| 108 | // We'll collect d_probs and do softmax backward below. |
| 109 | // For now, just store d_prob_i temporarily by recomputing in the next loop. |
| 110 | let _ = d_prob_i; // used below |
| 111 | } |
| 112 | |
| 113 | // Softmax backward: d_scores = probs * (d_probs - sum(probs * d_probs)) |
| 114 | // Recompute d_probs (small loop, avoids allocation). |
| 115 | let mut d_probs = vec![0.0f32; t + 1]; |
| 116 | for i in 0..=t { |
| 117 | let v_h = &cache.v_cache[i * EMBED_DIM + ho..i * EMBED_DIM + ho + HEAD_DIM]; |
| 118 | d_probs[i] = vec_dot(d_out_h, v_h); |
| 119 | } |
| 120 | |
| 121 | let mut dot_sum = 0.0f32; |
| 122 | for i in 0..=t { |
| 123 | dot_sum += cache.att_prob(t, h, i) * d_probs[i]; |
| 124 | } |
| 125 | |
| 126 | for i in 0..=t { |
| 127 | let prob = cache.att_prob(t, h, i); |
| 128 | let d_score = prob * (d_probs[i] - dot_sum) * scale; |
| 129 | |
| 130 | // scores[i] = dot(q_h, k_h[i]) * scale |
| 131 | // d_q_h += d_score * k_h[i] (scale already applied above) |
| 132 | let k_h = &cache.k_cache[i * EMBED_DIM + ho..i * EMBED_DIM + ho + HEAD_DIM]; |
| 133 | for d in 0..HEAD_DIM { |
| 134 | d_q[ho + d] += d_score * k_h[d]; |
| 135 | d_k_cache[i * EMBED_DIM + ho + d] += d_score * q_h[d]; |
| 136 | } |
| 137 | } |
| 138 | } |
| 139 | |
| 140 | // --- Q projection backward: q = emb @ Wq --- |
| 141 | let emb = cache.emb_at(t); |
| 142 | let d_emb_from_q = vec_mat_mul_backward_x(&d_q, wq(params), EMBED_DIM, EMBED_DIM); |
| 143 | vec_mat_mul_backward_w(emb, &d_q, &mut grads[WQ_OFFSET..], EMBED_DIM, EMBED_DIM); |
| 144 | vec_add_inplace(&mut d_emb, &d_emb_from_q); |
| 145 | |
| 146 | // --- K, V projection backward (d_k_cache[t] and d_v_cache[t] are now complete) --- |
| 147 | let d_k_t = &d_k_cache[t * EMBED_DIM..(t + 1) * EMBED_DIM]; |
| 148 | let d_emb_from_k = vec_mat_mul_backward_x(d_k_t, wk(params), EMBED_DIM, EMBED_DIM); |
| 149 | vec_mat_mul_backward_w(emb, d_k_t, &mut grads[WK_OFFSET..], EMBED_DIM, EMBED_DIM); |
| 150 | vec_add_inplace(&mut d_emb, &d_emb_from_k); |
| 151 | |
| 152 | let d_v_t = &d_v_cache[t * EMBED_DIM..(t + 1) * EMBED_DIM]; |
| 153 | let d_emb_from_v = vec_mat_mul_backward_x(d_v_t, wv(params), EMBED_DIM, EMBED_DIM); |
| 154 | vec_mat_mul_backward_w(emb, d_v_t, &mut grads[WV_OFFSET..], EMBED_DIM, EMBED_DIM); |
| 155 | vec_add_inplace(&mut d_emb, &d_emb_from_v); |
| 156 | |
| 157 | // --- Embedding backward: emb = wte[token] + wpe[pos] --- |
| 158 | let token = cache.tokens[t]; |
| 159 | let wte_start = WTE_OFFSET + token * EMBED_DIM; |
| 160 | for d in 0..EMBED_DIM { |
| 161 | grads[wte_start + d] += d_emb[d]; |
| 162 | } |
| 163 | let wpe_start = WPE_OFFSET + t * EMBED_DIM; |
| 164 | for d in 0..EMBED_DIM { |
| 165 | grads[wpe_start + d] += d_emb[d]; |
| 166 | } |
| 167 | } |
| 168 | |
| 169 | total_loss * inv_seq_len |
| 170 | } |
| 171 | |
| 172 | #[cfg(test)] |
| 173 | mod tests { |
| 174 | use super::*; |
| 175 | use crate::rng::Rng; |
| 176 | |
| 177 | /// Numerical gradient check: compare analytic gradients from backward() |
| 178 | /// against finite-difference approximations for every parameter. |
| 179 | #[test] |
| 180 | fn test_gradient_check() { |
| 181 | let mut rng = Rng::new(42); |
| 182 | let mut params = vec![0.0f32; NUM_PARAMS]; |
| 183 | init_weights(&mut params, &mut rng); |
| 184 | |
| 185 | // Short sequence for faster checking |
| 186 | let input_tokens = vec![0, 5, 13]; // BOS, e, m |
| 187 | let targets = vec![5, 13, 0]; // e, m, BOS |
| 188 | |
| 189 | // Analytic gradients |
| 190 | let mut cache = ForwardCache::new(); |
| 191 | forward(¶ms, &input_tokens, &mut cache); |
| 192 | let mut grads = vec![0.0f32; NUM_PARAMS]; |
| 193 | let loss = backward(¶ms, &cache, &targets, &mut grads); |
| 194 | assert!(loss.is_finite()); |
| 195 | |
| 196 | // Numerical gradients for a random subset of parameters |
| 197 | let eps = 1e-3; |
| 198 | let mut max_err = 0.0f32; |
| 199 | let mut checked = 0; |
| 200 | let check_indices: Vec<usize> = (0..NUM_PARAMS).step_by(17).collect(); |
| 201 | for &i in &check_indices { |
| 202 | let mut p_plus = params.clone(); |
| 203 | p_plus[i] += eps; |
| 204 | let mut c_plus = ForwardCache::new(); |
| 205 | forward(&p_plus, &input_tokens, &mut c_plus); |
| 206 | let mut g_dummy = vec![0.0f32; NUM_PARAMS]; |
| 207 | let loss_plus = backward(&p_plus, &c_plus, &targets, &mut g_dummy); |
| 208 | |
| 209 | let mut p_minus = params.clone(); |
| 210 | p_minus[i] -= eps; |
| 211 | let mut c_minus = ForwardCache::new(); |
| 212 | forward(&p_minus, &input_tokens, &mut c_minus); |
| 213 | let mut g_dummy2 = vec![0.0f32; NUM_PARAMS]; |
| 214 | let loss_minus = backward(&p_minus, &c_minus, &targets, &mut g_dummy2); |
| 215 | |
| 216 | let numerical = (loss_plus - loss_minus) / (2.0 * eps); |
| 217 | let analytic = grads[i]; |
| 218 | let err = (analytic - numerical).abs() / (analytic.abs() + numerical.abs() + 1e-8); |
| 219 | if err > max_err { |
| 220 | max_err = err; |
| 221 | } |
| 222 | if err > 0.01 { |
| 223 | panic!( |
| 224 | "Gradient check failed at param {}: analytic={:.6}, numerical={:.6}, rel_err={:.6}", |
| 225 | i, analytic, numerical, err |
| 226 | ); |
| 227 | } |
| 228 | checked += 1; |
| 229 | } |
| 230 | assert!(checked > 100, "checked too few params: {}", checked); |
| 231 | eprintln!("Gradient check passed: {} params checked, max relative error = {:.6}", checked, max_err); |
| 232 | } |
| 233 | } |
| 1 | mod backward; |
| 2 | mod model; |
| 3 | mod optimizer; |
| 4 | mod rng; |
| 5 | mod tensor; |
| 6 | mod tokenizer; |
| 7 | |
| 8 | use model::{ForwardCache, NUM_PARAMS}; |
| 9 | use optimizer::Adam; |
| 10 | use rng::Rng; |
| 11 | |
| 12 | const TRAINING_STEPS: usize = 1000; |
| 13 | const LEARNING_RATE: f32 = 0.01; |
| 14 | const TEMPERATURE: f32 = 0.8; |
| 15 | const NUM_SAMPLES: usize = 20; |
| 16 | |
| 17 | fn main() { |
| 18 | // ESP-IDF boilerplate: link patches and initialize logging. |
| 19 | #[cfg(target_arch = "xtensa")] |
| 20 | { |
| 21 | esp_idf_svc::sys::link_patches(); |
| 22 | esp_idf_svc::log::EspLogger::initialize_default(); |
| 23 | } |
| 24 | |
| 25 | #[cfg(not(target_arch = "xtensa"))] |
| 26 | { |
| 27 | env_logger::init(); |
| 28 | } |
| 29 | |
| 30 | log::info!("=== esp32gpt: microgpt in Rust on ESP32 ==="); |
| 31 | log::info!("Model: {} params, {} heads, {}-dim embeddings", |
| 32 | NUM_PARAMS, model::NUM_HEADS, model::EMBED_DIM); |
| 33 | |
| 34 | // --- Dataset --- |
| 35 | let dataset: &str = include_str!("../data/names.txt"); |
| 36 | let num_names = dataset.lines().filter(|l| !l.is_empty()).count(); |
| 37 | log::info!("Dataset: {} names loaded from flash", num_names); |
| 38 | |
| 39 | // --- Initialize --- |
| 40 | let mut rng = Rng::new(42); |
| 41 | let mut params = vec![0.0f32; NUM_PARAMS]; |
| 42 | model::init_weights(&mut params, &mut rng); |
| 43 | |
| 44 | let mut adam = Adam::new(NUM_PARAMS, 0.85, 0.99); |
| 45 | let mut grads = vec![0.0f32; NUM_PARAMS]; |
| 46 | let mut cache = ForwardCache::new(); |
| 47 | |
| 48 | // --- Training --- |
| 49 | log::info!("Training for {} steps...", TRAINING_STEPS); |
| 50 | |
| 51 | for step in 0..TRAINING_STEPS { |
| 52 | // Pick a random name |
| 53 | let name_idx = rng.random_index(num_names); |
| 54 | let name = dataset.lines() |
| 55 | .filter(|l| !l.is_empty()) |
| 56 | .nth(name_idx) |
| 57 | .unwrap(); |
| 58 | let encoded = tokenizer::encode(name); |
| 59 | |
| 60 | // Skip names that are too long for our block size |
| 61 | if encoded.len() > model::BLOCK_SIZE + 1 { |
| 62 | continue; |
| 63 | } |
| 64 | |
| 65 | // Input = all tokens except last, targets = all tokens except first |
| 66 | let input_tokens = &encoded[..encoded.len() - 1]; |
| 67 | let target_tokens: Vec<usize> = encoded[1..].to_vec(); |
| 68 | |
| 69 | // Forward |
| 70 | cache.clear(); |
| 71 | model::forward(¶ms, input_tokens, &mut cache); |
| 72 | |
| 73 | // Backward |
| 74 | grads.iter_mut().for_each(|g| *g = 0.0); |
| 75 | let loss = backward::backward(¶ms, &cache, &target_tokens, &mut grads); |
| 76 | |
| 77 | // Optimizer step with linear LR decay |
| 78 | let lr = LEARNING_RATE * (1.0 - step as f32 / TRAINING_STEPS as f32); |
| 79 | adam.step(&mut params, &grads, lr, step + 1); |
| 80 | |
| 81 | if step % 100 == 0 || step == TRAINING_STEPS - 1 { |
| 82 | log::info!("step {:>4}/{}: loss = {:.4}", step, TRAINING_STEPS, loss); |
| 83 | } |
| 84 | } |
| 85 | |
| 86 | // --- Inference --- |
| 87 | log::info!("--- Generated names (temperature={}) ---", TEMPERATURE); |
| 88 | for i in 0..NUM_SAMPLES { |
| 89 | let name = model::generate(¶ms, &mut cache, &mut rng, TEMPERATURE); |
| 90 | log::info!(" {:>2}. {}", i + 1, name); |
| 91 | } |
| 92 | |
| 93 | log::info!("=== Done ==="); |
| 94 | |
| 95 | // On ESP32, halt rather than returning (prevents watchdog reset). |
| 96 | #[cfg(target_arch = "xtensa")] |
| 97 | loop { |
| 98 | std::thread::sleep(std::time::Duration::from_secs(3600)); |
| 99 | } |
| 100 | } |
| 1 | /// Manual backward pass for the GPT model. |
| 2 | /// |
| 3 | /// Computes parameter gradients by backpropagating through each position in |
| 4 | /// reverse order. The tricky part is attention: position t's query attends to |
| 5 | /// all keys/values at positions 0..t, so key/value gradients accumulate |
| 6 | /// contributions from multiple future positions. |
| 7 | |
| 8 | use crate::model::*; |
| 9 | use crate::tensor::*; |
| 10 | use crate::tokenizer::VOCAB_SIZE; |
| 11 | |
| 12 | /// Backpropagate through the full sequence, accumulating gradients. |
| 13 | /// |
| 14 | /// Returns the average cross-entropy loss over the sequence. |
| 15 | /// `targets[t]` is the target token for position t (i.e., `tokens[t+1]` in the |
| 16 | /// encoded name). `grads` must be pre-zeroed. |
| 17 | pub fn backward( |
| 18 | params: &[f32], |
| 19 | cache: &ForwardCache, |
| 20 | targets: &[usize], |
| 21 | grads: &mut [f32], |
| 22 | ) -> f32 { |
| 23 | let seq_len = cache.seq_len; |
| 24 | assert_eq!(targets.len(), seq_len); |
| 25 | let scale = 1.0 / (HEAD_DIM as f32).sqrt(); |
| 26 | let inv_seq_len = 1.0 / seq_len as f32; |
| 27 | |
| 28 | // Accumulated key/value gradients: d_k_cache[pos] and d_v_cache[pos] collect |
| 29 | // contributions from all positions that attend to them. |
| 30 | let mut d_k_cache = vec![0.0f32; seq_len * EMBED_DIM]; |
| 31 | let mut d_v_cache = vec![0.0f32; seq_len * EMBED_DIM]; |
| 32 | |
| 33 | let mut total_loss = 0.0f32; |
| 34 | |
| 35 | // Process positions in reverse so that when we reach position t, |
| 36 | // d_k_cache[t] and d_v_cache[t] are complete. |
| 37 | for t in (0..seq_len).rev() { |
| 38 | let target = targets[t]; |
| 39 | |
| 40 | // --- Cross-entropy loss gradient --- |
| 41 | let logits = cache.logits_at(t); |
| 42 | let probs = softmax(logits); |
| 43 | total_loss += -probs[target].max(1e-10).ln(); |
| 44 | |
| 45 | // d_logits = (probs - one_hot(target)) / seq_len |
| 46 | let mut d_logits = probs; |
| 47 | d_logits[target] -= 1.0; |
| 48 | for v in d_logits.iter_mut() { |
| 49 | *v *= inv_seq_len; |
| 50 | } |
| 51 | |
| 52 | // --- Output projection backward: logits = res2 @ Wout --- |
| 53 | let res2 = cache.res2_at(t); |
| 54 | let d_res2 = vec_mat_mul_backward_x(&d_logits, wout(params), EMBED_DIM, VOCAB_SIZE); |
| 55 | vec_mat_mul_backward_w(res2, &d_logits, &mut grads[WOUT_OFFSET..], EMBED_DIM, VOCAB_SIZE); |
| 56 | |
| 57 | // --- FFN residual backward: res2 = res1 + ffn_out --- |
| 58 | // d_res1 and d_ffn_out both receive d_res2 |
| 59 | let mut d_res1 = d_res2.clone(); |
| 60 | let d_ffn_out = d_res2; |
| 61 | |
| 62 | // --- FFN down-projection backward: ffn_out = ffn_relu @ W2 --- |
| 63 | let ffn_relu = cache.ffn_relu_at(t); |
| 64 | let d_ffn_relu = vec_mat_mul_backward_x(&d_ffn_out, w2(params), FFN_DIM, EMBED_DIM); |
| 65 | vec_mat_mul_backward_w(ffn_relu, &d_ffn_out, &mut grads[W2_OFFSET..], FFN_DIM, EMBED_DIM); |
| 66 | |
| 67 | // --- ReLU backward --- |
| 68 | let ffn_hidden = cache.ffn_hidden_at(t); |
| 69 | let d_ffn_hidden: Vec<f32> = d_ffn_relu.iter().zip(ffn_hidden.iter()) |
| 70 | .map(|(&dg, &h)| if h > 0.0 { dg } else { 0.0 }) |
| 71 | .collect(); |
| 72 | |
| 73 | // --- FFN up-projection backward: ffn_hidden = res1 @ W1 --- |
| 74 | let res1 = cache.res1_at(t); |
| 75 | let d_res1_from_ffn = vec_mat_mul_backward_x(&d_ffn_hidden, w1(params), EMBED_DIM, FFN_DIM); |
| 76 | vec_mat_mul_backward_w(res1, &d_ffn_hidden, &mut grads[W1_OFFSET..], EMBED_DIM, FFN_DIM); |
| 77 | vec_add_inplace(&mut d_res1, &d_res1_from_ffn); |
| 78 | |
| 79 | // --- Attention residual backward: res1 = emb + proj --- |
| 80 | let d_proj = d_res1.clone(); |
| 81 | let mut d_emb = d_res1; // emb receives same gradient from residual |
| 82 | |
| 83 | // --- Output projection backward: proj = att_out @ Wo --- |
| 84 | let att_out = cache.att_out_at(t); |
| 85 | let d_att_out = vec_mat_mul_backward_x(&d_proj, wo(params), EMBED_DIM, EMBED_DIM); |
| 86 | vec_mat_mul_backward_w(att_out, &d_proj, &mut grads[WO_OFFSET..], EMBED_DIM, EMBED_DIM); |
| 87 | |
| 88 | // --- Multi-head attention backward --- |
| 89 | let mut d_q = vec![0.0f32; EMBED_DIM]; |
| 90 | |
| 91 | for h in 0..NUM_HEADS { |
| 92 | let ho = h * HEAD_DIM; |
| 93 | let d_out_h = &d_att_out[ho..ho + HEAD_DIM]; |
| 94 | let q_h = &cache.query_at(t)[ho..ho + HEAD_DIM]; |
| 95 | |
| 96 | // out_h = sum_i probs[i] * v_cache_h[i] |
| 97 | for i in 0..=t { |
| 98 | let v_h = &cache.v_cache[i * EMBED_DIM + ho..i * EMBED_DIM + ho + HEAD_DIM]; |
| 99 | let prob = cache.att_prob(t, h, i); |
| 100 | |
| 101 | // d_probs[i] = dot(d_out_h, v_h) |
| 102 | let d_prob_i = vec_dot(d_out_h, v_h); |
| 103 | |
| 104 | // d_v_cache_h[i] += prob * d_out_h |
| 105 | for d in 0..HEAD_DIM { |
| 106 | d_v_cache[i * EMBED_DIM + ho + d] += prob * d_out_h[d]; |
| 107 | } |
| 108 | |
| 109 | // We'll collect d_probs and do softmax backward below. |
| 110 | // For now, just store d_prob_i temporarily by recomputing in the next loop. |
| 111 | let _ = d_prob_i; // used below |
| 112 | } |
| 113 | |
| 114 | // Softmax backward: d_scores = probs * (d_probs - sum(probs * d_probs)) |
| 115 | // Recompute d_probs (small loop, avoids allocation). |
| 116 | let mut d_probs = vec![0.0f32; t + 1]; |
| 117 | for i in 0..=t { |
| 118 | let v_h = &cache.v_cache[i * EMBED_DIM + ho..i * EMBED_DIM + ho + HEAD_DIM]; |
| 119 | d_probs[i] = vec_dot(d_out_h, v_h); |
| 120 | } |
| 121 | |
| 122 | let mut dot_sum = 0.0f32; |
| 123 | for i in 0..=t { |
| 124 | dot_sum += cache.att_prob(t, h, i) * d_probs[i]; |
| 125 | } |
| 126 | |
| 127 | for i in 0..=t { |
| 128 | let prob = cache.att_prob(t, h, i); |
| 129 | let d_score = prob * (d_probs[i] - dot_sum) * scale; |
| 130 | |
| 131 | // scores[i] = dot(q_h, k_h[i]) * scale |
| 132 | // d_q_h += d_score * k_h[i] (scale already applied above) |
| 133 | let k_h = &cache.k_cache[i * EMBED_DIM + ho..i * EMBED_DIM + ho + HEAD_DIM]; |
| 134 | for d in 0..HEAD_DIM { |
| 135 | d_q[ho + d] += d_score * k_h[d]; |
| 136 | d_k_cache[i * EMBED_DIM + ho + d] += d_score * q_h[d]; |
| 137 | } |
| 138 | } |
| 139 | } |
| 140 | |
| 141 | // --- Q projection backward: q = emb @ Wq --- |
| 142 | let emb = cache.emb_at(t); |
| 143 | let d_emb_from_q = vec_mat_mul_backward_x(&d_q, wq(params), EMBED_DIM, EMBED_DIM); |
| 144 | vec_mat_mul_backward_w(emb, &d_q, &mut grads[WQ_OFFSET..], EMBED_DIM, EMBED_DIM); |
| 145 | vec_add_inplace(&mut d_emb, &d_emb_from_q); |
| 146 | |
| 147 | // --- K, V projection backward (d_k_cache[t] and d_v_cache[t] are now complete) --- |
| 148 | let d_k_t = &d_k_cache[t * EMBED_DIM..(t + 1) * EMBED_DIM]; |
| 149 | let d_emb_from_k = vec_mat_mul_backward_x(d_k_t, wk(params), EMBED_DIM, EMBED_DIM); |
| 150 | vec_mat_mul_backward_w(emb, d_k_t, &mut grads[WK_OFFSET..], EMBED_DIM, EMBED_DIM); |
| 151 | vec_add_inplace(&mut d_emb, &d_emb_from_k); |
| 152 | |
| 153 | let d_v_t = &d_v_cache[t * EMBED_DIM..(t + 1) * EMBED_DIM]; |
| 154 | let d_emb_from_v = vec_mat_mul_backward_x(d_v_t, wv(params), EMBED_DIM, EMBED_DIM); |
| 155 | vec_mat_mul_backward_w(emb, d_v_t, &mut grads[WV_OFFSET..], EMBED_DIM, EMBED_DIM); |
| 156 | vec_add_inplace(&mut d_emb, &d_emb_from_v); |
| 157 | |
| 158 | // --- Embedding backward: emb = wte[token] + wpe[pos] --- |
| 159 | let token = cache.tokens[t]; |
| 160 | let wte_start = WTE_OFFSET + token * EMBED_DIM; |
| 161 | for d in 0..EMBED_DIM { |
| 162 | grads[wte_start + d] += d_emb[d]; |
| 163 | } |
| 164 | let wpe_start = WPE_OFFSET + t * EMBED_DIM; |
| 165 | for d in 0..EMBED_DIM { |
| 166 | grads[wpe_start + d] += d_emb[d]; |
| 167 | } |
| 168 | } |
| 169 | |
| 170 | total_loss * inv_seq_len |
| 171 | } |
| 172 | |
| 173 | #[cfg(test)] |
| 174 | mod tests { |
| 175 | use super::*; |
| 176 | use crate::rng::Rng; |
| 177 | |
| 178 | /// Numerical gradient check: compare analytic gradients from backward() |
| 179 | /// against finite-difference approximations for every parameter. |
| 180 | #[test] |
| 181 | fn test_gradient_check() { |
| 182 | let mut rng = Rng::new(42); |
| 183 | let mut params = vec![0.0f32; NUM_PARAMS]; |
| 184 | init_weights(&mut params, &mut rng); |
| 185 | |
| 186 | // Short sequence for faster checking |
| 187 | let input_tokens = vec![0, 5, 13]; // BOS, e, m |
| 188 | let targets = vec![5, 13, 0]; // e, m, BOS |
| 189 | |
| 190 | // Analytic gradients |
| 191 | let mut cache = ForwardCache::new(); |
| 192 | forward(¶ms, &input_tokens, &mut cache); |
| 193 | let mut grads = vec![0.0f32; NUM_PARAMS]; |
| 194 | let loss = backward(¶ms, &cache, &targets, &mut grads); |
| 195 | assert!(loss.is_finite()); |
| 196 | |
| 197 | // Numerical gradients for a random subset of parameters |
| 198 | let eps = 1e-3; |
| 199 | let mut max_err = 0.0f32; |
| 200 | let mut checked = 0; |
| 201 | let check_indices: Vec<usize> = (0..NUM_PARAMS).step_by(17).collect(); |
| 202 | for &i in &check_indices { |
| 203 | let mut p_plus = params.clone(); |
| 204 | p_plus[i] += eps; |
| 205 | let mut c_plus = ForwardCache::new(); |
| 206 | forward(&p_plus, &input_tokens, &mut c_plus); |
| 207 | let mut g_dummy = vec![0.0f32; NUM_PARAMS]; |
| 208 | let loss_plus = backward(&p_plus, &c_plus, &targets, &mut g_dummy); |
| 209 | |
| 210 | let mut p_minus = params.clone(); |
| 211 | p_minus[i] -= eps; |
| 212 | let mut c_minus = ForwardCache::new(); |
| 213 | forward(&p_minus, &input_tokens, &mut c_minus); |
| 214 | let mut g_dummy2 = vec![0.0f32; NUM_PARAMS]; |
| 215 | let loss_minus = backward(&p_minus, &c_minus, &targets, &mut g_dummy2); |
| 216 | |
| 217 | let numerical = (loss_plus - loss_minus) / (2.0 * eps); |
| 218 | let analytic = grads[i]; |
| 219 | let err = (analytic - numerical).abs() / (analytic.abs() + numerical.abs() + 1e-8); |
| 220 | if err > max_err { |
| 221 | max_err = err; |
| 222 | } |
| 223 | if err > 0.01 { |
| 224 | panic!( |
| 225 | "Gradient check failed at param {}: analytic={:.6}, numerical={:.6}, rel_err={:.6}", |
| 226 | i, analytic, numerical, err |
| 227 | ); |
| 228 | } |
| 229 | checked += 1; |
| 230 | } |
| 231 | assert!(checked > 100, "checked too few params: {}", checked); |
| 232 | eprintln!("Gradient check passed: {} params checked, max relative error = {:.6}", checked, max_err); |
| 233 | } |
| 234 | } |
| 235 | |
| 1 | # esp32gpt |
| 2 | |
| 3 | A Rust port of Karpathy's [microgpt](https://karpathy.github.io/2026/02/12/microgpt/) that trains and runs inference **entirely on an ESP32**. |
| 4 | |
| 5 | The model learns to generate human-like names from scratch — no pre-trained weights, no cloud API, just 4,192 parameters training on a microcontroller. |
| 6 | |
| 7 | ``` |
| 8 | step 0/1000: loss = 3.3071 |
| 9 | step 100/1000: loss = 2.4193 |
| 10 | step 500/1000: loss = 1.9888 |
| 11 | step 999/1000: loss = 2.0980 |
| 12 | --- Generated names (temperature=0.8) --- |
| 13 | arona, raeli, cealin, malie, sunaya, arishel, mosile ... |
| 14 | ``` |
| 15 | |
| 16 | ## Architecture |
| 17 | |
| 18 | A 1-layer GPT transformer matching the original Python implementation: |
| 19 | |
| 20 | | | | |
| 21 | |---|---| |
| 22 | | Parameters | **4,192** (16.4 KB) | |
| 23 | | Embedding dim | 16 | |
| 24 | | Attention heads | 4 | |
| 25 | | Layers | 1 | |
| 26 | | Block size | 16 | |
| 27 | | Vocab | 27 tokens (a-z + BOS) | |
| 28 | | Optimizer | Adam (lr=0.01, beta1=0.85, beta2=0.99) | |
| 29 | | Training | 1,000 steps on 32K names | |
| 30 | |
| 31 | ### Why not just port the autograd engine? |
| 32 | |
| 33 | The Python microgpt uses a scalar-level autograd (one graph node per multiply/add). For a single forward pass, this creates ~30K-50K nodes consuming 1-2 MB — more than the ESP32's entire 520 KB of SRAM. |
| 34 | |
| 35 | Instead, this port uses **explicit matrix-level forward and backward passes**, storing only the activations needed for backprop (~25 KB). The backward pass is hand-derived and verified against numerical gradients. |
| 36 | |
| 37 | ## Memory budget |
| 38 | |
| 39 | | | | |
| 40 | |---|---| |
| 41 | | Model parameters | 17 KB | |
| 42 | | Gradients | 17 KB | |
| 43 | | Adam state (m + v) | 34 KB | |
| 44 | | Activation cache | 25 KB | |
| 45 | | Dataset (in flash, not RAM) | 0 KB | |
| 46 | | **Total SRAM** | **~100 KB** of ~300 KB available | |
| 47 | |
| 48 | ## Project structure |
| 49 | |
| 50 | ``` |
| 51 | src/ |
| 52 | main.rs Training loop + inference entry point |
| 53 | model.rs GPT forward pass, parameter layout, KV cache |
| 54 | backward.rs Manual backward pass with gradient accumulation |
| 55 | optimizer.rs Adam optimizer |
| 56 | tensor.rs Vector-matrix math primitives |
| 57 | tokenizer.rs Character-level encode/decode (a-z + BOS) |
| 58 | rng.rs Xorshift32 PRNG + Box-Muller for Gaussian init |
| 59 | data/ |
| 60 | names.txt 32K training names (embedded in flash via include_str!) |
| 61 | ``` |
| 62 | |
| 63 | ~1,000 lines of Rust. All core logic is platform-independent and testable on the host. |
| 64 | |
| 65 | ## Prerequisites |
| 66 | |
| 67 | - [Rust ESP32 toolchain](https://docs.esp-rs.org/book/) (`espup install`) |
| 68 | - `espflash` for flashing (`cargo install espflash`) |
| 69 | - An ESP32 dev board (any ESP32-WROOM-32 variant) |
| 70 | |
| 71 | ## Usage |
| 72 | |
| 73 | ```bash |
| 74 | # Run tests on host |
| 75 | make test |
| 76 | |
| 77 | # Build for ESP32 |
| 78 | make build |
| 79 | |
| 80 | # Flash and monitor serial output |
| 81 | make flash |
| 82 | |
| 83 | # Just monitor (already flashed) |
| 84 | make monitor |
| 85 | ``` |
| 86 | |
| 87 | ## Running on host (no ESP32 needed) |
| 88 | |
| 89 | The project compiles and runs natively for development: |
| 90 | |
| 91 | ```bash |
| 92 | RUST_LOG=info RUSTUP_TOOLCHAIN=stable cargo run --target aarch64-apple-darwin |
| 93 | ``` |
| 94 | |
| 95 | ## How it works |
| 96 | |
| 97 | **Training:** For each of 1,000 steps, a random name is sampled from the dataset, encoded as tokens, and fed through the transformer. The cross-entropy loss is backpropagated through every operation — attention, FFN, embeddings — and Adam updates the weights. |
| 98 | |
| 99 | **Inference:** Starting from the BOS token, the model autoregressively samples one character at a time (with temperature scaling) until it produces another BOS or hits the block size limit. |
| 100 | |
| 101 | **The hard part** is the attention backward pass: position *t*'s query attends to all keys/values at positions 0..*t*, so key and value gradients accumulate contributions from every future position. Processing backward through the sequence ensures each position's KV gradients are complete before they're used. |
| 1 | # esp32gpt |
| 2 | |
| 3 | Rust port of Karpathy's microgpt — a 1-layer GPT transformer (4,192 params) that trains from scratch and generates names, running entirely on an ESP32. |
| 4 | |
| 5 | ## Build & Run |
| 6 | |
| 7 | ```bash |
| 8 | make test # Host unit tests (stable toolchain, aarch64-apple-darwin) |
| 9 | make build # Cross-compile for ESP32 (esp toolchain, xtensa-esp32-espidf) |
| 10 | make flash # Build + flash + serial monitor |
| 11 | make monitor # Serial monitor only |
| 12 | ``` |
| 13 | |
| 14 | Host development (no ESP32 needed): |
| 15 | ```bash |
| 16 | RUST_LOG=info RUSTUP_TOOLCHAIN=stable cargo run --target aarch64-apple-darwin |
| 17 | ``` |
| 18 | |
| 19 | ## Architecture |
| 20 | |
| 21 | ### Dual-target design |
| 22 | The codebase compiles for both ESP32 and desktop. All ML logic (tensor, model, backward, optimizer, tokenizer, rng) is platform-independent. Only `main.rs` has `#[cfg(target_arch = "xtensa")]` blocks for ESP-IDF initialization vs `env_logger`. |
| 23 | |
| 24 | ### Module responsibilities |
| 25 | - `model.rs` — Parameter layout (flat `Vec<f32>`, 4192 elements), forward pass with KV cache, generation |
| 26 | - `backward.rs` — Manual backward pass. Processes positions in reverse; key/value gradients accumulate across positions |
| 27 | - `tensor.rs` — `vec_mat_mul`, `vec_mat_mul_backward_x/w`, `softmax`, `relu`, `vec_add` |
| 28 | - `optimizer.rs` — Adam with bias correction |
| 29 | - `tokenizer.rs` — 27-token vocab: BOS=0, a=1..z=26 |
| 30 | - `rng.rs` — Xorshift32 + Box-Muller for Gaussian init |
| 31 | |
| 32 | ### Parameter layout (flat buffer, row-major) |
| 33 | ``` |
| 34 | WTE [0..432) 27×16 token embeddings |
| 35 | WPE [432..688) 16×16 position embeddings |
| 36 | WQ [688..944) 16×16 query projection |
| 37 | WK [944..1200) 16×16 key projection |
| 38 | WV [1200..1456) 16×16 value projection |
| 39 | WO [1456..1712) 16×16 output projection |
| 40 | W1 [1712..2736) 16×64 FFN up |
| 41 | W2 [2736..3760) 64×16 FFN down |
| 42 | WOUT [3760..4192) 16×27 output head |
| 43 | ``` |
| 44 | |
| 45 | ### Forward pass flow (per position t) |
| 46 | 1. `emb = wte[token] + wpe[t]` |
| 47 | 2. `q = emb @ Wq`, `k = emb @ Wk` (append to cache), `v = emb @ Wv` (append to cache) |
| 48 | 3. Multi-head attention: split into 4 heads, causal softmax, weighted value sum |
| 49 | 4. `proj = att_out @ Wo`, `res1 = emb + proj` |
| 50 | 5. `ffn = relu(res1 @ W1) @ W2`, `res2 = res1 + ffn` |
| 51 | 6. `logits = res2 @ Wout` |
| 52 | |
| 53 | No layer norm, no bias terms. |
| 54 | |
| 55 | ### Memory budget (~300KB SRAM available with WiFi disabled) |
| 56 | - Params + grads + Adam state: ~68 KB |
| 57 | - Activation cache: ~25 KB |
| 58 | - Dataset stays in flash (`include_str!`), not RAM |
| 59 | |
| 60 | ## ESP32 toolchain |
| 61 | |
| 62 | - **Target**: `xtensa-esp32-espidf` |
| 63 | - **ESP-IDF**: v5.3 (managed by embuild) |
| 64 | - **Toolchain**: `esp` rustup toolchain (not nightly) |
| 65 | - **Linker**: `ldproxy` |
| 66 | - **Runner**: `espflash flash --monitor` |
| 67 | - **build-std**: `["std", "panic_abort"]` |
| 68 | - **opt-level**: 2 for both dev and release (required for ESP-IDF) |
| 69 | - WiFi is disabled in `sdkconfig.defaults` to free SRAM |
| 70 | - Main task stack: 32KB (for matrix ops) |
| 71 | - Single factory partition, no OTA |
| 72 | |
| 73 | ## Testing |
| 74 | |
| 75 | `make test` runs 14 unit tests on the host: |
| 76 | - Tensor ops + numerical gradient check |
| 77 | - Forward pass smoke test |
| 78 | - Backward pass gradient verification (combined absolute/relative error check with eps=0.01) |
| 79 | - Tokenizer encode/decode roundtrip |
| 80 | - RNG distribution checks |
| 81 | - Adam convergence test |
| 82 | - Generation smoke test |
| 83 | |
| 84 | The gradient check uses `eps=0.01` (not 1e-3) because f32 precision at opt-level=2 causes loss deltas near machine epsilon with smaller perturbations. |
| 85 | |
| 86 | ## Conventions |
| 87 | |
| 88 | - Weight matrices are stored flat in row-major order: `M[i][j] = slice[i * cols + j]` |
| 89 | - `vec_mat_mul(x, w, n, m)` computes `y = x @ W` where x:[n], W:[n×m], y:[m] |
| 90 | - Gradient accumulation uses bounded slices: `grads[OFFSET..OFFSET + SIZE]`, not `grads[OFFSET..]` |
| 91 | - The `ForwardCache` is pre-allocated once and reused via `cache.clear()` to avoid heap fragmentation |