Akshay's Expedition Logs

Deep dives into distributed systems, ML infrastructure, and the architectures that make AI at scale actually work

Featured

11 min read•2026-07-04

Harness Engineering: The Model Is Half the Car

In 2026 F1 made the engine only half the car. AI had the same year: the model stopped being the differentiator. Harness engineering is building the other half.

Codex

10 min read•2026-04-27

Eight Friends at the Buffet

One craving per word isn't enough. Multi-head attention sends eight specialized versions of every word to eight parallel buffets — and the whole upgrade is free, parameter for parameter, with one of the most beautiful tricks in the architecture.

Codex

9 min read•2026-04-19

The Seat Ribbons

Self-attention treats a sentence as a bag of words with no order. The fix — the one that powered the original Transformer — was to blend a unique positional pattern into every word before the buffet.

Codex

14 min read•2026-04-18

Attention is a Potluck

A tactile, phone-readable walkthrough of the single mechanism that makes every modern LLM actually understand language — no matrices first, no jargon, no mush.

Akshay's Expedition Logs

Featured

Harness Engineering: The Model Is Half the Car

Recent Posts

Eight Friends at the Buffet

The Seat Ribbons

Attention is a Potluck