hamza
aboutwritingcontact

Writing

Thoughts on technology, creativity, and the things I'm learning along the way.

SalesBench: The Long-Horizon Agent-to-Agent Eval

May 14, 2026

A long-horizon RL environment where a small model learns to manage an insurance sales pipeline against an LLM buyer, scored by revenue closed instead of by an LLM judge. The trained model vastly outperforms the untrained base, and the gap widens as the eval gets harder.

12 min read

The Agent Research Loop

March 16, 2026

What Karpathy's autoresearch really means, where agent systems are headed, and an open-source harness that ran 550 experiments over a weekend.

8 min read

I Run a Personal AI Agent 24/7 on a Mac Mini. Here's How It Actually Works.

Mar 7, 2026

A Mac Mini, some markdown files, and seven communication channels. Inside the setup that gives me a 24/7 AI assistant that monitors my email, iMessage, WhatsApp, and Twitter - and actually does useful things.

12 min read

I Let AI Agents Train Their Own Models. Here's What Actually Happened.

Feb 8, 2026

Two frontier agents, a pile of bugs, and a reality check on the future of autonomous AI research.

7 min read

© 2026 Hamza Mostafa

Minimal, on purpose.