Software engineer building systems that catch failures before production.
Backend · Reliability · Developer Tooling · AI Infrastructure. Production backend at Thales Group, 4 merged fixes to the Temporal Go SDK.
Live Cloud Run system that detects unsafe AI outputs, blocks them with eval gates, and converts failures into AutoOps incident intelligence.
Detected recurring CI failures and blocked unreliable releases — grouped failures into incident families and generated hold/ship decisions with 0.91 confidence.
hold_release / investigate with confidence scoresStale workers commit after losing ownership. Lease expiry stops new claims — it doesn't stop an old worker from writing late. Fencing tokens fix the write boundary at the database, not the application.
safe_to_operate=false4 merged PRs in the Temporal Go SDK + 2 Azure SDK PRs in review.
Looking for backend, platform, QA automation, or reliability engineering roles. I build systems that prevent failures before production.
New grad · Dec 2025 · Open to relocation