Kriti Behl
New-grad software engineer building backend and distributed systems that stay correct under failure. Built production backend systems at Thales Group, contributed merged fixes to the Temporal Go SDK, and built proof-heavy systems including 0 duplicate commits across 1,500 fault-injected race reproductions and resilience checks that catch unsafe behavior even when probes still report healthy.
Projects
These projects are built to show not just that systems work, but what happens when they fail, drift, recover, or become unsafe in ways surface health checks can miss.
- PostgreSQL:
FOR UPDATE SKIP LOCKED, lease-based ownership, fencing tokens,UNIQUE(job_id, fencing_token)DB-enforced idempotency, bounded exponential backoff - FaultProxy wraps psycopg2 to inject latency, drops, and timeouts — 0 duplicate commits and 1,500 stale-write rejections across 1,500 race reproductions at 0%, 5%, 10% fault rates
- 29-assertion drill suite across 16 failure scenarios · 11 Prometheus metrics · 12 automated tests
- YAML disruption scenarios (CPU stress, pod kills, network partition) with baseline-vs-observed comparison and composite resilience scorecards
- Readiness false-positive detection: surfaces cases where probes report healthy while service metrics still show degradation
- CPU-stress validated: 8s recovery · ~210ms p95 · ~2% error rate · resilience score 86/100
- 11 failure families ·
config/rules.yamldrives detection patterns, ownership hints, and remediation — no backend code changes required - Admin audit log (rule ID, actor, timestamp, before/after state) ·
python cli.py replay <incident_id>for repeatable triage - 11 FastAPI endpoints · 5 Prometheus counters · 16 passing tests · runbook in
docs/runbook.md
- Deterministically isolated first divergence at event index 5 across a 20-event trace · preserves 4 artifacts per run
- Swift companion (DetTraceAnalyzer): async/await, actor-isolated AnalysisStore, JSON + Markdown reports · 3 passing tests
- Pipeline:
cases.jsonl→ DistilBERT inference → RAG-overlap or classification-label scorer →runs/ → reports/ → compare/ → gate/ - Release gate blocks on avg score drop / pass-rate drop / per-case regressions · validated by
test_real_model_regression_gate.py - FastAPI:
POST /evaluate,/compare,/gate· full CLI · Dockerfile · 11 tests
Open Source Impact
Open-source contributions to production systems SDKs, including 2 merged PRs and 1 open PR in the Temporal Go SDK, plus 2 PRs under review in the Azure Go SDK.
OnWorkflow matchers see the same headers as real workflow execution.doneChannel. Added idempotent closure with sync.Once and a regression test that fails without the fix and passes with it.realClose() transport failures with request errors using errors.Join so callers can inspect retry-path failures instead of losing them silently.traceparent and tracestate propagation via OpenTelemetry propagators and validated header injection with tests.Skills & Stack
Experience
- Built a PostgreSQL-backed backend analytics service processing ~100k state-transition records per run, giving operations teams real-time visibility into resource utilization across distributed pools.
- Designed deterministic state-resolution logic and timestamp-delta aggregation over historical event logs to compute configurable utilization metrics across 24-hour to 30-day reporting windows.
- Built REST APIs and operational dashboards for resource- and group-level efficiency reporting, enabling capacity planners to identify underutilized resources and optimization opportunities without affecting live request-processing paths.
- Built backend REST services on AWS using Node.js and Express, strengthening API behavior with improved input validation and structured error handling.
- Optimized database query execution plans and indexing, reducing endpoint latency by ~15–25% in performance tests.
- Built Java backend modules for procurement workflows with transactional safeguards and log-recovery simulation for safer pre-production behavior.
Selected Writing
Contact
New grad, Dec 2025 · Open to relocation.