Projects

Deep-dive work focused on GPU-accelerated inference, distributed serving architectures, and performance-critical AI systems.

ML Systems & GPU Inference

Hands-on system design and performance analysis of modern LLM inference pipelines — focusing on latency, throughput, memory behavior, and scaling limits.

Open-Source & Production Contributions

Production deployments, reference architectures, and open-source work supporting large-scale GenAI and cloud workloads.