Blog

Technical deep dives into GPU systems, AI infrastructure, LLM optimization, and vision model engineering.

Personal Blogs

Deep dives on GPU inference internals, LLM performance tradeoffs, and the systems design decisions that shape modern AI workloads.

Overview of SAM 2.1 and how to run segmentation workflows in JumpStart.

A breakdown of the Llama 4 model suite and deployment patterns.

End-to-end RAG pipeline using embedding, retrieval, and Llama 3 inference.

How to train Llama 2 using Trainium chips on SageMaker at scale.

Applying Llama 3 multimodal models for OCR, VQA, image reasoning, and more.

Architectures for scalable and production-grade RAG pipelines.

Launch optimized Neuron environments for training and inference workloads.