Writing
Technical Notes & Published Work
A curated index of writing around AI systems, model deployment, vision workloads, RAG, and accelerator-backed cloud infrastructure.
Personal Deep Dives
Coming soon: concise breakdowns of GPU inference bottlenecks, ML systems tradeoffs, and cloud-native deployment patterns from my own projects.
Focus Areas
GPU InferenceML SystemsCloud InfraRAGMultimodal
AWS Published Blogs
Selected external publications on applied AI workloads.
Meta SAM 2.1 in SageMaker JumpStart
VisionSegmentation workflows and model access through JumpStart.
Llama 4 in SageMaker JumpStart
LLMsModel family overview and deployment patterns.
RAG QnA with Llama 3
RAGRetrieval, embeddings, and Llama-based answer generation.
Training Llama 2 with AWS Trainium
TrainingLarge-model training using Trainium on SageMaker.
Vision use cases with Llama 3.2
MultimodalOCR, VQA, image reasoning, and multimodal workflows.
Advanced RAG patterns on SageMaker
RAGArchitecture patterns for scalable RAG systems.
Trainium and Inferentia with Neuron
AcceleratorsOptimized Neuron environments for training and inference.