Building High-Performance AI Systems & GPU-Accelerated Pipelines

Cloud AI Architect specializing in LLM optimization, distributed inference, multimodal vision systems, and GPU-accelerated ML infrastructure.

View Projects About Me

Interactive GPU Inference Simulator

Explore how GPUs accelerate large language model inference compared to CPUs. Adjust model size, batch size, GPU type, and inference mode to see how parallelism affects throughput and latency.

This simulator is intentionally simplified to illustrate inference dynamics (parallelism, batching, and latency trends), not to model cycle-accurate GPU performance.

Inference Mode

GPU

Model Size: 7B

Batch Size: 1

CPU (Serial Processing)GPU (Compute-Heavy Prefill)

Color indicates inference phase: emerald for compute-heavy prefill, blue for memory-bound decode. GPU blocks represent conceptual parallel execution.