Building High-Performance AI Systems & GPU-Accelerated Pipelines
AI Infrastructure Engineer focused on understanding GPU performance, ML systems, and cloud infrastructure for modern AI workloads.
Interactive GPU Inference Simulator
Explore how GPUs accelerate large language model inference compared to CPUs. Adjust model size, batch size, GPU type, and inference mode to see how VRAM, compute, and memory bandwidth change the bottleneck.
This simulator is intentionally simplified to illustrate inference dynamics (parallelism, batching, and latency trends), not to model cycle-accurate GPU performance.
Estimated footprint
14.1GB
14.0GB weights + 0.1GB KV
Selected GPU
A10G
24GB VRAM fits
Active bottleneck
Compute-bound prefill
Tensor Core throughput
Emerald shows compute-heavy prefill; blue shows memory-bound decode. If the footprint exceeds VRAM, the simulation adds a penalty for quantization, tensor parallelism, or CPU offload.