Jay Shah

Skills

My technical work spans GPU computing, embedded systems, and machine learning infrastructure. Below is an overview of what I work with regularly.

Languages & Low-Level Programming

C and C++ are my primary languages — used daily for both embedded firmware on STM32 and GPU kernel development. I write CUDA for NVIDIA GPU workloads and WGSL for WebGPU shaders. Python is my go-to for scripting, ML experimentation, and tooling. I also work in JavaScript for WebGPU-based browser applications and Embedded C for microcontroller firmware.

GPU & Parallel Computing

Core focus area. I work with WebGPU and CUDA for compute kernel development, optimizing for memory bandwidth, thread occupancy, and synchronization overhead. I use NVIDIA Nsight for profiling, TensorRT for inference optimization, and Triton for kernel authoring. For distributed workloads I have experience with OpenMP, MPI, and Slurm.

Machine Learning Infrastructure

Experience deploying and optimizing ML models using PyTorch, TensorFlow, and ONNX. My work on GPU-accelerated medical imaging involved quantization, precision tuning, and operator fusion using TensorRT — reducing inference latency by 50% while maintaining accuracy.

Embedded Systems & IoT

Firmware development on STM32 using STM32CubeIDE. Real-time task management with FreeRTOS. Sensor interfacing over SPI, I2C, and UART protocols. Wireless telemetry via MQTT over Wi-Fi and Bluetooth for IoT data pipelines.

Operating Systems & Tools

Linux (primary development environment), Git, and standard command-line toolchains.

Email · LinkedIn · GitHub