My technical work spans GPU computing, embedded systems, and machine learning infrastructure. Below is an overview of what I work with regularly.
C and C++ are my primary languages — used daily for both embedded firmware on STM32 and GPU kernel development. I write CUDA for NVIDIA GPU workloads and WGSL for WebGPU shaders. Python is my go-to for scripting, ML experimentation, and tooling. I also work in JavaScript for WebGPU-based browser applications and Embedded C for microcontroller firmware.
Core focus area. I work with WebGPU and CUDA for compute kernel development, optimizing for memory bandwidth, thread occupancy, and synchronization overhead. I use NVIDIA Nsight for profiling, TensorRT for inference optimization, and Triton for kernel authoring. For distributed workloads I have experience with OpenMP, MPI, and Slurm.
Experience deploying and optimizing ML models using PyTorch, TensorFlow, and ONNX. My work on GPU-accelerated medical imaging involved quantization, precision tuning, and operator fusion using TensorRT — reducing inference latency by 50% while maintaining accuracy.
Firmware development on STM32 using STM32CubeIDE. Real-time task management with FreeRTOS. Sensor interfacing over SPI, I2C, and UART protocols. Wireless telemetry via MQTT over Wi-Fi and Bluetooth for IoT data pipelines.
Linux (primary development environment), Git, and standard command-line toolchains.