Encoding. Encrypting. Transporting over HTTP.Receiving. Decrypting. Decoding. Rendering to surface. Syncing audio. At scale. I’ve had the privilege of shipping software that has run on 100M+ devices globally — from set-top boxes to mobile phones to deeply embedded Linux systems. A significant portion of that work was in the media domain, where performance, security, latency,…
NVIDIA GPU Deep Dive : Tensor and Ray Tracing Cores
This blog is Part 3 in our Nvidia GPU blog series. In Parts 1 and 2, we explored the fundamentals of NVIDIA GPU architecture: from CUDA cores to streaming multiprocessors (SMs) and warp schedulers. Today, we transition into what sets modern NVIDIA GPUs apart-the Tensor Cores and Ray Tracing (RT) Cores-two hardware innovations that fuel…
Deep Dive into GPU Compute Hierarchy
Modern NVIDIA GPUs are feats of hierarchical design, optimized to maximize parallelism, minimize latency, and deliver staggering computational throughput. Building upon Part 1, which introduced the high-level architecture of NVIDIA GPUs, this is Part2 – a Deep dive into GPU Compute Hierarchy: Graphics Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming Multiprocessors (SMs), and CUDA…
✅ Introduction to NVIDIA GPU Architecture: Hierarchy, Cores, and Parallelism
👋 Welcome to GPU Architecture 101 In this first post in five-part series, where I introduce you to the NVIDIA GPU architecture—a foundation for parallel computing used in gaming, scientific computing, artificial intelligence, and more. This guide is designed for engineering students and beginner developers who want to understand how modern GPUs work—from their evolution…
🔄 Google’s Agent2Agent (A2A) Protocol: A New Era of AI Agent Interoperability
Google has introduced the Agent2Agent (A2A) protocol, a revolutionary open standard designed to enable seamless communication between AI agents. This innovation, backed by over 50 major tech partners, allows autonomous agents to collaborate across platforms, applications, and organizations—even when built using different technologies. 🌐 Why the Agent2Agent Protocol Is a Game-Changer As AI agents become…
Timeline from Transformers to LLMs and Agentic AI
Since the groundbreaking 2017 paper “Attention is All You Need” introduced the Transformer architecture, the field of artificial intelligence has undergone a rapid and transformative evolution. This blog post will explore the chronology of important events that have shaped the AI landscape, leading up to the current era of Large Language Models (LLMs) and agentic…
A Deep Dive into PyTorch’s GPU Memory Management
Here is an error I got when using an image generation deep learning model. It is a common error Engineers get when using PyTorch on GPU. To solve this error, a deep dive into PyTorch’s GPU Memory management is needed. So fasten your seat belts 🙂 torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 58.00…
🚀 The Evolution of YOLO 🚀
The YOLO (You Only Look Once) series is a real-time object detection algorithm that uses convolutional neural network (CNN). It has dramatically shaped the landscape of real-time computer vision. Each iteration of YOLO brings something unique to the table, enhancing the capabilities and applications of object detection. Let’s dive into the evolution of Yolo, details…
Basic Machine Learning Optimization Algorithms
Learn basic machine learning optimization algorithms. Explore Gradient Descent, and ADAM, which employs adaptive learning rates for each parameter.
Hand written notes on Neural Networks and ML course by Andrew Ng
About 2018 when I started working on Machine learning I took many courses. Here are my hand written notes on Neural Networks and ML course by Andrew Ng. It focuses on the fundamental concepts covered in the course, including Logistic Regression, Neural Networks, and Softmax Regression. Buckle up for some equations and diagrams! Part 1:…


