Is Object Detection Dead? A Case for Recognizing LEGO Bricks

Abstract

With the rise of foundation models and zero-shot segmentation, it sometimes feels like fine-tuning classic object detection models is outdated. But is it? There are over 90 000 different LEGO bricks produced in almost 200 colors, and a single photo can easily contain hundreds of bricks. This makes LEGO recognition a perfect stress test for both traditional object detectors and the latest generation of vision models.

During this talk, I will walk you through a practical comparison of approaches to LEGO brick detection. I will start with the classic object detection pipeline: dataset creation, annotation, and training with models like NanoDet and RF-DETR. Then, I will put these detectors up against zero-shot approaches: SAM 3 (Segment Anything Model 3), and vision language models, both closed-source APIs like Gemini and open-source alternatives like Qwen-VL. Along the way, I will share the pitfalls, surprising results, and lessons learned, including cases where a fine-tuned lightweight detector still outperforms models orders of magnitude larger.