How I Got Into Computer Vision (And Why I'm Still Exploring)

tl;dr: I'm a full stack developer who got into computer vision not through research, but through building. I like making tools that make CV easier and cheaper for people. My first real CV app was the YOLO NDJSON Converter.

I'll be honest, I didn't get into computer vision through a textbook or a research paper. I got into it because I thought it was cool that a machine could look at an image and know what's in it. That's it. That was the whole reason.

Still Exploring, Still Building

I want to be upfront about where I am. I'm not deep into the mathematical core of CV. I haven't spent months implementing convolutions from scratch or reading dense papers about attention mechanisms. I'm still very much exploring, and honestly? That's what makes it exciting.

Every week I'm finding something new. A new way models handle edge cases. A new approach to annotation that saves hours of work. A weird trick that makes inference faster. It's like being a kid in a massive playground where you keep discovering new corners.

But here's the thing. I'm a full stack developer at heart. I like building things. I like taking something complex and making it usable. And that's exactly what drew me deeper into the CV world.

The models are impressive, sure. But what I kept noticing was that the tooling around computer vision is... rough. Data pipelines are fragile. Format conversions are a nightmare. Deployment is harder than it should be. And for someone who comes from a web development background where we have polished tools for everything, that gap was impossible to ignore.

So instead of trying to become a CV researcher, I leaned into what I'm actually good at: building tools that make computer vision easier and cheaper for people.

My First Real CV App

That mindset is exactly how I ended up building the YOLO NDJSON Converter. If you've ever worked with Ultralytics YOLO, you know the pain. You train a model, export your annotations, and then realize the format doesn't match what your next tool expects.

I kept running into this. Convert from NDJSON to COCO. Then COCO to Pascal VOC. Then realize you actually needed YOLO v8 format, not v5. It was hours of scripting the same boring conversion logic over and over.

So I built a desktop app that handles it. You give it your YOLO NDJSON exports, pick your target format (there's 12+ supported), and it just works. Runs offline, works cross-platform, parallel downloads, the whole thing.

It's not a flashy AI demo. It's a tool that saves people time. And that felt way more meaningful to me than building another object detection tutorial.

What I've Learned So Far

A few things I've picked up along the way:

You don't need to understand everything to build useful things. I don't fully understand how every layer in YOLO works, but I understand the inputs, the outputs, and the pain points around them. That's enough to build real tools.
The dev experience in CV is years behind web dev. There's so much room for someone with a frontend/backend background to make an impact here.
Data work is underrated. Everyone wants to talk about models. Nobody wants to talk about annotation formats, data cleaning, or pipeline reliability. But that's where most of the time actually goes.

What's Next

I'm going to keep building in this space. Keep making tools. Keep exploring. I've got ideas around making annotation workflows faster and making model deployment less painful for small teams.

And in my next post, I'll do a proper deep dive into the YOLO NDJSON Converter, how I built it, the architecture decisions, and what I'd do differently now.

If you're also a developer who stumbled into CV and feels like you're "not enough" of a researcher to be here, trust me, you are. The field needs builders just as much as it needs researchers.