Implementation

Choosing a Custom Computer Vision AI Provider: Mind the Error Budget

Apr 14, 20269 min read

A 95%-accurate vision demo sounds great until you cost the other 5% at your volume. Choose a custom computer vision AI provider on the error budget and the human-review fallback, not the headline number.

Choosing a Custom Computer Vision AI Provider: Mind the Error Budget

A custom computer vision AI provider will show you a demo with a number on it: 95% accuracy, maybe 98%. It sounds like a finished product. It is not. The number you actually need is the cost of the errors at your volume. If you process 50,000 images a month, a 5% error rate is 2,500 wrong calls, and whether that is fine or catastrophic depends entirely on what a wrong call does. The provider who designs around that error budget is the one worth hiring.

Why accuracy alone is the wrong way to judge computer vision development

Accuracy is a single number hiding two very different failures. A false positive and a false negative rarely cost the same. In quality inspection, missing a defect (false negative) might ship a faulty part to a customer, while flagging a good part (false positive) just wastes a few seconds of human review. Those are not equal, and a model tuned for headline accuracy optimises the wrong balance.

Good computer vision development starts by asking which error is expensive and tuning the system to avoid that one, accepting more of the cheap error in return. This is why a 99% accurate model can be worse for your business than a 94% one that almost never makes the costly mistake. The provider who leads with a single accuracy figure and no discussion of the error trade-off is selling a demo, not a custom computer vision solution.

Before any vendor quotes a build, answer two questions: what does a false positive cost you, and what does a false negative cost you? The gap between those two numbers is the single most important input to the design. Most buyers never compute it.

How to choose a computer vision provider: ask about the fallback

The systems that work in production almost never aim for full automation. They aim for confident automation plus a human path for the rest. Ask any provider how they handle the cases the model is unsure about.

  • A confidence threshold, not a verdict. The model should return how sure it is, not just a label. High-confidence cases get handled automatically; everything below the line goes somewhere safe.
  • A human-review queue for the rest. The uncertain cases route to a person. This is not a failure of the system. It is the design. Over time the review volume shrinks as the model improves on your real images.
  • Training on your images, not a stock dataset. A model trained on clean public data fails on your lighting, your angles, your edge cases. The provider needs your data, or a plan to collect it, before the accuracy number means anything.
  • A retraining story. Cameras move, products change, seasons shift the lighting. Ask who retrains the model, how often, and what it costs. A vision system is a living thing, not a one-time delivery.

A live example of AI image recognition with a human boundary

When we built Memórias do Jamor, an AI-moderated fan memory wall for the football club SCU Torreense, the design was exactly this shape. Fans upload photos to a public wall, and the system has to keep offensive content off it without a human checking every image in real time. So content the model is highly confident is safe gets published automatically, and everything else routes to a human review queue. The confidence threshold is the product. It lets a fan post in seconds while keeping a person in the loop on exactly the cases that need one.

That pattern generalises across AI image recognition: defect detection, document scanning, content moderation, inventory counting. The model handles the obvious cases at machine speed, the people handle the ambiguous ones, and the line between them is a dial you tune to your error budget. Pushing the threshold to full automation before the model has earned it is one of the surer ways to turn a working pilot into a failed project. If you want to see how we scope this kind of build, the audit is where the error budget gets defined.

When you do not need a custom vision system

Three situations where bespoke vision is not the answer.

  • A cloud API already does it. Reading text from documents, detecting common objects, basic face or barcode detection, the big cloud vision APIs handle these off the shelf. Pay per call before you build.
  • You cannot supply training data. If you have no labelled images and no realistic way to collect them, a custom model has nothing to learn from. The honest first project is data collection.
  • The volume is tiny. If a person could eyeball the whole daily volume in an hour, the automation rarely pays for the build and the maintenance. Vision earns its cost at scale.
  • A demo’s accuracy number is meaningless until you cost the errors at your real volume.
  • False positives and false negatives rarely cost the same. Tune the system to avoid the expensive one.
  • The architecture that ships is confident automation plus a human-review queue, with the threshold set to your error budget.
  • Insist on training with your own images and a clear retraining plan. A model on stock data fails on your reality.
  • Skip the custom build if a cloud API fits, you cannot supply training data, or the volume is small enough for a person.

The right custom computer vision AI services start from your error budget and build the human boundary in from day one, rather than chasing an accuracy figure that hides the trade-off. gamgi runs a two-week audit that defines which error is expensive, sets the threshold, and scopes a first build you own. What does a wrong call actually cost you, per image?

Book your AI audit