Assessing AI in Conservation - is it any good?

david.swann.new.zeal · April 15, 2026, 4:40am

There’s a lot of AI emerging in conservation. I’ll set my cards on the table. I think that AI has much to offer in conservation… but only if the models are adequately trained. And that’s a bit of a challenge.

To set a context, the models that we use outside conservation might be trained on billions or even trillions of data points. So identifying a car, a train, a cat or a window… those are well-trained models (because every time you’ve done a Captcha, that’s what you’ve been doing!). But identifying a mouse, a rat, a stoat, a kea, a kiwi… not so many training data sets. a million training points perhaps? Certainly not a billion!

The other challenge in conservation is that as AI gets added to devices like traps and cameras in the field, the AI chip has a tiny power budget… and so the model simply isn’t as powerful. When a model is running on Amazon Web Service, power isn’t an issue so the AI results should be better. When the model is running on a chip with a mA power budget, not so much.

We have a bunch of AI-enabled technology - bird monitors, trail cams and of course our AT520-AI traps. So how good is the AI? Subjectively, I had a feeling that there’s room for improvement. Objectively, I needed a metric. And that’s what this post is about.

What I’ve done is take 2,500 images (or bird song recordings… but that’s more difficult and out of scope for this). So for the 80 AT520-AI traps in our core trapping areas, that’s about five days of data. For each image, the AI has done its classification (nothing, rodent, possum, cat etc). I then review each image and then put my classification in. I hasten to add that I’m fair in this assessment. An AI image classifier has no context. It knows nothing of prior images, nothing of scale, nothing about trap architecture. So I assess realistically.

The end result is a matrix like this:

Column 2 is what the AI saw - Row 2 is what I saw. Where we agreed are green cells with the totals against each species.

The orange cells show where the AI either saw a bird or nothing… but there was a predator there. That’s bad because that’s an opportunity lost to kill the predator. (Noting that the majority of the possum errors were a dead possum under the kill bar!)

The red cells are worse. That where the AI ‘saw’ a rodent but it was a bird (invariably a waxeye in our traps). That would be a dead bird. So not good.

Overall the green cells represent 67% - so the AI got it right 67% of the time. For monitoring, that’s not bad. For trap control, there’s some improvement needed.

(I do need to set an important context. Motupohue is a low-predator environment and so we have a LOT of birds (predominantly waxeyes) feeding in our AT520s… a lot of mice, some rats, fewer possums. If we were in an early-stage control environment I suspect the results would be better)

This is probably an approach that others might find useful for assessing ‘your’ AI. Share your results - I’d be real interested to hear your results.

We’re going to now use this approach to assess past performance from a year ago when we first started with the AT520s… and then do this every time a new model is released so that we can confirm improvement.

I hope this helps

David

marianmilnenz · April 17, 2026, 1:38am

Your matrix is a super confusing represetnation…Perhaps a different grid would be easier to follow. Nice work and as you say the AI training is and it can only get better

vickyg · April 17, 2026, 1:50am

Glad it wasn’t just me! I know you’ve provided an explanation David, but intuitively when looking at a chart like this, the natural inclination is to assume a relationship between the rows and the columns (e.g. how many instances are bird and rodent both present; 3 or 10).
Much simpler to have the animal types across the columns, and AI and Manual as the rows. Then you can have another couple of rows and/or colour coding that details where there is a difference in numbers between both detection types.

lenbok · April 19, 2026, 4:58am

The matrix representation is pretty standard, it’s known as a “confusion matrix” (but it’s meant to illustrate the predictive algorithms confusion, not your confusion when reading it :-D).

Nice evaluation - is it possible to use this as a public dataset that could be used to improve the models that the AutoTraps guys provide?

Cheers,
Len.