Neural networks designed to recognize images are becoming better and better at describing what they’re seeing, and a recent video from developer Kyle McDonald shows this off in realtime – along with some obvious limitations.
Armed with his MacBook Pro and a modified version of image recognition software NeuralTalk2, McDonald walked the streets of Amsterdam and recorded as his computer tried to identify what it was seeing. The results are fascinating, if not consistently accurate.
If you look out your window right now, you can describe what you see instantly and accurately. It seems simple, but teaching computers to do the same thing is complicated. The computer needs to analyze the scene, identify individual components, and then figure out how they relate to each other.
This video makes that clear. When the captions are accurate it’s uncanny. “A boat is docked in the water near a city” and “a row of bikes parked next to each other” are both as accurate as they are quintessentially Amsterdam. And the look on the face of “a man eating a hot dog in a crowd” alone is worth the price of admission.
But most of the captions are totally wrong, and that’s possibly even more interesting. Why does the neural network describe McDonald , who is clearly wearing a hoody, as “wearing a suit and tie?” Is it confusing his zipper for a tie? Why is it constantly seeing clocks where none exist? Why does it perceive so many colorful things as black and white?
NeuralTalk2 is an open source piece of software designed to look at photos and caption them, identifying things in the images and attempting to put them into context. You can set NeuralTalk2 up yourself if you want to make a Thanksgiving project out of it, but you should that it’s not exactly user friendly. You’re going to need to invest some time and some smarts in this one.
But don’t worry. If you’re not a programming wizard you can just enjot the video, or check out more caption examples here.
Related Posts
This extraordinary humanoid robot plays basketball like a pro, really
Digital Trends has already reported on the G1’s ability to move in a way that would make even the world’s top gymnasts envious, with various videos showing it engaged in combat, recovering from falls, and even doing the housework.
How to Use Pollo AI Video Generator: A Step-by-Step Guide
Here we’re talking about the Pollo AI video generator which can be used with a variety of prompts, and I’ll talk you through using each one.
This 49-inch curved Samsung ultrawide is down to $799.99 and basically replaces two monitors at once
You’re getting a massive 49-inch curved Dual QHD panel, 120Hz refresh rate, USB-C, HDR400, and an adjustable stand that’s built for serious productivity but still fast and smooth enough for after-hours gaming.