When Computers Can See:
How Visual AI Will Fuel Our Future
We are already past the tipping point: Artificial Intelligence (AI), in its many forms, is no longer confined to the dreams of sci-fi writers. It powers many of our web searches, social network feeds, photo apps, Alexa requests, and self-parking cars, just to name a fraction of the everyday ways we already rely on AI. And this is just the beginning.
One particular flavor of AI—specifically, computer vision (CV)—is going to be at the fore in a process that will, with no exaggeration, fundamentally change our world. In a word, CV describes the ability of machines to process and understand visual data. As we’ll explore, that has far-reaching and thrilling implications—not only for the way we interact with machines, but the way machines interact with machines when humans aren’t around. In fact, as we enter what we’ll call the Age of AI, we think computer vision just may be the defining influence.
When machines talk back
One of the most powerful and immediate expressions of AI from a mainstream perspective has been the recent but quick shift in the way we interface with all of our various computing devices—our phones and tablets and laptops and desktops, but also our TVs and, increasingly, smart home devices. For decades, we have relied almost solely on the keyboard and mouse mode of text-based input to create, process, find, or share data. The arrival of fast AI-enabled personal digital assistants like Alexa, Cortana, Google Assistant, and Siri have not only made voice-based machine control possible, but have also given humans access to machines that are to various degrees semi-autonomous and increasingly intelligent.
Similarly, there has been a rapid refinement of computer vision AI techniques, which not only enable yet another new way for humans to interface with machines, but, more importantly, allow machines a new way to interface with the world. CV lets machines quickly and accurately “see” what’s inside images, graphics, and videos, and in the most sophisticated examples, actually “understand” them.
The arrival of fast AI-enabled personal digital assistants like Alexa, Cortana, Google Assistant, and Siri have given humans access to machines that are increasingly intelligent.
Just one example, think of what happens when a computer can watch footage from a camera, process it, and make decisions on how to act—all in real time. Connect that computer to 5 or 10 cameras, or even 100 or 1,000 cameras, spread across a home, factory, farm, or city. At the most basic level this allows cities to have a real-time sense of population densities and foot traffic—perhaps the exact number of people in a protest march or parade. But think about what happens when cameras see every road and smartly configure traffic lights in conjunction with your self-driving car. What if streets were literally accident free and gridlock as a concept became extinct? Imagine the perfect efficiency of a highway that is completely safe, even at high speeds, and you start to get an idea of what just one facet of CV can provide. As we’ll see, computer vision has seemingly endless applications.
How computer vision came to be
Before we dig deeper, we should address more fully what we mean by computer vision AI and how it works. Artificial intelligence has been around as a computing concept for several decades with all kinds of theories and models for how it would work. Traditionally, scientists tended to think that artificial intelligence of machines, in the sense of computers that have the ability to think and interact like humans, would require uploading vast amounts of knowledge data to a computer and then equally vast sets of rules for how that data would be processed—for instance, all the grammatical and syntax rules of a language. Once the machine had been properly loaded up, the thinking was that it was just a question of tweaking the algorithms, codes, and rules to find the right combination that enabled “intelligent” behavior and presto, your machine would speak. This may very well work some day, but as researchers have learned, the process of uploading the entirety of human knowledge is both time consuming and pricey—and so far still doesn’t enable true intelligence.
AI research got a major boost, however, back in 2011, when AI researchers at Google saw the fruits of their labors from working with extremely powerful computing systems known as neural networks and deep learning. These systems are modeled on how human brains are thought to work. While not a new idea—the initial theory has been around since the 1940s—neural networks didn’t quite work as advertised until modern hardware made them powerful enough to tackle huge amounts of data quickly.
And so Google’s teams decided that instead of telling a computer all the rules and forcing it to process everything, they would reverse the process and feed their computer huge amounts of data that had been painstakingly labeled, which would allow them to analyze it. It was a eureka moment! In short order, the company found their machines were able to decipher objects in new unlabeled images, as well or even better than humans could (a process known as unsupervised learning). In one landmark test, their machines were able to detect false positives and negatives in diagnosing diabetic retinopathy 90 percent of the time, versus human doctors who were correct only 80 percent of the time on average. In order to achieve that, the company first had to train its machines on some 128,000 medical images.
“Neural networks didn’t quite work as advertised until modern hardware made them powerful enough to tackle huge amounts of data quickly”
What computer vision can do
The potential applications for computer vision truly are limitless, touching every aspect of life. But there are plenty of big picture examples of where CV is already being put to use to make life safer and more efficient. CV enables farm and factory equipment and other robots to roam autonomously while avoiding obstacles—that includes the new breed of self-driving cars, such as those from Tesla and Volvo, among others. Give up the driving privileges to your Tesla and it relies on a host of cameras as well as sonar to not only prevent your car from drifting out of a lane, but even to see what other objects and vehicles are around you. With Uber, Google, and Ford also developing self-driving cars, it will be years, not decades, before we can expect to have highways with a significant amount of drivers being simply along for the ride. Since 90 percent of all medical data is image based and machines are equal to or better than humans in many tests of visual accuracy, there’s no question we’ll see better outcomes from computer vision.
Since 90 percent of all medical data is image based and machines are equal to or better than humans in many tests of visual accuracy, there’s no question we’ll see better outcomes from computer vision.
Another obvious big picture role where CV has already proven to be revolutionary is with healthcare, where some form of image recognition is already used to analyze X-rays, MRI, CAT, mammography, and other scans. Since 90 percent of all medical data is image based and machines are equal to or better than humans in many tests of visual accuracy, there’s no question we’ll see better outcomes. To that end, CV can also be put to use in actual medical procedures, via robots that create perfect sutures and thus vastly safer outcomes with fewer chances of complications or infections.
On a day to day level, image recognition is responsible for such handy features as searching easily with the term of your choice through your untagged photo collection on Google Photos or translating road signs and menus in real time via Google Translate. And even on a non-practical entertainment level, image recognition powers some basic augmented reality (AR) apps that bring tattoos to life.
In terms of business, image recognition has already become a crucial tool for processing the firehose of visual data that is uploaded and shared online every second. (For example, image recognition is used for everything from serving dynamically-customized contextual ads into relevant images to calculating the true value of sports sponsorships across multiple platforms.) And there are new business models, like Amazon’s drone delivery service, which might make product delivery as quick and simple as ordering a pizza—the company recently made its first successful test delivery in England and hired an army of computer vision experts to develop sense-and-avoid technology for its autonomous aircraft. Google/Alphabet is also charging ahead with a push to make homes and offices smarter by using computer vision techniques for its Nest line of smart home cameras, allowing them to tell the difference between an intruder and your sleepwalking Uncle Bill through facial recognition.
And of course, the gaming world is afire with the arrival of a host of competing computer-vision-based Virtual Reality (VR) platforms that will revolutionize interactive experiences—even perhaps spawning a new completely immersive entertainment medium as yet unnamed. These are just a tiny sampling of the myriad of ways computer vision is already applied.
A solution to visual overload
Fun fact: To most humans, our experience of the world is intensely visual, and it’s estimated that we use about half of our brain power simply to process the information we observe.
we have become a world of image generators, snapping photos and sharing them to the tune of 3 billion per day and increasing.
Take that notion and then consider that we have become a world of image generators, snapping photos and sharing them to the tune of 3 billion per day and increasing. Another fun bit of trivia: photo sharing site Photoworld determined it would take a person 10 entire years to even look at all the photos shared on Snapchat in just one hour. (And of course, in those 10 years, another 880,000 years’ worth of photos would have been spawned.)
All of those images are data, rich with useful information and context, at a scale that only massively powerful computers are capable of managing. Essentially all of the tantalizing technological breakthroughs on the horizon—the smart office, the Internet of Things, self-driving cars, hyper efficient earth-friendly farming, and advanced robotics—will depend on computer vision. In order to move forward, we will need to let our machines lead the way.