How ‘Object Recognition’ Is Helping the Blind

Digital health technology has improved the way people manage their conditions. If you take prescription medication, you can use dose tracking apps. If you have heart disease, you can use wearables, like Fitbit, to monitor a healthy lifestyle. That’s great, but what about underserved populations like the blind or visually impaired?
Technologies with object recognition are proving quite useful. For instance, Amazon has a smart display called, “Echo Show,” which has a “Show and Tell” feature. The feature uses object recognition to identify household pantry items for blind or visually impaired people. It works like this:
1) A person holds up a common pantry item in front of the camera.
2) They say, “Alexa, what am I holding?”
3) Alexa tells them the name of the object.
It sounds simple enough, but there’s way more to it than that.
Object recognition works by identifying objects in images and videos. This is made possible through computer vision and machine learning — a subset of artificial intelligence. The computer vision technique allows a computer to literally “see” and understand content. The process by which this happens involves machine learning.
With machine learning, a computer uses models and algorithms to learn for itself. It starts by recognizing features in objects (manual feature extraction) and then grouping them into a specific class.

As you can see, the computer is able to recognize different versions of one thing and categorize it correctly. Based on what it has learned from data, it can recognize patterns, and ultimately make decisions on its own. Here’s another example:

The two different types of cats are classified as “cat” and stored into an algorithm. The more cats the computer sees, the better it becomes at recognizing one.
Deep Learning vs. Machine Learning
Deep learning, a subset of machine learning, does this on a massive scale. It’s similar to machine learning but provides far greater accuracy. That’s because deep learning can involve showing a computer thousands (or millions) of images of cats and dogs until the system is able to automatically distinguish the two. Some deep learning models, such as convolutional neural networks (CNNs) make this possible by mimicking the human brain’s neural networks.

It can take enormous quantities of data to achieve deep learning. So, if you don’t possess a large quantity of training images, it may be best to use machine learning.
Just remember, no matter how you get there, if your goal is to achieve object recognition, you may be doing more than creating a cool feature. You could also be helping someone see.
References:
https://www.mathworks.com/solutions/image-video-processing/object-recognition.html