Custom Computer Vision
Computer Vision is a scientific field that deals with how computers can “see” and understand what they are seeing based on digital images and digital videos — sometimes real-time. It tries to reproduce the human ability to see and identify different objects in different places and in different angles.
Custom Vision (customvision.ai) is an Artificial Intelligence service, cloud-based, and end-to-end platform for applying computer vision to your specific scenario, using your own data, and training your model using the power of a cloud environment.
In this example, I will be using the Microsoft cloud-based service customvision.ai to explore the process and results.
The scope
Using customvision.ai, create a database with some images, train the model, and make the model recognize different images and classify them. This is a good example of supervised machine learning (image classification).
How does it work?
The first step is to load the images and tag each image with the respective meaning.
Then you have to train the model to connect each image with each tag, this way the model will understand what each image represents (tag).
The last step is to evaluate the model and verify the results (% accuracy) for each classification considering different photos (not the same used to train the model).
customvision.ai/projects (you will need an Azure Subscription to access this service)
Create a new project
Adding the info of your project
Add the images
Selected images to train the model
In this case, I selected 4 different kinds of images. Helicopters, airplanes, ships, and trains. The next step is to load a set of images and tag them with a specific name.
Loading airplanes <TAG>
Loading helicopters <TAG>
Loading ships <TAG>
Loading trains <TAG>
After you load all the photos, train the model
Select training type
Validating the results
Precision — Measures how accurate are your predictions. i.e. the percentage of your predictions are correct.
Recall — Measures how good you find all the positives. For example, we can find 70% of the possible positive cases in the top K predictions.
AP — (Average Precision)
You can check more details about precision, recall, and AP in this great and detailed post from Jonathan Hui:
Checking the results -> airplane
Checking the results -> train
Checking the results -> ship
Checking the results -> helicopter
Conclusion
As you can see even using draws in the specific tag the Computer Vision is able to recognize and to evaluate that the higher probability between the 4 different tags is correct.
Service Availability
There is an important note. This is a cloud-based service and maybe this service could not be available in all regions.
The Purpose
The purpose of this work is ONLY to present a simple solution for the proposed scope using free resources (or at least free for students) available on the web.
There IS NO pretension to show the most optimized solution and also to showcase the best tools. It is also important to recognize that, in some cases, has been applied more steps than needed to test and validate different features and services.
The main idea of this project is to explore possibilities, resources, and solutions that could be an alternative to solve the presented problem/scope in an effective way.
References
Thanks! ;-)