HUY NGUYEN

Computer Vision

September 22, 2024

What is Computer Vision?

Computer Vision is now a prominent application of Artificial Intelligence, relying primarily on image inputs but also utilizing other data types such as videos, depth maps, and point clouds for specific use cases. Advancements in the field, with models like Midjourney, DALL-E, Stable Diffusion, Segment Anything Model, and YOLO, have kept it at the forefront of AI research in recent years. By 2020, Computer Vision accounted for 31.7% of all published articles on arXiv, making it one of the most actively researched domains. Its growing popularity is largely attributed to the rise of Deep Learning and advances in parallel computing, enabling neural networks with millions of parameters to process images in milliseconds. This allows for real-time speeds of up to 30 images per second, demonstrating remarkable efficiency without apparent limitations.

What are the applications?

Computer Vision algorithms have become an integral part of daily life, appearing in various technologies we use regularly. In 2017, Apple introduced Face ID with the launch of the iPhone X, a feature that has since been included in every new iPhone model. Additionally, airports around the world have significantly streamlined boarding and verification processes through the adoption of biometric technology. Border control gates now employ automated systems powered by artificial vision algorithms, notably facial recognition and digital fingerprint matching.

ePassport gates in Heathrow Airport (Terminal 4)
ePassport gates in Heathrow Airport (Terminal 4). Source

In the realm of traffic and autonomous vehicles, Tesla, Elon Musk's automotive company, is intensifying competition among manufacturers by offering four distinct car models: the Model S, Model 3, Model X, and Model Y. Utilizing 8 cameras, Tesla vehicles have a full vision of 360 degrees and a maximum detection range of 250 meters. Advanced embedded computers process the data rapidly, enabling the vehicle to comprehend its environment in real time. Additionally, AI-powered visual identification systems facilitate quick and seamless vehicle recognition, streamlining toll processing. Moreover, traffic management has become significantly more efficient in many countries, as computer vision algorithms can accurately count vehicles with precision, eliminating the challenges previously associated with this task.

Tesla auto-pilot technology running in real-time. Source

Yuka, developed by Julie Chapon, Benoît and François Martin in 2017, is changing many consumers habit. Powered by Scandit’s Barcode Scanning method, the on-edge application allows fast and reliable barcode reading, and then provides out-of-100 scores on a product. This gives buyers an insight into compositions of any registered product, to determine its healthy level. A low score indicates that the good might contain unhealthy ingredients, such as, ultra-processed components, high levels of salt or sugar, or potential harmful preservatives. Yuka currently is available in 12 countries with over 55 million users. In the United States of America, the company registers around 600.000 new users monthly, with 25 products scanned every second. Source.

A client is using Yuka in a supermarket.
A client is using Yuka in a supermarket. Source

How does Computer Vision work?

The transition from analog to digital imaging occurs gradually over several decades. The first digital image sensor was developed by Willard S. Boyle and George E. Smith in 1969, based on MOS capacitor technology. After this ground-breaking development, many researches have been conducted, allowing camera manufacturers to release countless prototypes. Then, 20 years later, on 1990, the Dycam Model 1 was widely commercialized. It used a CCD image sensor and stored pictures digitally in a semi-conductor memory card.

The arrival of digital camera involves more specific image processing approach. Dark rooms and hazardous chemicals are no longer suitable to handle numerical photos. In 1987, Thomas and John Knoll created the first version of what would become the iconic image-editing software, Photoshop. Unlike human, digital images are processed with computers or dedicated hardware devices. Images are captured via sensing devices like cameras or radars, then processed by computers to produce a result. Essentially, a digital image is an electronic signal, and Computer Vision is an image signal processing methodology.

Human vision vs Computer Vision.
Source

Yann Le Cun, one of the 'fathers' of Convolutional Neural Networks, demonstrating its functioning. Source

Digital image processing research branches

Vision-based AI are heavily investigated in many branches, such as:

  • Classification
  • Semantic and Instance Segmentation
  • Object detection and tracking
  • 3D reconstruction
  • Content generation
  • Super resolution
  • Depth estimation
  • Augmented / Virtual reality
  • And many more

While research strengthens the position of digital vision in many sectors, it also makes it more popular and therefore, more vulnerable. One common offensive strategy against smart vision systems is adversarial attacks, where subtle alterations to images - often imperceptible to the human eye - can cause models to misbehave. Such attacks can cause serious implications, for instance, in autonomous driving when the attacked vehicle misinterprets road signs, or in facial recognition when frauders are identified as registered one. Other vulnerabilities include data poisoning, meaning that malicious data are introduced during the training base to corrupt the algorithm’s learning process, without being acknowledged if no verification procedures are established.

Adversarial attack against Computer Vision demonstration. Any RGB-like-pattern patches, could make the subject completely invisible in front of vision-AI systems. Source

This guides computer vision researchers and developers to a new direction: Digital security for artificial intelligence. Methods like adversarial training, robust feature extraction, and secure gradient updating mechanism are essential to combat these risks. Additionally, having a proper data verification technique is also crucial in order to maintain such integrity and reliability of computer vision systems in sensitive environments.