Object classification through a single-pixel detector
by Maxim Batalin
This article was originally published in Biophotonics.World
Machine vision systems are being used in various applications, including self-driving cars, intelligent manufacturing, robotic surgery and biomedical imaging, among many others. Most of these machine vision systems utilize lens-based cameras, and after an image or video is captured, typically with a few megapixels per frame, a digital processor is used to perform machine learning tasks, such as object classification and scene segmentation. Such a traditional machine vision architecture suffers from several drawbacks. First, the large amount of digital information makes it hard to achieve image/video analysis at high speed, especially using mobile and battery-powered devices. In addition, the captured images usually contain redundant information, which overwhelms the digital processor with a high computational burden, creating inefficiencies in terms of power and memory requirements. Moreover, beyond the visible wavelengths of light, fabricating high-pixel-count image sensors, such as what we have in our mobile phone cameras, is challenging and expensive, which limits the applications of standard machine vision methods at longer wavelengths, such as terahertz part of the spectrum.
UCLA researchers have reported a new, single-pixel machine vision framework that provides a solution to mitigate the shortcomings and inefficiencies of traditional machine vision systems. They leveraged deep learning to design optical networks created by successive diffractive surfaces to perform computation and statistical inference as the input light passes through these specially designed and 3D-fabricated layers. Unlike regular lens-based cameras, these diffractive optical networks are designed to process the incoming light at selected wavelengths with the goal of extracting and encoding the spatial features of an input object onto the spectrum of the diffracted light, which is collected by a single-pixel detector. Different object types or classes of data are assigned to different wavelengths of light. The input objects are automatically classified optically, merely using the output spectrum detected by a single pixel, bypassing the need for an image sensor-array or a digital processor. This all-optical inference and machine vision capability through a single-pixel detector that is coupled to a diffractive network provides transformative advantages in terms of frame rate, memory requirement and power efficiency, which are especially important for mobile computing applications.
Published in Science Advances, a journal of the American Association for the Advancement of Science (AAAS), UCLA researchers experimentally demonstrated the success of their framework at terahertz wavelengths by classifying the images of handwritten digits using a single pixel detector and 3D printed diffractive layers. The optical classification of the input objects (handwritten digits) was performed based on the maximum signal among the ten wavelengths that were, one by one, assigned to different handwritten digits (0 through 9). Despite using a single-pixel detector, an optical classification accuracy of more than 96% was achieved. An experimental proof-of-concept study with 3D printed diffractive layers showed a close agreement with the numerical simulations, demonstrating the efficacy of the single-pixel machine vision framework for building low-latency and resource-efficient machine learning systems. In addition to object classification, UCLA researchers also connected the same single-pixel diffractive optical network with a simple, shallow electronic neural network, to rapidly reconstruct the images of the input objects based on only the power detected at ten distinct wavelengths, demonstrating task-specific image decompression.
This single-pixel object classification and image reconstruction framework could pave the way for the development of new machine vision systems that utilize spectral encoding of object information to achieve a specific inference task in a resource-efficient manner, with low-latency, low power and low pixel count. This new framework can also be extended to various spectral domain measurement systems, such as Optical Coherence Tomography, Infrared Spectroscopy and others, to create fundamentally new 3D imaging and sensing modalities integrated with diffractive network-based encoding of spectral and spatial information.
This research was led by Professor Aydogan Ozcan, the associate director of the California NanoSystems Institute (CNSI) and the Volgenau Chair for Engineering Innovation at the Electrical and Computer Engineering (ECE) department at UCLA, along with Professor Mona Jarrahi, the director of the Terahertz Electronics Laboratory at UCLA. The other authors of this work include graduate students Jingxi Li, Deniz Mengu, Yi Luo, Xurong Li, Muhammed Veli, post-doctoral senior researcher Dr. Nezih T. Yardimci, Adjunct Professor Dr. Yair Rivenson, all with the ECE department at UCLA.
Link to paper: https://advances.sciencemag.org/content/7/13/eabd7690