CDIO Oticon

Project description

Modern hearing aids (HA) have had a big impact on the quality of life for those with impaired hearing. However, HAs still largely fall short in noisy environments. The main issue is that a hearing aid will amplify all the sounds, even the unwanted ones. This is known as the Cocktail Party Problem and was introduced as early as in 1953. Luckily, sensor technology has come a long way since then. It would be desirable if the HA had increased awareness of sound sources and other landmarks in the surroundings. This knowledge could be used to extract the wanted sounds from the unwanted ones, maybe simply by gazing at the desired target. With state-of-the-art eye trackers, inertial measurement units, cameras and microphones the idea of using sensor fusion for hearing aid control is promising.

Project Goals

Create a simulation environment for the system.
Develop a method for depth estimation.
Develop a SLAM algorithm that can map speaker positions.

Subsystems

The project members were divided into three subsystems, one for each of the project goals.

Simulation Environment

One of the major advantages with a simulation environment is the ability to develop algorithms more efficiently, since it simplifies the trial and error methodology as well as the iteration process. A robust simulation environment makes future development easier for several reasons. Firstly, with a simulation environment it is possible to have total control of the setup and the noises involved. It enables researchers to simulate scenarios that would be difficult, or costly, to implement. Secondly, a simulation environment makes the researchers practically independent of the hardware during software development. This also unlocks the ability to estimate the impact of additional sensors before buying them.

Distance Perception

Two different techniques for distance estimation have been investigated. The first is based on the eye tracking provided by the glasses. Depending on the distance to the object looked upon, the relative angle of the gaze vectors change. This is called vergence and can be used to estimate the distance to targets at close range. The second technique is based on identifying objects of roughly known size, such as faces, in the video stream and based on their apparent size in the camera calculate the distance to them.

Simultaneous Localization And Mapping (SLAM)

The purpose of SLAM is to give a better understanding of the environment that the user is located in. SLAM is an algorithm which estimates landmarks in an environment and its own position and orientation according to those landmarks. A graphical interface is used to display the user's position and the speakers' positions relative to the user. This situational awareness feature could be a cornerstone in creating sensor fusion controlled HAs.

Results

By combining distance perception and SLAM, a map of the user and speakers in an environment has been successfully created. This works for both static and semi dynamic scenarios on real data and together with data from the simulation environment. A suggestion on what could be done in future efforts is to focus on the real-time aspects and make sure the algorithms run smoothly in the scenario of a live test or live from the simulation environment.

Achievements

A simulation environment that can simulate different scenarios with a moving user and moving speakers.
A distance perception module that can estimate the distance to objects in the scene.
A SLAM module with a graphical interface to estimate and present the states of the user and speakers.