This project is a cooperation between Oticon AB, a company developing hearing healthcare solutions, and Linköping University, where a yearly CDIO project is researching within this field. This year, the project will continue developing the modules created by previous groups while researching new implementations within simulation technology, face analysis and sound filtering. New sensor information should improve the preciseness of the existing Simultaneous Location And Mapping (SLAM) module. View the results below or on the poster under Documents.
Background
Modern hearing aid controllers are severely challenged in certain environments.
Modern hearing aid devices are not passive amplifiers, but utilize sophisticated digital signal processing to improve hearing to a person with hearing loss. Such processing includes noise reduction and microphone directionality control, i.e. beamforming. However a complex problem in the research field is the so called "Cocktail party problem", which is encountered when multiple sound sources in the world present a complex mixture of sounds, requiring the brain to separate and focus on individual sources from the mix. This project aims to utilize advances in modern technologies such as eye tracking glasses and computer vision to gain understanding of the scene. It is conceivable that this information can be used in a hearing aid product to allow the user to focus on the relevant sounds.
The system currently uses an eye tracking system which aids the hearing aid controller. The system uses an IMU and image processing to estimate a user's head orientation. By comparing to a predetermined average face size the system can calculate approximate distances of people relative to the user. This approximate is compared with the distance calculated from the gaze vectors of the user to further improve the distance estimate. The information is then processed by a SLAM-module to give good approximate of the people surrounding the user. Expanding in these fields, this project aims to introduce a face mesh to extract more detailed information from the environment. While the current system operates on image analysis, introducing modules capable of sound analysis will be paramount for future improvements.
Parallel to this, a simulation environment has been created. It contains cylindrical movable characters with 2-d pictures pasted at the top to give a simple replica of a human. To aid analysis software which are to analyse both video and sound, the project will further develop the simulation environment to include sound. An easily operated GUI will also be introduced to the sim-env.
With a facemesh introduced, the system can now determine location and direction of people's faces in the user's field of view. New modules, based on sound and camera, was created which classifies whether a person is talking or not. The simulation environment now runs with a user friendly menu and with some sound-physics. A module which tracks features in the environment was created. All with the long term goal to better understand the environment and be implemented in the SLAM-module.