Disabled people should have the equal opportunities to be loved, to express their feelings and being understood, to get a job under a fair competition, etc. We strongly believe that technology should be used to help people and serve the community. Hence we would like to achieve a better and fair world by removing the handicap barrier for the disabled people.

Audio information is one of the essential mediums for communication. It links up people’s emotion and demonstrate expressiveness. In this project, we redesign the audio information representation so that the deaf people can understand and learn natural speech in other presentation formats such as visual or tactual form; the mute people can use synthesised expressive speech to communicate with other people; and the blind people can give accurate vocal instruction by expressive component removal.

Audio information retrieval (AIR) is a hot topic in recent years. The bottleneck of AIR research is that most current result is either not practical, nor not universal, nor not transformative. Our project is novel in a way such that the audio features are handled in a semantic approach, which the result can be used for many different applications. In our work, atomic and fundamental audio features carefully defined. These features are extracted from large quantity of real-world audio samples using data mining approaches, which provide a strong technological foundation. These features are then trained with machine learning approaches and used to define high-level audio descriptors. The semantic high-level audio descriptors will be used for comparison, visualisation, normalisation, or synthesis, and presented in a redesigned format where the disabled patient can use them easily. For example, to present the Thayer’s emotional model (as shown in Figure 1) of speech in other format. This can be done with a regression model which train audio parameters and perform classification as shown in Figure 2. Both accuracy and complexity will be taken care of. This will contribute to the field of Human-Computer Interaction (HCI), which the back-end technologies and our target end-user communities meet in the form of identifying suitable presentation modalities, interaction strategies, visualisation methods, and the affordances in various interactive platforms. Figure 3 shows an iPhone app called SoundMitate, which is PI Simon Lui’s recent work. It is a sound imitation game which users have to follow the sound visualisation guide and speak it out.

Figure 1. the Thayer’s emotional model. How to represent these emotions in visual/tactical format, such that the users find it easy to understand?

Figure 2. A 3D regression model to separate emotional speech, such that they can be easily represented in other different format. We added the Aggressiveness axis to the traditional Thayer’s model.

Figure 3. Screenshot of a sound visualisation game: SoundMitate. Developed by PI Simon Lui. (Photo credit: CNN International)