Clear Speech Considerations for the Deaf and Hard-of-Hearing

Grant Powell
School of Behavioral and Brain Sciences, University of Texas at Dallas
ACN 6763 Speech Perception
Professor Peter Assmann
October 27, 2022

Introduction

In everyday verbal communication, the fundamental goal amongst us, humans, is and has always been to find a way to communicate with each other in the clearest way possible. We try to achieve this goal from both the perspective as a listener and speaker. From the perspective of the speaker, we may try to communicate our message through our physical capabilities of how we manipulate the vocal tract or how we apply our knowledge of the overall structure of the language we speak with the utmost confidence that our listener is able to speak and understand the same language. From the perspective of the listener with that same confidence, we may try to communicate with the speaker by listening for certain vowel sounds, consonant sounds, pauses, spacing between words, sentence structure, or changes in pitch to indicate whether a question is being asked. However, no matter what language we speak and how much we improve it to communicate clearly, clear communication, overall, is still fundamentally dependent on our physical faculties of how we utilize our vocal tract and how our auditory system is functioning.

Researchers have identified this goal of achieving clear communication through how we use our vocal tract and auditory system as “clear speech.” This type of speech is very much necessary in everyday communication settings where there is background noise and conversations involving listeners who are deaf or hard-of-hearing (Bradlow & Smiljanic, 2009). Since the focus of clear speech that is being investigated is based, primarily, on how the English language is used, the modifications that speakers have been observed of making are speaking slowly, speaking loudly, articulating in a more exaggerated manner, speaking with a higher voice pitch, and employing a more variable voice pitch (Bradlow & Smiljanic, 2009; Ferguson & Quene, 2014). The way that researchers have determined to measure clear speech is through global and segmental measurements.

Global measurements involve measuring speaking rate, pause frequency and duration, fundamental frequency average and range, long term spectra (i.e., how the spectral energy is distributed over the course of what was spoken), and temporal envelope modulations (i.e., the changes in the amplitude and frequency of sound perceived by the listener over time) (Bradlow & Smiljanic, 2009). Segmental measurements involve measuring vowel formant changes (i.e., the transitions from the first formant frequency, F1, to the second formant frequency, F2, of a spoken vowel), vowel space (i.e., the area between vowel categories as defined by a two-dimensional area bounded by lines connecting F1 and F2 coordinates of vowels), segment duration, consonant-vowel ratio, voice onset time, short-term spectra (i.e., the spectrum of the speech signal at a particular point in time), sound insertion, and stop consonant burst elimination (Bradlow & Smiljanic, 2008; Berisha, et al, 2013). Based on these various measurements for analysis, researchers have determined certain features that make clear speech clear.

The features of clear speech that makes it clear are a wide range of acoustic and articulatory adjustments such as a decrease in speaking rate involving longer segments and longer and more frequent pauses, a wider dynamic pitch range, greater sound-pressure levels, more noticeably clear stop releases, greater root mean square intensity of the non-silent portions of obstruent consonants (i.e., release burst, frication, or aspiration), increased energy in the 1000-3000 Hz range of long-term spectra, and higher-voice intensity. But of all the acoustic features that make up clear speech, the one feature that stands out the most on a consistent basis is “vowel space expansion.”

Role of Clear Speech Articulatory Cues on Older Adult Hard-of-Hearing Listeners

According to the research literature, it is believed that vowel space expansion, or certain elements of it, may play a role in helping listeners with hearing loss improve their speech intelligibility. Global clear speech acoustic changes such as those mentioned earlier involving speaking more slowly, more loudly, with a higher voice pitch, and with a more variable voice pitch in clear speech are also accompanied by vowel modifications such as an expanded vowel space, greater dynamic formant movement, and longer vowel durations (Ferguson & Quene, 2014). In studies involving having speakers speak clearly in a clear speech condition by speaking as if they were speaking to someone who has difficulty understanding them, it has been found in listeners such as adults with sensorineural hearing loss and who wear cochlear implants that they benefitted from clearer speech for speech identification purposes (Ferguson & Quene, 2014). But as is the case in some other studies, every speaker’s idea of speaking clearly varies from speaker to speaker, especially when adding in multiple speakers.

There is not a consensus on how much vowel space expansion and certain characteristics of it improves speech intelligibility in listeners with hearing loss even though the benefit exists. For example, in a study by Ferguson (2012) involving listening to multiple individual speakers produce clear speech, it was found with a better signal-to-noise ratio (SNR) (i.e., -3 dB SNR for elderly hearing-impaired, EHI, listeners vs. -10 dB SNR for young normal hearing, YNH, listeners) that vowel intelligibility improved for EHI listeners. However, having only one speaker produce clear speech in a study by Ferguson and Kewley-Port (2002) provided no benefit for EHI listeners. There are a few reasons for this.

One reason is that of the three traditional vowel acoustic cues involving steady-state formant frequencies, dynamic formant movement, and vowel duration when identifying vowels, the relative importance of the cues from the perspective of the listener was different between the EHI and YNH groups (Ferguson & Quene, 2014). This is because the clear speech acoustic changes that benefit YNH listeners may not benefit EHI listeners, or most listeners with hearing loss, in general, since hearing loss alters the way in which acoustic cues are used to identify vowels (Ferguson & Quene, 2014). A second reason is that in comparison to the single speaker in the study by Ferguson and Kewley-Port (2002), 3 out of the 41 speakers from the database in the study by Ferguson (2012) had the same clear speech acoustic cues as the single speaker that only improved the vowel intelligibility of YNH listeners. This shows that it is important to be mindful through continued research and knowledge of which clear speaking style utilizing certain acoustic cues benefits people with hearing loss.

The study by Ferguson and Quene (2014) emphasizes that point by discovering that their results confirmed their first hypothesis, which is that the relationship between acoustic characteristics and vowel intelligibility in clear and conversational speech will differ between YNH and EHI listeners. This means that the hypothesis had expected that EHI listeners would depend more heavily on vowel duration acoustic cues than YNH listeners (Ferguson & Quene, 2014). This was not expected to be the case for all listeners with hearing loss, but mostly for older adults with hearing loss due to well-documented temporal processing deficits in older adults including age-related decline in cognitive processing of information that may make increased vowel duration a more helpful clear speech acoustic change for vowel intelligibility (Ferguson & Quene, 2014). The results of the Generalized Linear Mixed Modeling that confirmed the first hypothesis in the study by Ferguson and Quene (2014) showed that vowel duration, along with F1, had a stronger effect than F2 on vowel intelligibility in EHI listeners than for YNH listeners.

Longer duration was associated with better intelligibility in EHI listeners, mainly, for tense vowels (Ferguson & Quene, 2014). For high vowels, lower F1 values led to improved intelligibility in EHI listeners (Ferguson & Quene, 2014). Because hearing loss would make hearing vowel formants less audible and age was expected to play a factor in causing poor temporal processing and slower cognitive processing, the results for vowel duration improving intelligibility, specifically, for tense vowels confirmed that expectation (Ferguson & Quene, 2014). Mainly, because longer vowel duration helped EHI listeners by allowing more processing time for vowel identification and by increasing the temporal contrast between spectrally similar tense-lax pairs such as /i/-/ɪ/ and /u/-/ʊ/ (Ferguson & Quene, 2014). Moreover, EHI listeners’ intelligibility performance was better when high vowels had lower F1 values (Ferguson & Quene, 2014). With low vowels involving F1 values between 550 to 750 Hz, higher F1 values led to better intelligibility to the same degree for both EHI and YNH listeners (Ferguson & Quene, 2014). Along with F1 information and vowel duration when identifying vowels, EHI listeners did use F2 information, but in a different way than YNH listeners (Ferguson & Quene, 2014). Although it is generally agreed upon in English that F1 and F2 frequencies at steady-state, dynamic movement of these formants, and duration are three acoustic cues that mainly determine the identity of vowels, there were other indications from the analysis by Ferguson and Quene (2014) that other acoustic changes not captured by their measurements also contributed to influencing vowel intelligibility in EHI listeners.

Possible acoustic changes that may have played a role in influencing vowel intelligibility are voice quality, fundamental frequency, formant bandwidth, or other aspects of the spectral envelope. This calls for continued research on these changes to improve the overall knowledge and understanding of vowel perception and what makes vowels more intelligible in clear speech, in general, and to what extent, if at all, they play a role in improving vowel intelligibility in listeners with hearing loss. While the study by Ferguson and Quene (2014) focus more on older adults with hearing loss, let us look at listeners with hearing loss who are not older adults.

Role of Clear Speech Articulatory Cues on Younger Hard-of-Hearing Listeners

Berguson et al. (2015) examined vowel characteristics and the clear speech attribute of vowel space expansion in infant-directed (ID) and adult-directed (AD) speech by mothers on listeners with hearing loss who are children to determine which was clearer and beneficial. They found that ID speech demonstrated a more expanded vowel space area and dispersion when used for children with and without hearing loss than with AD speech (Berguson et al, 2015). The reason is that mothers produced more distinctive point vowels in ID speech for children with hearing loss more so than in AD speech (Berguson et al, 2015). This supports the researchers’ idea in terms of their research predictions that mothers produce vowels clearly when speaking to children with and without hearing loss (Berguson et al, 2015). This knowledge and the continued research in this area is important to know to determine the clear speaking style and acoustic cues that may be beneficial in the overall growth and development of a child with hearing loss.

Gathering the evidence found from these studies and others in the research literature by applying their findings to help prevent and treat language delays in children with hearing loss with continued research, would be a good next step forward (Berguson et al, 2015). This is needed because speech-language delays in this population are common, and the speech-language outcomes vary significantly especially for children with hearing loss who use cochlear implants (Berguson et al, 2015). The high variability in the deviations of speech and language outcomes could also be ameliorated by having speech-language pathologists use the findings from this type of research to also close the gaps in the quality of maternal speech input to children with hearing loss (Berguson et al, 2015). While it is important to expand research into this area to aid in improving treatment delivery for language delays in children with hearing loss, it is best to conduct research in the manner of the studies by Berguson et al (2015) by examining relatively large sample sizes of children with actual hearing loss instead of simulated hearing loss to help make the findings more generalizable to the population.

The applicability of the studies by Berguson et al (2015) also generated results that partially supports the hypothesis that point vowels in ID speech are produced in more phonologically contrastive articulatory positions than in AD speech, thus, supporting the hyper-articulation hypothesis of speech directed at children with hearing loss. This means that the way that mothers are providing ID speech may be facilitating language acquisition for children with hearing loss through the production of an expanded acoustic vowel space area and increased vowel space dispersion (Berguson et al, 2015). This is beneficial to children with hearing loss because vowel space modification, according to the research literature, has been linked to enhanced speech sound discrimination and word recognition. This finding from Berguson et al (2015), along with what they found as mentioned earlier, indicates potential benefits of applying therapeutic clinical interventions designed towards shaping vowel space characteristics of mothers’ speech towards children with hearing loss. However, more research is still needed in this area because most of the evidence from the research literature points out that the nature of phonetic changes in ID speech is complex considering that there is not enough consistent evidence purporting that such changes would benefit the learning of phonetic categories (Berguson et al, 2015). Berguson et al (2015) also found that for speech directed towards children with cochlear implants that there was an increase in F2 frequencies for /i/ and /a/ vowels relative to speech in groups matched on chronological age for the /a/ vowel and hearing experience for the /i/ vowel. But with hearing aids, speech directed towards children was represented by an increase in only the F1 frequency values for the /i/ vowels relative to the group matched on hearing experience (Berguson et al, 2015). This means that prosodic differences in prosody such as slowed rate and prosodic position could be playing a role between ID and AD speech and that more research assessing the influence of these factors on language acquisition in children with hearing loss, based on differences in the degree of hearing loss and types of assistive devices being used, should be investigated for future research (Berguson et al, 2015). However, the results of all these findings by Berguson et al (2015) would not have been evident without the help of assistive hearing devices such as cochlear implants (CI) and hearing aids (HA) that are designed for listeners with hearing loss to hear speech clearly.

Role of Assistive Hearing Technology on Clear Speech

The improved performance in speech intelligibility amongst listeners with HA and CI that is evident in the studies, as previously mentioned, very well could not have been possible without these assistive hearing devices that are designed to capture and send the temporal fine structure (TFS) and temporal envelope (ENV) of the speech signal’s temporal and spectral properties to the cochlea along its basilar membrane (Hong & Moon, 2014). Mainly, because the multiple acoustic cues that are used to interpret and understand speech in the human auditory system that have been pointed out, so far, are classified based on their temporal and spectral properties (Hong & Moon, 2014). Hong and Moon (2014) define the ENV as being characterized by the slow variation in the amplitude of the speech signal over time and the TFS as being represented as the rapid oscillations with a rate close to the center frequency of the band. Both pieces of information from the speech signal are represented to the human auditory system by the timing of neural discharges (Hong & Moon, 2014). They are also vital for speech perception in quiet and noisy backgrounds, especially the TFS, because it has been identified as being most important for pitch perception and sound localization based on experiments that conducted “chimeras,” which involves presenting the first sound representing the ENV and then the second sound representing the TFS to determine the relative perceptual importance of the ENV and TFS in different acoustic settings (Hong & Moon, 2014).

Another reason the TFS is important for pitch perception is because listeners with cochlear hearing loss, or who use CI, have poor pitch perception because they have trouble separating simultaneous sounds from each other in noisy environments based on their overall pitch qualities (Hong & Moon, 2014). This means that these types of listeners may run into situations where the fundamental frequency, F0, and its perceptual correlate, pitch, of simultaneous sounds may be perceived as being the same, or as a single entity (Hong & Moon, 2014). What separates the TFS from the ENV, performance-wise, from the perspective of the listener, is that, while the ENV is enough to improve speech intelligibility in quiet environments, it is not enough to fully perceive pitches where there is background noise (Hong & Moon, 2014).

This is because the ENV, by itself, is not enough to perceptually separate mixtures of sound and, as a result, the TFS is necessary for speech perception in noisy environments, especially fluctuating noise, where there are multiple speakers speaking at once (Hong & Moon, 2014). For example, in a study from the research literature, listeners with normal hearing showed improvement in speech perception with a simulated speech reception threshold of about 15 dB in a noisy environment when more of the TFS information was added (Hong & Moon, 2014). Thus, confirming the significant role that the TFS plays in a listener’s ability to identify speech in a noisy, fluctuating background (Hong & Moon, 2014). With this knowledge, researchers have found strategies of delivering the TFS information to listeners with cochlear hearing loss who wear either a CI or HA

To overcome the limitations such as reduced sensitivity to temporal modulation in electric hearing and changes in the repetition rate of the electric waveform above approximately 300 Hz that could not be processed in most CI users, especially considering that the TFS typically oscillates at a much higher rate, CI specialists have employed what is called a HiRes strategy that uses a relatively high envelope cutoff frequency and pulse rate to improve the delivery of the TFS information (Hong & Moon, 2014). They have also implemented a strategy that adds a frequency modulation (FM) signal by transforming the rapidly varying TFS information into a slowly varying FM signal and this has been found to improve sentence recognition performance in CI users by as much as 71% when exposed to babble noise (Hong & Moon, 2014). Now, with HA the TFS information is delivered through non-linear compression, which means that the gain applied to a signal is inversely related with the signal input level by making intense sounds less amplified than weak sounds (Hong & Moon, 2014). Although studies from the research literature suggested that slow compression was better, based on measures of listening comfort, and fast compression was better, based on measures of speech intelligibility, it has been found in HA users with good sensitivity to the TFS information that they may benefit more from fast compressions (Hong & Moon, 2014). This is because the TFS information may be better for listening in the dips of a noisy, fluctuating background and, in the dips, fast compression increases the audibility of signals (Hong & Moon, 2014). However, it has been found that speech intelligibility improved significantly with fast compressions than slow compressions, regardless of conditions, when testing normal-hearing listeners using vocoded signals (Hong & Moon, 2014). This means that the availability of the TFS information does not affect the optimal compression speed and, thus, should be confirmed, if it has not already, by testing listeners with actual cochlear hearing loss (Hong & Moon, 2014). 

Conclusion

Overall, qualities of vowel space expansion that are consistently found in clear speech such as vowel duration have been found to be an important acoustic cue for speakers to emphasize when speaking to older adult listeners with hearing loss. This is especially so when considering the cognitive and temporal processing declines that begin to occur as listeners with hearing loss enter older adulthood. Since there were other clear speech acoustic cues besides vowel duration that may have improved vowel intelligibility that were not measured, more continued research is needed to improve the overall understanding and knowledge of vowel perception. This can help determine exactly those unmeasured acoustic qualities that makes vowels more intelligible in clear speech that may benefit listeners with hearing loss. Further research in this area would also be a great benefit for young listeners with hearing loss.

Understanding the clear speech acoustic cues associated with vowel space expansion that are beneficial to young listeners with hearing loss can allow speech-language pathologists to better guide a mother’s speech towards her child to improve her child’s overall growth and development in speech and language. Finally, better understanding and knowledge gained from continued research in the clear speech acoustic cues such as vowel space expansion that benefits listeners with hearing loss can help CI and HA specialists continue to develop better strategies for delivering the TFS information to assistive hearing devices. This can allow listeners with hearing loss hear speech clearly by looking for those acoustic cues that make clear speech clear to improve their speech intelligibility.

References

Bradlow, A. R., & Smiljanic, R. (2009). Speaking and hearing clearly: talker and listener factors

in speaking style changes. Language and Linguistics Compass, 3(1), 236-264,

https://doi.org/10.1111/j.1749-818x.2008.00112x

Beresha, V., Liss, J., M., Utianski, R., L., Sandoval, S., & Spanias, A. (2013). Automatic

assessment of vowel space area. Journal of the Acoustical Society of America, 134(5),

https://doi.org/10.1121/1.4826150

Berguson, T. R., Burnham, E. B., Kondaurova, M., & Wieland, E. A. (2015). Vowel space

characteristics of speech directed to children with and without hearing loss. Journal of

Speech, Language, and Hearing Research, 58, 254-267.

Ferguson, S. H., & Quene, H. (2014). Acoustic correlates of vowel intelligibility in clear and

conversational speech for young normal-hearing and elderly hearing-impaired listeners.

Journal of the Acoustical Society of America, 135(6).

Hong, S. H., & Moon, I. J. (2014). What is temporal fine structure and why is it important?

Korean Journal of Audiology, 18(1), 1-7, http://dx.doi.org/10.7874/kja.2014.18.1.1