Presentation Summary on Text Captioning, Background Noise, Hearing Loss, and Speech Memory

Summary

The research article titled “Text Captioning Buffers Against the Effects of Background Noise and Hearing Loss on Memory for Speech” by Brennan R. Payne, Jack W. Silcox, Hannah A. Crandell, Amanda Lash, Sarah Hargus Ferguson, and Monika Lohani tested the following research hypothesis: “The presentation of realistic, assistive text-captioned speech will help offset the effects of background noise and hearing impairment on measures of speech memory involving immediate recall and delayed recognition memory accuracy, and delayed recognition confidence.” There were two sets of independent variables. The first set were the captioning variables involving “without text-captioning” and “with text-captioning.” The second set were the noise level variables involving “quietness,” “speech-to-noise ratio (SNR) of +7 dB,” and “SNR of +3 dB.” By putting those sets of independent variables together, the research design became a 2 x 3 factorial within-subjects design, which is 2 (caption vs. no caption) x 3 (no noise vs. +7 dB SNR vs. +3 dB SNR). The dependent variables, also the measured variables, were “immediate speech recall accuracy,” “delayed sentence recognition memory accuracy,” and “sentence recognition confidence.” There was a rationale for why the researchers tested these independent and dependent variables.

One of the reasons they tested these variables is because sensorineural hearing loss is increasingly becoming more common amongst the public. It has become the third most prevalent chronic medical condition in older adults. This type of hearing loss makes it difficult to understand speech and it affects speech comprehension and memory when conversing and listening. The second reason they tested these variables is because this type of hearing loss within a noisy environment can add an additional cognitive load by affecting successful memory encoding. A good way to empathize with people with hearing loss who have trouble with memory encoding of speech in noisy environments is to consider a study by P. M. Rabbitt titled “Channel-capacity, Intelligibility and Immediate Recall” from 1968.

Rabbitt presented the first part of a word list of spoken digits to young adult listeners who had no hearing loss without background noise. Then introduced background noise upon presenting the second part of the word list. After the presentation was done, memory recall for the first part of the word list was poorer than memory recall for the second part of the word list. This showed that, despite distorted speech affecting speech understanding of the second part of the word list and undistorted speech not affecting speech understanding of the first part of the word list, the noise interfered with the cognitive processes required for successful memory encoding of the first part of the word list. In other words, while the first part of the word list was hanging around in a person’s short-term memory ready to be stored in the working memory store of that person’s short-term memory for later consolidation into long-term memory, the person was distracted by trying hard to understand the second part of the word list, to the point, that opportunities to encode the first part of the word list were lost. While this shows the difficulty people with hearing loss have in noisy environments, there was another helpful solution in the research literature that showed some benefit in dealing with noise that also served as a fourth reason for testing the variables in the research study on text-captioning.

It has been found in the research literature that speechreading, also known as lip-reading, has helped improve speech comprehension in a noisy environment. Orthographic cues from speechreading simultaneously presented with text was found to provide a secondary direct channel for word recognition. In other words, if an assistive visual representation like speechreading can help people, then how much or how much more could text-captioning benefit in resolving difficulties with speech understanding? This also goes with the need to help expand the research literature on this subject, which is the fifth reason for testing the variables in the research study.

Expanding a limited research literature on text-captioning can help provide a better understanding of the cognitive mechanisms of the benefits of text-captioning. For example, the researchers of the text-captioning article state, “Text cues may modulate the perception of degraded speech” but “it is less clear what downstream benefits this improved clarity has on adults’ subsequent speech memory.” Finally, expanding the research literature on text-captioning can help validate and confirm the ecological validity of studies that were shown to have a generalizable, real-world application such as a study titled “Real-time Captioning for Improving Informed Consent: Patient and Physician Benefits” where real-time text-captioning was found to help patients retain more information during simulated informed consent. Now, let’s look at how the main research study on text-captioning was conducted.

It was split into two experiments where the research hypothesis was tested on young adult listeners without hearing loss in the first experiment and a “cohort of older adults with a wide range of hearing acuity in the second experiment.” A sample size of forty-eight young adults participated in the first experiment, except, two subjects had to be excluded for further analysis because they did not complete the entire protocol. A sample size of thirty-one older adults participated in the second experiment. However, due to their wide age range from 61 to 80 years of age, variation in hearing loss and acuity levels, and inconsistent, nontransparent reporting of overall hearing ability, a hearing assessment, vision screening, cognitive assessment, and auditory control task had to be conducted.

It was found that all the subjects of older adults had on average slight hearing loss of 16-25 dB to mild hearing loss of 26-40 dB. None of them showed any signs of dementia or Alzheimer’s. All of them were found to be able to see the captions well enough without any difficulty. And they showed the ability to understand speech clearly enough at a certain threshold in the noise environments, which were the noise environments with an SNR of +3 dB and +7 dB. Despite having different types of subjects, both experiments were conducted in the same manner. Ninety propositionally dense sentences each containing eighteen words in length covering diverse topics in science, nature, and history were presented. They were presented in two blocks of forty-five sentences with one block presented without text-captioning and the other one with text-captioning. Both blocks were presented in the three noise environments of no noise and the two SNR levels of +7 and +3 dB. Data was collected for immediate speech recall by having the subjects verbalize aloud as much of a sentence they remembered 5,000ms (i.e., 5 seconds) after the whole sentence disappeared from the screen 1,000ms (i.e., 1 second) after the offset of the final word of the sentence. Then for delayed recognition memory recall, they were instructed to indicate whether they heard a particular sentence from a set of forty-two sentences after each block of forty-five sentences was administered. Finally, for recognition memory confidence they were to rate their confidence in their response on a Likert scale of 1 to 5 (i.e., 1 = no confidence and 5 = complete confidence). The data collected led to the discovery of certain findings.

Based on the statistical data there were significant main effects of only “Caption” and of “Caption” and “Noise,” and significant interactions of “Caption” x “Noise” from the data collected in both experiments for immediate speech recall, delayed recognition memory recall, and memory recognition confidence. Moreover, there were statistically significant interactions between “Caption” x “Noise” x “Hearing Level,” “Noise” x “Hearing Level,” and “Caption” x “Hearing Level” in experiment two. Based on these findings, the researchers concluded, “Text captions improved not only immediate recall, but also long-term memory outcomes in younger and older adults [in both experiments] and [reduced] both the effects of increased background noise [in both experiments] and hearing loss [in experiment one].” While the benefits from captions was greatest for older adults with hearing loss, it also provides broad benefits to literate adult listeners with or without hearing loss. Despite the positive findings, the exact reason for why text captions yield improvements in speech memory cannot be gleaned from the results.

To conclude, future research could examine how text and speech information is utilized in real-time during encoding of text and speech through electroencephalography, EEG, or eye-tracking. Other future research could investigate the role intermodal asynchrony plays on the benefits provided by text captions and also test subjects with severe-to-profound hearing loss involving young and older subjects.