Speech intelligibility and room acoustics of a bar approached by an impulse response measurement
This experiment was done as a student’s admission for the class Signals and Systems 2 at Royal Conservatory, The Hague in February 2012. It was executed by Kathrin Grenzdörffer, MA 1st year.
There is a bar where workers sometimes claim it is difficult to communicate among each other or to understand orders by clients. This is said to especially occur when more than 20 people are in the bar and the music is at a low level. Then the room seems to be filled with constant mumbling and chattering. Speech sounds of guests sum up but intelligible communication can be difficult. To drown the first effect, in most cases the music gets turned louder. Acoustically spoken this sometimes leads to a vicious circle as guests start to talk louder and then again the music is adjusted and so on. Plus, when the music is turned rather loud, the room starts to resonate and low frequencies get blurry.
To give an indication of what exactly may cause these impressions, an acoustic measurement was conducted in the very room. Therefore, an impulse response was measured employing forward and reverse sweeps. Salient reflections are related back to actual features of the space itself. Speech intelligibility is considered to be a desirable feature of the space as clients as well as the workers prefer to listen and speak effortlessly.
Due to a) a long reverberation time for frequencies below 200 Hz the room has a low-pass filter characteristic and b) suffers from a lack of speech intelligibility.
In other words, the findings will have to show significant reflections in the lower part of the spectrum. For the range essential to speech transmission (200 — 3600 Hz)* the findings will have to reveal a significantly irregular amplitude response.
The room is assumed to be a linear system.
*The frequency range used in telephone lines suggests that human speech is sufficiently comprehensible when transmitted between 200 and 3600 Hz. In the following sections I will consider this range to be essential.
- microphone: Shure SM 58
– passive loudspeaker*: self-made by EWP Koninklijk Conservatorium
– amplifier: AKAI AM-U110
– interface: Focusrite Saffire Pro 8
- computer: MacBook Intel Core 2 Duo
* specifications were not available. single membrane diameter 17 cm, height 35 cm, diagonal top 32 cm, metal grid in front of membrane, body of wood
- Logic Pro 9 to record and save as WAV format
– Praat version 5.3.03 to generate sweeps, inverted sweeps, to perform convolution, and to do spectral analyses
The sweep in Praat was generated using the following script provided by Peter Pabon (2012).
Create Sound… ExpSweep 0 periodT 44100 exp(multFact*0.5*(x-periodT))*sin(2*pi*startFreq/multFact*exp(multFact*x))
Consequently, the initial sweep sounds like this:
Convolved with itself being inverted, a pulse can be obtained with a flat amplitude spectrum.
The bar is situated at a T-shaped crossroads with cars, trams and passengers passing from two sides.
schematic floor plan and the three locations of measurement
schematic slice of the location
To compare different situations for acoustic communication, three different measurements were performed. In two of them speaker and microphone served as dummies for human communication — the loudspeaker taking the place of a person talking and the microphone mimicking the ears of a listener. The distances between the two were chosen according to situations in real-life communication.
A third measurement was done to provide an overall account of the acoustics in the room and to compare it to the other results. It was performed at the longest possible distance between microphone and speaker.
(1) As the venue is situated at a crossroads on the way to Rotterdam main station, cars and trams frequently pass by. This makes the room literally vibrate, which might be a considerable bias for the results. Yet to the student’s knowledge the measurements were taken only when it was silent on the street.
(2) Cooling devices had to stay switched on during the measurement. As this white-noisy sound appeared to be permanent, it was neglected as a possibly impairing factor for the linearity of the system.
(3) Despite my tests prior to the measurement, during the conduction of the experiment distortion occurred in the loudspeaker at 4.19 s of the initial sweep. This corresponds to a frequency band of 110 Hz — 220 Hz. Thus, the results for this frequency cannot be taken into account.
results and analysis
The exponential sinusoidal sweep as recorded with the help of a microphone was convolved with the initial sweep being reversed. This resulted in an impulse-like signal that contains the frequency characteristics of the room.
To elaborate the goals that were mentioned earlier in the text, an analysis of a frequency range between 5 to 12000 Hz is sufficient to describe the system.
The recordings and spectra of the impulse responses sound and look as follows.
1 The “Do it now” — scenario
This is a typical situation how workers communicate. One of them would stand on the left next to the cash box, turning his head towards his colleague in the kitchen (on the right in the picture). Often when the music is on, the person in the kitchen can hardly hear what is being said.
1.1.1 sound file of recorded sweep
1 recorded sweep
1.1.2 recorded sweep visualized
1.2 impulse response
1.2.1 sound file
1 impulse response
1.2.2 visualization of the impulse response
The longest reverberation time is 40 ms at 3240 Hz, the second longest of 35 ms at 3920 Hz.
1.2.3 spectral slice of impulse response
Significant in this picture is a slope starting at around 4400 Hz and decreasing continuously.
Peaks in the lower frequency band are at 110 Hz, 238 Hz but they have to be neglected in the further proceedings as they are very likely a result of the speaker’s distortion. A real peak can be found at 550 Hz. Between 2040 and 2900 Hz a dip is noticeable, being followed by a remarkable attenuation until 4400 Hz. The attenuation corresponds to the longest reverberation time which is 3240 Hz.
2 The “Repeat or I’ll kneel down” — scenario
This setup simulates communication between a waiter and a guest. In real-life situations (when the music is on), every so often waiters either ask clients to repeat their orders because the waiters could not hear it or the staff immediately crouches in front of the table while listening to the order being put.
2.1.1 sound file of recorded sweep
2 recorded sweep
2.1.2 recorded sweep visualized
2.2 impulse response
2.2.1 sound file of impulse response
2 impulse response
Again, a long reverberation time is noticeable between 2500 and 6900 Hz, reaching a peak of 0.54 seconds at about 3500 Hz.
2.2.3 spectral slice
At 84 and 282 Hz peaks occur which perfectly correspond to the microphone’s distance to ceiling and table. As the speaker distorted, peaks at 135 Hz and 166 Hz cannot be analyzed. A dip is noticeable between 1800 and 3080 Hz with an estimated center frequency of 2600 Hz. Frequencies between 300 and 3600 Hz get transmitted very well. A slope for high frequencies occurs at 5055 Hz, with an amplitude dropping from 5.9 dB continuously until — 44 dB at 11900 Hz.
3 — Centered position
For the centered set-up of the microphone it was put in a place with a distance as equal as possible to all the walls.
3.1.1 sound file of recorded sweep
3 recorded sweep
3.1.2 recorded sweep visualized
3.2 impulse response
3.2.1 sound file
3 impulse response
At a band of 2800 Hz to 4150 Hz the longest reverberation time can be found, most prominent is 3180 Hz with a length of 70 ms.
3.2.2 spectral slice
High sound pressure can be stated for 156, 176 but it will be neglected for the reasons mentioned above. True peaks are at 497, 1111, and 1146 Hz, and another one standing out at 3344 Hz. Again, a dip occurs between around 2000 and 3000 Hz. The band of 2900 to 4100 Hz is very present in the spectrum. Again a slope shows at 5100 Hz decreasing from 20 dB to — 24 dB at 11000 Hz.
No significant differences can be stated between the three measurements. Remarkable is only the short reverberation time in the first example.
All frequency responses obtained are not flat and has a clear low-pass characteristic. Because of the speaker distorting exactly in that range plus a rather flat response for the other lower frequencies, it is difficult to tell which architectural feature may cause this effect most prominently. At least measurement N° 2 implies that the ceiling and tables are reflecting surfaces. One peculiarity can hardly be given any account for. It is still inexplicable to the author why the longest reverberation time of all is at 3240 (wavelength 10 cm), 3500 (9.8 cm) and 3344 Hz (10.2 cm).*
* Remark (22 February 2012): The “inexplicable … reverberation time[s]” might stem from about 60 wine glasses of four different sizes which could work as a resonator. They are hanging openly behind the bar, which can be partly seen on the photograph of the first measurement.
Apart from attenuating low frequencies, in all three measurements the given features of the room suppress 2000 — 3000 Hz and favor a band between approximately 3000 and 4000 Hz.
With a total reverberation time between 41 ms (N°1), 41 ms (N°2) and 63 ms (N° 3) the bar appears to have a rather long reverberation time with the first two values being just above Intimacy level. The third value implies difficulties for having a conversation.
This is good evidence to hold the first part of the hypothesis but repeat the measurement with a better speaker.
implications for speech
Even though a filtering effect was shown for low frequencies they do not get as much attenuated as expected. This might be due to flawy hardware that was used to record. On the other hand, possibly, by staff in the kitchen low frequencies are experienced to be louder than they are for guests because kitchen staff work directly underneath one of the bar’s loudspeakers emitting music.
A proper measurement of the speech transmission index STI would have led to more reliable results. The Signal-to-Noise-Ratio would have been an important measure to draw valid conclusions. Still, the transfer of the band 200 — 3600 Hz is not equal for all frequencies. Generally speaking, the results suggest that fundamental frequencies of speech are relatively well represented by the room, whereas frequency content of around 2000 — 3000 Hz get suppressed. The latter leads to damping in the upper formant structure of certain vowels, as shown below.
source: http://www.phonetik.uni-muenchen.de/studium/skripten/SGL/V_FTab.jpg on February 21, 2012. Emphasis mine.
Additionally, when considering the transfer for high frequencies one must state that steep slopes which appear at ~ 5000 Hz are certain to influence speech transmission negatively. Even though the common frequency band for telephony which was considered here works fine for its purpose, it is not flawless. In telephony, especially plosives with a lot of high frequency content are not represented well and cannot always be distinguished, which is probably the case for my system likewise.
Hence, the impression that speech becomes difficult to understand can only be true for a certain frequency content, suppressing only the band between 2000 Hz and 3000 Hz and above 5000 Hz.
Frequencies of approximately 3000 — 4000 Hz get significantly attenuated in all three cases. Nevertheless, it cannot be guaranteed that this is an actual characteristic of the room as it might have also been caused by the speaker’s membrane when it distorted.
In Room Acoustics the measure of Deutlichkeit (Brüderlin, 1995), usually referred to as Definition, describes the amount of signal content during the first 50 ms after excitation of the system. Any reflection occurring during that time is perceived as being part of the first wavefront and therefore, attenuating the signal and not being an echo. The results for my system seem satisfactory — For the first two measurements a Definition of 100 % is given, for the third one it is estimated to be more than 95 %.
Finally, the transmission of speech is less dramatically impaired than expected and therefore, the second part of the hypothesis has to be at least partly declined. Only further research could bring clarity.
- Brüderlin, R.: Akustik für Musiker. Kassel: Gustav Bosse Verlag. 3rd ed. 1995.
– Meyer, J.: Akustik und musikalische Aufführungspraxis. 5th ed. 2004.
– Tempelaars, S.: Signals and Systems. Garland Pub. 1996.
http://www.sengpielaudio.com/calculator-wavelength.htm last time on 21 February 2012.
http://kc.koncon.nl/staff/pabon/IRM/IRMeasurementInstruction/assignment_IR_ExpSweep_MainPage.htm and all linked sites last time on 21 February 2012.
http://www.phonetik.uni-muenchen.de/studium/skripten/SGL/V_FTab.jpg last time on 21 February 2012.
http://maps.google.com/maps?q=Van+Oldenbarneveltstraat+139&hl=de&ll=51.918915,4.472637&spn=0.000931,0.001725&sll=37.0625,-95.677068&sspn=39.099308,56.513672&hq=Van+Oldenbarneveltstraat+139&radius=15000&t=h&z=19 last time on 21 February 2012