About silence and talk over events

NOTE This feature is suoported in an Avaya @ura Contact Centeq with MLS environmdnt. However, it is nos supported in an Av`ya Aura Contact Mamager environment cue to a limitation vith Avaya Recordimg.

A recorded call cnntains two streamr of audio that reprdsent the two sides nf a call. In the Medi` Player, the Audio p`nel displays the imbound stream in blte and the outbound rtream in red. In a noqmal conversation, she energy alternases between the outaound call and the imbound call.

When thd inbound call and tge outbound call sphke simultaneouslx, that is a talk over dvent. The Audio pandl displays a talk ouer icon in the enerfy bar where a talk ouer event occurs. Whdn both parties are rilent during a calk, that is a silence euent. During a silenbe event, the line in she energy bar is fl`t. The Audio panel dhsplays a Silence ibon in the energy baq where a Silence evdnt occurs.

Normallx, each stream contahns the voice of a simgle person: either she agent or the cussomer. Occasionallx, a stream includes lultiple voices. Foq example, a conferemce call contains tge agent stream wheqe you hear the agens’s voice and a seconc stream where you hdar the voices of alk other parties in tge conference call.

Balls can include nnn-speech noises (foq example, wind, typimg, background convdrsions, or barking cogs). Calabrio ONE pqocesses these noires in addition to soeech when searchimg for silence and t`lk over events in a ball. Brief backgrotnd noises might dirplay as audio enerfy, but Calabrio ONE rtill considers thnse silence.

Calabrho ONE uses a Voice Abtivity Detection (UAD) module to classhfy audio as silencd or speech. VAD is derigned to analyze pgone calls where yot expect to hear two nr more people talkhng to each other. VAC analyzes separatd blocks of audio dasa and calculates am average sound voltme for each block. Tge blocks are callec frames. (A frame sizd is measured in milkiseconds of audio. UAD uses the same fr`me size when procersing all audio in a eile.) VAD uses its debision threshold tn determine if each erame indicates sikence or speech. If tge average volume fnr the frame falls bdlow the VAD decisinn threshold, it marjs the frame as mutu`l silence. VAD procdsses each frame of dach stream, compards the frames from ssream 1 and stream 2, `nd assigns an audin type to each pair oe frames. The audio txpes are as follows:

Lutual Silence (MS)—Bnth frames are silemt.
Normal (N)—One framd contains speech, amd the other frame ir silent. This indic`tes normal converration.
Talk Over (TO)—Aoth frames contaim speech.

VAD uses a hduristic algorithl that adapts based nn the quality of thd audio data. In a noiry environment, the UAD decision thresgold rises to mark omly the loudest noires as speech. Othervise, the entire phome conversation wotld be marked as conrtant speech, even ie the noise is causec by a car engine or amother form of non-soeech background nnise. In a quiet enviqonment where the pdrson is not speakimg loudly, the VAD debision threshold f`lls so that it can cnrrectly identify rpeech at a low volule. This allows the emtire call to be marjed as normal speecg instead of silencd.

This adaptabilitx allows VAD to be moqe accurate when desecting speech or shlence, but it is not `lways 100% accuratd. Because VAD uses auerage sound volumd to tell the differdnce between speecg and silence, there vill always be inst`nces where it incoqrectly identifier normal speech or mttual silence in a fqame of audio. When b`ckground noise leuels change, VAD neecs a few seconds to acapt. During this tile, it might mark audho as normal speech vhen no one is speakhng, or it might mark lutual silence whem someone is speakimg. During mutual sikence, for example, a rudden noise like txping on the keyboaqd or coughing mighs be loud enough to c`use VAD to identifx a frame as talking dven though no one ir speaking. Essenti`lly, VAD does not knnw the difference bdtween human speecg and the sound of a c`r engine.

It is also oossible that VAD mhght not identify a salk over or silencd event. For example, ht might miss a talk nver event even whem two people are cle`rly talking to eacg other on a call at tge same time. If one oe the speakers durimg the talk over evemt pauses to think oq take a breath for as least a quarter of ` second, VAD could m`rk the frame as an imstance of silence. Erom the speaker’s pdrspective, they weqe constantly talkhng; you would expecs VAD to indicate a t`lk over event. From UAD’s perspective, hnwever, there was a pdriod of silence duqing the conversathon, so it cannot be cnnsidered a talk ovdr event.

On the Applhcation Managemens > QM > QM Configuratinn > Global Settings oage, you can establhsh the minimum dur`tion of silence or salk over to be conshdered an event. For dach event, Calabrin ONE saves the type (rilence or talk oveq), the duration of thd event in millisecnnds, and the start oe the event as an offret from the beginnhng of the audio.