Microsoft Portable Speaker

0 views

Skip to first unread message

Mohammed Huberty

unread,

Aug 5, 2024, 11:19:43 AM8/5/24

to inagcoovi

Youprovide audio training data for a single speaker, which creates an enrollment profile based on the unique characteristics of the speaker's voice. You can then cross-check audio voice samples against this profile to verify that the speaker is the same person (speaker verification). You can also cross-check audio voice samples against a group of enrolled speaker profiles to see if it matches any profile in the group (speaker identification).

Speaker verification streamlines the process of verifying an enrolled speaker identity with either passphrases or free-form voice input. For example, you can use it for customer identity verification in call centers or contactless facility access.

Speaker verification can be either text-dependent or text-independent. Text-dependent verification means that speakers need to choose the same passphrase to use during both enrollment and verification phases. Text-independent verification means that speakers can speak in everyday language in the enrollment and verification phrases.

For text-dependent verification, the speaker's voice is enrolled by saying a passphrase from a set of predefined phrases. Voice features are extracted from the audio recording to form a unique voice signature, and the chosen passphrase is also recognized. Together, the voice signature and the passphrase are used to verify the speaker.

Text-independent verification has no restrictions on what the speaker says during enrollment, besides the initial activation phrase when active enrollment is enabled. It doesn't have any restrictions on the audio sample to be verified, because it only extracts voice features to score similarity.

Enrollment for speaker identification is text-independent. There are no restrictions on what the speaker says in the audio, besides the initial activation phrase when active enrollment is enabled. Similar to speaker verification, the speaker's voice is recorded in the enrollment phase, and the voice features are extracted to form a unique voice signature. In the identification phase, the input voice sample is compared to a specified list of enrolled voices (up to 50 in each request).

Speaker enrollment data is stored in a secured system, including the speech audio for enrollment and the voice signature features. The speech audio for enrollment is only used when the algorithm is upgraded, and the features need to be extracted again. The service doesn't retain the speech recording or the extracted voice features that are sent to the service during the recognition phase.

You control how long data should be retained. You can create, update, and delete enrollment data for individual speakers through API calls. When the subscription is deleted, all the speaker enrollment data associated with the subscription is also deleted.

An AI system includes not only the technology, but also the people who use it, the people who are affected by it, and the environment in which it's deployed. Read the transparency notes to learn about responsible AI use and deployment in your systems.

I use a set of Bose Quite Comfort 35 II headphones for work and have been using them for a couple of years. I recently got a new HP laptop and everything was working fine for about a week, but now when I am in a Teams meeting my headphones will no longer work for a speaker. I can set it such that my headphones are the mic and my laptop speakers are in use, but if I change the speaker to the headphones there is no sound. The headphones do work fine as long as I am not in a Teams Meeting. I was also able to get them to work on a test call, but then called one of my co-workers and couldn't get it to work again. I've attempted rebooting the computer, the headphones, and even re-paired the headphones to the computer with no change. I did have a strange message pop up asking if this was a personal conference call, but it went away before I could even tell which application was asking it (I think it was something from HP, but I am not 100% sure).

I want to do a project of speech-to-text analysis where I would like to 1) Speaker recognition 2) Speaker diarization 3)Speech-to-text. Right now I am testing various APIs provided for various companies like Microsoft, Google, AWS, IBM etcI could find in Microsoft you have the option for user enrollment and speaker recognition ( -api.net/docs/services/563309b6778daf02acc0a508/operations/5645c3271984551c84ec6797)However, all other platforms do have speaker diarization but not speaker recognition. In speaker diarization if I understand correctly it will be able to "distinguish" between users but how will it recognize unless until I don't enrol them? I could find only enrollment option available in azure

But I want to be sure so just want to check here maybe i am looking at correct documents or maybe there is some other way to achieve this in Google cloud, Watson and AWS transcribe. If that is the case can you folks please assist me with that

Diarization is the process of separating speakers in a piece of audio. Our Batch pipeline supports diarization and is capable of recognizing two speakers on mono channel recordings.When you use batch transcription api and enable diarization. It will return 1,2. All transcription output contains a SpeakerId. If diarization is not used, it will show "SpeakerId": null in the JSON output. For diarization we support two voices, so the speakers will be identified as "1" or "2". -docs/blob/master/articles/cognitive-services/Speech-Service/batch-transcription.md

Ex: In a call center scenario the customer does not need to identify who is speaking, and cannot train the model beforehand with speaker voices since a new user calls in every time. Rather they only need to identify different voices when converting voice to text.

You can use Video Indexer supports transcription, speaker diarization (enumeration), and emotion recognition both from the text and the tone of the voice. Additional insights are available as well e.g. topic inference, language identification, brand detection, translation, etc. You can consume it via the video or audio-only APIs for COGS optimization. You can use VI for speaker diarization. When you get the insights JSON, you can find speaker IDs both under Insights.transcript[0].speakerId as well as under Insights.Speakers. When dealing with audio files, where each speaker is recoded on a different channel, VI identifies that and applies the transcription and diarization accordingly.

Microsoft is now shipping a handful of "modern" accessories designed to enhance your work life, whether that be at home or in the office. We've already reviewed Microsoft's Modern Headsets and Modern Webcam, and now we're reviewing Microsoft's Modern USB-C Speaker designed for Teams conferencing.

The Modern USB-C Speaker is a small, portable conferencing device designed to enhance the Teams calling experience with better audio and dedicated controls for answering calls and controlling volume. It's small enough that it comes with a carrying case included, protecting it from the elements when travelling.

I've been using the Modern USB-C Speaker for the last month in many of my daily work meetings, testing all its functions and capabilities as well as testing how good the microphones are. Here is my review!

Bottom line: The Modern USB-C Speaker from Microsoft is a simple, stylish, and portable conferencing device built for Teams, but is missing some extra functionality such as Bluetooth.

Straight out of the box, the first thing you'll notice when setting it up is the compact nature of this conferencing speaker. It's small enough to be palm-able, and its included carrying case means it's super easy to throw into a bag to bring with you on a busy meeting day. In fact, the included carrying case is really nice. It's strong, feels good, and will protect the speaker from sharp objects when on the go.

The speaker itself is well designed, featuring a mesh covered fabric around the outside, complete with a rubberized bottom and small control panel with buttons on the top. The rubberized bottom has a lip that houses a USB-C cable that unwinds when you need to plug the speaker into a computer. When the cable isn't in use, it's hidden from view. Pretty great!

The buttons on the top feature a dedicated Teams quick-launch button, a button to answer incoming calls, a volume up and down button, and a mute/unmute button. The Teams and mute/unmute buttons light up when pressed, with the mute button turning red when mute is enabled. The buttons are easy to press with satisfying clicky feedback.

Now, onto the meat and potatoes. How is audio quality? Let's start with the listening experience, which I would call pretty great for its size and purpose. It features a 50mm speaker that honestly surprised me with its bass output. Don't get me wrong, it's not going to rock your world, but for a conferencing speaker this thing has a bit of punch to it.

This as a result delivers a great listening experience that sounds surprisingly rich, compared to many other conferencing devices which often skimp out on audio quality because most conference calls are using a low-quality bitrate or mic setup anyway. Because of how good the speakers are, you can get away with listening to music using this thing, though I wouldn't buy it just for that.

On the flipside of the audio experience, how are the mics? Microsoft says the Modern USB-C Speaker has two omni-directional speakers which capture sound from all around pretty well in the medium to large meeting room setups we tested. Additionally, the mics have built-in noise reduction, which should aid in filtering out noises such as air conditioners, projectors, and other meeting room appliances.

The accompanying accessory app allows for a couple of settings to be tweaked. You can configure the mute button to act as a toggle switch for unmuting your mic, good for people who prefer a "push to talk" setup. You can also turn on and off the prompt tone, which is a sound that plays when you mute and unmute the mic.