Watson Speech Recognition - Custom Language Models

120 views
Skip to first unread message

Jean Alves

unread,
Oct 22, 2019, 11:48:50 AM10/22/19
to UniMRCP
Hello everyone,

I'm testing the Watson Speech Recognition plugin, and reading their documentation, I saw that it's possible to add ABNF grammar on a custom language model at their cloud (ref: https://cloud.ibm.com/docs/services/speech-to-text?topic=speech-to-text-grammarAdd).

I've followed the instructions in the documentation and I've been able to get recognitions with the grammar successfully with their REST API.

However, I've not found a way to do the same with the UniMRCP plugin.

On this page (https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-grammarUse) there is information about how to do this by the WebSocket Interface. It seems I need to add the attribute 'language_customization_id' at the URI and the attribute 'grammar_name' at the body of the start message.

I've tried to add these attributes as query parameter when defining the grammar, but with no success. Bellow follows the log:

2019-10-22 12:31:19:726959 [INFO]   Receive SIP Event [nua_i_invite] Status 100 Trying [SIP-Agent-Watson-1]
2019-10-22 12:31:19:726995 [INFO]   Receive SIP Event [nua_i_state] Status 100 Trying [SIP-Agent-Watson-1]
2019-10-22 12:31:19:727009 [NOTICE] SIP Call State  [received]
2019-10-22 12:31:19:727031 [INFO]   Create Session 0x7f421000d988 <new> [uni2-watson]
2019-10-22 12:31:19:727039 [INFO]   Remote SDP 0x7f421000d988 <new>
v=0
o=Asterisk 2744579004049171715 2712244270994762382 IN IP4 10.11.2.99
s=-
c=IN IP4 10.11.2.99
t=0 0
m=application 9 TCP/MRCPv2 1
a=setup:active
a=connection:new
a=resource:speechrecog
a=cmid:1
m=audio 28418 RTP/AVP 8 0 96 101
a=rtpmap:8 PCMA/8000
a=rtpmap:0 PCMU/8000
a=rtpmap:96 L16/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=sendonly
a=ptime:20
a=mid:1

2019-10-22 12:31:19:727915 [NOTICE] Add Session <27b04c5ec8d84e8e>
2019-10-22 12:31:19:727931 [INFO]   Receive Offer 0x7f421000d988 <27b04c5ec8d84e8e> [c:1 a:1 v:0]
2019-10-22 12:31:19:727945 [INFO]   Found MRCP Engine [WSR-1] for Resource [speechrecog] 0x7f421000d988 <27b04c5ec8d84e8e>
2019-10-22 12:31:19:729801 [INFO]   Enable RTP Session 10.11.2.99:5040
2019-10-22 12:31:19:729824 [INFO]   Open RTP Receiver 10.11.2.99:5040 <- 10.11.2.99:28418 playout [50 ms] bounds [0 - 600 ms] adaptive [1] skew detection [1]
2019-10-22 12:31:19:729833 [INFO]   Media Path 0x7f421000d988 Source->[PCMA/8000/1]->Decoder->[LPCM/8000/1]->Bridge->[LPCM/8000/1]->Sink
2019-10-22 12:31:19:729861 [INFO]   Add Pending Control Channel <27b04c5ec8d84e8e@speechrecog> [1]
2019-10-22 12:31:19:730368 [INFO]   Open <27b04c5ec8d84e8e@watsonsr>
2019-10-22 12:31:19:730387 [NOTICE] WSR Usage: 1/1/2
2019-10-22 12:31:19:730424 [INFO]   Send Answer 0x7f421000d988 <27b04c5ec8d84e8e> [c:1 a:1 v:0] Status OK
2019-10-22 12:31:19:730440 [INFO]   Local SDP 0x7f421000d988 <27b04c5ec8d84e8e>
v=0
o=UniMRCPServer 0 0 IN IP4 10.11.2.99
s=-
c=IN IP4 10.11.2.99
t=0 0
m=application 1544 TCP/MRCPv2 1
a=setup:passive
a=connection:new
a=channel:27b04c5ec8d84e8e@speechrecog
a=cmid:1
m=audio 5040 RTP/AVP 8 101
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=recvonly
a=ptime:20
a=mid:1

2019-10-22 12:31:19:733008 [INFO]   Receive SIP Event [nua_i_state] Status 200 OK [SIP-Agent-Watson-1]
2019-10-22 12:31:19:733026 [NOTICE] SIP Call State 0x7f421000d988 [completed]
2019-10-22 12:31:19:735635 [NOTICE] Accepted TCP/MRCPv2 Connection 10.11.2.99:1544 <-> 10.11.2.99:54425
2019-10-22 12:31:19:737593 [INFO]   Receive SIP Event [nua_i_ack] Status 200 OK [SIP-Agent-Watson-1]
2019-10-22 12:31:19:737609 [INFO]   Receive SIP Event [nua_i_state] Status 200 OK [SIP-Agent-Watson-1]
2019-10-22 12:31:19:737616 [NOTICE] SIP Call State 0x7f421000d988 [ready]
2019-10-22 12:31:19:737621 [INFO]   Receive SIP Event [nua_i_active] Status 200 Call active [SIP-Agent-Watson-1]
2019-10-22 12:31:20:164236 [INFO]   Receive MRCPv2 Data 10.11.2.99:1544 <-> 10.11.2.99:54425 [385 bytes]
MRCP/2.0 385 RECOGNIZE 1
Channel-Identifier: 27b04c5ec8d84e8e@speechrecog
Content-Type: text/uri-list
Cancel-If-Queue: false
Start-Input-Timers: true
Recognition-Timeout: 10000
Speech-Language: en-US
No-Input-Timeout: 5000
Sensitivity-Level: 0.7
Content-Length: 107

builtin:speech/transcribe?language-customization-id=ff4c83d7-405d-4be5-bbc5-74e57b9701f8&grammar_name=fabio
2019-10-22 12:31:20:164299 [INFO]   Assign Control Channel <27b04c5ec8d84e8e@speechrecog> to Connection 10.11.2.99:1544 <-> 10.11.2.99:54425 [0] -> [1]
2019-10-22 12:31:20:164329 [INFO]   Process RECOGNIZE Request <27b04c5ec8d84e8e@speechrecog> [1]
2019-10-22 12:31:20:164400 [INFO]   Init Speech Detector: frame-size=160, max-frame-count=350, output-frame-count=20, vad-mode=2, noinput-timeout=5000 ms, input-timeout=10000 ms, start-timeout=50 ms, complete-timeout=1000 ms, incomplete-timeout=3000 ms, leading-silence=300 ms, trailing-silence=300 ms, interim-results=1, start-of-input=external <27b04c5ec8d84e8e>
2019-10-22 12:31:20:164446 [INFO]   Start No-Input Timer [5000 ms] <27b04c5ec8d84e8e>
2019-10-22 12:31:20:164460 [INFO]   Open Waveform File for Writing /opt/unimrcp/var/umswatsonsr-27b04c5ec8d84e8e-1-8-kHz.wav, sampling-rate [8000]
2019-10-22 12:31:20:165533 [INFO]   Initiate WS connection <27b04c5ec8d84e8e> [https://stream.watsonplatform.net/speech-to-text/api/v1/recognize]
2019-10-22 12:31:20:834727 [INFO]   WS connected <27b04c5ec8d84e8e>
2019-10-22 12:31:20:834779 [INFO]   WS upgrade <27b04c5ec8d84e8e>
2019-10-22 12:31:22:078142 [INFO]   WS upgraded <27b04c5ec8d84e8e>
HTTP/1.1 101 Switching Protocols
Date: Tue, 22 Oct 2019 15:30:08 GMT
Content-Type: application/octet-stream
Connection: upgrade
upgrade: websocket
sec-websocket-accept: 6SeKQKyQSdPoXKyFj6qZvJRwbzo=
x-global-transaction-id: b47960e4882e28fa126200121628f05a
X-DP-Watson-Tran-ID: b47960e4882e28fa126200121628f05a


2019-10-22 12:31:22:078273 [INFO]   Process RECOGNIZE Response <27b04c5ec8d84e8e@speechrecog> [1]
2019-10-22 12:31:22:078284 [INFO]   State Transition IDLE -> RECOGNIZING <27b04c5ec8d84e8e@speechrecog>
2019-10-22 12:31:22:078327 [INFO]   Send MRCPv2 Data 10.11.2.99:1544 <-> 10.11.2.99:54425 [83 bytes]
MRCP/2.0 83 1 200 IN-PROGRESS
Channel-Identifier: 27b04c5ec8d84e8e@speechrecog


2019-10-22 12:31:22:208478 [INFO]   Speech Detector State Transition NO-INPUT -> IN-PROGRESS [2050 ms] <27b04c5ec8d84e8e>
2019-10-22 12:31:22:208519 [INFO]   Start Input Timer [10000 ms] <27b04c5ec8d84e8e>
2019-10-22 12:31:22:208624 [INFO]   Send WS msg [132 bytes] <27b04c5ec8d84e8e>
{"action": "start", "content-type": "audio/l16;rate=8000", "interim_results": true, "max_alternatives": 1, "smart_formatting": true}
2019-10-22 12:31:22:208653 [INFO]   Send WS bin msg [4640 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:22:338518 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:22:538627 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:22:738635 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:22:873417 [INFO]   Received WS msg [27 bytes] <27b04c5ec8d84e8e>
{
   "state": "listening"
}
2019-10-22 12:31:22:938585 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:23:138564 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:23:338616 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:23:538620 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:23:738593 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:23:938644 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:24:138592 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:24:338641 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:24:538659 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:24:738619 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:24:938599 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:25:138622 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:25:158572 [INFO]   Input Complete [success] size=0 bytes, dur=0 ms <27b04c5ec8d84e8e@watsonsr>
2019-10-22 12:31:25:158665 [INFO]   Send WS msg [18 bytes] <27b04c5ec8d84e8e>
{"action": "stop"}
2019-10-22 12:31:26:254168 [INFO]   Received WS msg [360 bytes] <27b04c5ec8d84e8e>
{
   "results": [
      {
         "alternatives": [
            {
               "transcript": "five B. you'll kaffir sin "
            }
         ], 
         "final": false
      }
   ], 
   "result_index": 0, 
   "warnings": [
      "Unknown URL query params: grammar_name. Websockets requests should have the parameters in WS messages, not in URL."
   ]
}
2019-10-22 12:31:26:254302 [INFO]   Set Result Flag [1000 ms] <27b04c5ec8d84e8e>
2019-10-22 12:31:26:254363 [INFO]   Process START-OF-INPUT Event <27b04c5ec8d84e8e@speechrecog> [1]
2019-10-22 12:31:26:254410 [INFO]   Send MRCPv2 Data 10.11.2.99:1544 <-> 10.11.2.99:54425 [115 bytes]
MRCP/2.0 115 START-OF-INPUT 1 IN-PROGRESS
Channel-Identifier: 27b04c5ec8d84e8e@speechrecog
Input-Type: speech


2019-10-22 12:31:26:428670 [INFO]   Send WS bin msg [3200 bytes] <27b04c5ec8d84e8e>
2019-10-22 12:31:26:602788 [INFO]   Received WS msg [241 bytes] <27b04c5ec8d84e8e>
{
   "results": [
      {
         "alternatives": [
            {
               "confidence": 0.42, 
               "transcript": "5 B. you'll kafirs "
            }
         ], 
         "final": true
      }
   ], 
   "result_index": 0
}
2019-10-22 12:31:26:602873 [INFO]   Received WS msg [27 bytes] <27b04c5ec8d84e8e>
{
   "state": "listening"
}
2019-10-22 12:31:26:602981 [INFO]   Process RECOGNITION-COMPLETE Event <27b04c5ec8d84e8e@speechrecog> [1]
2019-10-22 12:31:26:602992 [INFO]   State Transition RECOGNIZING -> RECOGNIZED <27b04c5ec8d84e8e@speechrecog>
2019-10-22 12:31:26:603031 [INFO]   Send MRCPv2 Data 10.11.2.99:1544 <-> 10.11.2.99:54425 [415 bytes]
MRCP/2.0 415 RECOGNITION-COMPLETE 1 COMPLETE
Channel-Identifier: 27b04c5ec8d84e8e@speechrecog
Completion-Cause: 000 success
Content-Type: application/x-nlsml
Content-Length: 230

<?xml version="1.0"?>
<result>
  <interpretation grammar="builtin:speech/transcribe" confidence="0.42">
    <instance>5 B. you'll kafirs</instance>
    <input mode="speech">5 B. you'll kafirs</input>
  </interpretation>
</result>

2019-10-22 12:31:26:620904 [INFO]   Receive SIP Event [nua_i_bye] Status 200 Session Terminated [SIP-Agent-Watson-1]
2019-10-22 12:31:26:620923 [INFO]   Receive SIP Event [nua_i_state] Status 200 Session Terminated [SIP-Agent-Watson-1]
2019-10-22 12:31:26:620931 [NOTICE] SIP Call State 0x7f421000d988 [terminated]
2019-10-22 12:31:26:620942 [INFO]   Receive SIP Event [nua_i_terminated] Status 200 Session Terminated [SIP-Agent-Watson-1]
2019-10-22 12:31:26:620953 [INFO]   Deactivate Session 0x7f421000d988 <27b04c5ec8d84e8e>
2019-10-22 12:31:26:620959 [INFO]   Terminate Session 0x7f421000d988 <27b04c5ec8d84e8e>
2019-10-22 12:31:26:620980 [INFO]   Remove Control Channel <27b04c5ec8d84e8e@speechrecog> [0]
2019-10-22 12:31:26:621002 [INFO]   Close <27b04c5ec8d84e8e@watsonsr>
2019-10-22 12:31:26:621640 [INFO]   TCP/MRCPv2 Peer Disconnected 10.11.2.99:1544 <-> 10.11.2.99:54425
2019-10-22 12:31:26:621704 [NOTICE] Destroy TCP/MRCPv2 Connection 10.11.2.99:1544 <-> 10.11.2.99:54425
2019-10-22 12:31:26:622224 [INFO]   Close WS connection <27b04c5ec8d84e8e>
2019-10-22 12:31:26:622350 [INFO]   Clean up <27b04c5ec8d84e8e>
2019-10-22 12:31:26:622365 [NOTICE] WSR Usage: 0/1/2
2019-10-22 12:31:26:631523 [INFO]   Close RTP Receiver 10.11.2.99:5040 <- 10.11.2.99:28418 [r:226 l:0 j:45 p:50 d:0 i:0]
2019-10-22 12:31:26:631593 [INFO]   Remove RTP Session 10.11.2.99:5040
2019-10-22 12:31:26:631659 [NOTICE] Remove Session <27b04c5ec8d84e8e>
2019-10-22 12:31:26:631668 [INFO]   Session Terminated 0x7f421000d988 <27b04c5ec8d84e8e>
2019-10-22 12:31:26:631718 [NOTICE] Destroy Session <27b04c5ec8d84e8e>




Is it possible to do using the UniMRCP plugin? Is there any way to add attributes on the WS msgs?

Thanks!

Arsen Chaloyan

unread,
Oct 26, 2019, 4:44:48 PM10/26/19
to UniMRCP
Hello Jean,

Support for custom language model and many other related parameters has been added in the latest release of WSR 1.5.0. You may find more info in Sections 3.2 and 4.7 in the Usage Guide.

The problem is the parameter grammar-name is not supported at this time.

--
You received this message because you are subscribed to the Google Groups "UniMRCP" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimrcp+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimrcp/811afc83-3990-46a4-80cb-bfb0e2981d9c%40googlegroups.com.


--
Arsen Chaloyan
Author of UniMRCP
http://www.unimrcp.org

Jean Alves de Almeida

unread,
Oct 30, 2019, 2:03:45 PM10/30/19
to uni...@googlegroups.com
Hello Arsen,

Thanks for the reply.

Regards

Reply all
Reply to author
Forward
0 new messages