Notation for non-standard words in vocab files

Let’s say you want Mycroft to recognize some words that would not appear in a natural language dictionary. How would you notate those non-standard words in the vocab files? Example scenario:
Company uses Atlassian JIRA Service Desk, and we want to ask Mycroft for the “JIRA staus report”. What to I write in the vocab file so that the STT engine will return “JIRA” (or “jira”) to Mycroft?

Sometimes he does get it, but often the word is not recognized at all, sometimes it comes back “jiro”. I can imagine other cases that might be harder to recognize, or harder to distinguish from “real” words.

Based on the normalization of possessive apostrophes ( Error parsing word with possessive apostrophe ) I wonder if a similar mechanism exists for special words (something like “JEE-RUH”). I think I understand why in general IPA or similar would be a bad thing. But I think some exceptions such as some well-known proper nouns might exist.

Or maybe I’m thinking about this all wrong. I would appreciate both technical guidance and pragmatic approaches that maybe walk around the issue.

1 Like

Combine two source words phonemes perhaps?
http://www.speech.cs.cmu.edu/cgi-bin/cmudict?in=jeer
http://www.speech.cs.cmu.edu/cgi-bin/cmudict?in=rah
JH IH R + R AA = JH IH R AA
Some variation of that maybe?

So firstly, big ups to @steve.penrod who talked me through this one this afternoon.

The way to determine this is to spin up the command line interface (CLI), speak to Mycroft and understand how the STT translates the spoken phrase into text.

I took the liberty of doing this, and surprisingly, the phrase is jira.

20:38:04.964 - requests.packages.urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): api.mycroft.ai
~~~~ata": {"lang": "en-US", "utterance": "jira"}, "context": {"client_name": "mycroft_listener", "ident": "1517909881.8-2338034935122241024"}}
20:38:04.965 - mycroft.skills.padatious_service:handle_fallback:101 - DEBUG - Padatious fallback attempt: jira
20:38:04.966 - mycroft.skills.core:handler:1101 - WARNING - No fallback could handle intent.
20:38:04.967 - requests.packages.urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): api.mycroft.ai
20:38:04.968 - SKILLS - DEBUG - {"type": "mycroft.skill.handler.start", "data": {"handler": "fallback"}, "context": null}
20:38:04.968 - SKILLS - DEBUG - {"type": "complete_intent_failure", "data": {}, "context": null}
20:38:04.954 - mycroft.client.speech.listener:transcribe:166 - DEBUG - STT: jira
20:38:04.955 - __main__:handle_utterance:61 - INFO - Utterance: [u'jira']
20:38:04.957 - requests.packages.urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): api.mycroft.ai
20:38:04.968 - __main__:handle_complete_intent_failure:81 - INFO - Failed to find intent.

I decided to try a few others:

  • confluence = confluence
  • github = github
  • ssh = ssh
  • https = is recognized as http

Let me know how you go :slight_smile:

Thanks for the reply, @KathyReid . I agree that after further experimentation, I can just leave it as “jira”, which is pretty cool and impressive. Some interesting side notes: if i pause too long between “jira” and “status report”, Mycroft ignores or chops the “jira” off (i.e., …“data”: {“lang”: “en-US”, “session”: “45b43886-df3f-4885-90bd-1c9d6c2620d1”, “utterances”: [“status report”]}, “context”: {“client_name”: “mycroft_listener”, “ident”: “1517938560.561406417937”}} ) so don’t enunciate “too clearly”! It also helps to follow the Atlassian pronunciation guide. Normally, I pronounce “JIRA” a little bit like the first syllable of “jury” (not quite, but not quite “JEE” either) followed by a schwa.
As long has i give a pretty quick and clear JEE-rah, not too long pauses for articulation, it works. Which is great, just as you would want it to be.

And in my assumed more difficult cases, I think i see the prescribed approach, just sort of back-in to a modified spelling (kind of like @baconator 's suggestion) by repeatedly uttering and varying while watching the utterance log.

Thanks for all the help.