Mimic Pronounce feature and complementary skill worth exploring - Word pipeline for queuing mispronuciations and thematic review

wolfgang8741 · September 11, 2022, 6:17pm

Has anyone considered developing a pipeline from Mycroft Mark I or Mark II w/display or other display configuration to queue up words for correction of pronunciation?

The problem: Mimic Pronounce relies upon having previously identified mispronounced words or trying different guesses to find words to correct.

Ideas to explore: Adding a queue of word(s) to review and/or add pronunciations to Mimic Pronounce identified as needing fixed by the community

Scenario 1 - Existing interaction reports - reporting of poorly pronounced words of results and phases of skills or passages returned by the skill. ie Wikipedia article result mispronounced a word in the sentence or didn’t sound right. The interaction could be - if user interrupts the response with “that doesn’t sound correct” or similar phrase, on devices with a screen the sentence(s) recently pronounced Text could be displayed with a number below he word and the user could say “word four was wrong” and the user could be asked if they would like to submit it to Mimic Pronounce for review and correction.

I for one would like to have Mycroft read any Wikipedia article or ebook read to me, but if Mycroft Mimic encounters a mispronunciation, I’d like to flag and have the word fixed for future reads and work with the community for a better experience. I don’t often want to stop, go to the Mimic Pronounce website and both add the word and suggested pronunciation while in the middle of listening and interacting with Mycroft. There should be the ability to queue words to fix in Mimic Pronounce when encountered.

Scenario 2 - Dedicate word and concept pronunciation skill and community focus for review - A new skill would need to be designed to ingest words and phrases for review. Using Mimic for TTS one would read a word from a review review queue to verify and send to a queue in Mimic Pronounce for fixing. Different than Common Voice in that phrase would be pulled from a knowledge graph such as Wikidata (CC0) and focus on specific concepts ie proper nouns and specific acronyms for surnames, organizations, or other concepts represented.

The interaction could be similar to scenario 1 except instead of being presented a sentence, the Wikidata Q, label language, and Q description could be displayed as well as important additional structured details to help identify pronunciation ie “live vs live” in live together vs live on stage Where there is ambiguity information could be placed back to Wikidata to help clarify a concept.

There may be other scenarios, but these are two I’ve selected to highlight why a queue feature should be considered for Mimic Pronounce. The affordance of improving the crowdsourcing task of reviewing mimic TTS and trying to be proactive prior to encountering a mispronunciation and broadening what words have been reviewed by the community. It could complement the work of Common Voices and identify words and sentences that would be good to queue to common voice readers for pronunciation variation by region.

My interest lies more in scenario 2 as a regular user of Mimic Pronounce and frustrations I have found when trying to come up with words to review and fix. Approaching with a structured way to queue words aligned with current interactions and ability to focus word review works to avoid stumbling upon a mispronunciation ie already experiencing a poor user experience or test random words until one that doesn’t sound correct. This would make for a broader appeal for interacting with Mycroft as well as tying into a knowledge graph like Wikidata (CC0 public domain licensed content constantly expanding) it would afford providing content to review based on concept especially if domains are identified as being particularly problematic ie surnames or geographical names and affording tapping into comparing multiple languages and spellings and for the word or concept represented by the Wikidata Qid. Interactions with the knowledge graph that is open like Wikidata could work both to improve Mycroft, but be designed in a way that reciprocates value to the Wikidata community with fixes and TTS and may attract new adoptions of Mycroft with Wikimedia properties.

Functionally the queue and review features could be implemented on the current Mimic Pronounce webpage by adding a “review” box and instead of filling in the “word to fix”, a word from the queue would be presented and the user would be asked to add the “Pronounce as” and save or “sounds right” could be selected to double check and dismiss a sound - though I’d suggest a two out of three agreement check by different accounts or other consensus check before dismissing, possible need to flag sensitive data depending on how words are queued. A second “review a word” box could be presented similar to reviewing in common voice and if “sounds right” is selected the Q is dismissed else ask for the “Pronounced As” to fix or add to queue as sometimes it is easy to identify an incorrect word, but not easy to get the correct sound.

Possible links of interest:

https://www.wikidata.org/wiki/Wikidata:Database_reports/EntitySchema_directory
https://www.wikidata.org/wiki/Wikidata:List_of_properties
https://www.wikidata.org/wiki/Wikidata:WikiProject_Names - often includes phonetic spelling and recording (check licenses)
[Category:Geographical WikiProjects - Wikidata](https://Geographical focused categories and projects)
https://query.wikidata.org/ - Querying of Wikidata items via SPARQL language
https://www.wikidata.org/wiki/Wikidata:Schemas
[Wikidata:WikiProject every politician - Wikidata](https://Every Politician - Wikidata Project)
[Wikidata:SPARQL tutorial - Wikidata](https://Wikidata SPARQL tutorial)

There is a lot of text here to digest, but if anyone wants to run with this, lets scope out how to add a word queue to Mimic Pronounce in the code repository ( I haven’t found it) so we can scope out the feature below.