Dialog rendering and grammar (NLP)

SGee · September 1, 2022, 11:04am

I want to kick things off here instead if dumping it as an issue.

There are a bunch of Lingua Franca helper functions to parse and format parts of speech, trying to hash out the intricacies of different languages. Yet the dialog renderer is a big blocker to be grammatically precise.

On the skill level you normally compute the data (eventually leverageing common parsing/formatting procedures by LF or through custom code), which is then sent to the mustache renderer picking one line constructing the scentence to be voiced.

This poses a challenge for certain languages, since to be accurate the context (of the dialog line) is needed beforehand. As is the skill dev is pressed to form all lines of dialog in a nominative case. In return you get a pretty stale sounding phrase, sometimes impractical or outright impossible. I’ve seen some helper functions recognizing this problem offering a contextual parameter, yet core falls short to facilitate that with the given pipeline.

The “easy” way would be to prerender a dialog, tokenize it and pass it as context to the helpers, but i would prefer a more robust “formatting pipeline” as a part of the dialog rendering process in core (Note that only a fraction of a fraction is getting the LF treatment). The ultimate goal should be - in my book - to abstract this away from the skill level, since i don’t think this should be a skill devs concern.

The context should be at least POS-tagged, dependency parsed and optimally have some morphological analysis attached. I always come up with the tool spaCy in that regard, which is -unfortunately- dubbed as too chunky. On the other hand has a pretty accessable, flexible and extendable pipeline, which should be beneficial to LF. Its purpose is foremost analytical NLP - the part core/LF is lacking -, but i can see a win-win scenario.

Thoughts on this?

Not directly in the scope of What is wrong with mycroft-core right now? but certainly an addendum

synesthesiam · September 1, 2022, 11:38pm

I see this as a natural language generation problem, specifically multi-language. Ideally, you would be able to specify the semantics of the dialog only (e.g., abstract meaning representation), and the syntax would be generated for you.

How do you imagine people would specify dialogs at the semantic level? Or were you thinking of something more syntactic, like “use dialog X if the subject has case A, and dialog Y otherwise”?

JarbasAl · September 2, 2022, 10:30am

this subject has been brought up a few times in mattermost and github, here are some relevant discussions from github

github.com/MycroftAI/mycroft-core

It should be possible to set the (grammatical) gender of Mycroft

opened 01:13PM - 06 Nov 18 UTC

ftyers

Type: Enhancement - proposed Status: For discussion

For some languages (Portuguese, Spanish) it will be necessary to set the gender …of Mycroft according to the voice (female, male) and allow this to be referenced in the dialogues. For example, in Portuguese the word for thankyou is *obrigado* (male speaker) and *obrigada* (female speaker). Or "Qué tal Mycroft?" ... "Estoy cansado" (male speaker) / "Estoy cansada" (female speaker) For other languages, it would be convenient to know the gender of the person speaking to Mycroft to be able to choose the correct greeting. E.g. Icelandic *sæll* for addressing a male, and *sæl* for addressing a female. This could be either estimated at utterance time using the speech signal, or alternatively set on a per-user basis.

github.com/MycroftAI/mycroft-core

Consolidated proposal for adding metadata to dialog files

opened 10:58AM - 22 Dec 19 UTC

ChanceNCounter

Type: Enhancement - proposed

Addresses #1865 and #2293 **What**: The proposal would add whitespace-del…imited metadata to the *ends* of dialog lines, separated from the dialog line itself by carat `^`: `this is a line of dialog^attribute1=value attribute2=value` **Why**: To quote Jarbas in chat, > if i'm not missing anything we are discussing 3 features enabled by this metadata scheme > > * gender value for grammatical correctness > * atitude value, weights for this would be global (i really wanna say increase sarcasm by 20%) > * individual line weights (after atitude selection) **How**: [`MustacheDialogRenderer`](https://github.com/MycroftAI/mycroft-core/blob/dev/mycroft/dialog/__init__.py)'s existing logic would be extended to parse the file for lines matching the desired attributes. I've looked at a few different ways of doing this, and I think recursive, broader searches, dropping the "lowest value" attribute each time, are sadly the easiest route. In this case, that would likely mean that Mycroft would first look for a line matching the device's grammatical gender and its attitude, plucking random lines until it runs out. After that, it would look for a line matching the device's grammatical gender and default/no attitude. Failing that, it would revert to the existing logic and pull random lines with no regard for metadata. I worry about the computational consequences if, and *only* if, several such attributes are added *and* dialog files start cropping up with hundreds of lines. I stress tested the existing renderer when `recent_phrases` was added, and it's lightning for any reasonable values of `max_recent_phrases`. Still, this system does run the risk of parsing entire dialog files `n + 1` times for `n` attributes, and there's further multiplicative complexity within the loop. Right now, we're talking about kilobytes, so who cares, but... yeah... --- Regarding "individual line weights," that was in reference to easter eggs that get obnoxious if they fire too often. There'd be a few ways to do it, but I think the most straightforward would be to shove them in a dict that works like recent_phrases, phrases being the keys, and their values representing a countdown until the phrase gets "unlocked." Each time a line is rendered, the values in that dict would decrement, and any keys with the value 0 would be removed from the dict, allowing the parser to choose them for rendering again. In that case, 'weight' might be the wrong term. The attribute would instead reflect a "cooldown" before the phrase is spoken again. Also, I should point out that, because the dict I describe would be per dialog renderer, the cooldown would not guarantee that any other phrases from the same .dialog file are spoken in between instances of the weighted line. It would only guarantee a minimal number of *interactions with Mycroft* before the line is spoken a second time. It also lives in memory, so it remains possible that a user hears your easter egg on their last skill invocation before shutdown, and again on their first invocation after startup. ---- I propose, and will likely prototype during the break, adding the metadata immediately, and simultaneously adding attitude weights to mycroft.conf, along with an intent for tweaking those values verbally. Increase sarcasm by 20%, indeed! That's at least three of us now who are tickled by that reference 👨‍🚀 Lingua Franca has a part to play in handling grammatical gender, so I think that should probably be added to core at the time of the switch. That way, Lingua Franca can report whether the configured language is gendered, what (if any) default gender it uses, and any other information which might be relevant to the problem.

mike99mac · September 2, 2022, 10:36am

My first thought is “Wait what?” Your post looks like English, but it was a bit over my head

How would this apply to a specific issue? I posted here:
First prototype for better music playing skills
on better music playing skills. There is hard-coded English such as album/artist/band/by/song/track, etc. How should that code be internationalized based on your suggestions?

Thanks.

-Mike Mac

P.S. Does Am%zon Al%xa play music by requests in languages other than English? I’m guessing so…

SGee · September 2, 2022, 3:57pm

Michael seems to be 10 steps ahead. Sentences being free form would be next level. I wasn’t thinking that far.

to borrow a comment/example from another discussion (from NeonMaria):

In Ukrainian and Russian languages numerals change their form not only depending on the case or plural/singular form of the dependent word, but also on the numeral itself. There are several numeral groups (7 in Ukrainian, 4 in Russian) each of them has its own rules of getting flexions. That’s why putting numerals in correct form in Ukrainian, Russian and some other synthetic languages requires dependency sentence parsing and POS tags determination. Below there are some examples of those:
* Group I:


1. **один** (one) - changes flexions by gender, case and plural/singular forms of the following noun: **одне, одна, одні, одного, однієї (1)**
e.g. Ми бачили одного(1) хлопця. (We saw one guy.)
Numeral(1) depends on singular noun in genitive case (хлопця), according to this we put numeral (один -1) in certain form in genitive case (одного).
* Group II:


1. **Actually quantitative numerals** (власне кількісні: два, три, чотири), (2, 3, 4), (two, three, four).
   Change their flexions according to the case of the noun in plural form on which it depends.
e.g. По вулиці йдуть два(2) хлопці. (Two boys are walking on the street.)
Numeral (два - 2) depends on plural noun in nominative case (хлопці). That’s why we put quantitative numeral (2 - два)
2. **Composite numerals** (збірні числівники: двоє, троє, четверо, п’ятеро, …) (2 – 20 і 30) (two, three, four -, twenty, thirty).
   Change their flexions according to the case of the noun in plural form on which it depends. Can not be dependent on nouns in the nominative case.
e.g. По вулиці йшли двоє(2) хлопців. (Two boys were walking on the street.)
Numeral (двоє - 2) depends on plural noun in genitive case(хлопців). That’s why we put composite numeral (2 - двоє)