Using a socketio connection in a Mycroft skill

I try to build a skill that connects Mycroft with a Rasa Chatbot via a socketio connection.

It is not clear to me where to start implementing this. My experiments always get stuck in a “turn-taking” (rather than a realtime message exchange) model where the intent handler can pass one user utterance to the chatbot (and display the answer) but I fail to keep the conversation ongoing.

I also don’t know if the approach is even doable with the intent handlers and the converse method (that would pass follow-up user utterances and emit corresponding events + speak the response in Mycroft).

Any thoughts if my approach is doable or reasonable?

Hi, I would try function converse(). I did not use it personally, but from doc it seems to fit your need.

I would just add to converse utterance to remove skill from active ones so you can stop chatting and use Mycroft in usual way instead of waiting for 5 minutes timeout.

Doc tell about function to make skill active by calling self.make_active()
But I do not see any inverted function to it.

Second option I would use is get_response with higher number of retries (waiting for response) running in loop stopped by utterace or timeout. But I am not sure if there is a timeout for how long skill can block run.

Hey,

TLDR - based on my assumptions of what you’re trying to do, I don’t think we have the right answer yet, but there are ways to achieve similar functionality in the interim.

Can you describe a little more about what you are wanting the interaction to look like?

When you say it gets stuck in turn-taking rather than realtime, do you mean that the user needs to trigger the device each time it wants to take a turn rather than just responding to what the chat bot said, or something else?

I don’t think converse() fully provides what we need for immersive experiences and ends up being overused. I’m interested in exploring what a conversational system might look like for specific use cases. Things like chat bots, games, or other direct and ongoing interactions where the user would know and expect that anything they said would be listened to by Mycroft for a given period of time.

We’re just about to look in detail at Skills interactions - improving the way that different Skills operate on the same system, behaviours when interacting with multiple Skills at a time, and how Skills interact with each other. Whilst this question is about a single Skill I think it still fits into this question as it impacts how the user can interact with other Skills on the device. Eg if you’re in an immersive experience such as chatting with a Rasa chatbot and ask “what’s the weather” - is that a question for the chatbot? Should it fall through to the Weather Skill? If it drops to a Skill outside the “immersive experience” does that end the immersive session or does that session continue?

Thanks for the quick response. I managed to get the socket working, but I also think that some piece is missing for a good user experience. The case is this:

We use a Mycroft skill to create a connection with a Rasa Chatbot via socketio. So far, we used their REST API but the code became quite complex and I thought real-time message exchange would be a more reliable option to exchange messages too. Our conversations (same session) can have 20+ turns. We use a customized Mycroft App (websocket) + a customized Mycroft core + our own Mycroft skill + one Rasa chatbot with the socketio endpoint.

Our skill has an intent handler that handles the first user utterance that actiates the skill.

  1. It connects the user to the socket endpoint,
  2. uses get_response to a) say “I’m listening” and b) wait for the user’s response.
  3. “calls” the socket sending the second user message (the one after activation).

The intent handler code defines how to respond to incoming events - it speaks the Rasa bot response.

The skill also has a converse-method to handle all utterances after the first two (activation utterance + first question). It does not use the get_response method; instead, it calls the socket endpoint sending the users utterance. The converse method contains the code how to respond to the incoming events, i.e. it speaks the Rasa bot response.

I can have conversations with the bot like in the attached image (there is a bug in my code though that the message is not properly consumed (see last turn where Mycroft answers two times to my question).

The big problem is, that the converse method for our skill must handle any kind of user utterance (because users could direct any question to the Rasa chatbot). The only way I see how to get out of an infinte conerse mode with our skill is to deactivate the skill after saying a stop word. I could not get this working so far, i.e. making the skill inactive.

A timeout is one option too, but it cannot be the only way. In our use case, users may need several minutes to continue with the same conversation: they follow instructions given by the assistant and report back when they are done. Without the converse method, users always need to re-activate the skill manually.

Conceptuall, we may well run into a problem that humans have too when they participate in group conversations - how does a participant know that you address them :slight_smile: . In this case, the other participants are skills. Some of them get simple commands (e.g. timer) while others need to do multiple tasks or complex ones that require lots of context information (longer conversations).

Maybe it is possible to weight all skills, i.e. a skill that has a command and control behavior (e.g timer or send email) is ranked higher while “conversational interface skills” are lower. The higher ranked skills always process messages before the lower ranked ones.
The downside is, that my Rasa skill might have a timer skill too that would never be activated this way :slight_smile: That means conversational interface skills should communicate (with metadata) what they can do to Mycroft. If there is a conflict, Mycroft should ask which skill you want to use.

The fluent transition between skills is indeed a complex issue, especially if we want super smooth interactions.

1 Like