Is browser integration on the roadmap?

Question for the mycroft team: are there plans to have skills integrated with a web browser? currently I am experimenting with this awesome fork https://github.com/JarbasAI/JarbasAI but I am curious if There is a plan to have some browser integration so the skill can open a new or existing browser tab.

Is there a roadmap somewhere that I can look at?

@oren Just wanted to chime in here, we have been discussing a browser option, but I am not positive that I fully understand your needs, vs what I was thinking about. Can you elaborate on what you want to do other than opening a new or existing browser?

Is this what you want:

Hello Mycroft – Open New Firefox Window > Opens new window of browser
Hello Mycroft – Show me Firefox Window > Opens an existing browser session?

Nate

Hi Mycroft, translate ‘how are you doing’ to Chinese.
The output will be audio reply, text reply in the browser, and new or existing tab is showing a relevant webpage.

There’s some plans to allow opening webpages and other things in the web browser or using chromecast similarly to jarbasai’s implementation.

If one want’s to do something like that today I’d recommend using the xdg-open linux command which will do pretty much what you ask. (It takes an uri and tries to open it with the default program, web urls with firefox, pdf’s with evince, etc)

Interesting.

Maybe what I am looking for is an HTTP web service that receive stream of audio, and return back stream of audio. This service is doing the speech to text, intent phrasing, skill, and text to speech. I believe that this is similar to Lex, Amazon service. Having something like this will allow us to build audio conversation into any device/platform that can make HTTP calls (web, native apps, IoT, etc).

Is this possible with Mycroft or any other open source platform?

Currently it’s not easy getting this working in a good way for multiple users since skills and intent service retain internal state.

I’ve been thinking about this and trying to come up with an architecture where state is stored at the client instead in the skill service.

For single users this can be setup with small or no changes.

1 Like

I think this would be a good start Host Mycroft Securely In Cloud

1 Like

Thank you. I just want to point out that this video shows how to create an HTTP endpoint that accepts text and output text. This indeed can be a great starting point for me. And maybe as the next step I should figure out is how to do audio in audio out endpoint. That way I’ll have a way to interact from the web using text or audio.

If anyone have any suggestions for learning about audio to audio endpoint. please share them!

What you could do is use google’s text to speech to process audio and send the text to a Mycroft endpoint to process the text. Then you can set up mimic to process text to speech, or use google’s text to speech api.

Nice! I assume you are talking about the Web Speech API.

The problem is IE, Safari, and Firefox don’t have support for speech to text at the moment. Text to speech has better support - only IE is lagging behind. Maybe that’s why Amazon Lex are doing the STT and the TTS on the server?

doing from the web is a bit tricky. If you can find a way to record voice cross browser platform, without using the web speech api, you can send the voice file to the server to process.

@oren check this out! https://github.com/MycroftAI/mycroft-core/pull/1028. this PR is close to something you are asking for. Makes it able for mycroft to receive speech to text via a wav file. So basically if you can record your voice on the browser, you should be able to query Mycroft directly by giving it the wav file.

i’ll take a look on the weekend. thank you!