[Content understanding] Transcription service

Skill name: transcription-skill

User story:

  • As a keynote speaker, I want Mycroft to live transcribe the words I am speaking, so that the conference I am speaking at does not need to employ a stenographer.
  • As a keynote speaker, I want Mycroft to live transcribe the words I am speaking, so that any hearing impaired audience members can have a better experience, and so that I better support accessibility and universal design.
  • After I have spoken, I want to be able to read the transcription in a text file so that I can save it for future reference
  • As a writer, I want to be able to play back a recording of an interview, and have the spoken audio transcribed into written text so that I can make the writing process quicker and easier.
  • As someone who is on a low income, I want to be able to use a free and open source transcription service so that I don’t have to pay the transcription fees (up to 20c minute) that commercial services charge.
  • As a web developer, I want to be able to access the transcription via an API or a web service to integrate into other products or offerings.

What third party services, data sets or platforms will the Skill interact with?

None.

Are there similar Mycroft Skills already?

What will the user Speak to trigger the Skill?

Hey Mycroft, start transcription {{transcription_name}}
Hey Mycroft, stop transcription

What phrases will Mycroft Speak?

Starting transcription {{transcription_name}} {{live transcription}}
Stopping transcription {{transcription_name}} {{live transcription}}
The transcription {{transcription_name}} has been stored at {{transcription path}}

What Skill Settings will this Skill need to store?

  • Transcription directory path
  • Any custom works or phrases that are not mainstream - such as medical terminology
  • (optional) a custom transcription stop and start phrase

Other comments?

There are several commercial services that offer low cost or marginal cost subscription with quick turnaround, but there are none that I’m aware of that offer live transcription as a service, or from a device like this.

One of the barriers I see to this Skill is the memory and storage that may be required.

saves everything you say to text, and sends by email

the trouble is continous listening, if mycroft misses an utterance you will need a “hey mycroft” to keep going

for the “play back a recording of an interview, and have the spoken audio transcribed into written text” you have the wav client https://github.com/forslund/mycroft-wave-client

1 Like

Additional user story: As a meeting attendee, I want to pay close attention to all speakers, take a few notes, but be able to later on review a detailed transcription to improve my notes and recollection of the specifics of what was communicated in the meeting.

As a Totally Cool, but probably not really plausible at this time bonus: speaker tagging (kind of like a stage or film script). Clearly actual identification would be way too much to tackle, but making distinctions between different speakers based on differences in voice.

TRANSCRIPT BEGIN
Speaker1: thanks for coming to the meeting everyone. today’s topic is our new web application.
Speaker2: I’m glad you brought that up, i am still stuck on my current project. i won’t be able to help.
Speaker1: ok, thanks for letting us know. anyone else too busy to help?
Speaker3: yeah, me too, i’m out
Speaker1: ok, on to the technicals. We’ll need a dedicated server, Kasi, what server can we use?
Speaker4: webserver32 is available and also webserver48.
Speaker1: ok great, let’s use webserver32.
Speaker5: you want me to install a web server on there?
Speaker4: sure please do that. but don’t do the latest version, its not tested yet. go with version 2.5.2
…
Speaker1: ok, everyone thanks for your time, see you next week, one hour later than normal meeting time.
Speaker1: Hey Mycroft, stop transcription.
END TRANSCRIPT

1 Like

Brilliant user stories, @jrwarwick, thank you

This seems highly practical and useful, and perhaps simple to implement the basic functionality. I’ve added it to our community development list, which I’ll share publicly soon. :slight_smile:

this is here :slight_smile:

and the audio equivalent