Speech to Text for the Digital Humanities

Hello. I work with digital collections for a large special collections library. We use a technology stack called Islandora to host digital images, books, films, audio files, etc. Below is a diagram of our massive toolbox I just found with a random search.

You can see in that middle level there’s a tool called Tesseract. It’s an open source OCR engine. When we ingest images of printed material into our digital library, Tesseract runs and creates really terrible transcriptions. This allows us to perform full text searches of the books we ingest. If one day Tesseract gets updated and is a better tool, then we will just regenerate better OCR off the same images by running the tool again. It’s a modular open-source thing we just stuck in there and it performs a great service.

It seems like you guys need to build a speech-to-text tool as a major component of the mycroft project. Is it possible to build this in such a way that it can be pulled out as a modular tool for use anywhere? For instance, could I put the mycroft speech-to-text tool into the Islandora stack in a way similar to Tesseract. So I ingest an oral history interview, the tool looks at it, and a text files is generated. It’s fine if it’s a terrible text file so long as the tool improves and I can always rerun the process later. A rough transcript is better than no transcript.

If it worked like that, there could potentially be a very large group of grant-writing developers out there that could work on just that piece of the larger mycroft ecosystem.

1 Like

Yes, we are working on an open source initiative called OpenSTT ( http://OpenSTT.org ). You will see the result of this very soon as we are putting in place the pieces in order to ensure that everyone is able to contribute to the project.

So yeah, I think it’s a perfect fit. I’m Excited to have the community help out!