Development - Suggestion for fuzzy string matching feature

Hi there!

I need a little dev help/guidance to add a fuzzy string matching feature to my skill.

I am developing a simple skill for users to create lists of things to remember. Very simple interface (i.e. “add the father to my film list”) and data is stored on a json file stored locally.

A user submitted an issue on github raising the problem that my skill requires exact matches between utterance data and database items to function, and no spelling variations (i.e. “to-do” and “to do”) or other approximations (i.e. ‘film’ and ‘films’) are allowed.

I have given this a little thought and the solution I came up with is pretty drastic, so I would like to ask for a second opinion before I (effectively) refactor a big chunk of the skill!

My idea is to add some preliminary checks on the data parsed from the utterance, trying to match this incoming data with what’s already present in the database. Sort of correcting wrong transcriptions (i.e. “to-do” and “to do”) or user’s slips (i.e. ‘film’ and ‘films’).

I would do so modifying the database module in a way that:

  • data is retrieved from utterance;
  • check if this data has any close match with something already stored in the database using fuzzywuzzy python library;
  • if there is a match, use this match instead of the original utterance; otherwise default with the utterance data.

Does this sound reasonable? Are there better ways to solve this problem?

Many thanks!!

you mean rapidfuzz

1 Like