Mycroft Community Forum

Regex Experts Wanted!

I am looking for some help with regex to help with some of my skills.
If the user says play the song blue christmas the regex should return the song name as blue christmas now if the user says play the song blue christmas by elvis presley the regex should return the song name as blue christmas and the artist as elvis presley. I am able to get these working individually but not as a combined regex.
examples.
“play the song blue christmas”

(the |)(song|single) (?P<title>.+)

“play the artist elvis presley”

(the |)(artist|group|band|(something|anything|stuff|music|songs) (by|from)|(some|by)) (?P<artist>.+)

I don’t think there’s a good regex for this, sadly. The problem is identifying the break between capture groups.

Even if you try to hinge on important words, they could all be part of a song title.

This is a brain wracking problem. I think the only good solution is to feed it into some kind of search algorithm that returns a tuple or JSON or something.

Indeed a difficult one if you don’t want to risk sticking with a match that cuts the song title apart. Since regex by default is greedy, trying to match very generic regex first against the music library, then, if nothing was found, trying the more specific regex with song and artist separation, probably is safest.

Otherwise it makes sense to be strict:

((the song (?P<title>.+)|something)( from the artist (?P<artist>.+))?|the artist (?P<artist>.+))

Matches:

  1. the song abc
  2. the song abc from the artist xyz
  3. something
    • If playing a random song from an artist is allowed, playing any random song probably as well?
  4. something from the artist xyz
  5. the artist xyz
    • Short from of 4., so it could be removed, but matches the example you gave.

The shorter/easier the signal words and the more alternatives you allow for those, like song/single, from/by, something/anything/stuff..., artist/group/band, the more false matches occur, so either multiple regex need to be looped through to not match wrong or miss a match in the music library, or one has to live with either missing results (when full matches are allowed only) or a large number of results (when a title/artist only needs to contain the string matched by the regex), when a song title or artist name contains one or more of those signal words.

1 Like