Privacy = Good
Tracking = Bad
We are only pushing data through the feedback loop from users who explicitly opt-in either through home.mycroft.ai or by setting the "Learn" feature of their Mark I device. Other users data will not be tracked and, if we do for some reason collect it in a log file or something, it will be purged just as soon as I become aware of it.
On the Speech-To-Text side of things we've put in place ( or are putting in place ) mechanisms to allow users to have their voice data removed from the online repository. Consumers of the data will be required, as part of their license, to refresh it periodically. Unfortunately, to allow removal of the data, by definition, we need to know which user goes with which utterance.
We set it up this way because we were forced to strike a balance between securely anonymizing the data and having it live forever -OR- Giving users control of their data so they can delete it. In light of our certainty that speaker identification software will be ubiquitous in 10 years ( allowing companies to unmask speakers based on a small voice sample ) we decided it is better to allow users to delete their data.
On the natural language understanding side of the house we don't plan to tie the data to the user in any way. We will also put in place reasonable safeguards to prevent data like credit card numbers, social security numbers and phone numbers from entering the public data set. As much as possible data will be scrubbed of IP addresses and other personally identifiable information.
We don't want your data. We are not interested in making money off of your data. We are not interested in tracking you or your behavior online. We don't want to sell you paper towels. We are not in the advertising game and have zero interest in unloading billions of dollars worth of overpriced computers with a fruit stenciled on them.
Our sole purpose is to build an AI that runs anywhere and interacts exactly like a person. Period.
To do that we need to build a data set, but we intend to do it while collecting as little information as possible.
Does that answer your question?