Language standard

Discussion of standard for marking the language of a post so that frontends can appropriately filter.

I will start off the standards discussion for this category by talking about my initial ideas for how we can markup the languages of a post so that a frontend like steemit.com can filter out posts that we aren't interested in because they are in a different language.

This is what I posted in the #proposals channel of the Steem slack:

A post/comment may have a "language" field in the JSON with an array of strings representing the code (cannot have any hyphens in it) for the language as its value. The array includes all the languages present in the post. The first language code in the array is the predominant language of the post and it determines how the post will be filtered.

When submitting a post or comment, there would be a dropdown to select the predominant language of the post. Normally, it is by default set to the default language indicated in the client (if no default is yet set in the client then the client default language is English), but there are two (possibly three) cases where the dropdown would be set by default to another language. First, if this is a comment, the default dropdown is set to the parent's predominant language. Second, if this is a top-level post in a category whose name is of the form "<language_code>-*" where the <language_code> (cannot have any hyphens in it) denotes a valid language code, then the default dropdown will be set to that language. A possible third exception would be if the client is able to detect with high degree of certainty that the post is in a different language, in which case the default dropdown will be set to that language. Ultimately, the poster gets to choose what the predominant language of the post/comment should be set as by making the appropriate choice in the language dropdown before posting. If the dropdown is selected to a language that is not the same as the predominant language of the parent post/comment (if any exist) or that is not the same as the language code indicated by the category (if the category name is prefixed by a valid language code), then the client should warn the user that they are posting under a different language than what is expected given the context the post is in and therefore might be downvoted.

Then the client can simply have a whitelist of languages and it can filter out all top-level posts that have a predominant language set to something not on their whitelist.

Hopefully we can use these initial ideas as a starting point for a discussion to iron out a formal standard that we can follow.