Contexto.me is a Brazilian game inspired by Semantle.com which the objective is to find the correct word based on the distance between the answer and the words you input. Then you should be asking yourself: “distance between words? How does this work?” No worries! In this article, I’ll explain everything about it.
This kind of game explores a concept of Natural Language Processing (NLP) called Word Embeddings – a way to represent words in the shape of vectors while keeping the word’s context. There are many ways to do this. The simplest one is One-hot, which takes the size of your vocabulary (a vocabulary is the set of all the distinct words in your text) as the dimension for the vector representation of the word, creating a unique vector for each term (e.g., figure 1).
But, the One-hot method has its problems, like:
Willing to solve these issues, other methods of representing words in vectors were created, and I intend to get deeper into this subject in another article. Still, on this one, we will focus on the method that context.me uses, which is Global Vectors (GloVe).
Global Vectors is a more complex method for creating vectors as word representations. It consists of training an unsupervised model to learn the relations between the words in the corpus based on their co-occurrence. This way, we can have more context and semantics on the vector representation of terms and a dimensional reduction made with Matrix Factorization, which will imply a reduction in processing time.
After training a GloVe model (or another word embedding model) or getting a pre-trained version of the model, we can deal with words in the format of vectors. So, we can make vector operations with them, like calculating the cosine of the angle between two vectors (e.g., figure 2), which will result in the similarity between two words in this case.
This similarity makes the game work because the game’s main objective is to find the target word based on its similarity with the words you input.
This similarity is shown to you as a distance between the input word and the target word, and this distance is the Nth word on a vector of similarity calculated from the target word.
As you can see in figure 3, if you input the word “carro” (car) with the target word being “batata” (potato), you will get the result distance of 9922 (only in this example). It should lead you to the information that the word you input is far from the target word, and then you should input a better one – smaller distances means better words, which means you are getting closer to the target word based on the context. Then you keep guessing until you find the target word and win the game for the day – the target word is updated daily.
I hope that now you can play and explore the game in a different and better way, knowing how it works. Keep in mind that this article explores a small part of the full potential of word embeddings inside the NLP field, and the objective here was to explore how the context.me game works, which is pretty fun. It is also important to say that this kind of method has so much potential when used with other machine learning models, solving problems involving sentiment analysis, entity recognition, text classification, and others.
If you enjoyed this reading, feel free to look at a Google Colab that I made about this topic, which has a better approach in technical terms, dealing directly with examples of code that can make things happen.