Researchers from Spain’s Polytechnic University of Madrid (UPM), together with colleagues from the Carlos III University of Madrid (UC3M) and the University of Valladolid (UVa), have developed an application, called ChatWords, to evaluate the knowledge that the artificial intelligence system has of different languages. The initial study of the more than 90,000 words found in the Royal Academy’s Spanish dictionary shows that the ChatGPT-3.5-turbo model is missing approximately 20% of them. And of the remaining 80%, it misunderstands up to 5%.
To better understand the results, it should be taken into account that a Spanish speaker recognizes 30,000 words on average, that is, almost a third of the whole Spanish lexicon. It may seem like a poor score compared to the machine, but with artificial intelligence systems, the lexicon is the basic building block, and analyzing the meanings ChatGPT of the words is often, we see that there is a nondisputable percentage in which the sense he points out is wrong, says Javier Conde, assistant professor at the Higher Technical School of Telecommunications Engineers (ETSIT) of UPM and one of the researchers. “Maybe ChatGPT isn’t as clever as it looks,” he adds.
It is reasonable to presume that large language models (LLMs), based on artificial intelligence and designed to process and understand natural language on a huge scale, will not use words they do not know. This raises another concern. Pedro Reviriego, professor at ETSIT also involved in the research, points out that it is essential to guarantee the lexical richness in the text created by artificial intelligence.
The ChatWords app is open source and is designed to be easy to use and expand. The researchers’ next step is to evaluate other languages and LLMs to better understand the lexical knowledge that artificial intelligence tools have and how it may evolve as new versions and tools appear. His work is part of the Project Networks of the Future for Data Processing and Operator Centers, funded by the State Research Agency, and is supported by OpenAI, the US laboratory responsible for ChatGPT, through its program of access to researchers.
Martínez, G., Conde, J., Reviriego, P., Merino-Gómez, M., Hernández, J. A., and Lombardi, F. “How Many Words Does ChatGPT Know? The answer is ChatWords.” arXiv:2309.16777