That title is a bit misleading. The reality is that any language can lead to this problem. Full disclosure, I am French Canadian. Born in Quebec but raised in America. So what am I talking about here?
When an API is streaming, a character set (ASCII for example) has to be declared. These charsets do not contain the entire set of characters. The receiver will use the defined charset to decode the API. The majority of the time characters are similar among Western countries. The problem arises when you come upon an unexpected character. This problem recently came up in an API we were working with.
To be specific, what occurs when “François” (the name of one of my uncles) is the data but the declared charset is unable to decode that unexpected character (ç)? Well, the ç can be replaced with a variety of weird characters, based on the system. It can even get translated to “Fran?ois,” and when that happens in a link… well you get the picture. That seemingly insignificant issue can, and has, taken down entire systems.
The moral of the story is that APIs are not magic. They are susceptible to numerous potential pitfalls that can cost a company users, revenue, and quality. What makes this particularly insidious is that if you do not hear about it, you assume it is not effecting you. Well, how can you know without creating proper payload monitors? Etsy did not realize they were losing money. This can be summarized with a quote from one of my favorite movies.
“The greatest trick the devil ever pulled was convincing the world he didn’t exist.”
— Usual Suspects (1995) by way of “The Generous Gambler” by Charles Baudelaire (1864)