Japanese characters transcoding: stop using Shift-JIS for the source

So, last day I got an encoding error. The data is what I though was in Shift-JIS but it could not be converted to UTF-8 because in Shift-JIS U+0087 does not exists.

It's is multiple used as title like so:

https://www.htmlsymbols.xyz/unicode/U+0087

Solution:

In Windows, apparently, instead of the strict Shift-JIS, they are using CP932 which is superset of Shift-JIS (meaning it's Shift-JIS with additional characters).

Please use CP932 encoding when dealing with Japanese characters.

The Rubyist out there, it will be:

CSV.read("csvfile.csv", encoding: "CP932:UTF-8")
0
Subscribe to my newsletter

Read articles from Thomas Brennetot directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Thomas Brennetot
Thomas Brennetot

10+ years most with Ruby on Rails and JavaScript frontend and some Go. My first language is C for 3 years in university.