Problem with special chars(ñ) in Python of SQL Questions

Re: Problem with special chars(ñ) in Python of SQL Questions

by carl hyde -
Number of replies: 0

On Windows, many editors assume the default ANSI encoding (CP1252 on US Windows) instead of UTF-8 if there is no byte order mark (BOM) character at the start of the file. Files store bytes, which means all unicode have to be encoded into bytes before they can be stored in a file. read_csv takes an encoding option to deal with files in different formats. So, you have to specify an encoding, such as utf-8.

 df.to_csv('D:\panda.csv',sep='\t',encoding='utf-8')

If you don't specify an encoding, then the encoding used by df.tocsv defaults to ascii in Python2, or utf-8 in Python3.

Also, you can encode a problematic series first then decode it back to utf-8.

df['column-name'] = df['column-name'].map(lambda x: x.encode('unicode-escape').decode('utf-8'))

This will also rectify the problem.