Question Authors' Forum

Problem with special chars(ñ) in Python of SQL Questions

Problem with special chars(ñ) in Python of SQL Questions

by Sergi García -
Number of replies: 7

Hi mates! 

I have a problem with special chars in SQL questions with things like PAIS="España" 

 SQLite works fine, but Python of The questions try to codify test as ascii 7 and it doesnt work. 

I did several changes to prototype. I changed how is read and writed to use utf8 in file commands and it works fine, but subprocess.checkoutput has problems with it. (I think cant accept utf8

Does anybody know a solution? 

In reply to Sergi García

Re: Problem with special chars(ñ) in Python of SQL Questions

by Richard Lobb -

Unfortunately there's no solution at present. CodeRunner itself is all UTF-8 compatible but Jobe uses the Runguard sandbox from the DomJudge programming contest server. That enforces 8-bit ASCII.

However, I have been meaning to look into this, so thanks for the reminder. I'll put it on the TODO list for this summer (southern hemisphere). 

Richard

In reply to Richard Lobb

Re: Problem with special chars(ñ) in Python of SQL Questions

by Sergi García -
Hi Richard, finally I adapted it to "English chars".

It will be a good feature for future :)

Regards, Sergi.
In reply to Sergi García

Re: Problem with special chars(ñ) in Python of SQL Questions

by Martin Zwerschke -

I have encountered similar problems in a question for C# using the template suggested by Richard.

Inside the question author's answer I used the German chars "ä" and "ü", wich are quite common in German.

The template script gives an error although the C#-program works with these chars.

The question author can avoid ä, ü, Ä, Ü and ß, if he knows that but students would use them in their answers.

Then they will get a "not correct" grade, and do not know the reason.

In reply to Martin Zwerschke

Re: Problem with special chars(ñ) in Python of SQL Questions

by Martin Zwerschke -

I have to add, that it seems to be even worse:

UnicodeEncodeError: 'ascii' codec can't encode character '\xb2' in position 349: ordinal not in range(128)
So we even must not use 8Bit ASCII but only 7-Bit ASCII.

The error above appeared, when I used x² in a question for a solver program for a squared equation.

In reply to Martin Zwerschke

Re: Problem with special chars(ñ) in Python of SQL Questions

by Richard Lobb -

Jobe's character set limitations are documented in the Jobe readme. Specifically:

Programs may write binary output but the results are returned to the caller JSON-encoded, which requires UTF-8 strings. To avoid crashing the json-encoder, the standard output and standard error output from the program are taken as 8-bit character streams; characters below '\x20' (the space character) and above '\x7E' are replaced by C-style hexadecimal encodings (e.g. '\x8E') except for newlines which are passed through directly, and tabs and returns which are replaced with '\t' and '\r' respectively. Also, the Runguard sandbox currently runs programs in the default C locale. As a consequence of these two constraints, programs that generate utf-8 output cannot currently be run on Jobe. It is hoped to improve on this in the future.

Trying to improve on this is on my todo list, but I'm on holiday at present and nothing is likely to happen before about February next year (if I can find a fix).

Of course, if anyone else can find a fix that works for all the different languages while I'm away, I'll be happy to receive a pull request :-)

Richard

In reply to Richard Lobb

Re: Problem with special chars(ñ) in Python of SQL Questions

by Richard Lobb -

Versions of Jobe since 14 Jan 2018 should handle UTF-8 code submissions correctly, provided Jobe is configured correctly. See the Jobe installation instructions.

Richard

In reply to Sergi García

Re: Problem with special chars(ñ) in Python of SQL Questions

by carl hyde -

On Windows, many editors assume the default ANSI encoding (CP1252 on US Windows) instead of UTF-8 if there is no byte order mark (BOM) character at the start of the file. Files store bytes, which means all unicode have to be encoded into bytes before they can be stored in a file. read_csv takes an encoding option to deal with files in different formats. So, you have to specify an encoding, such as utf-8.

 df.to_csv('D:\panda.csv',sep='\t',encoding='utf-8')

If you don't specify an encoding, then the encoding used by df.tocsv defaults to ascii in Python2, or utf-8 in Python3.

Also, you can encode a problematic series first then decode it back to utf-8.

df['column-name'] = df['column-name'].map(lambda x: x.encode('unicode-escape').decode('utf-8'))

This will also rectify the problem.