Coderunner on Docker fails to perform testsubmit.py

Coderunner on Docker fails to perform testsubmit.py

by Fabrizio Fioravanti -
Number of replies: 11

I have installed on RHEL8 the docker version of coderunner (on podman actually) but executing the testsubmit.py I obtain several errors. Do you have any idea on how to correct them?

[root@coderunner-rh8 system]# podman exec -t jobe /usr/bin/python3 /var/www/html/jobe/testsubmit.py c

Supported languages:

    c: 7.5.0

    cpp: 7.5.0

    java: 11.0.6

    nodejs: 8.10.0

    octave: 4.2.2

    pascal: 3.0.4

    php: 7.2.24

    python3: 3.6.9


Test good C hello world OK

Test compile error C hello world OK

Test use of compileargs with C OK

Test runtime error C hello world OK

Test timelimit on C OK

Test outputlimit on C OK

Memory limit exceeded in C (seg faults) OK

Infinite recursion (stack error) on C OK


***************** FAILED TEST ******************


{'run_id': None, 'outcome': 15, 'cmpinfo': '', 'stdout': '3 forks succeeded, 997 failed\n', 'stderr': ''}

C program controlled forking

Jobe result: Successful run


Output:

3 forks succeeded, 997 failed



************************************************


A C program with ASCII non-UTF-8-compatible output OK

Test compile error C++ hello world OK


11 tests, 10 passed, 1 failed, 0 exceptions


Checking parallel submissions

Doing child 0

Doing child 1

Doing child 2

Doing child 3

Doing child 4

Doing child 5

Doing child 6

Doing child 7

Doing child 8

Doing child 9


***************** FAILED TEST ******************


{'run_id': None, 'outcome': 11, 'cmpinfo': "/var/www/html/jobe/application/libraries/../../runguard/runguard: cannot start `sh': Resource temporarily unavailable\nTry `/var/www/html/jobe/application/libraries/../../runguard/runguard --help' for more information.\n", 'stdout': '', 'stderr': ''}

C program to check parallel submissions

Jobe result: Compile error


Compiler output:

/var/www/html/jobe/application/libraries/../../runguard/runguard: cannot start `sh': Resource temporarily unavailable

Try `/var/www/html/jobe/application/libraries/../../runguard/runguard --help' for more information.




************************************************


C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK


***************** FAILED TEST ******************


{'run_id': None, 'outcome': 11, 'cmpinfo': '/var/www/html/jobe/application/libraries/../../runguard/runguard: warning: timelimit exceeded (wall time): aborting command\n/var/www/html/jobe/application/libraries/../../runguard/runguard: warning: command terminated with signal 15\n', 'stdout': '', 'stderr': ''}

C program to check parallel submissions

Jobe result: Compile error


Compiler output:

/var/www/html/jobe/application/libraries/../../runguard/runguard: warning: timelimit exceeded (wall time): aborting command

/var/www/html/jobe/application/libraries/../../runguard/runguard: warning: command terminated with signal 15




************************************************


All done


Testing a submission with an excessive cputime parameter

OK

Error: non zero exit code: 1: OCI runtime error



In reply to Fabrizio Fioravanti

Re: Coderunner on Docker fails to perform testsubmit.py

by Richard Lobb -

I don't have an RHEL system to test with and I've never used podman. So all I can offer is the observation that it seems like your container is limiting the number of forks or the number of processes that can be created inside the container in a different way from docker.

In the fork-bomb test ("C process controlled forking"), the program tries to fork 1000 times. The parameter numprocs = 10 is passed to the runguard sandbox, so it is expected that 9 forks will succeed, 991 will fail. On your system only 3 succeeded and 997 failed. That implies that the container is itself limiting the number of processes to 4.

In a similar way, the parallel submissions test failed because the test tries to fork off 10 different processes, all of which immediately throw a job at Jobe. But your test failed during the forking with the message "cannot start `sh': Resource temporarily unavailable" - this is probably the same problem. I don't fully understand the rest of the errors but my guess is that they all have the same root cause.

You probably have a working Jobe server, but with a greatly reduced load-handling capability compared to the usual docker jobeinabox.

In reply to Richard Lobb

Re: Coderunner on Docker fails to perform testsubmit.py

by Fabrizio Fioravanti -

Thank for your replay. I'll investigate in that sense in podman (that is redhat supported implementation for docker) and report in the thread any useful discover I'll do (hoping other users can benefit from that).


In reply to Richard Lobb

Re: Coderunner on Docker fails to perform testsubmit.py

by Fabrizio Fioravanti -

I have inspected the container configuration and I have huge limits for files and processes.

           "PidsLimit": 4096,

            "Ulimits": [

                {

                    "Name": "RLIMIT_NOFILE",

                    "Soft": 1048576,

                    "Hard": 1048576

                },

                {

                    "Name": "RLIMIT_NPROC",

                    "Soft": 1048576,

                    "Hard": 1048576

                }

            ],


In reply to Fabrizio Fioravanti

Re: Coderunner on Docker fails to perform testsubmit.py

by Richard Lobb -

I installed podman on Ubuntu 20:04 and ran up both rootful and rootless jobeinabox containers. Both ran testsubmit.py with no errors.

Are you sure you are not yourself restricted in how many processes you can run under RHEL8? Podman containers run as children of your own user process so are subject to its ulimits, whereas docker containers run in a daemon process.

In reply to Richard Lobb

Re: Coderunner on Docker fails to perform testsubmit.py

by John Paul Posada -
Hi all,

Any updates on this? I'm seeing similar errors running jobeinabox docker container on RHEL7.

Supported languages:

    c: 9.4.0

    cpp: 9.4.0

    java: 16.0.1

    nodejs: 10.19.0

    octave: 5.2.0

    pascal: 3.0.4

    php: 7.4.3

    python3: 3.8.10

 

Valid Python3 OK

Python3 with stdin OK

Syntactically invalid Python3 OK

Python3 runtime error OK

Python3 file I/O OK

Testing use of interpreter args with Python3 OK

Testing use of runargs args with Python3 OK

Python3 program with customised timeout OK

Python3 program with support files OK

Valid Python3/pylint program OK

Invalid Python3/pylint program OK

UTF-8 output from Python3 (will fail unless Jobe set up for UTF-8) OK

Test good C hello world OK

Test compile error C hello world OK

Test use of compileargs with C OK

Test runtime error C hello world OK

Test timelimit on C OK

Test outputlimit on C OK

Memory limit exceeded in C (seg faults) OK

Infinite recursion (stack error) on C OK

C program controlled forking OK

A C program with ASCII non-UTF-8-compatible output OK

Valid Octave OK

octave with stdin OK

Syntactically invalid Octave (treated as runtime error) OK

Syntactically valid Nodejs hello world OK

Syntactically invalid Nodejs OK

Correct Php program  OK

Syntactically incorrect Php program  OK

Syntactically incorrect Php program  OK

Correct Java program  OK

Correct Java program without supplied sourcefilename  OK

Syntactically incorrect Java program  OK

Java program with a support class (.java) OK

Java program with Unicode output (will fail unless Jobe set up for UTF-8)  OK

Test good C++ hello world OK

Test compile error C++ hello world OK

Good Hello world Pascal test OK

Fail Hello world Pascal test OK

 

39 tests, 39 passed, 0 failed, 0 exceptions

 

Checking parallel submissions

Doing child 0

Doing child 1

Doing child 2

Doing child 3

Doing child 4

Doing child 5

Doing child 6

Doing child 7

Doing child 8

Doing child 9

 

***************** FAILED TEST ******************

 

{'run_id': None, 'outcome': 11, 'cmpinfo': "/var/www/html/jobe/application/libraries/../../runguard/runguard: cannot start `sh': Resource temporarily unavailable\nTry `/var/www/html/jobe/application/libraries/../../runguard/runguard --help' for more information.\n", 'stdout': '', 'stderr': ''}

C program to check parallel submissions

Jobe result: Compile error

 

Compiler output:

/var/www/html/jobe/application/libraries/../../runguard/runguard: cannot start `sh': Resource temporarily unavailable

Try `/var/www/html/jobe/application/libraries/../../runguard/runguard --help' for more information.

 

************************************************

 

C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK

C program to check parallel submissions OK

All done

 

Testing a submission with an excessive cputime parameter

OK

Any help would be great, thanks.


In reply to John Paul Posada

Re: Coderunner on Docker fails to perform testsubmit.py

by Richard Lobb -
I can't replicate this, so I'm not sure what to suggest. Can you confirm, please, that you're using the standard ready-build JobeInABox from docker hub?  

It still looks like a process limit is being reached here. I see your controlled forking is now working, so you must have been able to fork 10 times. However, the parallel submissions check will require at least 20 forks (one for each shell, which then forks again to run the compile) and possibly several times that.

Can you confirm please that if you type the command ulimit on the machine on which you run docker, that you get the response unlimited? Also please confirm you get the same response if you run the ulimit command inside the container.

How many cores/CPUs on the machine you're running? [If you exec bash in your container, run top, and hit the 1 key, it will show you the number of CPUs available.] I always run with 8 cores and it's realistic to attempt to run 10 parallel tasks on such a machine. The HTTP requests to Jobe are fielded by Apache which sets up a number of worker threads depending on the number of cores. So, if you're trying to run 10 tasks on a machine with 2 cores, say, I would expect problems, though not that particular error message. 


In reply to Richard Lobb

Re: Coderunner on Docker fails to perform testsubmit.py

by John Paul Posada -
Hi Richard,

Thanks for the quick reply. The ulimit command both on the machine and the container returned the expected unlimited response.

As for CPU Cores this server only has 2 unfortunately. That could explain why a course with 170 students was having issues with sporadic error messages a couple of days ago. The hosting of JOBE here was only meant to be for a proof of concept a few years ago.

In your opinion how many users do you think I could have hitting the server answering CodeRunner questions from another server hosting Moodle at any one time with these specs:

lscpu

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              2
Core(s) per socket:              1
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:                        1
CPU MHz:                         2300.092
BogoMIPS:                        4600.18
Hypervisor vendor:               Xen
Virtualization type:             full
L1d cache:                       32 KiB
L1i cache:                       32 KiB
L2 cache:                        256 KiB
L3 cache:                        45 MiB
NUMA node0 CPU(s):               0,1
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; Load fences, __user pointer sanitization
Vulnerability Spectre v2:        Vulnerable: Retpoline without IBPB
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtr
                                 r pge mca cmov pat pse36 clflush mmx fxsr sse s
                                 se2 ht syscall nx pdpe1gb rdtscp lm constant_ts
                                 c rep_good nopl xtopology eagerfpu pni pclmulqd
                                 q ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movb
                                 e popcnt tsc_deadline_timer aes xsave avx f16c 
                                 rdrand hypervisor lahf_lm abm fsgsbase bmi1 avx
                                 2 smep bmi2 erms invpcid xsaveopt



In reply to John Paul Posada

Re: Coderunner on Docker fails to perform testsubmit.py

by Richard Lobb -
Ah, with only 2 cores the situation becomes a bit clearer. I'm not seeing a memory size in the above specs - are you perhaps light on memory as well? My guess at what has been happening in your testsubmit runs is that 10 jsimultaneous jobs (each with a 2 second wait inserted to ensure they don't finish quickly) has overloaded your 2-core server. I've just read that the "resource temporarily unavailable" can be issued when a fork fails due to lack of memory as well as due to hitting the 'numprocs' limit. 

The first thing you should do is edit <jobehome>/application/config/config.php. Change the value of 'jobe_max_users' from 8 to 2, i.e. edit the line to

        $config['jobe_max_users'] = 2
Re-run the installer (<jobehome>/install).

Now rerun the testsubmit program. What happens? 

I've just changed the Jobe code to set the default value of the 'jobe_max_users' config parameter to the number of CPUs rather than the current default of 10. The change will go out on the next push.

You say your course of 170 students had sporadic error messages. What were the error messages? If a free Jobe user can't be allocated to run the job within 10 seconds, you should get a quite explicit message that the Jobe server has overloaded. However, if the server overloads prior to the jobe_max_users limit being reached - which seems to be what is happening with your testsubmit runs - you'll probably get more inexplicable errors.

You ask how many students a 2-core Jobe server can handle. I really can't answer that question as it depends on how much memory and CPU time a typical student run takes and the rate at which jobs are being submitted to the server. You can do a back-of-the-envelope calculation like the following (after measuring how long a typical student job takes):

total_cpu_time_required = num_students * time_per_run * num_coderunner_questions_in_quiz * num_tries_per_question

Compare that with:

total_cpu_time_available = quiz_duration * num_cores

The latter should be many times larger than the former in order to allow for the bursty nature of student submissions (usually the start of a timed quiz is the busiest) and for the Apache and communication overheads.

But that's still a very crude calculation and can be complicated by things like excess regradings at the end of a timed quiz (if all students finish together). You really need to measure what's happing. I usually run top on our jobe servers during a test or exam: the instantaneous load factor should stay below about half the number of CPUs for comfort.

I'd recommend you upgrade to an 8-core Jobe server as soon as possible, regardless of the above calculations.




In reply to Richard Lobb

Re: Coderunner on Docker fails to perform testsubmit.py

by John Paul Posada -

Hi again Richard.

First, thanks so much for your support.
Since my last post I've upgraded the server. It is now 8-core with 32GB of ram:

top - 16:49:20 up 1:18, 0 users, load average: 0.14, 0.36, 0.37

Tasks: 13 total, 1 running, 12 sleeping, 0 stopped, 0 zombie

%Cpu0 : 8.1 us, 3.0 sy, 0.0 ni, 88.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu1 : 0.3 us, 1.0 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu2 : 12.8 us, 2.0 sy, 0.0 ni, 85.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu3 : 0.7 us, 0.7 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu4 : 8.0 us, 2.7 sy, 0.0 ni, 89.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu5 : 1.7 us, 0.7 sy, 0.0 ni, 97.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu6 : 9.4 us, 2.0 sy, 0.0 ni, 88.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

%Cpu7 : 9.3 us, 1.0 sy, 0.0 ni, 89.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.3 st

MiB Mem : 32011.0 total, 28353.7 free, 1200.6 used, 2456.7 buff/cache

MiB Swap: 8064.0 total, 8064.0 free, 0.0 used. 30162.0 avail Mem

 

lscpu

Architecture: x86_64

CPU op-mode(s): 32-bit, 64-bit

Byte Order: Little Endian

Address sizes: 46 bits physical, 48 bits virtual

CPU(s): 8

On-line CPU(s) list: 0-7

Thread(s) per core: 2

Core(s) per socket: 4

Socket(s): 1

NUMA node(s): 1

Vendor ID: GenuineIntel

CPU family: 6

Model: 79

Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz

Stepping: 1

CPU MHz: 2300.004

BogoMIPS: 4600.12

Hypervisor vendor: Xen

Virtualization type: full

L1d cache: 128 KiB

L1i cache: 128 KiB

L3 cache: 45 MiB

NUMA node0 CPU(s): 0-7

I edited the max users setting to 8 now, but noticed it had been originally set to 10.

 $config['jobe_max_users'] = 8;

I restarted jobe in docker and ran the testsubmit.py script, but am seeing the same errors:


Supported languages:
c: 9.4.0
cpp: 9.4.0
java: 16.0.1
nodejs: 10.19.0
octave: 5.2.0
pascal: 3.0.4
php: 7.4.3
python3: 3.8.10



Valid Python3 OK
Python3 with stdin OK
Syntactically invalid Python3 OK
Python3 runtime error OK
Python3 file I/O OK
Testing use of interpreter args with Python3 OK
Testing use of runargs args with Python3 OK
Python3 program with customised timeout OK
Python3 program with support files OK
Valid Python3/pylint program OK
Invalid Python3/pylint program OK
UTF-8 output from Python3 (will fail unless Jobe set up for UTF-8) OK
Test good C hello world OK
Test compile error C hello world OK
Test use of compileargs with C OK
Test runtime error C hello world OK
Test timelimit on C OK
Test outputlimit on C OK
Memory limit exceeded in C (seg faults) OK
Infinite recursion (stack error) on C OK
C program controlled forking OK
A C program with ASCII non-UTF-8-compatible output OK
Valid Octave OK
octave with stdin OK
Syntactically invalid Octave (treated as runtime error) OK
Syntactically valid Nodejs hello world OK
Syntactically invalid Nodejs OK
Correct Php program OK
Syntactically incorrect Php program OK
Syntactically incorrect Php program OK
Correct Java program OK
Correct Java program without supplied sourcefilename OK
Syntactically incorrect Java program OK
Java program with a support class (.java) OK
Java program with Unicode output (will fail unless Jobe set up for UTF-8) OK
Test good C++ hello world OK
Test compile error C++ hello world OK
Good Hello world Pascal test OK
Fail Hello world Pascal test OK



39 tests, 39 passed, 0 failed, 0 exceptions



Checking parallel submissions
Doing child 0
Doing child 1
Doing child 2
Doing child 3
Doing child 4
Doing child 5
Doing child 6
Doing child 7
Doing child 8
Doing child 9



***************** FAILED TEST ******************

{'run_id': None, 'outcome': 11, 'cmpinfo': "/var/www/html/jobe/application/libraries/../../runguard/runguard: cannot start `sh': Resource temporarily unavailable\nTry `/var/www/html/jobe/application/libraries/../../runguard/runguard --help' for more information.\n", 'stdout': '', 'stderr': ''}
C program to check parallel submissions
Jobe result: Compile error

Compiler output:
/var/www/html/jobe/application/libraries/../../runguard/runguard: cannot start `sh': Resource temporarily unavailable
Try `/var/www/html/jobe/application/libraries/../../runguard/runguard --help' for more information.

************************************************


***************** FAILED TEST ******************

{'run_id': None, 'outcome': 11, 'cmpinfo': "/var/www/html/jobe/application/libraries/../../runguard/runguard: cannot start `sh': Resource temporarily unavailable\nTry `/var/www/html/jobe/application/libraries/../../runguard/runguard --help' for more information.\n", 'stdout': '', 'stderr': ''}
C program to check parallel submissions
Jobe result: Compile error


Compiler output:
/var/www/html/jobe/application/libraries/../../runguard/runguard: cannot start `sh': Resource temporarily unavailable
Try `/var/www/html/jobe/application/libraries/../../runguard/runguard --help' for more information.

************************************************


***************** FAILED TEST ******************
{'run_id': None, 'outcome': 11, 'cmpinfo': "/var/www/html/jobe/application/libraries/../../runguard/runguard: cannot start `sh': Resource temporarily unavailable\nTry `/var/www/html/jobe/application/libraries/../../runguard/runguard --help' for more information.\n", 'stdout': '', 'stderr': ''}
C program to check parallel submissions
Jobe result: Compile error


Compiler output:
/var/www/html/jobe/application/libraries/../../runguard/runguard: cannot start `sh': Resource temporarily unavailable
Try `/var/www/html/jobe/application/libraries/../../runguard/runguard --help' for more information.


************************************************

***************** FAILED TEST ******************

{'run_id': None, 'outcome': 11, 'cmpinfo': "/var/www/html/jobe/application/libraries/../../runguard/runguard: cannot start `sh': Resource temporarily unavailable\nTry `/var/www/html/jobe/application/libraries/../../runguard/runguard --help' for more information.\n", 'stdout': '', 'stderr': ''}
C program to check parallel submissions
Jobe result: Compile error

Compiler output:
/var/www/html/jobe/application/libraries/../../runguard/runguard: cannot start `sh': Resource temporarily unavailable
Try `/var/www/html/jobe/application/libraries/../../runguard/runguard --help' for more information.

************************************************


C program to check parallel submissions OK
C program to check parallel submissions OK
C program to check parallel submissions OK
C program to check parallel submissions OK
C program to check parallel submissions OK
C program to check parallel submissions OK
All done

Testing a submission with an excessive cputime parameter
OK

 Any ideas? Just a bit concerned for the upcoming tests. 

In reply to John Paul Posada

Re: Coderunner on Docker fails to perform testsubmit.py

by Richard Lobb -
Ouch. That is a bit concerning. And also baffling. I've fired up lots of Jobe servers myself, and there are literally thousands of sites running CodeRunner. I've never seen this problem myself and no-one else seems to be reporting it in recent years. Though I do recall issues with Jobe on CentOS years ago and CentOS is/was some sort of RHEL clone, as I recall. Unfortunately I don't recall exactly what the issues were.

Are you always running your containers on RHEL? I know it has lots of extra security features, so perhaps it is applying some resource limits, though I don't understand exactly what they might be to cause those symptoms. Are you able to try Jobe on any other Linux version?

On the plus side, if you have an 8-core server with lots of memory and only 170 students in the class, I'd say you were unlikely to find the server trying to run 8 jobs at once, unless they're slow jobs. A typical C or Python run takes around 120 msecs but if you're running Java I'd be more nervous. What language are you using and how many CodeRunner questions do you have?

I'd like to get to the heart of this but I suggest we take it off line until we figure out what the problem is. I'll email you, and send you a recent version of testsubmit.py that measures throughput.
In reply to Richard Lobb

Re: Coderunner on Docker fails to perform testsubmit.py

by John Paul Posada -
Sounds great. Looking forward to solving this.