Issues caused by attachments

Issues caused by attachments

de Hongchuan Liu -
Número de respuestas: 14

I have been using Coderunner for four years. It can  run very stably with 400 students taking the exam simultaneously. Recently, I encountered the following problem while trying to use attachments in test questions.


This situation occurs in every question. Students have to click multiple times to score properly.

My Moodle is 3.6, Coderunner is v4.0.0, and Jobe is V1.6.6.

I installed Moodle 4.3 and they use the same Jobe server, but there is no such issue.

En respuesta a Hongchuan Liu

Re: Issues caused by attachments

de Richard Lobb -
The obvious explanation would appear to be that the Jobe server is overloaded. Do you have some reason to believe that that isn't the case?

A test with 400 students i getting into the danger zone where overloads of various sorts are possible, depending on both peak and averate rates of submission, language (Java and C++ are both bad), number of test cases, use of a combinator or not, compute time per question, frequency of timeouts (endless loops in student code), ...

Having attachments adds to the load on Jobe. Jobe maintains a file cache that maps from a file ID (the MD5 hash of its contents) to the actual file. Files are uploaded to the cache by HTTP PUT requests, and jobs are run with HTTP POST requests. To avoid unnecessary repeated file uploads, CodeRunner first tries to run a job (a POST) with just the required file-ids in the job specification. If the request fails with a 404 (because Jobe finds it does not have all the files it needs to run the job), CodeRunner then PUTs all the files and retries the POST. This strategy is a huge win with author supplied files, which only ever get uploaded once. It's also a win if students reuse the same file or if there are multiple test cases in the question and all run separately (i.e. no combinator or it's a "write a program" question type), using the same file(s). However, in a worst case scenario (a student provides just a single small file, never resubmits or retries the question, and the question runs all tests in a single run) it results in 3 round-trips to the Jobe server. This might push your Jobe server into overload if it's close to it on questions without attachments.

With tests of this size I recommend multiple Jobe servers - did you realise that in the CodeRunner settings you can list multiple Jobe servers separated by a semicolon? One gets chosen at random for each run.

I agree it's a bit surprising that you're not seeing the same issues with Moodle 4.3 and the same Jobe server. The protocol is unchanged. Were the class sizes in the two scenarios the same? One possible explanation could be that Moodle 4.3 is a bit slower, so can't throw jobs and Jobe quite as fast.
En respuesta a Richard Lobb

Re: Issues caused by attachments

de Hongchuan Liu -
I'm sure it's not due to the load. Because this situation is only discovered when the test question has attachments, that is, only one user is accessing Moodle.
I may have found the reason from your answer. In order to have hundreds of people participate in a test containing coderunner problems at the same time, I used a reverse proxy server to redirect requests to 5 job servers. When there are attachments in a test question, it needs to use several related requests to complete the scoring, but the proxy service sends these related requests to different servers.
When I tested Moodle 4.3, I did not use a reverse proxy, so this issue would not occur.
I saw in Coderunner V5 that it is possible to set up multiple job servers, but I am not using V4.
I am currently unable to upgrade to Moodle V4. If I upgrade, there is another issue. There is a significant difference in performance among my 5 job servers, and reverse proxies can perform load balancing, which allows the servers to fully utilize them. Setting up multiple servers in the coderunner is a random selection of servers, which cannot be achieved.
Therefore, the current issue is how to forward a set of related requests for the same test question to the same job server when using a proxy server?
En respuesta a Hongchuan Liu

Re: Issues caused by attachments

de Hongchuan Liu -
I tested Coderunner V5 and found that using a proxy server had the same issue, but setting up multiple job servers directly did not.
En respuesta a Hongchuan Liu

Re: Issues caused by attachments

de Tim Hunt -
I must say (possibly to Richard's horror) that we have been using Jobe behind a reverse proxy (we would call it a load balancer) for ages, and, seeminly, getting away with it.

That is possibly because a) we only have two servers behind the load-balancer, so b) by the time the author has finished creating and testing the questions, all the files are already cached on both back-ends.

I can certainly see why the current design of file handling, where they are cached, and with separate PUT requests, is good for efficently. However, certainly for us, it would be nice if there was a way this could be compatible with use of a load-balancer in front of Jobe.

Oh, here is a thought. Where in the file-system on the JOBE server are the files cached? Could this be made configurable, so this folder can be an NFS mount, so the same storage is stored between all the Jobe servers?
En respuesta a Tim Hunt

Re: Issues caused by attachments

de Hongchuan Liu -
That's it, when the Submit button is clicked multiple times, the attachment is saved to all the backends.

Configuring all attachments to be saved on the same server may be a solution.
En respuesta a Hongchuan Liu

Re: Issues caused by attachments

de Richard Lobb -
@Hongchuan Liu Good to know you have an explanation.

The solution to your problem is to configure your reverse proxy to use the special HTTP header "X-CodeRunner-Job-Id", which CodeRunner inserts into its HTTP requests. It's a random 8-digit hexadecimal integer and the same value is used for all HTTP requests to Jobe for a given job. I haven't ever configured a reverse proxy but I believe it's usually possible to key the target off a specified HTTP header.

This is the mechanism that CodeRunner uses to ensure that the same Jobe server is used for all requests for a particular job when multiple semi-colon-separated Jobe servers are available.

@Tim Hunt I think your OU proxy server uses this mechanism? In fact, I thought the request to add this header came from you or your team, many years ago?!
En respuesta a Richard Lobb

Re: Issues caused by attachments

de Tim Hunt -
LOL. I had completely forgotten that. That is an even better explanation for why it works for us!
En respuesta a Tim Hunt

Re: Issues caused by attachments

de Hongchuan Liu -
@Tim Hunt
I tried my best, but was not able to set it up successfully. Can you tell me how to set it up to maintain a persist connection? I use Apache HTTP server as a reverse proxy.
En respuesta a Hongchuan Liu

Re: Issues caused by attachments

de Tim Hunt -
I don't know about using Apache for this. A quick Google search found this: https://cwiki.apache.org/confluence/display/OFBIZ/Sticky+session+load+balancing+with+Apache+and+mod_balancer

That suggests that Apache can only do sticky sessions (session affinity) using a Cookie.

And, we are thinking about moving our Jobe back-end into AWS, and it seems that AWS load-balancing can also only to load-balancing based on a cookie set by the load balancer.

Richard, if we find it necessary to implement support for load-balancing cookies, in parallel with support for X-CodeRunner-Job-Id; and if we find a way to do this that does not make a terrible mess in the CodeRunner Jobe sandbox code; then I assume you would be happy to accept that into the code?
En respuesta a Tim Hunt

Re: Issues caused by attachments

de Hongchuan Liu -
@Tim Hunt
I have found a lot of information online, including what you suggested. The result is that when accessing the website "http://myip/job/index. php/restapi/" from a browser, the request is forwarded to the same backend, but CodeRunner's requests are still sent to different servers in sequence.
En respuesta a Hongchuan Liu

Re: Issues caused by attachments

de Richard Lobb -
I invited ChatGPT4 to tell me how to proxy pass requests to different Jobe servers depending on the header X_Coderunner_Job_Id and it gave me an excellent answer. I've checked it and it worked right first time. In my case, I set up two Docker containers running Jobe, one on port 4000 and the other on port 5000. I wanted to forward https requests to one of those two containers depending on the last character of the Job ID (which is essentially a random hex number in the range 0 - F). I added the code below to the SSL sites-enabled file. The code also logs each request to the access log in a format that tells you which container received the request - you'll probably want to delete the log stuff once you've convinced yourself it works.

I should mention that I'm only forwarding URLs that begin with /jobe, as the server does other stuff and I was just using it to check if the proxying worked.

# Configuration for Jobe requests, send to either of the two containers
    LogFormat "%h %l %u %t \"%r\" %>s %b \"Container: %{TARGET_CONTAINER}e\" %{Referer}i \"%{User-Agent}i\"" dynamic_container_log

    RewriteEngine On
    RewriteCond %{HTTP:X-CodeRunner-Job-Id} .(.)$
    RewriteRule . - [E=LAST_CHAR:%1]

    RewriteCond %{ENV:LAST_CHAR} [0-7]
    RewriteRule ^/jobe/(.*)$ http://localhost:4000/jobe/$1 [P,E=TARGET_CONTAINER:4000]

    RewriteCond %{ENV:LAST_CHAR} [8-9a-fA-F]
    RewriteRule ^/(.*)$ http://localhost:5000/$1 [P,E=TARGET_CONTAINER:5000]

    CustomLog ${APACHE_LOG_DIR}/access.log dynamic_container_log

Before doing this I followed ChatGPT's wonderfully clear instructions to enable the required modules:

sudo a2enmod proxy
sudo a2enmod proxy_http
sudo a2enmod headers
sudo a2enmod rewrite
sudo systemctl restart apache2

If you want all this explained to you in detail, just ask ChatGPT4 :-)

En respuesta a Tim Hunt

Re: Issues caused by attachments

de Richard Lobb -
@Tim Hunt
I'm of course happy to add stuff you suggest, provided it doesn't make a terrible mess of the code :-) I'm just not sure why it's necessary. I thought one of the great things about Amazon AWS was that it could supply you with elastic servers that expanded as required to handle the load. So I'm not sure why you'd want to set up multiple servers with your own load-balancer.
En respuesta a Richard Lobb

Re: Issues caused by attachments

de Tim Hunt -
Yes, the compute resource is all virtual and easy to scale - but the way that works involved lots of differnt virtual components all plugged together. I typical AWS architecture diagram look like this: https://docs.aws.amazon.com/whitepapers/latest/web-application-hosting-best-practices/an-aws-cloud-architecture-for-web-hosting.html.

Jobe won't look exactly like that, but there will be some number of instances of the JobeInABox container running, and there will be a single pubic IP/DNS name, and so there needs to be some sort of Load Balancer to route some requests to each server. That needs to know the stickiness rules, and it seem to only offer stickiness using a cookie it sets.
En respuesta a Tim Hunt

Re: Issues caused by attachments

de Richard Lobb -
@Tim Hunt
Wow - complex! Thanks for clarifying.
I don't think it should be too difficult to catch the cookie in the 404 response and ensure it gets used on all subsequent requests for that job, if that's all that's required.