SQL .db Size Limits

SQL .db Size Limits

by Constantine Zakkaroff -
Number of replies: 3

Hello Richard & ALL,

I'm playing with larger number of rows (500K) in a table in a .db SQLite files, and, naturally, larger support file sizes. In this instance the .db file size is 97.5 MB which gets compressed to 21.2 MB in a .zip file. Then, the template is customised to extract the .zip file.

Then I set these advanced customisation options:


So far so good — the question runs OK, but it seems these advanced customisations don't have much effect on the Jobe running environment, because I print these lines in the customised Python template:

Available memory: 15.63 Gib
Total disk space: 47 GiB
Used disk space: 14 GiB
Free disk space: 30 GiB
Question files:
prog.in: 0.0MB
nyc311_500K.15C.db.zip: 21.287MB
prog.err: 0.0MB
nyc311_500K.15C.db: 97.586MB
prog.out: 0.0MB
__tester__.python3: 0.003MB
commands: 0.0MB
prog.cmd: 0.0MB
nyc311_500K.15C: 97.586MB
Total: 216.462MB

So, my question is this: if I were to use a larger file, say, twice a big (1M rows) which is 195 MB uncompressed and 41.8 MB compressed, then, pretty much no matter what advanced customisation options I use, the question always crashes, resulting in this error message:



Where do you think it may fail, even though requiring about 500 MB on disk shouldn't be a big deal it seems?

What else can I try to work this out?

Kind regards,
Constantine

Tags:
In reply to Constantine Zakkaroff

Re: SQL .db Size Limits

by Richard Lobb -
Well, I don't really agree that a 500 MB disk file isn't that big a deal. In the context of web servers, that's huge. As discussed in earlier emails (which I forgot to post here as promised, sorry), php.ini parameters like post_max_size, upload_max_filesize and memory_limit are all likely to be getting in your way, preventing the file from being uploaded.

The disklimit parameter is relevant only to the runguard sandbox within Jobe - it's one of the various resource-limiting parameters that prevents jobs from breaking Jobe. It's relevant only once the job starts executing, which is after all the environment has been set up (which involves setting up a working directory containing all the required code and data files etc; the uploading of the file to the Jobe file cache takes place in a separate HTTP operation before the main job request is even sent).

It seems you've been able to print all the uploaded files by modifying the Python template. Am I right in assuming that when you increase the file size, it no longer appears in the list? It's probably the php.ini settings on the Moodle server.




In reply to Richard Lobb

Re: SQL .db Size Limits

by Constantine Zakkaroff -
Hi Richard,

Thank you for your response. Ah, no, I didn't say 500 MB disk file, which is quite something, as you say, in the context of web servers.

The support file I'm working with is about 21 MB (zipped, 100 MB unzipped) with 500,000 rows in one table, plus a couple smaller tables.

It is fun to play with this dataset, which is a small part of this project, What a Hundred Million Calls to 311 Reveal About New York (https://www.wired.com/2010/11/ff_311_new_york/) — there are quite a few very interesting/advanced queries to run with that 500,000 row table and there's a lot of engagement in the class learning SQL among other things.

Then, if I use larger files, the Python template doesn't even get its turn to execute (I'm suspecting) — and the message is this: Jobe server request failed., this is where it most likely just fails to fetch the support file.

Kind regards,
Constantine



In reply to Constantine Zakkaroff

Re: SQL .db Size Limits

by Richard Lobb -
When running a job with support files, CodeRunner first tries to send the job request to Jobe assuming the file is already in the Jobe file cache (which is usually the case as most files are question authors' support files, used repeatedly). If Jobe responds with a 404, an HTTP PUT is used to send the file to Jobe and the job is re-tried. However, if the PUT fails you get the error message above:

Jobe server request failed. One or more of the specified files is missing/unavailable.
In your case the PUT has almost certainly failed because of the file size rather than because there's no such file: the error message isn't very helpful in this case, I admit.

You're into a very messy space here because there are just so many different parameters that you might be running foul of, on either the Moodle server or the Jobe server.

On my development environment, I tried running a CodeRunner question with a 95 MB file and it failed with the error message

Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 127291392 bytes) in /var/www/html/moodle/question/type/coderunner/classes/jobesandbox.php on line 296

On a production server you wouldn't see such errors because they're hidden from users. Note that although the file is only 95 MB, it's needing 268MB of memory to process it (it has to be Base64 encoded for one thing). This error is on the Moodle server, not the Jobe server.

Pinning down exactly what limit you're running into would take some time. But then, having done so, you'd have to persuade the appropriate system administrator to increase the limit just for your question. The larger one makes these limits, the greater the risk of resource depletion or denial of service attacks. So you might have trouble convincing the sysadmin.

My recommendation is that you find a way to achieve the same learning outcome with smaller files :)

Richard