Programming statistics in R

Programming statistics in R

by Chris Sangwin -
Number of replies: 12

Is there any support for programming statistics in R?  I've had some interest from statisticians here about this possibility.

If not then how much work would this be to add?

Thanks,

Chris


In reply to Chris Sangwin

Re: Programming statistics in R

by Jenny Harlow -

I asked about this and we did a very little investigation about a year ago but then I ran out of free time and got distracted ...  We started by looking at a basic R install to see what configuration options would be useful.  I do remember noting down the --silent option.  There did seem to be potential for creating some resources that our stats students who need to use R but are not programmers could use to at least start finding their feet with it.

Jenny



In reply to Jenny Harlow

Re: Programming statistics in R

by Richard Lobb -

Just to give you a bit of background to Jenny's answer ...

When Jenny raised the question of R a year or two ago, my first response was that it would be trivial to ask R questions of the simple "write-an-R-function" or "write-an-R-program" variety. I installed R-base on our Jobe server (5 mins work) and we used a Python3 question (with R as the Ace language) to run R. That was easy enough. Here's a possible template:

import subprocess
r_prog = """{{ STUDENT_ANSWER | e('py') }}"""
r_prog += "\n" + """{{TEST.testcode | e('py')}}"""
with open('prog.r', 'w') as fout:
    fout.write(r_prog)

cmd = "R --slave --vanilla"
subprocess.call(cmd.split(), stdin=open('prog.r'), universal_newlines=True)

With that question type you can ask questions like the following:

Screen shot of simple R question

However, as I recall that wasn't the sort of question that Jenny thought the stats lecturers would want to ask their students, who aren't really programmers. So then you get into the much harder issue of "What sort of question to you really want to ask?".

Richard

In reply to Richard Lobb

Re: Programming statistics in R

by Chris Sangwin -

Thank you both for such swift and helpful replies.

I'm very reassured that CodeRunner can, at least at a technical level, accept R code.  I'm not sure exactly what my colleagues have in mind. We will be running on a server which also has my own STACK question type installed (https://github.com/maths/moodle-qtype_stack) and I think a combination of the normal Moodle questions, STACK and CodeRunner will be an interesting combination of tasks for students which combine mathematical and programming elements.

The 10^6$ question is always "What sort of question to you really want to ask?".  I'll talk with colleagues about that one...

Chris


In reply to Chris Sangwin

Re: Programming statistics in R

by Chris Sangwin -

I'm in the process of setting up CodeRunner to call "R" so that we can develop some introduction to R/stats courses here in Edinburgh.  I'm getting a strange setup problem, which I suspect is nothing to do with CodeRunner.

I've setup the latest version of Jobe, and CodeRunner ($plugin->version  = 2017082200;)  etc. 

CodeRunner works just fine with Python, Java, C, and also with Octave.

I've created the attached very simple python code which calls R, and this executes with the following result.

python3 r.py 
[1] 3
[1] 1.581139

So, I think I have R on my Jobe server, and python can call R in the way Jobe would expect to (permissions not withstanding....).

I'm using the sample R question referred to above, but, I get the error shown in the screen shot.

***Error***
Fatal error: couldn't allocate node stack

Having done a little bit of digging, I think this error is related to node.js.  I'm not sure.  Does anyone know what is causing this error and how I can fix it please?  

Thanks,

Chris  



Attachment Screenshot - 111517 - 154850.png
In reply to Chris Sangwin

Re: Programming statistics in R

by Richard Lobb -

This is probably just an out-of-memory error. I'd suggest using Customise > Advanced Customisation and setting MemLimit (MB) for this question to a something like 500. The default is 200 MB, which is probably not sufficient for Python + R together. Or you could try setting it to 0 to turn off memory limit checking altogether.

Richard

In reply to Richard Lobb

Re: Programming statistics in R

by Chris Sangwin -

Thanks Richard,

I've tried this.  Even with 0, to remove limit checking, this error persists.

Chris

In reply to Chris Sangwin

Re: Programming statistics in R

by Richard Lobb -

I've downloaded R and found the actual error you're getting. It's in r-source/src/main/memory.c:

    R_BCNodeStackBase =
        (R_bcstack_t *) malloc(R_BCNODESTACKSIZE * sizeof(R_bcstack_t));
    if (R_BCNodeStackBase == NULL)
        R_Suicide("couldn't allocate node stack");

So certainly it's a memory error - a failed call to malloc.

I tried making a simple R test question using exactly your prototype and of course it worked fine for me :)  Here's the proof:

Proof it works

I attach the exported Moodle XML question; please try importing and running that first off. If that works on your system too, then the problem is in the setting of the memory limit. But I suspect it will fail on your system too. In which case: tell me a bit about your Jobe server. What version of Linux is it running on? Were any non-standard actions taken during the install? 

Richard


In reply to Richard Lobb

Re: Programming statistics in R

by Chris Sangwin -
Thank you Richard,

This is now working.  I think the explicit memory limit 0 has fixed this.

You help is much appreciated.  We can take this from here.

Chris


In reply to Chris Sangwin

Re: Programming statistics in R

by Richard Lobb -

Good to know the problem is fixed. However, I'd advise against using 0 for the memory limit in a production question because a memory gobbling submission might then be able to cripple Jobe. It's safer to find a value at which a typical submission will run OK and then, say, double it. 

I'm also curious as to why your Jobe server seems more prone to memory limit problems than ours. Are you perhaps running it on a 32-bit OS? 

Richard

In reply to Richard Lobb

Re: Programming statistics in R

by Richard Lobb -

I've just been asked for more information about R questions, so I'm posting an XML export of two versions of a very simple R question. One version is like the above, using a customised python3 question. The other version is split into two: a prototype to define a new question type Rtest and the actual question that uses that prototype.

Obviously R needs to be installed on Jobe for either of these questions to run.

Please realise I'm not an R programmer and cannot offer any R-specific support.

Richard

In reply to Richard Lobb

Re: Programming statistics in R

by Tim Hunt -

Here is a strange behaviour of the demo question, which I have not yet had time to debug.

I was trying to get the question wrong with minimal editing, but changing the order of the two print statements from the right answer did not break it!

Like Richard, I am not an R expert, so I have no idea why this happens. It is not imporant, but it is curious.

In reply to Tim Hunt

Re: Programming statistics in R

by Richard Lobb -

Bizarre indeed. I can't replicate it myself - with both the questions in the XML export I posted earlier the output lines change order if I swap the print statements. If you Check an unchanged answer you get the same grading output as before but I've never seen that mechanism fail to notice the changed answer. It uses (via arrays_same_at_key_missing_is_blank) a full string equality test.

Richard