Thursday, July 05, 2007

CS180: Writing easy-to-grade labs

Maybe I'm having delusions of grandeur, but I think this should be possible. I'm currently envisioning a method of writing labs that allows a script to perform automatic grading (yes that's right, I've gone mad) for the ease of the lab TA. A little adjusting to how comments in the skeleton are done can go a long way. I'm kind of lazy, so I'll be using my super-trivial code example that I ranted about some time ago, with two TODOs added:

public boolean foo() {
    boolean b = bar();
    if (b == true) {
        /* TODO 1:  Return true */
        /* END TODO 1 */
    }
    /* TODO 2:  Write an else statement that returns false. */
    /* END TODO 2 */
}

Usually the skeleton code will consist of a bunch of blanks you need to fill in, so if we specify and ending line for said blank, then you can tell what the student typed in for the TODO. This is a pretty simple approach, and allows you to swap out TODO implementations (say, a student's implementation for the solution's), which can be done by a grading script. This method would also take away the problem of TODOs that don't compile by default, which is an issue for students who can't finish their labs. You can test one TODO at a time, using the solution for the rest of the program.

Edit:This doesn't really take away the problem of TODOs that come non-compiling, since students can't test their program until it does. Duh. More on this later, maybe. Suggestions welcome.

Theoretically, this idea could be extended to make a script generate the skeleton code as well, since it seems trivial to extract the text between the comments and call it a skeleton:

public boolean foo() {
    boolean b = bar();
    if (b == true) {
        /* TODO 1:  Return true */
        return true;
        /* END TODO 1 */
    }
    /* TODO 2:  Write an else statement that returns false. */
    else {
        return false;
    }
    /* END TODO 2 */
}

I can imagine creating a directory structure for creating labs (kind of like Rails...except not):

solution/
doc/

From this, the lab spec would be written inside doc (how this will be done is yet to be decided), and the solution in the solution folder. Running a script after the solution is finished would:

  1. Extract all of the solution text from the TODOs to create the skeleton in a skeleton folder
  2. Create a tests/ directory that has a subdirectory for each TODO, where the author can insert Java programs to test that TODO
  3. Generate the lab spec in HTML format in a www folder

This entire structure would be mailed out to all of the TAs, while the skeleton and www folders would be what was exposed to the students. When it came time for grading, you could have a generic script (I hope) that would take a test directory and a solution directory and run tests for each TODO on every student's files, generating text that would be mailed to students.

While automation would probably speed up grading by a lot, I still think there's room for grading things by hand. Emphasis on style, such as proper indentation, non-terrible horizontal whitespace) is much more difficult to automatically grade (if someone wants to write a script to do it, be my guest). Granting partial credit and giving advice for writing better code also has to be done manually, and I think it would be good if TAs at least spent time on this once every few labs (perhaps this should be designated; every third lab, maybe?). This process also doesn't help if we create write-from-scratch labs.

I think I've rambled enough for now; comments welcome, etc.

6 comments:

Unknown said...

My first thought when seeing this is wondering how you would deal with the students who failed to follow the directions and put their code... outside of those spots. Automated testing is unlikely to catch things like that; at the same time, they would be blatantly disregarding the directions, so... perhaps they should be punished for that.

More later (when I'm not at work).

saiyr said...

Yeah, I'm supposed to draft a document on how to complete labs. I think as long as it's explicitly state that everything has to be in between the begin/end comments, then maybe they don't deserve points for not being able to follow simple instructions. On the other hand, I doubt there would be so many rogue students that the grading TA couldn't adjust the comment positions to allow the automation to continue.

Luke said...

Sorry to burst your bubble but this is already the idea behind unit testing. Take a look at C#'s metadata and http://www.nunit.org/. It'd be reinventing the wheel to write a script to do this. We could use JUnit to test their methods and then students don't get screwed for having a 99% complete program that doesn't output correctly so they get a zero.

Of course, you'd also need to do black-box testing to ensure the program as a whole is correct as well but it should be a smaller chunk of the grade (less than 100%).

The downside to this is that we'd have to give them the framework to fill in. At least for the first half of the semester this method could be used. Once they start writing code from scratch (which I agree is really important), then black-box and manual testing is really the only way.

Unknown said...

API v. Scratch
I would got with a mix of both. The API method would definitely be easier, but as you have stated, they don't learn how to design the API's if they are only implementing them. Although it is a very tough to learn how to do well, this will be a good way to start learning.

As far as telling them they must implement everything between the lines... we must realize that these are CS180 students, some who may never programmed before. They may not see the solution that the lab writer sees. This can become very frustrating and can lead to a lot of confusion by them. Trivial examples like the ones given are pretty straight forward, but I remember when I was taking 180, I'm sure I wanted to take paths that the TODO didn't want me to take. Was it always the right path? No, probably not... but I wouldn't say it was necessarily _wrong_ either.

saiyr said...

The testing part could be unit tested, but you still require a script to substitute chunks of code out individually, because unit testing will fail if your code doesn't compile. Unit testing is meant to test correctness of compiling code, not whether or not code compiles; that's what the compiler is for.

saiyr said...

Actually, I take part of that back. Unit testing isn't fine grained enough, as far as I can tell, to test parts of a method. If you have three TODOs in a method, how would you grade them separately? I've done a fair share of unit testing with whatever ships with VSTS, and I'm pretty sure that it's not possible, but maybe I'm wrong.

In addition, a script that ran tests wouldn't really be hard to write at all. If each test had its own main, all you would have to do is compile each test and run it, which seems to be a fairly simple script to write.

Logan: CS180 will never be scratch labs; the mix was my idea from someone in CS240 telling me they didn't like having to write from scratch all of the time (probably not the type of response they expected, either). I think about a quarter of the labs will be labs written from scratch.

Concerning writing between the lines: If you write your TODOs to be fairly specific, then there shouldn't really be many solutions to it. If you leave the method blank, then they can do whatever they want, as long as it works. Maybe you could provide a more concrete example where this would happen. I think in many cases, filling in the blank describes what type of "problem" they're solving is, which kind of implies that there aren't many solutions. Making them fill in an entire method, however, allows more creativity. They should not be trying to alter the API, if that's what you're implying.