Thursday, April 05, 2007

Java as a first programming language

A thread on the Something Awful forums came up recently about a high school, second-year computer science/programming course. The author was requesting help to convince his teacher to switch said course from using VB6 to C#. There were a number of suggestions and alternatives given (Java, VB.NET, Python, Scheme), but that's not as relevant to the CS environment at Purdue (thread is here if interested). I got into a couple of discussions about C# and Java, and it brings me back again to imagine what Purdue would be like if we didn't teach Java as a first programming language.

The issue I brought up was an argument against teaching students how to write procedural code in a purely object oriented language. In particular, someone suggested that teaching procedural code in C# was easy, since you could simply add the static keyword to class methods to write non-object oriented code. I argued that teaching the static keyword before telling students what it meant was a particularly bad idea. I've never been a fan of teaching stuff out of order (who is?), but it seems impossible to teach things in order in Java. This is plainly visible by Java's hello world program:

class Main {
    public static void main(String[] args) {
        System.out.println("Hello world!");
    }
}

This is incredibly ugly, and C# is guilty of the exact same thing. For beginners, I don't think this is acceptable. Unfortunately, all compiled languages that I know of are guilty of this to some degree; on the other hand, scripting languages can do hello world programs in one line. When you introduce this Java program to a student, here is what new programmers tend to think:

  1. What is public?
  2. What is static?
  3. What is "main"?

The list can continue (what is System, out, println, etc.), but the point is, there are a lot of questions to be answered that can't be answered without covering material that they aren't ready for. In particular, I believe the static keyword is a real killer, even for some people who've programmed before (obviously not in depth). Instead of reiterating everything I said in the thread, I'll just copy and paste like the resourceful (read: lazy) person I am:

I think the static keyword versus other language syntax is slightly different. Why should you have to teach the static keyword first in an object-oriented language? In terms of thinking about objects, static "breaks" the OOP paradigm. One of the C# compiler devs even argues that the static keyword shouldn't exist. Obviously you need some way to differentiate between instance and static methods, but I never really liked to use of the word "static" to do so. Maybe it's just me, but when I hear the word in a general context, I take it to either be in terms of static you see in TVs, or static as in unchanging, the latter of which is due to static/DHCP IPs. Are people really supposed to be able to tell what "static" means, just offhand?

OOP languages like C# and Java are actually backwards in terms of expressing methods and their arguments. If you were to write OOP code in C, the "instance" methods would be the ones with the extra typing (the first parameter being the object type), and the static methods would simply do the opposite. In Python, declaring methods is the exact same way. All instance methods in Python begin with a "self" whereas static methods simply don't have a self. You can also call instance methods with static syntax, which gives some insight into just how OOP methods are actually defined.

As far as learning is concerned, I think Python is better for teaching the difference between static/instance methods (other topics are debatable). I really don't like telling my students, "oh, don't worry about this huge, gigantic header you have to write for your main method--or what a main method actually is. You'll learn that later!" {}, (), [], and <> are introduced in a more appropriate order (though funny enough, I do have students that still don't get the paren), so I don't think it's [as] relevant.

A post from another user probably summarizes very well:

introducing students to computer science through languages which introduce syntax issues long before they introduce computer science concepts is a recipe for failure, and I completely agree. Aside from the utter boneheadedness of an "objects early" or "objects first" approach, you're giving students a gigantic chunk of boilerplate code and not explaining what any of it does. Seriously, just think about it: a student is not going to be thinking in object-oriented terms right off the bat, so how do you explain what this "public class" and "public static void main" business is? The Javaschool answer is "we don't, we just tell them that it's not important and they'll understand it later." This is not an acceptable approach.

From what I've observed, Java is taught at Purdue (and other schools) for a few reasons:

  • Java is the number one language used, at the moment
  • Java is heavily involved in other courses (for Purdue, often data structures and compilers), and not teaching Java would have a severe impact on the curriculum
  • It's a good gateway to other statically typed languages, unlike Python and company, because of syntax similarities

There was, however, a comment by the same user above, regarding Java in computer science programs:

I've been doing a lot of research into computer science education recently (I'm more or less rewriting a computer science curriculum for my college), and there have been numerous studies which show poor computer science retention rates in Java-based computer science programs (equivalent to ACM CS1 core) that have improved substantially when the course syllabus was switched to a simpler language like Python or Scheme which allows students to focus on computer science concepts rather than fighting with language syntax. Pummel them with Java later.

While the idea of teaching a language that isn't statically typed bugs me, maybe it's just my upbringing in C. As far as static methods in classes go, I would certainly agree that Python does a better job of getting the idea across, as said in my quote above. However, I also think Python offers a lot of freedoms that aren't available in C# and Java (as do most scripting languages), which might increase the difficulty of learning statically typed OOP; nevermind the syntax differences between Python and "C-family" languages.

On the other hand, Python's enforced proper block indentation also brings something refreshing to the table. I've seen some extremely bad indention in Java (literally different levels on each line in a block!), and I think Python has taken a good step by enforcing uniform indentation. Unfortunately, we don't see this in most languages, so this is a nice plus to using Python as a learning language.

Unfortunately, I'm still not sure teaching Python as a first language is a good idea. I sure don't like Java (for anything), but then again, I don't know of any languages that I would rather teach students that have advantages that are substantial enough to merit a change in a class syllabus. I don't think functional languages will ever fly here, regardless of how pretty or ugly they look; the idea of CS180 is to prepare students for CS240, which transitions from Java to C. I don't think the Pythonic way of doing things will prepare students for C; it may teach you how to program, but as much as it makes no sense, I'm guessing it would just pass along the problem to the CS240 instructors. And we all know what kind of monster that would awaken.

2 comments:

Luke said...

You bring up some interesting ideas here. Fortunately, Purdue teaches programming in other ways to non-CS majors and we can treat these as case studies. For instance, engineers first learn Matlab and then move to C. Physics majors learn Python but I don't think they ever learn anything more than that. Students definitely picked up Python the fastest but they learned no OOP in the process. So I guess the point of your post is that you'd like to see CS taught in a more in-order fashion (func-decomp -> OOP) in terms of paradigms and to do this, what language best allows you to first ignore OOP and then intuitively move into it?

This brings me to my real question. Why change the way its done? Because our retention rate is low and it may be too hard for people? Since when is the university responsible for rewarding you for your work or making sure that you do all your work? The way CS is set up now seems to ensure that only the people that really love CS or the really hard workers are getting through. I didn't struggle with CS freshman year because I came in with the mindset that college, in general, was going to be much harder than high school. I'd heard they don't "spoon feed" anymore and I adjusted for that. Also I know quite a few current juniors that had no previous programming but made it through 180 just fine.

So maybe the retention problem isn't with the introduction language or the way the intro course is taught at all. Maybe it's just that productivity and self-teaching expectations are higher than what some high schools prepare for. I believe the real problem is with high schools in the US. I don't think college should make any easier to accommodate for this and in fact, I wish it was harder.

saiyr said...

I guess what I really mean is, OOP is a great paradigm, but in a pure form, it doesn't really make much sense. You and I can easily make sense of what public static void main is, but introducing this to a student who hasn't programmed before is something I disagree with. Python allows for a pretty intuitive learning path, but I don't like dynamic typing very much, so bleh.

I think something should be done (not that I'm expecting anything to be) because I feel like Java is kind of a crappy language to be taught first; unfortunately, I have no better language suggestions (though C# is better in terms of features...still guilty of svm though), which is why nothing can really be done.

As a TA, I would like to see the retention rate go up, but also as a TA, I see a lot of students confused about certain things in CS180 that are taught oddly. One of the common confusions is related to the static keyword, since they aren't taught that for several weeks; I've seen more than one student try to solve "non-static blah blah cannot be referenced from a static context" by sticking static in everywhere. I think something like this implies that the pedagogy behind CS180 as it is is really messed up.

I know you're more of a gung-ho person than I am, and you get really excited about CS, which is great; but most of my friends that I talk to every day are pretty dispassionate about CS now (I guess we fall into "hard workers that complain a lot"). We all started out excited to be in CS, but it seems like the way things were taught really blew us off course.

I think this is another problem, but one that's essentially unsolvable. None of us have really struggled in CS, but I think that the retention rate could be higher if things were taught better, even without a language switch. A lot of people that drop out are by no means stupid. In fact, one of my students from a previous semester has already started feeling apathetic towards CS, for various reasons; he was by far one of the smartest in my class.

My sister, who came to Purdue CS before me, felt the same thing, as a lot of her friends did. Maybe we're just a whiny group of people, but it seems like turning off a group of smart people means something in CS is wrong. And now I'm getting kind of tangential, since th is has nothing to do with Java.

I definitely agree that some people just aren't prepared for the new workload, like you were, but I'm not sure I would agree with it being the sole reason for the low retention. Even so, I think the low retention rate is just part of the problem. We're clearly not doing something right when students don't seem to retain things they learn in previous classes. Hell, when we took compilers last semester, somebody still didn't know what the static keyword was. What's up with that? And we all should have learned about the stack and heap in CS250, but apparently some people didn't. Admittedly, these occurrences have been rare (the only two I can think of), but what if there are people thinking the same thing but don't want to embarrass themselves?

I also agree that I would like CS to be harder, in the sense that I would like to learn as much as possible, while I'm here. For me. There are some obvious tiers of students in our department, though, and I think it would be unfair to make it harder for more than just the most talented students (ego? :\). Enrollment and retention in CS at more than just Purdue seems to be on the decline; I don't think we need to help drive it down more.

And wtf is up with all these terrible commenting systems :\ hey guys like 30 columns of editing space lol ugh.