Good question. I imagine there are many opinions on this topic. Some initial thoughts:
How are opponent-adjustments applied? Is it a season-long performance adjusted against the composite strength of the opposition faced? Is it applied game by game? Is it applied possession by possession or play by play?
I'm of the opinion, like Ed, that the specific results do matter and that beating a good team and losing to a poor one is not the same as beating the poor team and losing to the good one. Not precisely the same, anyway. As Matthew Smith pointed out, and I have discussed at FO, my FEI rating system includes a 'relevance' factor as part of the opponent-adjustment. Generally speaking, this means that games against teams that are of similar strength receive more weight in my formula, but I also add more weight to bad losses.
In the current issue of ESPN the Magazine, Peter Keating wrote a college football bit about the SRS (simple rating system), trumping it up as the best rating system in existence. It's absurd to consider any individual system the best without a lot of data to back it up, and Keating's column is pretty thin in that regard. Aside from the hyperbole, the article did get me to thinking about a key aspect to SRS that Keating valued -- the output is represented as the "points better than average" of each team. i.e., Alabama is 30 points better than average team, Oregon is 28 points better, Notre Dame is only 22 points better, etc.
Most systems represent their data this way. Maybe not in terms of points, but in terms of relationship to average. FEI is represented this way. The thing that jumped out at me from Keating's article was that we take the transitive property for granted when we represent our data this way. Alabama is 30 points better than average, Notre Dame is 22 points better than average, ergo, Alabama is 8 points better than Notre Dame. It makes intuitive sense to draw that conclusion, but I wonder if that's actually what the numbers themselves mean. Does a great team's relationship to an average team have a linear relationship to its relationship to another great team?
I represent strength of schedule as an elite team's likelihood of going undefeated against the schedule. That's quite different from measuring its ability to play against a schedule full of average teams. Should the rating system be reconstructed to measure each team against an elite team rather than an average one?