Hey all,
Pretty pumped about finding this discussion board. I'm new to programming, but have been dabbling in SQL and R for a little over a month now. Here is my best attempt to keep this question short:
Last week, I created a script for the Lahman Database to examine when players peaked offensively (using Bill James' "technical" runs created formula and OPS) in relation to their age, the position they played, their total plate appearances and games they've played up until that year of their career, the nth season of their career in which they peaked, ect. Some more parameters I included were only selecting players after 1955 (when they started recording SF) and all players born before 1980 - meaning any player still playing today would be at least 35 years old and hypothetically has probably already reached and surpassed their career peak. Also, each player had to of received at least 500 plate appearances in that season.
The problem I have run into: My current query brings back the same player multiple times and I'm trying to only return their best offensive season (ex: if you run the script, Barry Bonds' 4 best seasons will all be at the top of the list and I would only like it to return his single greatest season - according to runs created per plate appearance. I've tried numerous SELECT DISTINCT's, HAVING/GROUP BY clauses, and nested SELECT's, but haven't been able to work it out. If anyone could assist me in any way possible, I would appreciate it so much! Thanks!
Here is the current query and I'll attach the current exported results (DataFrame_Wduplicates.csv):
"
SELECT CONCAT(m.nameFirst, ' ',
m.nameLast) AS Player,
b.teamID AS Team,
b.yearID - m.birthYear AS Age,
f.POS AS Pos,
b.yearID AS Current_Year,
b.AB+b.BB+b.HBP+b.SF+b.SH AS PA,
((b.H+b.BB+b.HBP)/(b.AB+b.BB+b.HBP+b.SF)+((b.H+b.2B+2*b.3B+3*b.HR)/b.AB))
AS OPS,
((b.H+b.BB+b.HBP-b.CS-b.GIDP)*((b.H+b.2B+2*b.3B+3*b.HR)+.26*(b.H+b.HBP-b.IBB)+.52*(b.SH+b.SF+b.SB)))/(b.AB+b.BB+b.HBP+b.SH+b.SF)
AS RC,
((b.H+b.BB+b.HBP-b.CS-b.GIDP)*((b.H+b.2B+2*b.3B+3*b.HR)+.26*(b.H+b.HBP-b.IBB)+.52*(b.SH+b.SF+b.SB)))/(b.AB+b.BB+b.HBP+b.SH+b.SF)
/ (b.AB + b.BB + b.HBP + b.SF+b.SH) as RC_per_PA,
(SELECT SUM(b2.g) FROM Batting
b2
WHERE b2.playerID = b.playerID
AND b2.yearID <= b.yearID) AS
Total_Games,
(SELECT
SUM(b3.ab+b3.bb+b3.sf+b3.hbp+b3.sh) FROM Batting b3
WHERE b3.playerID = b.playerID
AND b3.yearID <= b.yearID) AS
Total_PA,
(SELECT b.yearID-MIN(b4.yearID)
FROM Batting b4
WHERE b4.playerID = b.playerID
AND b4.yearID <=b.yearID) AS
nth_Season
FROM Batting b,
Master m,
Fielding f
WHERE b.playerID = m.playerID
AND b.playerID = f.playerID
AND b.yearID = f.yearID AND
f.POS != 'OF'
AND b.yearID >= 1955 AND f.GS
>= 100
AND m.birthYear < 1980
HAVING PA > 500
ORDER BY RC_per_PA DESC;
"