Baseball, Analytics, and interpretation
THURSDAY- Analytics. It's taken over sports. It's revolutionized each sport in several areas, such as: scouting, development, play style, roster construction, progress/player evaluation, front office's, and pretty much anything else that is associated with professional sports. Any time a team elects to go for it on fourth down in a football game, the network will show the analytical percentage of converting and the consequences of each outcome. What started out as a highly debated and not openly embraced approach has become a focal point for every major sports organization.
So what are analytics, and where did they come from? By definition, analytics is the science of analysis. In other words, it's using data to better determine what leads to specific outcomes. Baseball is given a significant chunk of the credit for bringing analytics into the mainstream, the roots of which can be traced all the way back to 1964, with the release of Earnshaw Cook's book, Percentage Baseball. The publication is one of the first citing's of sports analytics to receive national attention. Another big pioneer was Bill James, who brought the Society of American Baseball Research (SABR) into national prominence.
The Oakland Athletics are regarded by many as the team to bring analytics into roster construction. This was in response to a change of philosophy from majority ownership, who wanted to cut payroll following the death of former Owner Walter A. Haas Jr., in 1995. The A's put together several good team's in the early 2000's utilizing the approach, with the 2002 version of the team being documented in the book and film Moneyball, but they were never able to win a championship.
Regardless, the A's ability to consistently have high-performing team's, despite being near the bottom of the MLB in total payroll, forced other franchises to change their approach. A simple example of this change is team's shifting their focus from batting average to on-base plus slugging percentage (OPS) to evaluate hitters. Batting average had become the standard for many years to determine who the best hitters where. It's a simple calculation done by taking the total number of hits a player has divided by the total number of at-bats they have. one problem with this is when a player is walked, hit by a pitch, sacrifice's, or there is a catcher's interference, it doesn't count as an at-bat. On-base percentage (OBP) gives an exact measure of how much a runner gets on-base because it is calculated by taking a player's total number of times reached base, and dividing it by their total number of plate appearances
The other issue with simply using batting average as a means of evaluation is that it doesn't take into account how many bases a player is getting on each hit. Slugging percentage (SLG%) solves this issue by assigning a number to each type of hit: 1 for a single, 2 for a double, 3 for a triple and 4 for a home run, and then dividing this number by the player's total at-bats.
So for example: let's say a player has 23 hits in 100 at-bats. Batting average is simple to find here, as it is simply 23/100 = .230. Not a great batting average when compared to the 2023 MLB average of .248. But let's say in those 100 at-bats they've actually gone to the plate 118 times because they've walked or been hit by a pitch on 18 different occasions, and within those 23 hits they have six home runs, a triple, seven doubles and nine singles. For on-base percentage we would add those 18 trips to the plate the player was hit or walked to his total hits and at-bats. This gives us 41/118, which gives us an OBP of .347, which would indicate the player gets on base at an above average rate compared to the 2023 MLB average of .320.
Now for slugging. We take their total home runs (6) and multiply the value by four, giving us 24. Now we'd take their one triple, multiple by three, and add to the 24, giving us 27. Next we take their total doubles (7) and multiply by two and add that to the 27, giving us 41, and we would finally add nine for their singles, giving us a final value of 50. We would now take 50 and divide it by the at-bats (not plate appearances), which would give us a SLG% of .500. By adding this to our previously found OBP of .347, we find this player would have an OPS of .847. Once again, if we compare the SLG% and OPS of this player to the 2023 MLB averages, we would once again find that they are an above average hitter, with those marks coming in at .414, and .734, respectively.
With this quick break down it's undeniable that analytics tell a better overall story about how players and teams value's. But despite these formulas and equations that can calculate things so accurately, baseball is still unpredictable. Every year there's a team that falls short of expectations, like last season's San Diego Padres, who analytically speaking were one of the best and most well-rounded teams in baseball. So are these equations flawed?
No, but perhaps the interpretation of how analytics should be applied is. I don't remember much from my AP Psychology class Junior year of high school, but the one thing that has been engrained in my brain all this time is "correlation does not equal causation". Baseball front office guru's must've not had Mr. Etheridge at La Costa Canyon High School, because baseball has embraced the opposite.
Former big league catcher, and World Series champion, AJ Pierzynski was mentioning on his podcast, "Foul Territory", that he had interacted with people from within the New York Yankees organization who said the organization trains every player the same way, making exit velocity (how fast the ball comes off a player's bat) and launch angle (the angle the ball comes off the bat) the focal point of their player development. This is because analytics shows a high correlation between a good batting average and balls that are hit hard at a specific launch angle. A great example that was shared was a game the club played during spring training, where player's took at-bats versus a live pitcher, and could only get on base by either drawing a walk, or hitting the ball 95 MPH or harder.
This reveals a dramatic over simplification of analytics. It highlights the biggest problem with simply looking at data to evaluate players: it removes the human element of games played by humans. While hitting the ball hard and at the right trajectory is fantastic, the process to doing it consistently, especially against big league caliber pitching, is much more complex. Players come is different in their physical build and playing abilities, and therefore need to have their own development plan tailored to them. Something that shouldn't be overly difficult with how crowded MLB front office's are now, and the data available.
Analytics should have baseball head's salivating at what they can do in terms of development, because it should give them the opportunity to be more personal and intentional with each player. I'm sure there are team's who use analytics in an extremely detailed way. I also don't mean to be going at the Yankees because I'm not in their organization, so I don't know how factual all that is. Nevertheless, there are several stories you can find if you dig about other organizations as well that share a similar sentiment.
No matter what tool's come along, baseball will always be a heavily situational game. Once again going back to the Padres last season, they were great in almost every significant category besides hitting with runners in scoring position, where the team finished 23rd out of 30 teams, and 19th in runners left in scoring position. Baseball requires the ability to adjust approach and strategic execution to consistently beat high-level team's, and yes, when that is combined with power and guys who take disciplined at-bats, it is the most dangerous.