WAR, what is it good for?

An essay about advanced analytics and Wins Above Replacement which will be way more interesting than you think...

Feb 24, 2023

The first version of this piece was written back in 2014 and has been sitting around on my laptop ever since, mostly because back then nobody wanted to hear what I had to say, but now that I can afford to not give a damn, I finally decided to update it and post it.

It was written about the original version of the baseball metric WAR, but now there’s fWAR and rWAR and bWAR and WARP (which seems to indicate people continue to disagree about how to calculate this metric), but just last night I saw a TV graphic about a player that included his WAR, so this metric is still being used by the media and that’s worth talking about.

And if you hate all this stuff about baseball, you should know some readers complain I need to write more of it. Either way, I’ll write about another subject soon, although if you think about it – and I have –most of the time I’m writing and drawing about dumb stuff people do (which often includes me) and today is no exception, so keep reading and give it a chance.

Wins Above Replacement and the media’s use of advanced metrics

If terrorists ever take over a major league press box and threaten to execute every reporter who ever quoted the Wins Above Replacement metric in a story, but can’t recite the Wins Above Replacement formula, I’m pretty sure nobody would get out alive.

So why do reporters quote a metric we don’t fully understand?

It’s convenient.

A reporter doesn’t have to learn why it’s important to be able to throw an off-speed pitch in a 2-1 count, what a good route to a fly ball looks like or which play reveals an infielder’s arm strength. Instead of hanging out for hours and talking to players and coaches – which if you don’t really love baseball is a pain in the ass – just go to a website, look up a number and without much effort you can still sound like an expert.

And editors (who are generally just as mystified by advanced analytics as the rest of us) will publish those numbers because they sound scientific. We know that a big WAR number is better than a small WAR number, but are pretty much clueless as to how those numbers are put together.

So today, let’s open the hood and take a look.

A quote about WAR from the baseball website FanGraphs:

“Wins Above Replacement (WAR) is an attempt by the sabermetric baseball community to summarize a player’s total contributions to their team in one statistic. You should always use more than one metric at a time when evaluating players, but WAR is all-inclusive and provides a useful reference point for comparing players. WAR offers an estimate to answer the question, “If this player got injured and their team had to replace them with a freely available minor leaguer or a AAAA player from their bench, how much value would the team be losing?” This value is expressed in a wins format, so we could say that Player X is worth +6.3 wins to their team while Player Y is only worth +3.5 wins, which means it is highly likely that Player X has been more valuable than Player Y.

WAR is not meant to be a perfectly precise indicator of a player’s contribution, but rather an estimate of their value to date. Given the imperfections of some of the available data and the assumptions made to calculate other components, WAR works best as an approximation.”

OK, now take a deep breath and think

That explanation raises as many questions as it answers. Pay attention to words like “estimate” and “approximation” and “assumption” and the phrase “highly likely.”

The sabermetric community will offer vague conclusions expressed in specific numbers; Player X is worth +6.3 wins while Player Y is worth +3.5 wins, which sounds pretty damn specific and gives a scientific appearance to what FanGraphs then admits is a guesstimate.

Slog through the fine print of advanced analytics explanations and you often find a disclaimer about the accuracy of the numbers presented, but then those disclaimers are ignored or forgotten and the numbers are treated like Moses carved them on stone tablets.

Now let’s move on to the dubious claim that when estimating a player’s value “WAR is all-inclusive.”

Unless WAR manages to put a number on guys who show up hungover for day games, skip early work, or make teammates worse by encouraging them to go out and party (and players have been traded for doing just that) WAR is not “all-inclusive.”

One of the complaints ballplayers have about analytics is the tendency to ignore and/or discount anything they don’t know how to measure and analytics advocates have yet to develop a metric that deducts points for banging a teammate’s wife, although now that I think about it, there clearly ought to be one.

There are things that matter that can’t be measured.

Now back to FanGraphs:

The WAR formula

“While WAR is not as complicated as some might think, it does require a good bit of information to calculate and understand.

Calculating WAR, especially for position players, is simpler than you’d think.

To calculate WAR for position players you want to take their Batting Runs, Base Running Runs, and Fielding Runs above average and then add in a positional adjustment, a small adjustment for their league, and then add in replacement runs so that we are comparing their performance to replacement level rather than the average player. After that, you simply take that sum and divide it by the runs per win value of that season to find WAR. The simple equation looks something like this:

WAR = (Batting Runs + Base Running Runs +Fielding Runs + Positional Adjustment + League Adjustment +Replacement Runs) / (Runs Per Win)”

OK, right about here let’s take a moment and come up for air.

FanGraphs might believe WAR is not as complicated as some might think, but I have to pop a Xanax and get the advice of a Certified Public Accountant whenever I balance my checkbook so I’m guessing I’m not part of their target audience and if you don’t think a pocket protector makes a swell Christmas gift, it’s a pretty good bet you aren’t either.

Remember: this is what analytics advocates consider simple, so now let’s see what it looks like when things get a bit more complicated.

Batting Runs; a look at how just one part of WAR is calculated

“To calculate Batting Runs Above Average you only need to know three things about a player and several things about the league in general. You need the player’s wOBA, PA, and home park factor and you need League Average wOBA (lgwOBA), the wOBA Scale, MLB R/PA (lgR/PA), and the specific league (AL or NL) wRC and PA for non-pitchers.

The first step is to find Weighted Runs Above Average (wRAA) from the player’s wOBA, or you may simply find their wRAA on FanGraphs. To calculate their wRAA, do the following:

wRAA = ((wOBA – lgwOBA)/wOBA Scale) * PA

wRAA is simply a non-adjusted Batting Runs. To adjust wRAA for park and league, you do the following with the park factor expressed as a decimal (i.e. 0.95 for 95):

Batting Runs = wRAA + (lgR/PA – (PF*lgR/RA))*PA + (lgR/PA – (AL or NL non-pitcher wRC/PA))*PA”

After reading that formula I’d expect to be able to make an atomic bomb in my basement or distill rocket fuel from household cleaners; not hazard what amounts to a wild-ass guess about a ballplayer’s worth.

But as they say in the Ginsu knife commercials: wait…there’s more.

That’s just the beginning

That’s just the formula for batting runs; we still have Base Running Runs, Fielding Runs, Positional Adjustment, League Adjustment, Replacement Runs and Runs Per Win to factor in.

And according to FanGraphs calculating WAR for pitchers is even more complicated.

Get even one factor wrong – include something that shouldn’t be included or fail to include something that should – and the formula won’t be accurate. And to top it off, analytics advocates don’t agree on how Wins Above Replacement should be calculated.

This next quote is from Baseball Reference:

“There is no one way to determine WAR. There are hundreds of steps to make this calculation, and dozens of places where reasonable people can disagree on the best way to implement a particular part of the framework. We have taken the utmost care and study at each step in the process, and believe all of our choices are well reasoned and defensible. But WAR is necessarily an approximation and will never be as precise or accurate as one would like.”

After jumping through all the mathematical hoops, you still wind up with a number that’s an “estimate” or an “approximation.” Something any big league coach could offer without all the smoke and mirrors.

Undue complication

In the Big Leagues, player skills are rated from 20-to- 80 with 50 being major league average (or depending on the team, 2-to-8 with 5 being average) so when a coach says a right fielder has a 65 arm everybody knows what he means: that guy’s arm is better than average, but not top of the line.

Everybody understands each other.

You can’t say the same thing about advanced analytics.

Some people get paid for knowing things and some people get paid for keeping other people from knowing things. Doctors and lawyers and people trying to get you to invest in cryptocurrency, use language designed to obscure, not reveal, because that’s part of their mystique and why they get paid: you don’t know what they’re talking about and you’re not part of their club, so sit down, shut the fuck up and let them do their thing.

The people who investigate financial hijinks say “undue complication” is often a sign of fraud; make it so complicated nobody understands or questions what you’re doing.

Go back to the explanation of WAR and FanGraphs describes it as a “useful reference point for comparing players.”

But instead of comparing an existing player to other existing players – like how does your second baseman stack up to every other second baseman in the American League, which might actually be useful – they compare existing players to a mythical “replacement” player who performs at an imaginary AAAA level and then express the existing player’s worth in a “wins format” that takes a team of squabbling physicists to calculate.

Examine how it’s calculated and it turns out WAR isn’t quite as scientific or useful as its advocates would like us to think, but coming up with advanced metrics is how these guys get a seat at the table and they’re not going to stop doing it.

So the main problem here is the media quoting metrics we don’t fully understand which gives those metrics credibility with the public and maybe we ought to do some research and figure out which metrics are worthwhile and which ones aren’t.

Wins Above Replacement is a flawed metric that tells us less than the people who quote it want us to think and it’s time someone said so and I just did.

Although it took most of a decade for me to do it.