Quick News Spot

Revisiting the Kirby Index

By Michael Rosen

Revisiting the Kirby Index

Right after FanGraphs published my piece on the Kirby Index, the metric's namesake lost his touch. George Kirby's trademark command -- so reliable that I felt comfortable naming a statistic after him -- fell off a cliff. While the walk rate remained under control, the home run rate spiked; he allowed seven home runs in May, all on pitches where he missed his target by a significant margin.

Watching the namesake of my new metric turn mediocre immediately following publication was among the many humbling experiences of publishing this story. Nevertheless, I wanted to revisit the piece. For one, it's December. And writing the story led me down a fascinating rabbit hole: While I learned that the Kirby Index has its flaws, I also learned a ton about contemporary efforts to quantify pitcher command.

But first, what is the Kirby Index? I found that release angles, in concert with release height and width, almost perfectly predicted the location of a pitch. If these two variables told you almost everything about the location of a pitch, then a measurement of their variation for individual pitchers could theoretically provide novel information about pitcher command.

This got a few people mad on Twitter, including baseball's eminent physicist Alan Nathan and Greg Rybarczyk, the creator of the "Hit Tracker" and a former member of the Red Sox front office. These two -- particularly Rybarczyk -- took issue with my use of machine learning to make these predictions, arguing that my use of machine learning suggested I didn't understand the actual mechanics of why a pitch goes where it goes.

"You're spot on, Alan," wrote Rybarczyk. "The amazement that trajectory and launch parameters are strongly associated with where the ball ends up can only come from people who see tracking data as columns of digits rather than measurements of reality that reflect the underlying physics."

While the tone was a bit much, Rybarczyk had a point. My "amazement" would have been tempered with a more thorough understanding of how Statcast calculates the location where a pitch crosses home plate. After publication, I learned that the nine-parameter fit explains why pitch location could be so powerfully predicted by release angles.

The location of a pitch is derived from the initial velocity, initial release point, and initial acceleration of the pitch in three dimensions. (These are the nine parameters.) Release angles are calculated using initial velocity and initial release point. Because the location of the pitch and the release angle are both derived from the 9P fit, it makes sense that they'd be almost perfectly correlated.

This led to a reasonable critique: If release angles are location information in a different form, why not just apply the same technique of measuring variation on the pitch locations themselves? This is a fair question. But using locations would have undermined the conclusion of that Kirby Index piece -- that biomechanical data like release angles could improve the precision of command measurements.

Teams, with their access to KinaTrax data, could create their own version of the Kirby Index, not with implied release angles derived from the nine-parameter fit, but with the position of wrists and arms captured at the moment of release. The Kirby Index piece wasn't just about creating a new way to measure command; I wanted it to point toward one specific way that the new data revolution in baseball would unfold.

But enough about that. It's time for the leaderboards. I removed all pitchers with fewer than 500 fastballs. Here are the top 20 in the Kirby Index for the 2024 season:

And here are the bottom 20:

A few takeaways for me: First, I am so grateful Kirby got it together and finished in the top three. Death, taxes, and George Kirby throwing fastballs where he wants. Second, the top and bottom of the leaderboards are satisfying. Cody Bradford throws 89 and lives off his elite command, and Joe Boyle -- well, there's a reason the A's threw him in as a piece in the Jeffrey Springs trade despite his otherworldly stuff. Third, there are guys on the laggard list -- Seth Lugo and Miles Mikolas, in particular -- who look out of place.

Mikolas lingered around the bottom of the leaderboards all year, which I found curious. Mikolas, after all, averages just 93 mph on his four-seam fastball; one would imagine such a guy would need to have elite command to remain a viable major league starter, and that league-worst command effectively would be a death sentence. Confusing this further, Mikolas avoided walks better than almost anyone.

Why Mikolas ranked so poorly in the Kirby Index while walking so few hitters could probably be the subject of its own article, but for the purposes of this story, it's probably enough to say that the Kirby Index misses some things.

An example: Mikolas ranked second among all pitchers in arm angle variation on four-seam fastballs, suggesting that Mikolas is intentionally altering his arm angle from pitch to pitch, likely depending on whether the hitter is left-handed or right-handed. This is just one reason why someone might rank low in the Kirby Index. Another, as I mentioned in the original article, is that a pitcher like Lugo might be aiming at so many different targets that it fools a metric like the Kirby Index.

So: The Kirby Index was a fun exercise, but there are some flaws. What are the alternatives to measuring pitcher command?

Location+ is the industry standard. The FanGraphs Sabermetric library (an incredible resource, it must be said) does a great job of describing that metric, so I'd encourage you to click this hyperlink for the full description. The short version: Run values are assigned to each location and each pitch type based on the count. Each pitch is graded on the stuff-neutral locations.

Nobody seems particularly satisfied with Location+, including the creators of Location+ themselves. Because each count state and each pitch type uses its own run value map to distribute run value grades, it takes a super long time for the statistic to stabilize, upward of hundreds of pitches. It also isn't particularly sticky from year to year.

The newest version of Location+, which will debut sometime in the near future, will use a similar logic to PitchProfiler's command model. Essentially, PitchProfiler calculates a Stuff+ and a Pitching+ for each pitcher, which are set on a run value scale. By subtracting the Stuff+ run value from the Pitching+ run value, the model backs into the value a pitcher gets from their command alone.

Whether it's measuring the standard deviation of release angle proxies or the actual locations of the pitches themselves, this method can be defined as the "blob" method, assessing the cluster tightness of the chosen variable.

Max Bay, now a senior quantitative analyst with the Dodgers, advanced the Kirby Index method by measuring release angle "confidence ellipses," allowing for a more elegant unification of the vertical and horizontal release angle components.

The central concern with the Kirby Index and all the blob methods, as I stated at the time, is the single target assumption. Ideally, instead of looking at how closely all pitchers are clustered around a single point, each pitch would be evaluated based on how close it finished to the actual target.

But targets are hard to come by. SportsVision started tracking these targets in the mid-2010s, as Eno Sarris outlined in his piece on the state of command research in 2018. These days, Driveline Baseball measures this working alongside Inside Edge. Inside Edge deploys human beings to manually tag the target location for every single pitch. With these data in hand, Driveline can do a couple of things. First, they created a Command+ model, modifying the mean miss distances by accounting for the difficulty of the target and the shape of a pitch.

Using intended zone data, Driveline also shows pitchers where exactly they should aim to account for their miss tendencies. I'm told they will be producing this methodology in a public post soon.

In a perfect world, computers would replace human beings -- wait, let me try that sentence again. It is expensive and time-intensive to manually track targets through video, and so for good reason, miss target data belong to those who are willing to pay the price. Computer vision techniques present the potential to produce the data cheaply and (therefore) democratically.

Carlos Marcano and Dylan Drummey introduced their BaseballCV project in September. (Drummey was hired by the Cubs shortly thereafter.) Joseph Dattoli, the director of player development at the University of Missouri, offered a contribution to the project by demonstrating how computer vision could be used to tag catcher targets. The only limitation, Joseph pointed out, is the computing power required to comb through video of every single pitch.

There are some potential problems with any command measurement dependent on target tracking. Targets aren't always real targets, more like cues for the pitcher to throw toward that general direction. But Joseph gets around this concern by tracking the catcher's glove as well as his center of mass, which is less susceptible to these sorts of dekes. Still, there's a way to go before this method scales into a form where daily leaderboards are accessible.

Absent a raft of public information about actual pitcher targets, there instead can be an effort to simulate them. In his 2023 presentation, "Pitch trajectory density estimation for predicting future outcomes," Rice professor Scott Powers proposed a method to account for the random variation in pitch trajectories, in the process offering a framework for simulating something like a target. (I will likely butcher his methods if I try to summarize them, so I'd encourage you to watch the full presentation if you're interested.)

The Powers method was modified by Stephen Sutton-Brown at Baseball Prospectus, who used Blake Snell as an example of the way these targeting models can be applied at scale to assess individual pitchers. First, Sutton-Brown fit a model that created a global target for each pitch type, adjusting for the count and handedness of each batter. Then, for each pitcher, this global target was tweaked to account for that pitcher's tendencies. Using these simulated targets, he calculated their average miss distance, allowing for a separation of the run value of a pitcher's targets from the run value of their command ability.

On Twitter, I asked Lance Brozdowski what he saw as the gold standard command metric. He answered "Nothing," which sums up the problem well. This is a challenging question, and all the existing methods have their flaws.

There are ways that the Kirby Index could be improved, but as far as I can tell, the best way forward for public command metrics is some sort of combination of the final two methods, with active monitoring of the computer vision advancements to see if consistent targets can be established.

But one would imagine the story is completely different on the team side. By marrying the KinaTrax data with miss distance information, these methods could potentially be combined to make some sort of super metric, one that I imagine gets pretty close to measuring the true command ability of major league pitchers. (In a video from Wednesday, Brozdowski reported on some of the potential of these data for measuring and improving command, as well as their limitations.) The public might not be quite there, but as far as I can tell, we're not that far off.

Previous articleNext article

POPULAR CATEGORY

corporate

3756

tech

4014

entertainment

4601

research

2032

misc

4874

wellness

3633

athletics

4721