Modeling Plate Appearance Outcomes in Baseball
A baseball game is a sequence of plate appearances. Every pitch, every swing, every called strike funnels toward a discrete outcome: a single, a double, a walk, a strikeout, a ground ball to shortstop. Prediction models work by estimating the probability of each possible outcome for every batter-pitcher confrontation, then aggregating those probabilities upward into inning-level and game-level forecasts. The plate appearance is where prediction begins.
The challenge is that the outcome space is not simple. A plate appearance can end in more than a dozen distinct ways, and the probability of each depends on the batter's profile, the pitcher's arsenal, the count, the handedness matchup, the park, and increasingly, the measurable quality of contact. Decomposing all of this into a coherent probability distribution is the core task of the plate appearance model.
The Outcome Space
Every plate appearance resolves into one of the following categories: single, double, triple, home run, walk (including intentional), strikeout (swinging or looking), hit by pitch, ground out, fly out, line out, pop out, fielder's choice, sacrifice bunt, sacrifice fly, double play, or error. Some of these are grouped together for modeling purposes. Sacrifice situations, for instance, are often folded into the broader ground out and fly out categories because their frequency is low and the circumstances that produce them are highly context-dependent.
The practical modeling question is how to assign a probability to each of these outcomes for a specific batter-pitcher matchup. The simplest approach uses historical rates: over the past three seasons, this batter has hit a single 16% of the time, walked 9% of the time, struck out 22% of the time, and so on. But raw rates ignore the matchup. The same batter's outcome distribution changes depending on whether he faces a left-handed sinker-slider pitcher or a right-handed four-seam-changeup pitcher. The model needs to account for this.
Pitch Mix and Outcome Probabilities
A pitcher's arsenal shapes the outcome distribution before a single pitch is thrown. A pitcher who relies heavily on a four-seam fastball and a slider will induce a different mix of contact types than one who throws primarily sinkers and changeups. Fly-ball pitchers produce more home runs and fly outs. Ground-ball pitchers produce more ground outs and double plays. The pitch mix determines the texture of the outcome distribution.
Models incorporate pitch-level data by building pitcher profiles that capture the usage rate, velocity, movement, and effectiveness of each pitch type. A slider with 36 inches of horizontal break and a 38% whiff rate is a different weapon than a slider with 28 inches of break and a 22% whiff rate. These characteristics translate into outcome-level effects. More whiffs means more strikeouts. More weak contact means more ground outs. Better pitch tunneling means fewer hard-hit outcomes overall.
How Pitch Type Influences Outcome Distribution
| Pitch Type | Primary Outcomes Affected | Key Metric |
|---|---|---|
| Four-Seam Fastball | Strikeout, fly out, home run | Whiff rate, induced vertical break |
| Sinker | Ground out, double play, single | Ground ball rate, induced arm-side run |
| Slider | Strikeout, weak contact, fly out | Whiff rate, horizontal break |
| Changeup | Ground out, weak fly out, strikeout | Velocity differential, drop |
| Curveball | Strikeout (looking), pop out | Called strike rate, vertical drop |
| Cutter | Weak contact, jam shots, ground out | Horizontal movement, velocity |
The interaction between pitch mix and batter tendencies creates the matchup-specific probability distribution. A batter who chases sliders at a 40% rate will have an elevated strikeout probability against a slider-heavy pitcher, even if his overall strikeout rate is below average. The model needs to capture these cross-interactions, not just treat batter rates and pitcher rates as independent inputs.
Platoon Effects and Handedness
Handedness is one of the strongest and most stable predictors of plate appearance outcomes. Right-handed batters perform systematically differently against left-handed pitchers than against right-handed pitchers, and vice versa. The platoon advantage manifests primarily in power numbers and contact quality. A left-handed batter facing a left-handed pitcher will typically see a reduction in home run probability, an increase in ground ball probability, and a shift in the singles distribution from line drives to weaker ground-ball singles.
The magnitude of the platoon effect varies by batter and by pitch type. A batter with a pronounced platoon split might see his home run probability drop by 50% or more against same-side pitching. A batter with minimal platoon splits might show only a 10% reduction. Models encode these splits at the individual level rather than applying a blanket league-average adjustment, because the variance across hitters is substantial.
Platoon effects also interact with pitch usage. Pitchers change their arsenal based on the batter's handedness. A right-handed pitcher might throw his slider 35% of the time to right-handed batters but only 20% to lefties, substituting additional changeups instead. This means the matchup-specific outcome distribution reflects not just the batter's performance split but also the pitcher's likely pitch selection adjustment.
Statcast Data and Contact Quality
The introduction of Statcast tracking in 2015 revolutionized plate appearance modeling by providing measurable data on the quality of contact. Before Statcast, a ground ball was a ground ball. Now, a model can differentiate between a 72 mph ground ball pulled to the second baseman and a 102 mph ground ball smoked through the hole at shortstop. Exit velocity and launch angle together define the quality of batted-ball events with remarkable precision.
Exit Velocity Distributions
Each batter generates a characteristic exit velocity distribution. Some batters produce a tight cluster of hard contact in the 95-105 mph range. Others show a wider spread with more soft contact below 85 mph. The shape of this distribution directly affects outcome probabilities. Higher average exit velocity correlates with higher BABIP, higher ISO (isolated power), and more extra-base hits. Models use the batter's exit velocity profile to weight the probability of each batted-ball outcome conditional on contact being made.
Launch Angle Distributions
Launch angle determines whether contact becomes a ground ball, line drive, fly ball, or pop-up. The boundaries are roughly: below 10 degrees for ground balls, 10 to 25 degrees for line drives, 25 to 50 degrees for fly balls, and above 50 for pop-ups. Within the fly ball range, the intersection of high exit velocity (above 95 mph) and a launch angle between 25 and 35 degrees defines the "barrel zone," the sweet spot that produces extra-base hits and home runs at the highest rates.
Barrel rate has emerged as one of the most predictive contact quality metrics. A batter with a 12% barrel rate produces a fundamentally different outcome distribution than one with a 4% barrel rate, even if their raw batting averages are similar. The high-barrel batter converts more fly balls into home runs and extra-base hits, which shifts the entire probability landscape of the plate appearance.
Count Leverage: How the Count Changes Everything
The count is the single most important in-plate-appearance variable. The difference between a 3-1 count and an 0-2 count is enormous, affecting not just the probability of a walk or strikeout but the quality and type of contact.
In hitter-friendly counts (2-0, 3-1, 3-0), pitchers must throw strikes, often with fastballs. Batters know this. They sit on pitches in their zone and swing aggressively. The result is a dramatic shift in the outcome distribution toward hard contact, extra-base hits, and home runs. Exit velocity in 3-1 counts is typically 3 to 5 mph higher than in 0-2 counts, because batters are timing fastballs and swinging with conviction.
In pitcher-friendly counts (0-2, 1-2), the dynamic inverts. Pitchers expand the zone with breaking balls and offspeed pitches. Batters become defensive, protecting the plate with abbreviated swings. Strikeout probability spikes. Contact quality degrades. The probability of an extra-base hit drops precipitously. An 0-2 count reduces a batter's expected outcome by roughly 40% relative to a 2-0 count, measured in run value.
Outcome Probability Shifts by Count
| Count | K Rate | BB Rate | HR Rate | Avg EV (mph) |
|---|---|---|---|---|
| 3-0 | ~0% | ~28% | ~6% | 93+ |
| 2-0 | ~3% | ~12% | ~5% | 92+ |
| 3-1 | ~2% | ~22% | ~5.5% | 93+ |
| 1-1 | ~12% | ~7% | ~3% | 89 |
| 0-2 | ~32% | ~3% | ~1.5% | 87 |
| 1-2 | ~28% | ~5% | ~2% | 88 |
Values are approximate league averages. Individual batter-pitcher matchups vary significantly.
Prediction models that operate at the pitch-by-pitch level simulate the count progression and apply count-specific outcome distributions at each step. Models that operate at the plate appearance level use composite distributions that blend count-specific probabilities based on the expected distribution of count states for the given batter-pitcher matchup. A batter who works deep counts will spend more plate appearances in 3-1 and 2-2 counts, which affects his aggregate outcome profile differently than a free-swinging hitter who resolves most plate appearances in fewer pitches.
From Outcomes to Runs
The plate appearance model produces a probability distribution over discrete outcomes for each batter-pitcher confrontation. But a probability distribution alone does not score runs. To translate plate appearance outcomes into run production, the model must connect each outcome to the base-out state transition it produces. A single with a runner on first advances the runner to second or third, depending on base running. A double clears the bases from second. A strikeout changes nothing but the out count.
These transitions feed directly into the run expectancy framework. Each plate appearance either increases or decreases the expected runs for the remainder of the inning based on the outcome it produces and the resulting base-out state change. Over the course of a full lineup cycle, the cumulative effect of thousands of possible outcome sequences produces the inning-level and game-level run distribution that the simulation engine uses to generate win probabilities.
The accuracy of the plate appearance model is the foundation of everything downstream. If the outcome probabilities for individual at-bats are poorly calibrated, the run expectancy calculations inherit that error, and the game-level predictions suffer accordingly. This is why so much modeling effort focuses on getting the plate appearance right, incorporating every available signal from pitch tracking, contact quality, count leverage, and matchup history to produce the most accurate possible outcome distribution for each confrontation.