Saturday, August 27, 2011

Two record streaks in women's sports came to an end last night.

Penn State's women's volleyball team, which has won the last four NCAA national titles, had its 94-match home winning streak ended by Oregon, 3 games to 1. For the Ducks, who've long been in the shadow of Pac 10 (now 12) rivals Stanford, Cal, USC, UCLA, and Washington, this is quite a stunning win.

Meanwhile, out on the left coast, the Tulsa Shock of the WNBA snapped its 20-game losing streak by edging the Los Angeles Sparks 77-75. The veteran Sheryl Swoopes, whose illustrious career includes leading the Texas Tech Lady Raiders to the 1993 NCAA women's basketball title, winning three Olympic gold medals, and capturing four WNBA rings with the now-defunct Houston Comets, hit a buzzer beater for the win.

Friday, August 26, 2011

Jeremy Arkes and Jose Martinez have an article in the latest issue of the Journal of Quantitative Analysis in Sports, purporting to show evidence for momentum in the National Basketball Association. Access to JQAS articles requires a subscription, but guest privileges to look at an individual article are available.

Using data from three recent seasons, the authors find "greater success in the past few games leads to a higher probability of winning the next game" (p. 13). Key to these results are statistical controls for focal teams' and opponents' long-term strength or ability levels (excluding the recent games), home/away status for a given team, and teams' number of days' rest between games. Some of the measures appear conceptually similar to an RPI ranking system, which accounts for teams' strength of schedule.

The study uses fairly complex econometric modeling and presents extensive results in tables. However, the authors distill the findings into easily graspable descriptions. For example, for each additional win a team has in its last 5 games, its probability of winning the next game goes up by roughly 2 to 4 percentage points.

I'm not sure, however, that these findings fit what the average fan would think of as "momentum." To some, momentum would suggest looking at teams that have won 5 in a row (or lost 5 in a row) and seeing how they do in their next game. Saying that a team with 1 win (vs. 0) or 5 wins (vs. 4) in its past 5 games has an increased probability of winning its next game (controlling for all of the aforementioned factors) is much more incremental in nature.

Wednesday, August 24, 2011

The Tulsa Shock of the Women's National Basketball Association (WNBA) has now lost 19 straight games, with Tuesday's defeat against the Minnesota Lynx being the latest. Tulsa's 18th consecutive setback, which came last Sunday, set the league record for longest losing streak. The Shock is now 1-24, with nine games left on its schedule.

Friday, August 19, 2011

Amidst the hubbub over allegations of improper benefits given to players by a booster at the University of Miami, Noel Nash of ESPN's Stats & Information Group has notified me of an unusual streak by former Hurricane football players at the pro level. For more than eight years now, a player who attended college at Miami "has scored a touchdown in every regular season week in the NFL...  a span of 139 game weeks."

Sunday, August 14, 2011

Dan Uggla's hitting streak ended at 33 games this afternoon, as his Atlanta Braves fell to the Chicago Cubs, 6-5. Much was made of Uggla's low batting average prior to the streak and how unlikely it seemingly made the streak. In my view, judging the likelihood of Uggla's hitting streak is not so simple.

Let's start with a refresher on some principles of probability. Batting average represents a player's probability of gettting a hit in any given official at-bat. Where consecutive-game hitting streaks are concerned, we're interested in the probability of a player getting at least one hit in a game. The latter will generally be a higher probability than the batting average because the player usually will have multiple official at-bats in a game.

To estimate the probability of a player getting at least one hit in a game, statisticians typically assume a number of official at-bats per game for the player and further assume independence of outcomes (i.e., that what happened on one at-bat has no effect on a later at-bat). As of the conclusion of yesterday's play, Uggla was getting 3.76 official at-bats (AB) per game (448/119). Looking at Baseball Reference's wonderful game-by-game log for Uggla this season, he had a few games (mostly prior to the streak) with 0 or 1 plate appearances, suggesting he appeared as either a late-inning defensive replacement or pinch hitter in a few games. Assuming regular starts, which would be the case well into a hitting streak, we could estimate he'd have 4 AB per game.

Whereas batting average (BA) is the probability of a success (hit) in a particular official at-bat, the probability of failure in that at-bat, F = (1 - BA). The probability of an all-failure (no hits) game with 4 AB is simply F raised to the 4th power. Getting at least one hit means avoiding an all-failure game, so the probability of getting at least one hit is:  1 - (F^4). To know F, we need to know BA, and that is where the difficulty arises with Uggla.

The day Uggla began his hitting streak (July 5), he woke up with a .173 BA. During the streak, he hit .377 (49/130). Upon completion of his last game during the streak (i.e., yesterday's), his season-to-date average sat at .232. And, while we're at it, his lifetime BA (excluding 2011) is .263. The question is, which batting average should we use to best capture his batting ability, let's say, midway through the hitting streak? Another way to think of the problem is that, Uggla's hitless game today notwithstanding, we wanted to know what BA to use for him in predicting his chances of getting a hit in his next 23 games, to tie Joe DiMaggio's record of 56 games.

The following table runs through the steps of transforming an Uggla batting average into his estimated probability of getting at least one hit in his next 23 games.

p(Hit in 1 AB)
[Batting Avg]
p(Failure in 1 AB)p(Failure in
All 4 AB)
p(>/= 1 Hit
in 4 AB)
p(Hit in All of Next 23 Games)
.173.....................827...........468.....................532.....................0000005............
.377.623.151.849.023
.232.768.348.652.00005
.263.737.295.705.0003

Even under the most advantageous assumption for Uggla -- namely taking his batting average exclusively from his recent streak -- the chances of tying DiMaggio would be only about two percent. Still, which batting average should we use?

As shown in the book Scorecasting by Moskowitz and Wertheim, a baseball player's batting average over the past two seasons is a better predictor of success in the next at-bat than is batting average over the last five plate appearances, last five games, the last month, or season-to-date. Thus, going by the principle that large sample size trumps recency, Uggla's lifetime batting average would appear to be the best of the above options in predicting his future hitting streaks.

Another factor that helped Uggla in putting together the 33-game hitting streak was his low walk rate. At the close of yesterday's play, he had only 39 bases on balls, so that his number of official AB (448) was not that much lower than his total plate appearances (494). A tendency to draw a lot of walks can really short-circuit a hitting streak because a player may only get 1 or 2 official AB per game, thus giving him few opportunities to get a hit (if a player walks in all of his plate appearances in a game, however, a hitting streak continues). As Joe D’Aniello wrote about in the Baseball Research Journal (Vol. 32, 2003) in conjunction with his examination of DiMaggio’s hitting streak, a key reason why Ted Williams never contended for a long hitting streak was his propensity to draw walks. 

David Rockoff and Phil Yates, writing in the Journal of Quantitative Analysis in Sports, identified as a flaw in statistical formulations of hitting streaks the assumption of the same number of at-bats per game (as I did above in making calculations based on 4 AB per game for Uggla). In real life, as noted above, a player may get only 1 or 2 AB in some games, thus harming his chances to extend a hitting streak. In Uggla's case, however, his rate of walks (and other plate appearances not resulting in official at-bats) is so low as to largely avoid the problem stated by Rockoff and Yates, in my view.

Tuesday, August 09, 2011

Sunday, August 07, 2011

Here are some brief streakiness-related items, all from baseball.

*With today's win over Seattle, the Angels have now won 13 of their last 15 series (game-by-game log). The exceptions to the series wins are a 2-2 split at Detroit July 28-31; and the loss of 3-of-4 at Oakland July 15-17 (including a doubleheader).

*Today against the New York Mets, one member of the Atlanta Braves extended a hitting streak -- Dan Uggla, to 28 games -- whereas another, Freddie Freeman, saw his 20-game hitting streak end (article).

*A little over a week ago, two hitting streaks in the mid-20s ended: Emilio Bonifacio's (Marlins) at 26, and Dustin Pedroia's (Red Sox) at 25.

*Last Friday night, Milwaukee's Craig Counsell finally got a hit after 45 official at-bats without one. According to this article, "Some claimed Counsell tied the modern baseball record (since 1900) for a position player when he popped out to second base Monday against the St. Louis Cardinals. Others claimed the actual record was 0 for 46, set by Brooklyn catcher Bill Bergen in 1909."

*In getting swept by the Yankees in a four-game series (August 1-4), the Chicago White Sox didn't get a single walk offensively. It had been 43 years since the White Sox last went without a walk for four straight games. New York's pitchers may have had a hot hand when it came to throwing strikes, or maybe the Chicago batters just had impatient hands.