helper that loads multiple seasons from the data repo either into memory or writes it into a db using some forwarded arguments in the dots
Usage
load_wnba_pbp(
seasons = most_recent_wnba_season(),
...,
dbConnection = NULL,
tablename = NULL
)
Arguments
- seasons
A vector of 4-digit years associated with given WNBA seasons. (Min: 2002)
- ...
Additional arguments passed to an underlying function that writes the season data into a database (used by
update_wnba_db()
).- dbConnection
A
DBIConnection
object, as returned byDBI::dbConnect()
- tablename
The name of the play by play data table within the database
Value
A dataframe with 42 columns
- shooting_play
Logical value (TRUE/FALSE) indicating whether the play was a shooting play
- sequence_number
Sequence number is supposed to represent a shot-possession, examine the last two numbers to see if there are multiple events that occur within the same shot-possession. A shot-possession is basically any sequence of plays until there is a shot, change in possession, and probably things like technical fouls and the like. So as soon as a shot goes up, a new sequence starts regardless, even if the shooting team retains possession via offensive or deadball rebound. The first portion of the number is usually time related (i.e. the numeric representation of when the sequence started, from a seconds remaining in the period perspective or so)
- period_display_value
Long form of period (1st quarter, 2nd Quarter, OT, etc.)
- period_number
The numeric period of play in the game
- home_score
Home score at the time of the play
- coordinate_x
The entire scale is a rectangle of size 25x47, intended as a half-court representation of the basketball court (i.e. on the side of the offense), with each coordinate unit representing a foot. It appears that the basket is roughly represented as the (25, 0) point. This is a nonsensical definition when considering that the basket overhangs the court, with the backboard aligned 48 inches from the baseline, then the center of the hoop being roughly 11 inches from there. This is an idiosyncracy of either sensor placement or software and data entry. Use your best judgement in making your charts, I think you will find that making some translations will be helpful.
- coordinate_y
- scoring_play
Logical value (TRUE/FALSE) indicating whether the play was a play on which the offense scored
- clock_display_value
Time left within the period
- team_id
Unique team identification number for the offensive team
- type_id
Unique play type identifcation number
- type_text
Play type text description
- away_score
Away score at the time of the play
- id
Unique play identifcation number
- text
Text description of the play
- score_value
The points value of the shot taken
- participants_0_athlete_id
Unique player identification number
- participants_1_athlete_id
Unique player identification number
- participants_2_athlete_id
Unique player identification number
- type_abbreviation
Play type abbreviation
- season
Season of the game
- season_type
Season type of the game, 1 is pre-season, 2 is regular season, 3 is post-season, 4 is off-season
- away_team_id
Unique away team identification number
- away_team_name
Away team name
- away_team_mascot
Away team mascot
- away_team_abbrev
Text abbreviation for the away team
- away_team_name_alt
Alternate versions of the away team abbreviation
- home_team_id
Unique home team identification number
- home_team_name
home team name
- home_team_mascot
home team mascot
- home_team_abbrev
Text abbreviation for the home team
- home_team_name_alt
Alternate versions of the home team abbreviation
- home_team_spread
The game spread with respect to the home team
- game_spread
Game spread in (-X Team) format. There are almost none, I would recommend not trusting any of these three columns
- home_favorite
Logical (TRUE/FALSE) indicating whether the home team is favored
- clock_minutes
Clock minutes split from seconds for developer convenience
- clock_seconds
Clock seconds split from minutes for developer convenience
- half
Half of the game
- lag_half
A lag column on the half
- lead_half
A lead column on the half
- game_play_number
Game play number
- game_id
Unique identifier for the game event
Examples
# \donttest{
try(load_wnba_pbp())
#> ──────────────────────────────────────────────────────────────── wehoop 2.1.0 ──
#> # A tibble: 58,334 × 62
#> game_play_number id sequence_number type_id type_text text away_score
#> <int> <dbl> <int> <int> <chr> <chr> <int>
#> 1 1 4.02e 9 4 615 Jumpball A'ja… 0
#> 2 2 4.02e 9 7 92 Jump Shot Dian… 3
#> 3 3 4.02e 9 9 144 Driving … Jonq… 3
#> 4 4 4.02e10 10 155 Defensiv… Brea… 3
#> 5 5 4.02e10 11 131 Pullup J… Jewe… 3
#> 6 6 4.02e10 12 155 Defensiv… Nnek… 3
#> 7 7 4.02e10 13 131 Pullup J… Cait… 3
#> 8 8 4.02e10 14 155 Defensiv… Jewe… 3
#> 9 9 4.02e10 15 114 Turnarou… A'ja… 3
#> 10 10 4.02e10 16 155 Defensiv… DeWa… 3
#> # ℹ 58,324 more rows
#> # ℹ 55 more variables: home_score <int>, period_number <int>,
#> # period_display_value <chr>, clock_display_value <chr>, scoring_play <lgl>,
#> # score_value <int>, team_id <int>, athlete_id_1 <int>, athlete_id_2 <int>,
#> # athlete_id_3 <int>, wallclock <chr>, shooting_play <lgl>,
#> # coordinate_x_raw <dbl>, coordinate_y_raw <dbl>, game_id <int>,
#> # season <int>, season_type <int>, home_team_id <int>, …
# }