Getting Started with wehoop
Saiem Gilani
Source: vignettes/getting-started-wehoop.Rmd
getting-started-wehoop.Rmd
Welcome folks,
I’m Saiem Gilani, one of the authors of wehoop
,
and I hope to give the community a high-quality resource for accessing
women’s basketball data for statistical analysis, basketball research,
and more. I am excited to show you some of what you can do with this
edition of the package.
Installing R and RStudio
- Head to https://cran.r-project.org
- Select the appropriate link for your operating system (Windows, Mac OS X, or Linux)
- Windows - Select base and download the most recent version
- Mac OS X - Select Latest Release, but check to make sure your OS is the correct version. Look through Binaries for Legacy OS X Systems if you are on an older release
- Linux - Select the appropriate distro and follow the installation instructions
- Head to Posit.co
- Follow the associated download and installation instructions for RStudio.
- Start peering over the RStudio IDE Cheatsheet. An IDE is an integrated development environment.
- For Windows users: I recommend you install Rtools. This is not an R package! It is “a collection of resources for building packages for R under Microsoft Windows, or for building R itself”. Go to https://cran.r-project.org/bin/windows/Rtools/ and follow the directions for installation.
Install wehoop
# You can install using the pacman package using the following code:
if (!requireNamespace('pacman', quietly = TRUE)){
install.packages('pacman')
}
pacman::p_load(wehoop, dplyr, glue, tictoc, progressr)
Quick Start
WNBA full play-by-play seasons (2002-2024) ~ 30-60 seconds
tictoc::tic()
progressr::with_progress({
wnba_pbp <- wehoop::load_wnba_pbp()
})
tictoc::toc()
## 0.94 sec elapsed
## 13.91 sec elapsed
glue::glue("{nrow(wnba_pbp)} rows of WNBA play-by-play data from {length(unique(wnba_pbp$game_id))} games.")
## 58334 rows of WNBA play-by-play data from 150 games.
## 1782985 rows of WNBA play-by-play data from 4674 games.
dplyr::glimpse(wnba_pbp)
## Rows: 58,334
## Columns: 62
## $ game_play_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,…
## $ id <dbl> 4016204584, 4016204587, 4016204589, 40…
## $ sequence_number <int> 4, 7, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ type_id <int> 615, 92, 144, 155, 131, 155, 131, 155,…
## $ type_text <chr> "Jumpball", "Jump Shot", "Driving Floa…
## $ text <chr> "A'ja Wilson vs. Jonquel Jones (Diana …
## $ away_score <int> 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 6,…
## $ home_score <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ period_number <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ period_display_value <chr> "1st Quarter", "1st Quarter", "1st Qua…
## $ clock_display_value <chr> "10:00", "9:48", "9:31", "9:29", "9:07…
## $ scoring_play <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALS…
## $ score_value <int> 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3,…
## $ team_id <int> 97, 97, 96, 97, 97, 96, 96, 97, 97, 96…
## $ athlete_id_1 <int> 3149391, 585, 2999101, 2998928, 298786…
## $ athlete_id_2 <int> 2999101, 2529122, NA, NA, NA, NA, NA, …
## $ athlete_id_3 <int> 585, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ wallclock <chr> "2024-07-21T00:44:59Z", "2024-07-21T00…
## $ shooting_play <lgl> FALSE, TRUE, TRUE, FALSE, TRUE, FALSE,…
## $ coordinate_x_raw <dbl> -214748340, 5, 41, 41, 18, 18, 5, 5, 1…
## $ coordinate_y_raw <dbl> -214748365, 15, 10, 10, 12, 12, 13, 13…
## $ game_id <int> 401620458, 401620458, 401620458, 40162…
## $ season <int> 2024, 2024, 2024, 2024, 2024, 2024, 20…
## $ season_type <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ home_team_id <int> 96, 96, 96, 96, 96, 96, 96, 96, 96, 96…
## $ home_team_name <chr> "Team WNBA", "Team WNBA", "Team WNBA",…
## $ home_team_mascot <chr> "Team WNBA", "Team WNBA", "Team WNBA",…
## $ home_team_abbrev <chr> "WNBASTARS", "WNBASTARS", "WNBASTARS",…
## $ home_team_name_alt <chr> "Team WNBA", "Team WNBA", "Team WNBA",…
## $ away_team_id <int> 97, 97, 97, 97, 97, 97, 97, 97, 97, 97…
## $ away_team_name <chr> "Team USA", "Team USA", "Team USA", "T…
## $ away_team_mascot <chr> "Team USA", "Team USA", "Team USA", "T…
## $ away_team_abbrev <chr> "USA", "USA", "USA", "USA", "USA", "US…
## $ away_team_name_alt <chr> "Team USA", "Team USA", "Team USA", "T…
## $ game_spread <dbl> 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5…
## $ home_favorite <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR…
## $ game_spread_available <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ home_team_spread <dbl> 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5…
## $ qtr <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ time <chr> "10:00", "9:48", "9:31", "9:29", "9:07…
## $ clock_minutes <int> 10, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8…
## $ clock_seconds <dbl> 0, 48, 31, 29, 7, 1, 49, 44, 34, 32, 2…
## $ home_timeout_called <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ away_timeout_called <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ half <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ game_half <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ lead_qtr <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ lead_half <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ start_quarter_seconds_remaining <dbl> 600, 588, 571, 569, 547, 541, 529, 524…
## $ start_half_seconds_remaining <dbl> 1200, 1188, 1171, 1169, 1147, 1141, 11…
## $ start_game_seconds_remaining <dbl> 2400, 2388, 2371, 2369, 2347, 2341, 23…
## $ end_quarter_seconds_remaining <dbl> 600, 571, 569, 547, 541, 529, 524, 514…
## $ end_half_seconds_remaining <dbl> 1200, 1171, 1169, 1147, 1141, 1129, 11…
## $ end_game_seconds_remaining <dbl> 2400, 2371, 2369, 2347, 2341, 2329, 23…
## $ period <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ lag_qtr <int> NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ lag_half <int> NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ coordinate_x <dbl> -214748406.75, -26.75, 31.75, -31.75, …
## $ coordinate_y <dbl> -214748365, -20, -16, 16, -7, 7, 20, -…
## $ game_date <date> 2024-07-20, 2024-07-20, 2024-07-20, 2…
## $ game_date_time <dttm> 2024-07-20 20:30:00, 2024-07-20 20:30…
## $ type_abbreviation <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
WNBA full team box score seasons (2003-2024) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wnba_team_box <- wehoop::load_wnba_team_box()
})
tictoc::toc()
## 0.495 sec elapsed
glue::glue("{nrow(wnba_team_box)} rows of WNBA team boxscore data from {length(unique(wnba_team_box$game_id))} games.")
## 300 rows of WNBA team boxscore data from 150 games.
dplyr::glimpse(wnba_team_box)
## Rows: 300
## Columns: 57
## $ game_id <int> 401620458, 401620458, 401620365, 401…
## $ season <int> 2024, 2024, 2024, 2024, 2024, 2024, …
## $ season_type <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ game_date <date> 2024-07-20, 2024-07-20, 2024-07-17,…
## $ game_date_time <dttm> 2024-07-20 20:30:00, 2024-07-20 20:…
## $ team_id <int> 97, 96, 5, 3, 20, 8, 19, 17, 18, 9, …
## $ team_uid <chr> "s:40~l:59~t:97", "s:40~l:59~t:96", …
## $ team_slug <chr> "team-usa", "team-wnba", "indiana-fe…
## $ team_location <chr> "Team USA", "Team WNBA", "Indiana", …
## $ team_name <chr> "Team USA", "Team WNBA", "Fever", "W…
## $ team_abbreviation <chr> "USA", "WNBASTARS", "IND", "DAL", "A…
## $ team_display_name <chr> "Team USA", "Team WNBA", "Indiana Fe…
## $ team_short_display_name <chr> "Team USA", "Team WNBA", "Fever", "W…
## $ team_logo <chr> "https://a.espncdn.com/i/teamlogos/w…
## $ team_home_away <chr> "away", "home", "away", "home", "awa…
## $ team_score <int> 109, 117, 93, 101, 79, 86, 93, 85, 7…
## $ team_winner <lgl> FALSE, TRUE, FALSE, TRUE, FALSE, TRU…
## $ assists <int> 34, 31, 26, 28, 18, 25, 20, 17, 17, …
## $ blocks <int> 3, 3, 7, 4, 4, 2, 2, 7, 4, 4, 6, 3, …
## $ defensive_rebounds <int> 34, 29, 26, 23, 28, 21, 28, 23, 23, …
## $ fast_break_points <chr> "10", "16", "14", "22", "11", "18", …
## $ field_goal_pct <dbl> 50.6, 50.0, 53.5, 51.3, 49.2, 38.7, …
## $ field_goals_made <int> 41, 44, 38, 39, 29, 29, 35, 31, 28, …
## $ field_goals_attempted <int> 81, 88, 71, 76, 59, 75, 72, 66, 66, …
## $ flagrant_fouls <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ fouls <int> 13, 13, 15, 18, 19, 16, 16, 14, 15, …
## $ free_throw_pct <dbl> 93.8, 85.0, 64.7, 77.8, 76.2, 73.9, …
## $ free_throws_made <int> 15, 17, 11, 14, 16, 17, 19, 18, 16, …
## $ free_throws_attempted <int> 16, 20, 17, 18, 21, 23, 22, 22, 22, …
## $ largest_lead <chr> "6", "20", "6", "16", "7", "7", "21"…
## $ offensive_rebounds <int> 9, 9, 9, 6, 5, 11, 7, 3, 8, 6, 12, 1…
## $ points_in_paint <chr> "50", "52", "62", "50", "38", "26", …
## $ steals <int> 8, 14, 2, 10, 5, 10, 9, 11, 11, 10, …
## $ team_turnovers <int> 0, 0, 1, 1, 1, 2, 1, 1, 1, 3, 1, 1, …
## $ technical_fouls <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, …
## $ three_point_field_goal_pct <dbl> 36.4, 33.3, 24.0, 45.0, 33.3, 36.7, …
## $ three_point_field_goals_made <int> 12, 12, 6, 9, 5, 11, 4, 5, 2, 11, 7,…
## $ three_point_field_goals_attempted <int> 33, 36, 25, 20, 15, 30, 13, 18, 18, …
## $ total_rebounds <int> 43, 38, 35, 29, 33, 32, 35, 26, 31, …
## $ total_technical_fouls <int> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, …
## $ total_turnovers <int> 14, 8, 18, 11, 17, 9, 14, 13, 14, 21…
## $ turnover_points <chr> "15", "10", "28", "17", "15", "8", "…
## $ turnovers <int> 14, 8, 17, 10, 16, 7, 13, 12, 13, 18…
## $ opponent_team_id <int> 96, 97, 3, 5, 8, 20, 17, 19, 9, 18, …
## $ opponent_team_uid <chr> "s:40~l:59~t:96", "s:40~l:59~t:97", …
## $ opponent_team_slug <chr> "team-wnba", "team-usa", "dallas-win…
## $ opponent_team_location <chr> "Team WNBA", "Team USA", "Dallas", "…
## $ opponent_team_name <chr> "Team WNBA", "Team USA", "Wings", "F…
## $ opponent_team_abbreviation <chr> "WNBASTARS", "USA", "DAL", "IND", "M…
## $ opponent_team_display_name <chr> "Team WNBA", "Team USA", "Dallas Win…
## $ opponent_team_short_display_name <chr> "Team WNBA", "Team USA", "Wings", "F…
## $ opponent_team_logo <chr> "https://a.espncdn.com/i/teamlogos/w…
## $ opponent_team_score <int> 117, 109, 101, 93, 86, 79, 85, 93, 8…
## $ team_color <chr> NA, NA, "002d62", "002b5c", "e31837"…
## $ team_alternate_color <chr> NA, NA, "e03a3e", "c4d600", "5091cc"…
## $ opponent_team_color <chr> NA, NA, "002b5c", "002d62", "266092"…
## $ opponent_team_alternate_color <chr> NA, NA, "c4d600", "e03a3e", "79bc43"…
WNBA full player box score seasons (2002-2024) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wnba_player_box <- wehoop::load_wnba_player_box()
})
tictoc::toc()
## 0.548 sec elapsed
## [1] 150
nrow(wnba_player_box)
## [1] 3352
Women’s college basketball full play-by-play seasons (2004-2024) ~ 45-90 seconds
tictoc::tic()
progressr::with_progress({
wbb_pbp <- wehoop::load_wbb_pbp()
})
tictoc::toc()
## 10.732 sec elapsed
## [1] 5779
nrow(wbb_pbp)
## [1] 1908679
Women’s college basketball full team box score seasons (2006-2024) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wbb_team_box <- wehoop::load_wbb_team_box()
})
tictoc::toc()
## 0.614 sec elapsed
## [1] 5898
nrow(wbb_team_box)
## [1] 11796
Women’s college basketball full player box score seasons (2006-2024) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wbb_player_box <- wehoop::load_wbb_player_box()
})
tictoc::toc()
## 1.498 sec elapsed
## [1] 5898
nrow(wbb_player_box)
## [1] 167412