
Getting Started with wehoop
Saiem Gilani
Source: vignettes/getting-started-wehoop.Rmd
getting-started-wehoop.Rmd
Welcome folks,
I’m Saiem Gilani, one of the authors of wehoop
,
and I hope to give the community a high-quality resource for accessing
women’s basketball data for statistical analysis, basketball research,
and more. I am excited to show you some of what you can do with this
edition of the package.
Installing R and RStudio
- Head to https://cran.r-project.org
- Select the appropriate link for your operating system (Windows, Mac OS X, or Linux)
- Windows - Select base and download the most recent version
- Mac OS X - Select Latest Release, but check to make sure your OS is the correct version. Look through Binaries for Legacy OS X Systems if you are on an older release
- Linux - Select the appropriate distro and follow the installation instructions
- Head to Posit.co
- Follow the associated download and installation instructions for RStudio.
- Start peering over the RStudio IDE Cheatsheet. An IDE is an integrated development environment.
- For Windows users: I recommend you install Rtools. This is not an R package! It is “a collection of resources for building packages for R under Microsoft Windows, or for building R itself”. Go to https://cran.r-project.org/bin/windows/Rtools/ and follow the directions for installation.
Install wehoop
# You can install using the pacman package using the following code:
if (!requireNamespace('pacman', quietly = TRUE)){
install.packages('pacman')
}
pacman::p_load(wehoop, dplyr, glue, tictoc, progressr)
Quick Start
WNBA full play-by-play seasons (2002-2022) ~ 30-60 seconds
tictoc::tic()
progressr::with_progress({
wnba_pbp <- wehoop::load_wnba_pbp()
})
tictoc::toc()
## 1.359 sec elapsed
## 13.91 sec elapsed
glue::glue("{nrow(wnba_pbp)} rows of WNBA play-by-play data from {length(unique(wnba_pbp$game_id))} games.")
## 71881 rows of WNBA play-by-play data from 183 games.
## 1782985 rows of WNBA play-by-play data from 4674 games.
dplyr::glimpse(wnba_pbp)
## Rows: 71,881
## Columns: 60
## $ id <dbl> 4015073764, 4015073767, 4015073768, 40…
## $ sequence_number <chr> "4", "7", "8", "9", "10", "11", "12", …
## $ type_id <int> 615, 64, 92, 155, 92, 156, 92, 155, 13…
## $ type_text <chr> "Jumpball", "Traveling", "Jump Shot", …
## $ text <chr> "Jonquel Jones vs. A'ja Wilson (Sabrin…
## $ away_score <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ home_score <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ period_number <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ period_display_value <chr> "1st Quarter", "1st Quarter", "1st Qua…
## $ clock_display_value <chr> "10:00", "9:43", "9:20", "9:12", "9:03…
## $ scoring_play <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ score_value <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ team_id <int> 9, 9, 17, 9, 9, 9, 9, 17, 17, 9, 9, 17…
## $ athlete_id_1 <int> 2999101, 2593770, 4065870, 2999101, 25…
## $ athlete_id_2 <int> 3149391, NA, NA, NA, NA, NA, NA, NA, N…
## $ athlete_id_3 <int> 4066533, NA, NA, NA, NA, NA, NA, NA, N…
## $ wallclock <chr> "2023-08-16T01:07:56Z", "2023-08-16T01…
## $ shooting_play <lgl> FALSE, FALSE, TRUE, FALSE, TRUE, FALSE…
## $ coordinate_x_raw <dbl> -214748340, 4, 8, 8, 49, 49, 40, 40, 1…
## $ coordinate_y_raw <dbl> -214748365.00, 2.00, 18.00, 18.00, -2.…
## $ season <int> 2023, 2023, 2023, 2023, 2023, 2023, 20…
## $ season_type <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ away_team_id <int> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,…
## $ away_team_name <chr> "New York", "New York", "New York", "N…
## $ away_team_mascot <chr> "Liberty", "Liberty", "Liberty", "Libe…
## $ away_team_abbrev <chr> "NY", "NY", "NY", "NY", "NY", "NY", "N…
## $ away_team_name_alt <chr> "New York", "New York", "New York", "N…
## $ home_team_id <int> 17, 17, 17, 17, 17, 17, 17, 17, 17, 17…
## $ home_team_name <chr> "Las Vegas", "Las Vegas", "Las Vegas",…
## $ home_team_mascot <chr> "Aces", "Aces", "Aces", "Aces", "Aces"…
## $ home_team_abbrev <chr> "LV", "LV", "LV", "LV", "LV", "LV", "L…
## $ home_team_name_alt <chr> "Las Vegas", "Las Vegas", "Las Vegas",…
## $ home_team_spread <dbl> 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5…
## $ game_spread <dbl> 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5…
## $ home_favorite <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR…
## $ game_spread_available <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ game_id <int> 401507376, 401507376, 401507376, 40150…
## $ qtr <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ time <chr> "10:00", "9:43", "9:20", "9:12", "9:03…
## $ clock_minutes <int> 10, 9, 9, 9, 9, 9, 8, 8, 8, 8, 8, 8, 8…
## $ clock_seconds <dbl> 0, 43, 20, 12, 3, 1, 59, 56, 46, 43, 2…
## $ half <chr> "1", "1", "1", "1", "1", "1", "1", "1"…
## $ game_half <chr> "1", "1", "1", "1", "1", "1", "1", "1"…
## $ lead_qtr <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ lead_game_half <chr> "1", "1", "1", "1", "1", "1", "1", "1"…
## $ start_quarter_seconds_remaining <int> 600, 583, 560, 552, 543, 541, 539, 536…
## $ start_half_seconds_remaining <int> 1200, 1183, 1160, 1152, 1143, 1141, 11…
## $ start_game_seconds_remaining <int> 2400, 2383, 2360, 2352, 2343, 2341, 23…
## $ game_play_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,…
## $ end_quarter_seconds_remaining <int> 600, 600, 583, 560, 552, 543, 541, 539…
## $ end_half_seconds_remaining <int> 1200, 1200, 1183, 1160, 1152, 1143, 11…
## $ end_game_seconds_remaining <int> 2400, 2400, 2383, 2360, 2352, 2343, 23…
## $ period <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ lag_qtr <int> NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ lag_game_half <chr> NA, "1", "1", "1", "1", "1", "1", "1",…
## $ coordinate_x <dbl> -214748406.75, -39.75, 23.75, -23.75, …
## $ coordinate_y <dbl> -214748365, -21, 17, -17, 24, 24, 15, …
## $ game_date <date> 2023-08-15, 2023-08-15, 2023-08-15, 2…
## $ game_date_time <dttm> 2023-08-15 21:00:00, 2023-08-15 21:00…
## $ type_abbreviation <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
WNBA full team box score seasons (2003-2022) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wnba_team_box <- wehoop::load_wnba_team_box()
})
tictoc::toc()
## 0.458 sec elapsed
glue::glue("{nrow(wnba_team_box)} rows of WNBA team boxscore data from {length(unique(wnba_team_box$game_id))} games.")
## 366 rows of WNBA team boxscore data from 183 games.
dplyr::glimpse(wnba_team_box)
## Rows: 366
## Columns: 57
## $ game_id <int> 401507376, 401507376, 401507310, 401…
## $ season <int> 2023, 2023, 2023, 2023, 2023, 2023, …
## $ season_type <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ game_date <date> 2023-08-15, 2023-08-15, 2023-08-13,…
## $ game_date_time <dttm> 2023-08-15 21:00:00, 2023-08-15 21:…
## $ team_id <int> 9, 17, 20, 17, 11, 14, 19, 16, 9, 5,…
## $ team_uid <chr> "s:40~l:59~t:9", "s:40~l:59~t:17", "…
## $ team_slug <chr> "new-york-liberty", "las-vegas-aces"…
## $ team_location <chr> "New York", "Las Vegas", "Atlanta", …
## $ team_name <chr> "Liberty", "Aces", "Dream", "Aces", …
## $ team_abbreviation <chr> "NY", "LV", "ATL", "LV", "PHX", "SEA…
## $ team_display_name <chr> "New York Liberty", "Las Vegas Aces"…
## $ team_short_display_name <chr> "Liberty", "Aces", "Dream", "Aces", …
## $ team_color <chr> "0d79b9", "000000", "d02030", "00000…
## $ team_alternate_color <chr> "89c9bf", "c7cfd4", "383a37", "c7cfd…
## $ team_logo <chr> "https://a.espncdn.com/i/teamlogos/w…
## $ team_home_away <chr> "away", "home", "away", "home", "awa…
## $ team_score <int> 82, 63, 65, 86, 71, 81, 76, 83, 100,…
## $ team_winner <lgl> TRUE, FALSE, FALSE, TRUE, FALSE, TRU…
## $ assists <int> 20, 14, 13, 16, 19, 16, 17, 21, 32, …
## $ blocks <int> 6, 3, 2, 6, 4, 5, 3, 2, 5, 4, 2, 5, …
## $ defensive_rebounds <int> 37, 24, 38, 30, 20, 31, 22, 24, 25, …
## $ fast_break_points <chr> "6", "8", "5", "9", "10", "14", "9",…
## $ field_goal_pct <dbl> 40.6, 33.3, 37.1, 41.6, 41.3, 51.6, …
## $ field_goals_made <int> 28, 23, 23, 32, 31, 32, 28, 30, 37, …
## $ field_goals_attempted <int> 69, 69, 62, 77, 75, 62, 67, 61, 72, …
## $ flagrant_fouls <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, …
## $ fouls <int> 17, 16, 19, 19, 14, 10, 18, 14, 17, …
## $ free_throw_pct <dbl> 64.7, 85.7, 78.9, 78.9, 57.1, 90.9, …
## $ free_throws_made <int> 11, 12, 15, 15, 4, 10, 12, 20, 14, 1…
## $ free_throws_attempted <int> 17, 14, 19, 19, 7, 11, 16, 24, 19, 1…
## $ largest_lead <chr> "20", "7", "7", "23", "7", "16", "2"…
## $ offensive_rebounds <int> 12, 4, 5, 8, 11, 7, 12, 8, 10, 8, 9,…
## $ points_in_paint <chr> "18", "28", "28", "30", "38", "44", …
## $ steals <int> 3, 9, 3, 13, 10, 4, 8, 14, 7, 2, 7, …
## $ team_turnovers <int> 2, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, …
## $ technical_fouls <int> 1, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 1, …
## $ three_point_field_goal_pct <dbl> 42.9, 19.2, 25.0, 30.4, 21.7, 46.7, …
## $ three_point_field_goals_made <int> 15, 5, 4, 7, 5, 7, 8, 3, 12, 7, 8, 6…
## $ three_point_field_goals_attempted <int> 35, 26, 16, 23, 23, 15, 25, 18, 29, …
## $ total_rebounds <int> 49, 28, 43, 38, 31, 38, 34, 32, 35, …
## $ total_technical_fouls <int> 1, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 1, …
## $ total_turnovers <int> 18, 9, 17, 9, 8, 16, 17, 17, 9, 11, …
## $ turnover_points <chr> "17", "5", "13", "7", "7", "21", "20…
## $ turnovers <int> 16, 9, 16, 9, 8, 16, 16, 17, 8, 11, …
## $ opponent_team_id <int> 17, 9, 17, 20, 14, 11, 16, 19, 5, 9,…
## $ opponent_team_uid <chr> "s:40~l:59~t:17", "s:40~l:59~t:9", "…
## $ opponent_team_slug <chr> "las-vegas-aces", "new-york-liberty"…
## $ opponent_team_location <chr> "Las Vegas", "New York", "Las Vegas"…
## $ opponent_team_name <chr> "Aces", "Liberty", "Aces", "Dream", …
## $ opponent_team_abbreviation <chr> "LV", "NY", "LV", "ATL", "SEA", "PHX…
## $ opponent_team_display_name <chr> "Las Vegas Aces", "New York Liberty"…
## $ opponent_team_short_display_name <chr> "Aces", "Liberty", "Aces", "Dream", …
## $ opponent_team_color <chr> "000000", "0d79b9", "000000", "d0203…
## $ opponent_team_alternate_color <chr> "c7cfd4", "89c9bf", "c7cfd4", "383a3…
## $ opponent_team_logo <chr> "https://a.espncdn.com/i/teamlogos/w…
## $ opponent_team_score <int> 63, 82, 86, 65, 81, 71, 83, 76, 89, …
WNBA full player box score seasons (2002-2022) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wnba_player_box <- wehoop::load_wnba_player_box()
})
tictoc::toc()
## 0.451 sec elapsed
## [1] 183
nrow(wnba_player_box)
## [1] 4061
Women’s college basketball full play-by-play seasons (2004-2022) ~ 45-90 seconds
tictoc::tic()
progressr::with_progress({
wbb_pbp <- wehoop::load_wbb_pbp()
})
tictoc::toc()
## 15.024 sec elapsed
## [1] 5702
nrow(wbb_pbp)
## [1] 1883731
Women’s college basketball full team box score seasons (2006-2022) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wbb_team_box <- wehoop::load_wbb_team_box()
})
tictoc::toc()
## 0.757 sec elapsed
## [1] 5815
nrow(wbb_team_box)
## [1] 11630
Women’s college basketball full player box score seasons (2006-2022) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wbb_player_box <- wehoop::load_wbb_player_box()
})
tictoc::toc()
## 1.787 sec elapsed
## [1] 5814
nrow(wbb_player_box)
## [1] 164768