
Getting Started with wehoop
Saiem Gilani
Source: vignettes/getting-started-wehoop.Rmd
getting-started-wehoop.Rmd
Welcome folks,
I’m Saiem Gilani, one of the authors of wehoop
,
and I hope to give the community a high-quality resource for accessing
women’s basketball data for statistical analysis, basketball research,
and more. I am excited to show you some of what you can do with this
edition of the package.
Installing R and RStudio
- Head to https://cran.r-project.org
- Select the appropriate link for your operating system (Windows, Mac OS X, or Linux)
- Windows - Select base and download the most recent version
- Mac OS X - Select Latest Release, but check to make sure your OS is the correct version. Look through Binaries for Legacy OS X Systems if you are on an older release
- Linux - Select the appropriate distro and follow the installation instructions
- Head to Posit.co
- Follow the associated download and installation instructions for RStudio.
- Start peering over the RStudio IDE Cheatsheet. An IDE is an integrated development environment.
- For Windows users: I recommend you install Rtools. This is not an R package! It is “a collection of resources for building packages for R under Microsoft Windows, or for building R itself”. Go to https://cran.r-project.org/bin/windows/Rtools/ and follow the directions for installation.
Install wehoop
# You can install using the pacman package using the following code:
if (!requireNamespace('pacman', quietly = TRUE)){
install.packages('pacman')
}
pacman::p_load(wehoop, dplyr, glue, tictoc, progressr)
Quick Start
WNBA full play-by-play seasons (2002-2022) ~ 30-60 seconds
tictoc::tic()
progressr::with_progress({
wnba_pbp <- wehoop::load_wnba_pbp()
})
tictoc::toc()
## 1.388 sec elapsed
## 13.91 sec elapsed
glue::glue("{nrow(wnba_pbp)} rows of WNBA play-by-play data from {length(unique(wnba_pbp$game_id))} games.")
## 93166 rows of WNBA play-by-play data from 241 games.
## 1782985 rows of WNBA play-by-play data from 4674 games.
dplyr::glimpse(wnba_pbp)
## Rows: 93,166
## Columns: 56
## $ id <dbl> 4013916504, 4013916507, 4013916509, 40…
## $ sequence_number <chr> "4", "7", "9", "12", "14", "15", "16",…
## $ type_id <chr> "615", "45", "615", "92", "156", "70",…
## $ type_text <chr> "Jumpball", "Personal Foul", "Jumpball…
## $ text <chr> "Queen Egbo vs. Myisha Hines-Allen (De…
## $ away_score <int> 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 5, 5, 7,…
## $ home_score <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 4, 4,…
## $ period_number <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ period_display_value <chr> "1st Quarter", "1st Quarter", "1st Qua…
## $ clock_display_value <chr> "10:00", "9:47", "9:33", "9:31", "9:31…
## $ scoring_play <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ score_value <int> 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 3, 2, 2,…
## $ team_id <chr> "5", "16", "5", "5", "5", "5", "16", "…
## $ participants_0_athlete_id <chr> "4398779", "3142055", "4398776", "4398…
## $ participants_1_athlete_id <chr> "3142055", NA, "3146151", "3058893", N…
## $ participants_2_athlete_id <chr> "4398751", NA, "4398751", NA, NA, NA, …
## $ wallclock <chr> "2022-05-06T23:10:02Z", "2022-05-06T23…
## $ shooting_play <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALS…
## $ coordinate_x <int> -214748340, 38, -214748340, 37, 37, 32…
## $ coordinate_y <int> -214748365, 24, -214748365, 9, 9, 3, 0…
## $ season <int> 2022, 2022, 2022, 2022, 2022, 2022, 20…
## $ season_type <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ away_team_id <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
## $ away_team_name <chr> "Indiana", "Indiana", "Indiana", "Indi…
## $ away_team_mascot <chr> "Fever", "Fever", "Fever", "Fever", "F…
## $ away_team_abbrev <chr> "IND", "IND", "IND", "IND", "IND", "IN…
## $ away_team_name_alt <chr> "Indiana", "Indiana", "Indiana", "Indi…
## $ home_team_id <int> 16, 16, 16, 16, 16, 16, 16, 16, 16, 16…
## $ home_team_name <chr> "Washington", "Washington", "Washingto…
## $ home_team_mascot <chr> "Mystics", "Mystics", "Mystics", "Myst…
## $ home_team_abbrev <chr> "WSH", "WSH", "WSH", "WSH", "WSH", "WS…
## $ home_team_name_alt <chr> "Washington", "Washington", "Washingto…
## $ home_team_spread <dbl> 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5…
## $ game_spread <dbl> 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5…
## $ home_favorite <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR…
## $ game_spread_available <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
## $ game_id <int> 401391650, 401391650, 401391650, 40139…
## $ qtr <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ time <chr> "10:00", "9:47", "9:33", "9:31", "9:31…
## $ clock_minutes <int> 10, 9, 9, 9, 9, 9, 9, 9, 8, 8, 8, 8, 7…
## $ clock_seconds <dbl> 0, 47, 33, 31, 31, 31, 11, 7, 58, 44, …
## $ half <chr> "1", "1", "1", "1", "1", "1", "1", "1"…
## $ game_half <chr> "1", "1", "1", "1", "1", "1", "1", "1"…
## $ lag_qtr <dbl> NA, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ lead_qtr <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ lag_game_half <chr> NA, "1", "1", "1", "1", "1", "1", "1",…
## $ lead_game_half <chr> "1", "1", "1", "1", "1", "1", "1", "1"…
## $ start_quarter_seconds_remaining <int> 600, 587, 573, 571, 571, 571, 551, 547…
## $ start_half_seconds_remaining <int> 1200, 1187, 1173, 1171, 1171, 1171, 11…
## $ start_game_seconds_remaining <int> 2400, 2387, 2373, 2371, 2371, 2371, 23…
## $ game_play_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,…
## $ end_quarter_seconds_remaining <dbl> 600, 600, 587, 573, 571, 571, 571, 551…
## $ end_half_seconds_remaining <dbl> 1200, 1200, 1187, 1173, 1171, 1171, 11…
## $ end_game_seconds_remaining <dbl> 2400, 2400, 2387, 2373, 2371, 2371, 23…
## $ period <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ type_abbreviation <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
WNBA full team box score seasons (2003-2022) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wnba_team_box <- wehoop::load_wnba_team_box()
})
tictoc::toc()
## 0.114 sec elapsed
glue::glue("{nrow(wnba_team_box)} rows of WNBA team boxscore data from {length(unique(wnba_team_box$game_id))} games.")
## 482 rows of WNBA team boxscore data from 241 games.
dplyr::glimpse(wnba_team_box)
## Rows: 482
## Columns: 52
## $ game_id <int> 401455684, 401455684, 401455683, 401…
## $ season <int> 2022, 2022, 2022, 2022, 2022, 2022, …
## $ season_type <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
## $ game_date <date> 2022-09-18, 2022-09-18, 2022-09-16,…
## $ team_id <int> 17, 18, 17, 18, 18, 17, 18, 17, 18, …
## $ team_uid <chr> "s:40~l:59~t:17", "s:40~l:59~t:18", …
## $ team_slug <chr> "las-vegas-aces", "connecticut-sun",…
## $ team_location <chr> "Las Vegas", "Connecticut", "Las Veg…
## $ team_name <chr> "Aces", "Sun", "Aces", "Sun", "Sun",…
## $ team_abbreviation <chr> "LV", "CONN", "LV", "CONN", "CONN", …
## $ team_display_name <chr> "Las Vegas Aces", "Connecticut Sun",…
## $ team_short_display_name <chr> "Aces", "Sun", "Aces", "Sun", "Sun",…
## $ team_color <chr> "000000", "de6b36", "000000", "de6b3…
## $ team_alternate_color <chr> "c7cfd4", "002d62", "c7cfd4", "002d6…
## $ team_logo <chr> "https://a.espncdn.com/i/teamlogos/w…
## $ team_home_away <chr> "away", "home", "away", "home", "awa…
## $ team_score <int> 78, 71, 76, 105, 71, 85, 64, 67, 72,…
## $ team_winner <lgl> TRUE, FALSE, FALSE, TRUE, FALSE, TRU…
## $ assists <int> 18, 19, 19, 32, 19, 21, 18, 8, 25, 1…
## $ blocks <int> 3, 5, 7, 2, 2, 4, 4, 4, 7, 10, 3, 5,…
## $ defensive_rebounds <int> 28, 25, 21, 28, 28, 32, 25, 27, 37, …
## $ field_goal_pct <dbl> 45.3, 41.3, 45.0, 55.8, 42.2, 51.6, …
## $ field_goals_made <int> 29, 26, 27, 43, 27, 33, 28, 23, 27, …
## $ field_goals_attempted <int> 64, 63, 60, 77, 64, 64, 74, 58, 59, …
## $ flagrant_fouls <int> 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, …
## $ fouls <int> 14, 18, 12, 18, 16, 14, 14, 10, 19, …
## $ free_throw_pct <dbl> 71.4, 72.2, 73.3, 84.6, 66.7, 76.5, …
## $ free_throws_made <int> 10, 13, 11, 11, 14, 13, 3, 16, 16, 7…
## $ free_throws_attempted <int> 14, 18, 15, 13, 21, 17, 5, 19, 16, 1…
## $ largest_lead <int> 10, 4, 8, 29, 2, 20, 8, 12, 9, 11, 8…
## $ offensive_rebounds <int> 4, 9, 3, 10, 6, 2, 13, 6, 6, 7, 5, 1…
## $ steals <int> 11, 7, 3, 10, 6, 5, 8, 3, 5, 13, 7, …
## $ team_turnovers <int> 1, 1, 2, 0, 1, 1, 3, 2, 0, 0, 2, 2, …
## $ technical_fouls <int> 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ three_point_field_goal_pct <dbl> 38.5, 46.2, 42.3, 53.3, 21.4, 23.1, …
## $ three_point_field_goals_made <int> 10, 6, 11, 8, 3, 6, 5, 5, 2, 8, 10, …
## $ three_point_field_goals_attempted <int> 26, 13, 26, 15, 14, 26, 16, 24, 8, 2…
## $ total_rebounds <int> 32, 34, 24, 38, 34, 34, 38, 33, 43, …
## $ total_technical_fouls <int> 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, …
## $ total_turnovers <int> 13, 16, 16, 9, 9, 8, 10, 12, 23, 13,…
## $ turnovers <int> 12, 15, 14, 9, 8, 7, 7, 10, 23, 13, …
## $ opponent_team_id <int> 18, 17, 18, 17, 17, 18, 17, 18, 19, …
## $ opponent_team_uid <chr> "s:40~l:59~t:18", "s:40~l:59~t:17", …
## $ opponent_team_slug <chr> "connecticut-sun", "las-vegas-aces",…
## $ opponent_team_location <chr> "Connecticut", "Las Vegas", "Connect…
## $ opponent_team_name <chr> "Sun", "Aces", "Sun", "Aces", "Aces"…
## $ opponent_team_abbreviation <chr> "CONN", "LV", "CONN", "LV", "LV", "C…
## $ opponent_team_display_name <chr> "Connecticut Sun", "Las Vegas Aces",…
## $ opponent_team_short_display_name <chr> "Sun", "Aces", "Sun", "Aces", "Aces"…
## $ opponent_team_color <chr> "de6b36", "000000", "de6b36", "00000…
## $ opponent_team_alternate_color <chr> "002d62", "c7cfd4", "002d62", "c7cfd…
## $ opponent_team_logo <chr> "https://a.espncdn.com/i/teamlogos/w…
WNBA full player box score seasons (2002-2022) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wnba_player_box <- wehoop::load_wnba_player_box()
})
tictoc::toc()
## 0.135 sec elapsed
## [1] 241
nrow(wnba_player_box)
## [1] 4559
Women’s college basketball full play-by-play seasons (2004-2022) ~ 45-90 seconds
tictoc::tic()
progressr::with_progress({
wbb_pbp <- wehoop::load_wbb_pbp()
})
tictoc::toc()
## 17.675 sec elapsed
## [1] 5456
nrow(wbb_pbp)
## [1] 1802443
Women’s college basketball full team box score seasons (2006-2022) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wbb_team_box <- wehoop::load_wbb_team_box()
})
tictoc::toc()
## 0.203 sec elapsed
## [1] 5565
nrow(wbb_team_box)
## [1] 11130
Women’s college basketball full player box score seasons (2006-2022) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wbb_player_box <- wehoop::load_wbb_player_box()
})
tictoc::toc()
## 0.708 sec elapsed
## [1] 5567
nrow(wbb_player_box)
## [1] 112708