
Getting Started with wehoop
Saiem Gilani
Source: vignettes/getting-started-wehoop.Rmd
getting-started-wehoop.Rmd
Welcome folks,
I’m Saiem Gilani, one of the authors of wehoop
, and I hope to give the community a high-quality resource for accessing women’s basketball data for statistical analysis, basketball research, and more. I am excited to show you some of what you can do with this edition of the package.
Installing R and RStudio
- Head to https://cran.r-project.org
- Select the appropriate link for your operating system (Windows, Mac OS X, or Linux)
- Windows - Select base and download the most recent version
- Mac OS X - Select Latest Release, but check to make sure your OS is the correct version. Look through Binaries for Legacy OS X Systems if you are on an older release
- Linux - Select the appropriate distro and follow the installation instructions
- Head to RStudio.com
- Follow the associated download and installation instructions for RStudio.
- Start peering over the RStudio IDE Cheatsheet. An IDE is an integrated development environment.
- For Windows users: I recommend you install Rtools. This is not an R package! It is “a collection of resources for building packages for R under Microsoft Windows, or for building R itself”. Go to https://cran.r-project.org/bin/windows/Rtools/ and follow the directions for installation.
Install wehoop
# You can install using the pacman package using the following code:
if (!requireNamespace('pacman', quietly = TRUE)){
install.packages('pacman')
}
pacman::p_load(wehoop, dplyr, glue, tictoc, progressr)
Quick Start
WNBA full play-by-play seasons (2002-2022) ~ 30-60 seconds
tictoc::tic()
progressr::with_progress({
wnba_pbp <- wehoop::load_wnba_pbp()
})
tictoc::toc()
## 0.519 sec elapsed
## 13.91 sec elapsed
glue::glue("{nrow(wnba_pbp)} rows of WNBA play-by-play data from {length(unique(wnba_pbp$game_id))} games.")
## 13846 rows of WNBA play-by-play data from 35 games.
## 1782985 rows of WNBA play-by-play data from 4674 games.
dplyr::glimpse(wnba_pbp)
## Rows: 13,846
## Columns: 42
## $ shooting_play <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRU…
## $ sequence_number <chr> "4", "7", "9", "12", "14", "15", "16", "17",…
## $ period_display_value <chr> "1st Quarter", "1st Quarter", "1st Quarter",…
## $ period_number <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ home_score <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 4, 4, 4, 4,…
## $ coordinate_x <int> -214748340, 38, -214748340, 37, 37, 32, 12, …
## $ coordinate_y <int> -214748365, 24, -214748365, 9, 9, 3, 0, 0, 2…
## $ scoring_play <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ clock_display_value <chr> "10:00", "9:47", "9:33", "9:31", "9:31", "9:…
## $ team_id <chr> "5", "16", "5", "5", "5", "5", "16", "5", "5…
## $ type_id <chr> "615", "45", "615", "92", "156", "70", "136"…
## $ type_text <chr> "Jumpball", "Personal Foul", "Jumpball", "Ju…
## $ away_score <int> 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 5, 5, 7, 7, 8,…
## $ id <dbl> 4013916504, 4013916507, 4013916509, 40139165…
## $ text <chr> "Queen Egbo vs. Myisha Hines-Allen (Destanni…
## $ score_value <int> 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 3, 2, 2, 0, 1,…
## $ participants_0_athlete_id <chr> "4398779", "3142055", "4398776", "4398751", …
## $ participants_1_athlete_id <chr> "3142055", NA, "3146151", "3058893", NA, NA,…
## $ participants_2_athlete_id <chr> "4398751", NA, "4398751", NA, NA, NA, NA, NA…
## $ season <int> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 20…
## $ season_type <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ game_id <int> 401391650, 401391650, 401391650, 401391650, …
## $ away_team_id <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
## $ away_team_name <chr> "Indiana", "Indiana", "Indiana", "Indiana", …
## $ away_team_mascot <chr> "Fever", "Fever", "Fever", "Fever", "Fever",…
## $ away_team_abbrev <chr> "IND", "IND", "IND", "IND", "IND", "IND", "I…
## $ away_team_name_alt <chr> "Indiana", "Indiana", "Indiana", "Indiana", …
## $ home_team_id <int> 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, …
## $ home_team_name <chr> "Washington", "Washington", "Washington", "W…
## $ home_team_mascot <chr> "Mystics", "Mystics", "Mystics", "Mystics", …
## $ home_team_abbrev <chr> "WSH", "WSH", "WSH", "WSH", "WSH", "WSH", "W…
## $ home_team_name_alt <chr> "Washington", "Washington", "Washington", "W…
## $ home_team_spread <dbl> 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,…
## $ game_spread <dbl> 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5,…
## $ home_favorite <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR…
## $ clock_minutes <chr> "10", "9", "9", "9", "9", "9", "9", "9", "8"…
## $ clock_seconds <chr> "00", "47", "33", "31", "31", "31", "11", "0…
## $ half <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1",…
## $ lag_half <chr> NA, "1", "1", "1", "1", "1", "1", "1", "1", …
## $ lead_half <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1",…
## $ game_play_number <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ type_abbreviation <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
WNBA full team box score seasons (2003-2022) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wnba_team_box <- wehoop::load_wnba_team_box()
})
tictoc::toc()
## 0.021 sec elapsed
glue::glue("{nrow(wnba_team_box)} rows of WNBA team boxscore data from {length(unique(wnba_team_box$game_id))} games.")
## 70 rows of WNBA team boxscore data from 35 games.
dplyr::glimpse(wnba_team_box)
## Rows: 70
## Columns: 40
## $ team_short_display_name <chr> "Fever"…
## $ team_uid <chr> "s:40~l…
## $ team_alternate_color <chr> "08265b…
## $ team_color <chr> "e2382f…
## $ team_display_name <chr> "Indian…
## $ team_name <chr> "Fever"…
## $ team_logo <chr> "https:…
## $ team_location <chr> "Indian…
## $ team_id <chr> "5", "1…
## $ team_abbreviation <chr> "IND", …
## $ team_slug <chr> "indian…
## $ field_goals_made_field_goals_attempted <chr> "30-72"…
## $ field_goal_pct <chr> "41.7",…
## $ three_point_field_goals_made_three_point_field_goals_attempted <chr> "5-22",…
## $ three_point_field_goal_pct <chr> "22.7",…
## $ free_throws_made_free_throws_attempted <chr> "5-8", …
## $ free_throw_pct <chr> "62.5",…
## $ total_rebounds <chr> "42", "…
## $ offensive_rebounds <chr> "12", "…
## $ defensive_rebounds <chr> "30", "…
## $ assists <chr> "17", "…
## $ steals <chr> "3", "4…
## $ blocks <chr> "4", "5…
## $ turnovers <chr> "14", "…
## $ team_turnovers <chr> "4", "1…
## $ total_turnovers <chr> "18", "…
## $ technical_fouls <chr> "0", "0…
## $ total_technical_fouls <chr> "0", "0…
## $ flagrant_fouls <chr> "0", "0…
## $ fouls <chr> "20", "…
## $ largest_lead <chr> "4", "2…
## $ home_away <chr> "HOME",…
## $ opponent_id <int> 16, 5, …
## $ opponent_name <chr> "Washin…
## $ opponent_mascot <chr> "Mystic…
## $ opponent_abbrev <chr> "WSH", …
## $ game_id <int> 4013916…
## $ season <int> 2022, 2…
## $ season_type <int> 2, 2, 2…
## $ game_date <date> 2022-0…
WNBA full player box score seasons (2002-2022) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wnba_player_box <- wehoop::load_wnba_player_box()
})
tictoc::toc()
## 0.029 sec elapsed
## [1] 35
nrow(wnba_player_box)
## [1] 663
Women’s college basketball full play-by-play seasons (2004-2022) ~ 45-90 seconds
tictoc::tic()
progressr::with_progress({
wbb_pbp <- wehoop::load_wbb_pbp()
})
tictoc::toc()
## 17.715 sec elapsed
## [1] 5347
nrow(wbb_pbp)
## [1] 1760447
Women’s college basketball full team box score seasons (2006-2022) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wbb_team_box <- wehoop::load_wbb_team_box()
})
tictoc::toc()
## 0.168 sec elapsed
## [1] 5509
nrow(wbb_team_box)
## [1] 11018
Women’s college basketball full player box score seasons (2006-2022) ~ 5-30 seconds
tictoc::tic()
progressr::with_progress({
wbb_player_box <- wehoop::load_wbb_player_box()
})
tictoc::toc()
## 1.056 sec elapsed
## [1] 5508
nrow(wbb_player_box)
## [1] 111857