Introduction
The bouncer package provides comprehensive cricket analytics featuring:
- Ball-by-ball skill tracking - Player, team, and venue skill indices updated on every delivery
- Residual-based skill indices - Skills measured as deviation from context-expected performance
- Match simulation - Ball-by-ball simulation using learned skill profiles
- Win probability - Live in-match win probability predictions
- Player attribution - Quantifying individual contributions via zero-ablation
This vignette walks you through installation, data setup, and basic usage.
Installation
Install the package from GitHub:
# install.packages("devtools")
devtools::install_github("peteowen1/bouncerverse", subdir = "bouncer")Accessing Data
There are three ways to access bouncer data, from easiest to most powerful:
Option 1: Query Local Database
Connect to your local DuckDB database:
library(bouncer)
# Connect to local database
conn <- connect_to_bouncer()
# Run SQL queries
recent_matches <- DBI::dbGetQuery(conn, "
SELECT match_id, team1, team2, match_date, match_type
FROM cricsheet.matches
WHERE match_type = 'T20'
ORDER BY match_date DESC
LIMIT 20
")
# Get top T20 batters by skill index
top_batters <- DBI::dbGetQuery(conn, "
SELECT player_id, batter_scoring_index, batter_survival_rate
FROM t20_player_skill
ORDER BY batter_scoring_index DESC
LIMIT 10
")
# Always disconnect when done
disconnect_bouncer(conn)Or use the helper functions for common queries:
# Query matches with filters
matches <- query_matches(match_type = "T20", limit = 20)
# Query deliveries with filters
deliveries <- query_deliveries(batter_id = "V Kohli", match_type = "T20")Option 2: Download Parquet Files
For faster repeated queries, download the parquet files locally:
# Download all parquet files (~500MB total)
install_parquets_from_release()
# Or just specific tables
install_parquets_from_release(tables = c("matches", "deliveries_short_form"))Option 3: Build Full Local Database
For the complete experience with all features:
# Download from Cricsheet and build local DuckDB (30-60 min first time)
install_all_bouncer_data()
# Or install from GitHub Releases (faster)
install_bouncerdata_from_release()
# Check what data was loaded
get_data_info()Available Tables
| Table | Description |
|---|---|
cricsheet.matches |
Match metadata (teams, venue, date, result) |
cricsheet.players |
Player registry |
cricsheet.deliveries_long_form |
Ball-by-ball data for Tests, First-class |
cricsheet.deliveries_short_form |
Ball-by-ball data for T20s, ODIs |
t20_player_skill |
T20 player skill indices |
odi_player_skill |
ODI player skill indices |
test_player_skill |
Test player skill indices |
team_elo |
Team ELO ratings |
Looking Up Players
Use get_player() to look up any player by name:
# Look up a player (partial name match works)
kohli <- get_player("Virat Kohli")
print(kohli)
# Get player with T20 skill indices
kohli_t20 <- get_player("Virat Kohli", format = "t20")
print(kohli_t20)
# Search for players
search_players("Smith")Looking Up Teams
Use get_team() to get a team’s current ELO rating:
# Get team info
india <- get_team("India", format = "t20")
print(india)
# Search for teams
search_teams("Mumbai")Analyzing Players
Get detailed player analysis with analyze_player():
# Get comprehensive player analysis
analysis <- analyze_player("Jasprit Bumrah", format = "t20")
print(analysis)
# Compare two players
comparison <- compare_players("Virat Kohli", "Steve Smith", format = "test")
print(comparison)Player Statistics
Get detailed batting and bowling statistics:
# Get all players' batting stats (min 100 balls faced)
all_batters <- player_batting_stats()
# Get T20 batting stats for all players
t20_batters <- player_batting_stats(match_type = "T20")
# Get stats for a specific player
kohli_batting <- player_batting_stats("V Kohli")
# T20-specific batting stats for a player
kohli_t20 <- player_batting_stats("V Kohli", match_type = "T20")
# Get all players' bowling stats
all_bowlers <- player_bowling_stats()
# Bowling stats for a specific player
bumrah_bowling <- player_bowling_stats("JJ Bumrah")
# Test bowling stats for a specific player
anderson_tests <- player_bowling_stats("JM Anderson", match_type = "Test")Team Statistics
Get aggregated team statistics:
# Get all teams' batting stats
all_teams_batting <- team_batting_stats()
# Get T20 batting stats for all teams
t20_teams <- team_batting_stats(match_type = "T20")
# Get stats for a specific team
india_batting <- team_batting_stats("India")
india_bowling <- team_bowling_stats("India")
# Head-to-head record
h2h <- head_to_head("India", "Australia")
h2h_tests <- head_to_head("England", "Australia", match_type = "Test")Venue Statistics
Get venue-level statistics:
# Get all venues
all_venues <- venue_stats()
# Get T20 venue stats
t20_venues <- venue_stats(match_type = "T20")
# Get stats for a specific venue
mcg <- venue_stats("MCG")Analyzing Matches
Analyze a completed match with analyze_match():
# Get a match ID (you can find these in the database or from Cricsheet)
match <- analyze_match("1234567")
print(match)Predicting Match Outcomes
Use predict_match() to get win probability for an
upcoming match:
# Predict a match outcome
prediction <- predict_match("India", "Australia", format = "t20")
print(prediction)
# Compare two teams
matchup <- compare_teams("India", "Australia", format = "t20")
print(matchup)In-Match Win Probability
Calculate live win probability during a match:
# First innings: Team at 85/2 after 10 overs
wp <- predict_win_probability(
current_score = 85,
wickets = 2,
overs = 10.0,
innings = 1,
format = "t20"
)
print(wp)
# Second innings: Chasing 180, currently 100/3 after 12.4 overs
wp <- predict_win_probability(
current_score = 100,
wickets = 3,
overs = 12.4,
innings = 2,
target = 180,
format = "t20"
)
print(wp)Score Projection
Project the final innings score from any game state:
# Scoreboard shows: "India 80/3 (10.0 overs)"
projected <- calculate_projected_score(
current_score = 80,
wickets = 3, # wickets fallen
overs = 10.0, # overs bowled
format = "t20"
)
print(projected)Visualization
The package includes ggplot2-based visualization functions:
library(ggplot2)
# Plot score progression for a match
plot_score_progression("1234567")
# Plot win probability over a match
plot_win_probability("1234567", format = "t20")
# Plot a player's skill progression over time
plot_skill_progression("Virat Kohli", format = "t20")
# Plot team ELO history
plot_elo_history("India", format = "t20")
# Compare two players visually
plot_player_comparison("Virat Kohli", "Steve Smith", format = "test")Understanding Skill Indices
Bouncer uses residual-based skill indices that measure how a player performs relative to what’s expected given the match context. Here’s how to interpret them:
Player Skills
| Skill | Meaning | Interpretation |
|---|---|---|
batter_scoring_index |
Runs scored vs expected | +0.05 = scores ~0.05 more runs per ball than average |
batter_survival_rate |
Survival probability | 0.98 = gets out ~2% of balls faced |
bowler_economy_index |
Runs conceded vs expected | -0.05 = concedes ~0.05 fewer runs per ball |
bowler_strike_rate |
Wicket probability | 0.05 = takes wicket ~5% of balls bowled |
Team Skills
| Skill | Meaning |
|---|---|
batting_team_runs_skill |
Team’s deviation from expected when batting |
batting_team_wicket_skill |
Team’s wicket rate deviation when batting |
bowling_team_runs_skill |
Team’s deviation from expected when bowling |
bowling_team_wicket_skill |
Team’s wicket rate deviation when bowling |
Key insight: A skill index of 0 means “average” (performs as expected). Positive values for batting skills mean better-than-average. For bowling economy, negative is better (conceding fewer runs).
Querying the Database Directly
For advanced analysis, you can query the database directly using SQL:
# Connect to local database
conn <- connect_to_bouncer()
# Run any SQL query
ipl_matches <- DBI::dbGetQuery(conn, "
SELECT match_id, team1, team2, match_date, venue
FROM cricsheet.matches
WHERE event_name LIKE '%Indian Premier League%'
ORDER BY match_date DESC
LIMIT 50
")
# Get ball-by-ball data for a specific match
deliveries <- DBI::dbGetQuery(conn, "
SELECT *
FROM cricsheet.deliveries
WHERE match_id = '1234567'
ORDER BY innings, over, ball
")
# Aggregate stats
top_scorers <- DBI::dbGetQuery(conn, "
SELECT
batter_id,
SUM(runs_batter) as total_runs,
COUNT(*) as balls_faced,
ROUND(SUM(runs_batter) * 100.0 / COUNT(*), 2) as strike_rate
FROM cricsheet.deliveries
WHERE match_type = 'T20'
GROUP BY batter_id
HAVING COUNT(*) > 500
ORDER BY total_runs DESC
LIMIT 20
")
# Don't forget to disconnect!
disconnect_bouncer(conn)Or use the helper functions:
# Query specific deliveries with filters
deliveries <- query_deliveries(
batter_id = "V Kohli",
match_type = "t20",
event = "Indian Premier League",
limit = 1000
)
# Get batting stats
batting <- query_batter_stats(
batter_id = "V Kohli",
match_type = "t20"
)
# Get head-to-head stats
h2h <- query_batter_stats(
batter_id = "V Kohli",
bowler_id = "J Bumrah"
)Next Steps
-
Match Analysis: See
vignette("match-analysis")for detailed match analysis workflows -
Predictions: See
vignette("predictions")for pre-match and in-match prediction models -
Player Analysis: See
vignette("player-analysis")for deep dives into player skill tracking -
Simulation: See
vignette("simulation")for ball-by-ball match simulation
Getting Help
- File issues at: https://github.com/peteowen1/bouncerverse/issues
- View function documentation:
?function_name - View all package functions:
help(package = "bouncer")
