API Reference¶
Comprehensive reference for all scrapernhl functions, parameters, and return values.
Table of Contents¶
- HockeyScraper
- Constructor
- play_by_play()
- player_stats()
- schedule()
- roster()
- standings()
- teams_by_season()
- seasons()
- player_profile()
- bootstrap()
- scrape_multiple_games()
- url_for()
- fetch_raw()
- Convenience Aliases
- Bootstrap Properties and Accessors
- NHL-Only Methods
- scrape_game()
- scrape_plays()
- html_pbp()
- shifts()
- get_game_data()
- scrape_teams()
- team_stats()
- standings_by_date()
- draft()
- draft_records()
- team_draft_history()
- goal_replay()
- NHL Analytics Pipeline
- build_shifts_events()
- build_on_ice_long()
- build_on_ice_wide()
- seconds_matrix()
- strengths_by_second()
- toi_by_strength_all()
- shared_toi_teammates()
- shared_toi_opponents()
- on_ice_stats()
- combo_on_ice_stats()
- team_strength_aggregates()
- Functional API — scrape()
- CLI Reference
- Legacy NHL Scraper — scraper_legacy.py
- Importing legacy functions
- Core PBP functions
- Shift & strength pipeline
- On-ice stats & TOI
- Combination stats
- Team aggregates & expected goals
- Strength Notation
HockeyScraper¶
The unified entry point for all six leagues. Import from the top-level package:
Constructor¶
Creates a scraper for the specified league. For non-NHL leagues, bootstrap/configuration data is fetched lazily on first access and cached in bootstrap_data.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
league |
str |
Yes | League code. One of: 'nhl', 'ahl', 'pwhl', 'ohl', 'whl', 'qmjhl'. Case-insensitive. |
Public attributes after init:
| Attribute | Type | Description |
|---|---|---|
league |
str |
Normalised (lowercased) league code |
config |
LeagueConfig |
League settings: api_key, league_id, site_id, base_url, pbp_style, canvas_size, rate limits |
bootstrap_data |
dict \| None |
Pre-fetched league metadata (non-NHL only). Contains teams, seasons, divisions, conferences. None for NHL |
Raises: KeyError if the league is not supported.
Example:
from scrapernhl import HockeyScraper
nhl = HockeyScraper('nhl')
ahl = HockeyScraper('ahl')
pwhl = HockeyScraper('PWHL') # case-insensitive
play_by_play()¶
play_by_play(
game_id: int,
nhlify: bool = True,
raw: bool = False,
season_id: int | None = None,
is_playoff: bool = False,
) -> pd.DataFrame | dict
Fetches and parses play-by-play events for a single game. Works across all six leagues. For NHL, the output comes from the JSON API; for non-NHL leagues, events are sourced from the HockeyTech feed and normalized into a shared schema.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
game_id |
int |
— | League-specific game ID (e.g. 2023020001 for NHL, 1027781 for AHL) |
nhlify |
bool |
True |
For HockeyTech leagues (AHL/PWHL/OHL/WHL/QMJHL): merge separate shot and goal rows that occur at the same timestamp into a single goal row, matching NHL PBP structure. Set False to keep raw rows |
raw |
bool |
False |
If True, return the unprocessed API response dict instead of a DataFrame |
season_id |
int \| None |
None |
League-specific season ID (e.g. 90 for AHL 2024-25). Used to resolve the correct overtime period length for leagues that have changed OT formats over time. Defaults to the league's default_season |
is_playoff |
bool |
False |
Set True for playoff games so overtime game_seconds are computed with 20-minute periods instead of the regular-season OT length |
Returns: pd.DataFrame with one row per event, or dict if raw=True.
Key output columns (all leagues):
| Column | Description |
|---|---|
event |
Event type string (e.g. 'shot', 'goal', 'penalty', 'faceoff') |
period |
Period number (4+ = overtime) |
time |
Clock time in the period (MM:SS) |
elapsedTime |
Elapsed time since period start (MM:SS) |
time_seconds |
Period clock as seconds from period start |
game_seconds |
Total elapsed game seconds |
x, y |
Ice coordinates in feet, centered at the neutral zone dot |
shot_distance_ft |
Distance from net in feet (shot/goal events) |
shot_angle_deg |
Shot angle in degrees from goal line (shot/goal events) |
score_home, score_away |
Cumulative score at the time of the event |
game_id |
Game ID |
league |
League code |
scraped_at |
UTC timestamp when data was fetched |
Example:
from scrapernhl import HockeyScraper
ahl = HockeyScraper('ahl')
pbp = ahl.play_by_play(1027781)
print(pbp.shape) # (61, 67)
print(pbp['event'].value_counts())
# shot 22
# faceoff 14
# penalty 7
# goalie_change 4
# ...
# First few events
print(pbp[['event', 'period', 'time', 'x', 'y']].head())
# event period time x y
# 0 goalie_change 1 0:00 NaN NaN
# 1 goalie_change 1 0:00 NaN NaN
# 2 penalty 1 1:38 NaN NaN
# 3 shot 1 1:46 -53.65 -3.19
# 4 shot 1 2:03 9.88 -29.32
Aliases: scrape_pbp(), scrape_game_pbp()
player_stats()¶
player_stats(
season: int | None = None,
team: str = 'all',
position: Literal['skaters', 'goalies'] = 'skaters',
raw: bool = False,
**filters,
) -> pd.DataFrame | dict
Fetches player statistics for a season. Supports skaters and goalies for all six leagues.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
season |
int \| None |
None |
League-specific season ID (e.g. 20232024 for NHL, 90 for AHL). Defaults to config.default_season |
team |
str |
'all' |
Filter by team. Use 'all' for every team, or a team abbreviation / numeric ID (e.g. 'MTL' for NHL, '5' for AHL) |
position |
str |
'skaters' |
'skaters' for forward/defence stats, 'goalies' for goaltender stats |
raw |
bool |
False |
If True, return the unprocessed API response dict |
**filters |
Additional query parameters forwarded to the API (league-dependent) |
Returns: pd.DataFrame or dict if raw=True.
Key output columns (non-NHL skaters):
| Column | Description |
|---|---|
player_id |
HockeyTech player ID |
name |
Full player name |
position |
Position code |
teamCode |
Team abbreviation |
GP |
Games played |
G |
Goals |
A |
Assists |
PTS |
Points |
PTS/GP |
Points per game |
+/- |
Plus/minus |
PIM |
Penalty minutes |
PPG, SHG |
Power-play / shorthanded goals |
season, league |
Season ID and league code |
seasonName |
Human-readable season name (e.g. '2024-25') |
Notes:
- For NHL,
teammust be a valid tricode (e.g.'MTL') — the NHL stats endpoint is per-team only. To get league-wide skater stats, iterate over all teams. - Non-NHL results are league-wide when
team='all'.
Example:
from scrapernhl import HockeyScraper
ahl = HockeyScraper('ahl')
# All AHL skaters
skaters = ahl.player_stats(season=90, position='skaters')
print(skaters.shape) # (1089, 32)
print(skaters[['name', 'GP', 'G', 'A', 'PTS']].head(3))
# name GP G A PTS
# 0 Laurent Dauphin 49 15 41 56
# 1 Jakob Pelletier 45 20 33 53
# 2 Alex Barré-Boulet 49 17 35 52
# AHL goalies
goalies = ahl.player_stats(season=90, position='goalies')
# NHL — per-team only
nhl = HockeyScraper('nhl')
mtl_skaters = nhl.player_stats(team='MTL', season=20232024, position='skaters')
Aliases: scrape_skaters() (calls with position='skaters'), scrape_goalies() (calls with position='goalies')
schedule()¶
schedule(
team: str = 'all',
season: int | None = None,
raw: bool = False,
**filters,
) -> pd.DataFrame | dict
Fetches the game schedule. Returns every game in the season for the given team, or all teams when team='all'.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
team |
str |
'all' |
Team filter. Use 'all' for the full league schedule, or a team abbreviation / numeric ID |
season |
int \| None |
None |
Season ID. Defaults to config.default_season |
raw |
bool |
False |
If True, return the unprocessed API response dict |
**filters |
Additional query parameters forwarded to the API |
Returns: pd.DataFrame or dict if raw=True.
Key output columns (non-NHL):
| Column | Description |
|---|---|
gameId |
Game ID |
date |
Game date string |
homeCity, awayCity |
City names for home and away teams |
homeScore, awayScore |
Final scores (blank if not yet played) |
gameStatus |
Game status (e.g. 'Final', 'Scheduled') |
venue |
Arena name |
homeTeamName, awayTeamName |
Full team names (added by enrichment) |
season, league |
Season ID and league code |
Example:
from scrapernhl import HockeyScraper
ahl = HockeyScraper('ahl')
# Full AHL schedule
schedule = ahl.schedule(season=90)
print(len(schedule)) # ~1400 games
# Specific team
# First, find team ID from bootstrap
teams = ahl.get_teams()
print([(t['id'], t['name']) for t in teams[:3]])
# NHL schedule
nhl = HockeyScraper('nhl')
mtl = nhl.schedule(team='MTL', season=20232024)
print(mtl.shape) # (82, ...)
Alias: scrape_schedule()
roster()¶
Fetches the player roster for a specific team.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
team |
str |
— | Required. Team identifier. For NHL: tricode (e.g. 'MTL'). For non-NHL: numeric team ID string (e.g. '390'). Non-numeric codes are automatically resolved to numeric IDs via bootstrap data |
season |
int \| None |
None |
Season ID. Defaults to config.default_season |
raw |
bool |
False |
If True, return the unprocessed API response dict |
Returns: pd.DataFrame or dict if raw=True.
Key output columns (non-NHL):
| Column | Description |
|---|---|
player_id |
HockeyTech player ID |
name |
Full player name |
firstName, lastName |
Split name components |
position |
Position code |
tp_jersey_number |
Jersey number |
birthdate |
Date of birth |
h, w |
Height (cm) and weight (lbs) |
height_cm |
Parsed height in centimeters |
birthCity, birthCountry |
Birthplace |
shoots / catches |
Shooting/catching hand (skater/goalie) |
teamId, teamName, teamCode |
Team metadata |
season, league |
Season and league context |
Notes:
- For non-NHL leagues,
teammust be a numeric ID string. Retrieve valid IDs fromscraper.get_teams(). - Team code abbreviations (e.g.
'Rou'for Rouyn-Noranda in QMJHL) are automatically resolved to numeric IDs when passed to this method.
Example:
from scrapernhl import HockeyScraper
# OHL roster
ohl = HockeyScraper('ohl')
teams = ohl.get_teams()
print(teams[0]) # {'id': '7', 'name': 'Barrie Colts', ...}
roster = ohl.roster(team='7')
print(roster[['name', 'position', 'tp_jersey_number']].head(5))
# name position tp_jersey_number
# 0 Nicholas Desiderio LW 11
# 1 William Schneid RW 29
# ...
# NHL roster
nhl = HockeyScraper('nhl')
mtl_roster = nhl.roster(team='MTL', season=20232024)
Alias: scrape_roster()
standings()¶
Fetches current or historical league standings.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
season |
int \| None |
None |
Season ID. Defaults to config.default_season |
raw |
bool |
False |
If True, return the unprocessed API response dict |
**filters |
Additional query parameters (e.g. context='home' for home standings on HockeyTech leagues) |
Returns: pd.DataFrame or dict if raw=True.
Key output columns (non-NHL):
| Column | Description |
|---|---|
teamName |
Full team name |
teamId |
Numeric team ID |
team_code |
Team abbreviation |
wins |
Wins |
losses |
Losses |
ot_losses |
Overtime losses |
points |
Points |
games_played |
Games played |
goals_for, goals_against |
Goals for / against |
percentage |
Points percentage |
regulation_wins |
Regulation wins (ROW/ROW equivalent) |
streak |
Current streak |
past_10 |
Last 10 games record |
rank |
Division rank |
overall_rank |
League-wide rank |
season, league, seasonName |
Season and league metadata |
Example:
from scrapernhl import HockeyScraper
ahl = HockeyScraper('ahl')
standings = ahl.standings()
print(standings.shape) # (32, 28)
print(standings[['teamName', 'wins', 'losses', 'points']].head(3))
# teamName wins losses points
# 0 Providence Bruins 38 10 77
# 1 Wilkes-Barre/Scranton Penguins 35 13 75
# 2 Charlotte Checkers 29 17 61
# Filter by context (HockeyTech leagues)
home_standings = ahl.standings(context='home')
away_standings = ahl.standings(context='away')
Alias: scrape_standings()
teams_by_season()¶
Returns a DataFrame of teams that participated in a specific season. For NHL, pulls from the standings endpoint. For non-NHL, reads from bootstrap data.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
season |
int \| None |
None |
Season ID. Defaults to config.default_season |
raw |
bool |
False |
If True, return the unprocessed API response (NHL: dict, non-NHL: list) |
Returns: pd.DataFrame with columns season and league always present, or raw dict/list if raw=True.
Example:
from scrapernhl import HockeyScraper
nhl = HockeyScraper('nhl')
teams = nhl.teams_by_season(season=20232024)
print(teams.shape) # (32, ...)
ahl = HockeyScraper('ahl')
teams = ahl.teams_by_season(season=90)
print(teams[['name', 'id', 'season']].head(3))
seasons()¶
seasons(season_type: Literal['all', 'regular', 'playoff'] = 'all', raw: bool = False) -> pd.DataFrame | dict | list
Returns available seasons for the league. For NHL, fetches from the NHL seasons endpoint. For non-NHL, reads from bootstrap data (and discovers hidden playoff/preseason IDs for QMJHL and WHL by probing ID gaps).
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
season_type |
str |
'all' |
Which seasons to return: 'all', 'regular', or 'playoff' |
raw |
bool |
False |
If True, return the unprocessed API response dict or list |
Returns: pd.DataFrame with league column always present, or raw dict/list if raw=True.
Example:
from scrapernhl import HockeyScraper
ahl = HockeyScraper('ahl')
all_seasons = ahl.seasons('all')
regular = ahl.seasons('regular')
playoffs = ahl.seasons('playoff')
print(all_seasons[['id', 'name']].tail(5))
player_profile()¶
player_profile(
player_id: int,
season: int | None = None,
stats_type: str = 'standard',
raw: bool = False,
) -> dict
Fetches a player's full profile page from the HockeyTech API. Non-NHL leagues only. For NHL player data, use url_for('player_profile', player_id=...) or fetch_raw('player_profile', player_id=...).
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
player_id |
int |
— | HockeyTech player ID (obtain from player_stats() player_id column) |
season |
int \| None |
None |
Season ID. Defaults to config.default_season |
stats_type |
str |
'standard' |
'standard' for full stats profile, 'bio' for biographical data only |
raw |
bool |
False |
If True, return the unprocessed API response dict |
Returns: dict with keys:
| Key | Description |
|---|---|
info |
Bio data (name, position, birthdate, height, weight, etc.) |
careerStats |
Career stats aggregated by season |
seasonStats |
Stats for the requested season only |
gameByGame |
Game-by-game log for the season |
shotLocations |
Shot location data (if available for the league) |
Raises: NotImplementedError if called on an NHL scraper.
Example:
from scrapernhl import HockeyScraper
ahl = HockeyScraper('ahl')
# Get player ID first
stats = ahl.player_stats(season=90)
player_id = int(stats.iloc[0]['player_id'])
# Fetch full profile
profile = ahl.player_profile(player_id, season=90)
print(profile['info']) # bio dict
print(profile['careerStats']) # list of season dicts
bootstrap()¶
bootstrap(
game_id: int | None = None,
season: str = 'latest',
page_name: str = 'scorebar',
**filters,
) -> dict
Explicitly fetches raw bootstrap/configuration data from the HockeyTech API. Non-NHL only. Useful when you need game-specific configuration data or want to inspect the full raw metadata structure.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
game_id |
int \| None |
None |
Optional game ID for game-specific configuration context |
season |
str |
'latest' |
Season to fetch metadata for. 'latest' returns current season data |
page_name |
str |
'scorebar' |
Page context for the bootstrap request. Common values: 'scorebar', 'gamecenter', 'roster' |
**filters |
Additional query parameters forwarded to the API |
Returns: dict — the full raw bootstrap payload, which includes the league's configuration subtree.
Raises: NotImplementedError if called on an NHL scraper.
Notes:
Bootstrap data is automatically fetched on init for non-NHL leagues and stored in scraper.bootstrap_data. Only call this method explicitly when you need a different season or page context.
Example:
from scrapernhl import HockeyScraper
ahl = HockeyScraper('ahl')
# Same data as auto-fetched on init
bootstrap = ahl.bootstrap()
# Game-specific context
bootstrap = ahl.bootstrap(game_id=1027781, page_name='gamecenter')
# Historical season
bootstrap = ahl.bootstrap(season='88')
print(bootstrap['current_season_id']) # '90'
print(len(bootstrap['teamsNoAll'])) # 32
scrape_multiple_games()¶
Scrapes PBP data for multiple games and concatenates the results into a single DataFrame. Failed individual games are logged to stdout and skipped.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
game_ids |
list[int] |
— | List of game IDs to scrape |
**kwargs |
Additional keyword arguments forwarded to play_by_play() (e.g. nhlify, season_id, is_playoff) |
Returns: pd.DataFrame — concatenated PBP rows from all successful games. Empty DataFrame if all games fail.
Example:
from scrapernhl import HockeyScraper
ahl = HockeyScraper('ahl')
pbp = ahl.scrape_multiple_games([1027779, 1027781, 1027785])
print(pbp['game_id'].value_counts())
nhl = HockeyScraper('nhl')
pbp = nhl.scrape_multiple_games([2023020001, 2023020002])
url_for()¶
Returns the URL that would be fetched for a given data type and parameters, without making any HTTP request. Use this for debugging, inspecting raw endpoints, or building curl commands.
Parameters:
| Parameter | Type | Description |
|---|---|---|
data_type |
str |
Endpoint type. One of: 'pbp', 'stats', 'schedule', 'roster', 'standings', 'bootstrap', 'scorebar', 'player_profile', 'player_game_log' (NHL only) |
**kwargs |
Same keyword arguments as the corresponding scraper method |
Returns: str — the fully-constructed URL.
Example:
from scrapernhl import HockeyScraper
nhl = HockeyScraper('nhl')
print(nhl.url_for('pbp', game_id=2023020001))
# https://api-web.nhle.com/v1/gamecenter/2023020001/play-by-play
ahl = HockeyScraper('ahl')
print(ahl.url_for('standings', season=90))
print(ahl.url_for('player_profile', player_id=12345, season=90))
fetch_raw()¶
Fetches the unprocessed API response for a given data type, bypassing all parsing and transformation. Caching and rate limiting still apply.
Parameters:
| Parameter | Type | Description |
|---|---|---|
data_type |
str |
Endpoint type. Same options as url_for() |
**kwargs |
Same keyword arguments as the corresponding scraper method |
Returns: dict — raw JSON as returned by the API.
Example:
from scrapernhl import HockeyScraper
ahl = HockeyScraper('ahl')
raw = ahl.fetch_raw('standings', season=90)
print(raw.keys())
raw_pbp = ahl.fetch_raw('pbp', game_id=1027781)
Convenience Aliases¶
The following methods are direct aliases for the primary methods above:
| Alias | Equivalent call |
|---|---|
scrape_pbp(game_id, **kwargs) |
play_by_play(game_id, **kwargs) |
scrape_game_pbp(game_id, **kwargs) |
play_by_play(game_id, **kwargs) |
scrape_schedule(team, season, **kwargs) |
schedule(team, season, **kwargs) |
scrape_roster(team, season) |
roster(team, season) |
scrape_standings(season, **kwargs) |
standings(season, **kwargs) |
scrape_skaters(season, team, **kwargs) |
player_stats(season, team, position='skaters', **kwargs) |
scrape_goalies(season, team, **kwargs) |
player_stats(season, team, position='goalies', **kwargs) |
Bootstrap Properties and Accessors¶
Available on all non-NHL HockeyScraper instances. NHL calls raise NotImplementedError.
Properties¶
| Property | Type | Description |
|---|---|---|
teams |
list[dict] |
All teams, excluding the "All Teams" aggregate entry |
current_season_id |
str \| None |
Current season ID as a string |
current_league_id |
str \| None |
Current league ID as a string |
Accessor Methods¶
Returns the current season ID string.
Returns the current league ID string.
Returns the teams list. When include_all=True, includes the synthetic "All Teams" entry used in API filter parameters.
Looks up a team by its numeric ID. Returns None if not found.
Looks up a team by abbreviation code (case-insensitive). Returns None if not found.
Returns the list of seasons for the given type. For QMJHL and WHL, playoff and preseason IDs are discovered by probing ID gaps between regular seasons.
Returns the current season metadata dict, or None if not found in the seasons list.
Returns conferences. When include_all=True, includes the "All Conferences" aggregate entry.
Returns divisions. When include_all=True, includes the "All Divisions" aggregate entry.
Returns position list. When normalize=True, normalises PWHL "Defenders" → "Defencemen".
Returns the goalie qualification filter options (e.g. 'qualified', 'all').
Returns the league's inaugural season year as a string.
Returns True if the league supports French ('fr' is in svfLanguages). Currently True for QMJHL.
Returns the league's metadata dict containing id, name, short_name, code, logo_image.
Reads a boolean or string flag from the league's svfConfig block.
Returns the URL for the default player image when no headshot is available.
Returns True if the current season has active playoff data in bootstrap.
Returns True if the league config enables the expanded goalie stats option.
Example:
from scrapernhl import HockeyScraper
ahl = HockeyScraper('ahl')
# Quick lookups
print(ahl.current_season_id) # '90'
print(ahl.is_bilingual()) # False
# Teams
for team in ahl.teams:
print(team['id'], team['name'])
# Find a specific team
team = ahl.get_team_by_id('390')
print(team) # {'id': '390', 'name': 'San Diego Gulls', ...}
# Seasons
regular_seasons = ahl.get_seasons('regular')
current = ahl.get_current_season()
print(current['name']) # '2024-25'
NHL-Only Methods¶
These methods raise NotImplementedError when called on a non-NHL scraper.
scrape_game()¶
scrape_game(
game_id: int | str,
add_goal_replay: bool = False,
include_tuple: bool = False,
) -> pd.DataFrame
Full NHL game pipeline. Combines data from three sources:
- HTML play-by-play report — event descriptions, on-ice player names
- HTML shift reports (home + visitor) — player shift times, on-ice player IDs
- JSON API — event coordinates, shot types, player IDs
The three sources are merged on game time to produce a single enriched DataFrame with on-ice players, strength states, and zone start qualifiers.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
game_id |
int \| str |
— | NHL game ID (e.g. 2023020001) |
add_goal_replay |
bool |
False |
If True, fetch goal replay video URLs from PPT. Adds extra network requests |
include_tuple |
bool |
False |
If True, return a named tuple (pbp, shifts, html_pbp, home_team, away_team) instead of just the PBP DataFrame |
Returns: pd.DataFrame (or 5-tuple if include_tuple=True).
Key output columns:
| Column | Description |
|---|---|
Event |
HTML event type string |
Description |
Full HTML event description |
Per |
Period |
Time:Elapsed Game |
Elapsed game time |
strength, gameStrength |
Strength state (e.g. '5v5', '5v4') |
detailedGameStrength |
Detailed strength with empty-net notation (e.g. '5v6*') |
home_on_ice, away_on_ice |
Lists of on-ice player names |
home_on_id, away_on_id |
Lists of on-ice player IDs |
n_home_skaters, n_away_skaters |
Skater counts |
pulled_home, pulled_away |
Boolean — goalie pulled |
xCoord, yCoord |
API shot coordinates |
homeTeam, awayTeam |
Team tricodes |
gameId, gameDate |
Game metadata |
start_zone |
Zone where the event started (from shift data) |
Example:
from scrapernhl import HockeyScraper
nhl = HockeyScraper('nhl')
# Full enriched PBP
pbp = nhl.scrape_game(2023020001)
print(pbp.shape) # (1817, ~100 columns)
print(pbp.columns.tolist())
# Include metadata tuple
result = nhl.scrape_game(2023020001, include_tuple=True)
pbp, shifts, html_pbp, home_team, away_team = result
print(f"{away_team} @ {home_team}")
scrape_plays()¶
Fetches NHL play-by-play from the JSON API only (no HTML report). Lighter-weight than scrape_game() but lacks on-ice player lists and strength states.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
game_id |
int \| str |
— | NHL game ID |
add_goal_replay |
bool |
False |
If True, enrich goal rows with replay video URLs |
Returns: pd.DataFrame.
Example:
html_pbp()¶
html_pbp(
game_id: int | str,
raw: bool = False,
return_raw: bool = False,
) -> pd.DataFrame | dict | tuple
Scrapes and parses the NHL HTML play-by-play report from nhl.com. Returns the event table with on-ice player names from the official game report.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
game_id |
int \| str |
— | NHL game ID |
raw |
bool |
False |
If True, return the unprocessed bronze-layer dict ({'data': html_str, 'urls', 'game_id', 'scraped_on', 'source'}) |
return_raw |
bool |
False |
If True, return (DataFrame, raw_dict) tuple where raw_dict contains the raw parsed HTML structure |
Returns: pd.DataFrame, or raw dict if raw=True, or (pd.DataFrame, dict) tuple if return_raw=True.
Example:
nhl = HockeyScraper('nhl')
df = nhl.html_pbp(2023020001)
# With raw HTML parse tree
df, raw = nhl.html_pbp(2023020001, return_raw=True)
shifts()¶
Scrapes NHL HTML shift reports for both the home and visitor team from nhl.com. Returns all individual player shifts with start/end times.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
game_id |
int \| str |
— | NHL game ID |
raw |
bool |
False |
If True, return the unprocessed bronze-layer dict ({'home': html_str, 'away': html_str, 'urls', 'game_id', 'scraped_on', 'source'}) |
Returns: pd.DataFrame with one row per player shift, or raw dict if raw=True.
Key output columns:
| Column | Description |
|---|---|
player_id |
NHL player ID |
player_name |
Player full name |
team |
Team tricode |
period |
Period number |
shift_start, shift_end |
Shift start/end time (MM:SS) |
shift_start_sec, shift_end_sec |
Shift times in seconds |
duration |
Shift duration (MM:SS) |
Example:
nhl = HockeyScraper('nhl')
shifts = nhl.shifts(2023020001)
print(shifts.shape) # (~500 shifts per game)
print(shifts[['player_name', 'team', 'period', 'shift_start', 'shift_end']].head())
get_game_data()¶
Returns the raw NHL JSON API response for a game, with plays already extracted into a list. Useful for inspection or custom parsing.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
game_id |
int \| str |
— | NHL game ID |
add_goal_replay |
bool |
False |
If True, enrich goal plays with replay URLs |
Returns: dict — full API response with plays list.
scrape_teams()¶
Returns NHL team data from one of three public NHL endpoints. NHL only.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
source |
str |
'calendar' |
Data source. One of: 'calendar' (active teams from the schedule), 'franchise' (franchise list with first/last season from stats REST API), 'records' (full details with logos from records API) |
raw |
bool |
False |
If True, return the unprocessed API response dict |
Returns: pd.DataFrame with league and source columns always present, or raw dict if raw=True.
Raises: ValueError for invalid source. NotImplementedError for non-NHL leagues.
Example:
nhl = HockeyScraper('nhl')
# Active teams (default)
teams = nhl.scrape_teams()
# Full franchise history
franchises = nhl.scrape_teams(source='franchise')
# Records endpoint (includes logos)
records = nhl.scrape_teams(source='records')
team_stats()¶
team_stats(
team: str,
season: int | str | None = None,
session: int | str = 2,
goalies: bool = False,
raw: bool = False,
) -> pd.DataFrame | list
Returns per-player statistics for one NHL team from the NHL stats REST API.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
team |
str |
— | Team tricode (e.g. 'MTL') |
season |
int \| str \| None |
None |
Season ID (e.g. 20232024). Defaults to current season |
session |
int \| str |
2 |
Session type: 1 = pre-season, 2 = regular season, 3 = playoffs |
goalies |
bool |
False |
If True, return goalie stats; otherwise skater stats |
raw |
bool |
False |
If True, return the unprocessed API response list |
Returns: pd.DataFrame, or raw list if raw=True.
Example:
nhl = HockeyScraper('nhl')
# MTL skaters — regular season
skaters = nhl.team_stats(team='MTL', season=20232024, session=2)
# MTL goalies — playoffs
goalies = nhl.team_stats(team='MTL', season=20232024, session=3, goalies=True)
standings_by_date()¶
Returns NHL standings for a specific calendar date.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
date |
str \| None |
None |
Date in 'YYYY-MM-DD' format. Defaults to January 1st of the previous year |
raw |
bool |
False |
If True, return the unprocessed API response list |
Returns: pd.DataFrame, or raw list if raw=True.
Example:
draft()¶
draft(
year: int | str = 2024,
round: int | str = 'all',
raw: bool = False,
) -> pd.DataFrame | list
Returns NHL draft picks from the draft picks API.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
year |
int \| str |
2024 |
Draft year |
round |
int \| str |
'all' |
Round number (1–7), or 'all' for every round |
raw |
bool |
False |
If True, return the unprocessed API response list |
Returns: pd.DataFrame with one row per draft pick, or raw list if raw=True.
Example:
nhl = HockeyScraper('nhl')
# All picks
draft = nhl.draft(year=2024, round='all')
# First round only
first_round = nhl.draft(year=2023, round=1)
draft_records()¶
Returns draft records from the NHL Records API. Includes more detailed player and team information than draft().
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
year |
int \| str |
2025 |
Draft year |
raw |
bool |
False |
If True, return the unprocessed API response list |
Returns: pd.DataFrame, or raw list if raw=True.
team_draft_history()¶
Returns the complete draft history for a specific franchise.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
franchise |
int \| str |
1 |
NHL franchise ID (e.g. 1 for New Jersey Devils, 20 for Montreal Canadiens) |
raw |
bool |
False |
If True, return the unprocessed API response list |
Returns: pd.DataFrame with every draft pick ever made by the franchise, or raw list if raw=True.
Example:
goal_replay()¶
Fetches goal replay video data from a PPT replay URL. The URL comes from the pptReplayUrl column in PBP DataFrames returned by scrape_plays() or scrape_game().
Parameters:
| Parameter | Type | Description |
|---|---|---|
json_url |
str |
The pptReplayUrl value from a goal play row |
Returns: list[dict] — goal replay data objects.
NHL Analytics Pipeline¶
All analytics methods are NHL-only and raise NotImplementedError for other leagues. The recommended pipeline is:
For TOI analysis:
scrape_game() + shifts() → seconds_matrix() → strengths_by_second()
→ toi_by_strength_all() / shared_toi_teammates() / shared_toi_opponents()
build_shifts_events()¶
Converts shift data from shifts() into a tidy ON/OFF event table. Each shift becomes two rows: one SHIFT_START and one SHIFT_END. Used internally by the analytics pipeline.
Parameters:
| Parameter | Type | Description |
|---|---|---|
shifts |
pd.DataFrame |
Shift DataFrame from shifts() |
Returns: pd.DataFrame with one row per shift boundary event.
build_on_ice_long()¶
Converts the list-based home_on_ice / away_on_ice columns in a PBP DataFrame into a tidy long-format table where each row represents one player at one event.
Parameters:
| Parameter | Type | Description |
|---|---|---|
df |
pd.DataFrame |
PBP DataFrame from scrape_game() |
Returns: pd.DataFrame — long-format on-ice table (no numbered wide columns).
Example:
nhl = HockeyScraper('nhl')
pbp = nhl.scrape_game(2023020001)
long = nhl.build_on_ice_long(pbp)
print(long.columns.tolist())
build_on_ice_wide()¶
build_on_ice_wide(
df: pd.DataFrame,
max_skaters: int = 6,
include_goalie: bool = True,
drop_list_cols: bool = False,
) -> pd.DataFrame
Expands the list-based on-ice columns into named wide columns (home_skater_1 … home_skater_N, home_goalie, and equivalent for away).
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
df |
pd.DataFrame |
— | PBP DataFrame from scrape_game() |
max_skaters |
int |
6 |
Maximum number of skater columns per team |
include_goalie |
bool |
True |
Whether to add goalie columns |
drop_list_cols |
bool |
False |
If True, drop the original list columns after expansion |
Returns: pd.DataFrame — input with additional home_skater_1..N, away_skater_1..N, and goalie columns.
seconds_matrix()¶
Creates a boolean matrix where rows are players and columns are game seconds. A True value indicates that player was on the ice during that second. Used as the base for all TOI computations.
Parameters:
| Parameter | Type | Description |
|---|---|---|
df |
pd.DataFrame |
PBP DataFrame from scrape_game() |
shifts |
pd.DataFrame |
Shifts DataFrame from shifts() |
Returns: pd.DataFrame — matrix indexed by player with game-second columns.
strengths_by_second()¶
Derives the strength state (skater count for each team) for every second of the game from the on-ice matrix.
Parameters:
| Parameter | Type | Description |
|---|---|---|
matrix_df |
pd.DataFrame |
Output from seconds_matrix() |
Returns: pd.DataFrame — per-second table with home/away skater counts and strength label.
toi_by_strength_all()¶
toi_by_strength_all(
matrix_df: pd.DataFrame,
strengths_df: pd.DataFrame,
in_seconds: bool = False,
) -> pd.DataFrame
Computes total time-on-ice for every player broken down by strength state.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
matrix_df |
pd.DataFrame |
— | Output from seconds_matrix() |
strengths_df |
pd.DataFrame |
— | Output from strengths_by_second() |
in_seconds |
bool |
False |
If True, return TOI in seconds; default is minutes |
Returns: pd.DataFrame — per-player, per-strength-state TOI.
Example:
nhl = HockeyScraper('nhl')
pbp = nhl.scrape_game(2023020001)
shifts = nhl.shifts(2023020001)
matrix = nhl.seconds_matrix(pbp, shifts)
strengths = nhl.strengths_by_second(matrix)
toi = nhl.toi_by_strength_all(matrix, strengths)
print(toi.head())
shared_toi_teammates()¶
shared_toi_teammates(
matrix_df: pd.DataFrame,
strengths_df: pd.DataFrame,
in_seconds: bool = False,
) -> pd.DataFrame
Computes pairwise shared time-on-ice between teammates by strength state.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
matrix_df |
pd.DataFrame |
— | Output from seconds_matrix() |
strengths_df |
pd.DataFrame |
— | Output from strengths_by_second() |
in_seconds |
bool |
False |
Return TOI in seconds instead of minutes |
Returns: pd.DataFrame — every (player_a, player_b) teammate pair with shared TOI per strength.
shared_toi_opponents()¶
shared_toi_opponents(
matrix_df: pd.DataFrame,
strengths_df: pd.DataFrame,
in_seconds: bool = False,
) -> pd.DataFrame
Computes pairwise shared time-on-ice between players on opposing teams by strength state.
Parameters: Same as shared_toi_teammates().
Returns: pd.DataFrame — every (home_player, away_player) opponent pair with shared TOI per strength.
on_ice_stats()¶
on_ice_stats(
pbp: pd.DataFrame,
include_goalies: bool = False,
rates: bool = False,
) -> pd.DataFrame
Computes per-player, per-strength on-ice stats from a PBP DataFrame. Includes Corsi (CF/CA), Fenwick (FF/FA), shots (SF/SA), goals (GF/GA), and penalties (PF/PA).
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
pbp |
pd.DataFrame |
— | PBP DataFrame from scrape_game() |
include_goalies |
bool |
False |
Whether to include goalies in the output |
rates |
bool |
False |
Whether to add per-60-minute rate columns (e.g. CF60, CA60) |
Returns: pd.DataFrame — per-player, per-strength aggregated stats.
Example:
nhl = HockeyScraper('nhl')
pbp = nhl.scrape_game(2023020001)
stats = nhl.on_ice_stats(pbp, rates=True)
print(stats.columns.tolist())
print(stats.sort_values('CF', ascending=False).head(5))
combo_on_ice_stats()¶
combo_on_ice_stats(
pbp: pd.DataFrame,
focus_team: str,
n_team: int = 2,
m_opp: int = 0,
min_toi: int = 15,
include_goalies: bool = False,
rates: bool = False,
) -> pd.DataFrame
Computes on-ice stats for all n-player combinations on a focus team, optionally crossed against m-player combinations on the opposing team.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
pbp |
pd.DataFrame |
— | PBP DataFrame from scrape_game() |
focus_team |
str |
— | Team tricode for the team whose player combinations to compute |
n_team |
int |
2 |
Size of player combinations on the focus team |
m_opp |
int |
0 |
Size of opponent player combinations to cross against. 0 means opponent combinations are not computed |
min_toi |
int |
15 |
Minimum shared TOI in seconds to include a combination in the output |
include_goalies |
bool |
False |
Whether to include goalies in combinations |
rates |
bool |
False |
Whether to add per-60 rate columns |
Returns: pd.DataFrame — on-ice stats for each qualifying player combination.
Example:
nhl = HockeyScraper('nhl')
pbp = nhl.scrape_game(2023020001)
# home team combos
home_team = pbp['homeTeam'].iloc[0]
combos = nhl.combo_on_ice_stats(pbp, focus_team=home_team, n_team=2)
print(combos.head())
team_strength_aggregates()¶
Aggregates on-ice shot and goal stats by team and strength state. Provides a team-level view of shot share.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
pbp |
pd.DataFrame |
— | PBP DataFrame from scrape_game() |
rates |
bool |
False |
Whether to add per-60 rate columns |
Returns: pd.DataFrame — per-team, per-strength aggregated stats.
Example:
nhl = HockeyScraper('nhl')
pbp = nhl.scrape_game(2023020001)
agg = nhl.team_strength_aggregates(pbp, rates=True)
print(agg)
Functional API — scrape()¶
A thin wrapper around HockeyScraper for quick one-liner usage. Creates a scraper instance internally and delegates to the appropriate method.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
league |
str |
— | League code. Same options as HockeyScraper(league) |
data_type |
str |
'pbp' |
Data type to fetch. See table below |
**kwargs |
Forwarded to the underlying method |
Supported data_type values:
data_type |
Delegates to | Key kwargs |
|---|---|---|
'pbp' |
play_by_play() |
game_id |
'stats' |
player_stats() |
season, team, position |
'schedule' |
schedule() |
team, season |
'roster' |
roster() |
team, season |
'standings' |
standings() |
season |
'teams' |
scrape_teams() for NHL, teams_by_season() for others |
season |
'teams_by_season' |
teams_by_season() |
season |
'scrape_teams' |
scrape_teams() (NHL only) |
source |
'seasons' |
seasons() |
season_type |
Raises: ValueError for unrecognised data_type.
Example:
from scrapernhl import scrape
# PBP
pbp = scrape('ahl', 'pbp', game_id=1027781)
# Stats
skaters = scrape('ahl', 'stats', season=90, position='skaters')
nhl_mtl = scrape('nhl', 'stats', team='MTL', season=20232024, position='skaters')
# Schedule
schedule = scrape('nhl', 'schedule', team='TOR', season=20232024)
# Standings
standings = scrape('nhl', 'standings', season=20232024)
standings = scrape('ahl', 'standings')
# Roster
roster = scrape('nhl', 'roster', team='MTL', season=20232024)
# Teams
nhl_teams = scrape('nhl', 'teams')
nhl_teams = scrape('nhl', 'scrape_teams', source='records')
ahl_teams = scrape('ahl', 'teams', season=90)
# Seasons
seasons = scrape('nhl', 'seasons')
seasons = scrape('ahl', 'seasons', season_type='regular')
CLI Reference¶
The CLI is available as a Python module:
NHL Commands (top-level)¶
# Teams
python -m scrapernhl teams [-o OUTPUT] [-f FORMAT]
# Schedule (requires TEAM and SEASON)
python -m scrapernhl schedule TEAM SEASON [-o OUTPUT] [-f FORMAT]
# Standings (optional DATE in YYYY-MM-DD format)
python -m scrapernhl standings [DATE] [-o OUTPUT] [-f FORMAT]
# Roster (requires TEAM and SEASON)
python -m scrapernhl roster TEAM SEASON [-o OUTPUT] [-f FORMAT]
# Player stats (requires TEAM and SEASON)
python -m scrapernhl stats TEAM SEASON [--goalies] [--session INT] [-o OUTPUT] [-f FORMAT]
# Play-by-play (requires GAME_ID)
python -m scrapernhl game GAME_ID [-o OUTPUT] [-f FORMAT]
# Draft (requires YEAR, optional ROUND)
python -m scrapernhl draft YEAR [ROUND] [-o OUTPUT] [-f FORMAT]
Non-NHL League Commands¶
Non-NHL leagues are accessed as subcommands: ahl, pwhl, ohl, whl, qmjhl.
Available subcommands per league:
| Subcommand | Description | Key Options |
|---|---|---|
teams |
List teams | [-o] [-f] |
schedule |
Season schedule | [--season INT] [-o] [-f] |
standings |
League standings | [--season INT] [-o] [-f] |
roster |
Team roster | --team-id INT [--season INT] [-o] [-f] |
stats |
Player stats | [--season INT] [--player-type skater\|goalie] [--limit INT] [-o] [-f] |
game |
Play-by-play | GAME_ID [-o] [-f] |
bootstrap |
Raw config data | [--season TEXT] [--page-name TEXT] [--game-id INT] [-o] |
Output Formats¶
All commands support -f / --format with these options:
| Format | Description |
|---|---|
csv |
Comma-separated values (default for most commands) |
json |
JSON array |
parquet |
Apache Parquet (columnar, efficient for large data) |
excel |
Excel .xlsx |
Examples¶
# NHL examples
python -m scrapernhl standings -o nhl_standings.csv
python -m scrapernhl schedule MTL 20252026 -f json -o mtl_schedule.json
python -m scrapernhl game 2024020001 -o game_pbp.parquet -f parquet
python -m scrapernhl draft 2024 -o draft_2024.csv
# AHL examples
python -m scrapernhl ahl standings
python -m scrapernhl ahl standings -o ahl_standings.csv
python -m scrapernhl ahl stats --season 90 -f parquet -o ahl_skaters.parquet
python -m scrapernhl ahl stats --player-type goalie --season 90
python -m scrapernhl ahl game 1027781 -o pbp.csv
python -m scrapernhl ahl bootstrap --season 90
# PWHL / OHL / WHL / QMJHL
python -m scrapernhl pwhl standings
python -m scrapernhl ohl stats --player-type skater
python -m scrapernhl whl game 1022126
python -m scrapernhl qmjhl standings -f json
Legacy NHL Scraper — scraper_legacy.py¶
scraper_legacy.py is the original NHL analytics engine. It remains fully functional in 0.3.x and contains all heavy per-player and per-combination stat functions that have not yet been ported to the new module structure.
All functions are accessible via lazy import through scrapernhl.nhl.scraper (no heavy dependencies are loaded until you actually call a legacy function) or by importing scrapernhl.nhl.scraper_legacy directly.
Note
These functions are NHL-only. They operate on DataFrames produced by scrape_game(). For non-NHL leagues use HockeyScraper('<league>').play_by_play().
Importing legacy functions¶
# Option A — via the nhl scraper module (lazy, preferred)
from scrapernhl.nhl.scraper import (
scrape_game,
on_ice_stats_by_player_strength,
toi_by_player_and_strength,
combo_on_ice_stats,
combo_on_ice_stats_both_teams,
team_strength_aggregates,
)
# Option B — direct import (loads xgboost / polars / numpy immediately)
from scrapernhl.nhl.scraper_legacy import scrape_game
# Option C — via HockeyScraper convenience wrappers (recommended for new code)
from scrapernhl import HockeyScraper
nhl = HockeyScraper('nhl')
pbp = nhl.scrape_game(2024020001)
Core PBP functions¶
| Function | Signature | Description |
|---|---|---|
scrape_game() |
scrape_game(game_id, *, shifts=True, html_pbp=True, output_format='pandas') -> pd.DataFrame |
Full enriched PBP with shifts, strengths, and on-ice players. The main entry point for all analytics. |
scrape_game_async() |
scrape_game_async(game_id, ...) -> pd.DataFrame |
Async version of scrape_game(). |
scrapePlays() |
scrapePlays(game, output_format='pandas') -> pd.DataFrame |
JSON PBP only (no HTML enrichment). Faster but less complete. |
getGameData() |
getGameData(game, addGoalReplayData=False) -> dict |
Raw game JSON from the NHL API. |
scrape_html_pbp() |
scrape_html_pbp(game_id, return_raw=False) -> pd.DataFrame |
Parse HTML play-by-play sheet. Used internally by scrape_game(). |
scrape_shifts() |
scrape_shifts(game_id) -> pd.DataFrame |
Parse HTML shift chart into a tidy shift DataFrame. |
getGoalReplayData() |
getGoalReplayData(json_url) -> dict |
Fetch goal-replay sprite frame data. |
Shift & strength pipeline¶
These functions form the low-level strength/TOI backbone used by the higher-level stat functions.
| Function | Description |
|---|---|
build_shifts_events(shifts) |
Convert shift DataFrame into ON/OFF boundary events table. |
add_strengths_to_shifts_events(shifts_events, strengths_df) |
Annotate each ON/OFF event with the strength state at that second. |
build_strength_segments_from_shifts(shifts) |
Build contiguous strength segments from the shift chart. |
strengths_by_second_from_segments(segments) |
Turn segments into a per-second strength lookup Series. |
seconds_matrix(df, shifts) |
Build a player × second presence matrix (input to TOI math). |
strengths_by_second(matrix_df) |
Map each column (second) of the presence matrix to a strength label. |
toi_by_strength_all(matrix_df, strengths_df) |
Total TOI for every player × strength combination. |
On-ice stats & TOI¶
| Function | Signature | Description |
|---|---|---|
toi_by_strength(pbp_change_events) |
toi_by_strength(df) -> pd.DataFrame |
Per-team TOI broken out by EV / PP / PK / other. |
toi_by_player_and_strength(pbp_change_events) |
toi_by_player_and_strength(df) -> pd.DataFrame |
Per-player TOI split by strength state from the player's team perspective (fixed in 0.3.2 for the alphabetically second team). |
on_ice_stats_by_player_strength(pbp, team) |
on_ice_stats_by_player_strength(pbp, team) -> pd.DataFrame |
Corsi / Fenwick / goals for every player × strength, from team's perspective. |
build_on_ice_long(df) |
build_on_ice_long(df) -> pd.DataFrame |
Expand the compact on-ice player columns to long format (one row per player per event). |
build_on_ice_wide(df, ...) |
build_on_ice_wide(df, ...) -> pd.DataFrame |
Pivot long on-ice data back to wide format for downstream modeling. |
shared_toi_teammates_by_strength(...) |
— | Pairwise shared TOI for all teammate pairs at each strength. |
shared_toi_opponents_by_strength(...) |
— | Pairwise shared TOI for all opponent pairs at each strength. |
Combination stats¶
| Function | Description |
|---|---|
combo_on_ice_stats(pbp, focus_team) |
Corsi / Fenwick / goals for every N-player combination from focus_team's perspective. Strength labels use focus_team's perspective (fixed in 0.3.2). |
combo_on_ice_stats_both_teams(pbp) |
Runs combo_on_ice_stats() for home and away and concatenates results. Both teams now get correct perspective labels (fixed in 0.3.2). |
combos_teammates_by_strength(...) |
Pairwise teammate combination TOI by strength. |
combos_opponents_by_strength(...) |
Pairwise opponent combination TOI by strength. |
combo_toi_by_strength(...) |
Aggregate TOI for multi-player combinations at each strength. |
combo_shot_metrics_by_strength(...) |
Corsi / Fenwick / xG for multi-player combinations. |
Team aggregates & expected goals¶
| Function | Description |
|---|---|
team_strength_aggregates(pbp) |
Per-team Corsi, Fenwick, goals, TOI, and xG split by strength state. Returns one row per team per strength. Strength labels from each team's perspective (fixed in 0.3.2). |
build_shots_design_matrix(pbp_df) |
Build the feature matrix used by the xG model. |
predict_xg_for_pbp(pbp_df) |
Add an xG column using the bundled XGBoost model. Requires pip install scrapernhl[analytics]. |
engineer_xg_features(pbp_df) |
(deprecated) Feature engineering step — use build_shots_design_matrix() + predict_xg_for_pbp() instead. |
pipeline(game_id) |
Convenience wrapper: scrape → parse → compute on-ice stats in one call. |
Strength Notation¶
To distinguish empty-net situations from non-empty-net situations in on-ice stats, the scraper uses a * suffix to mark the team with the empty net.
| Notation | Meaning |
|---|---|
5v5 |
Five skaters vs. five skaters, both goalies in net |
5v4 |
Five skaters (power play) vs. four skaters (penalty kill), both goalies in net |
5v6* |
Five skaters vs. six skaters with the net empty (opposing goalie pulled) |
5*v4 |
Five skaters with their net empty vs. four skaters with a goalie in net |
4v4 |
Four-on-four (double minor / concurrent penalties) |
The * always marks the team whose goalie is not in the net. This notation appears in the detailedGameStrength column of scrape_game() output and in the on_ice_stats() / combo_on_ice_stats() strength columns.