API Reference¶
ScraperNHL API documentation. The scraper is split into modules for easier use and faster imports.
Quick Start¶
Python API¶
Modular Imports (Recommended)¶
from scrapernhl.scrapers.teams import scrapeTeams
from scrapernhl.scrapers.schedule import scrapeSchedule
teams = scrapeTeams()
schedule = scrapeSchedule("MTL", "20252026")
Legacy Imports (Still Supported)¶
from scrapernhl import scrapeTeams, scrapeSchedule
teams = scrapeTeams()
schedule = scrapeSchedule("MTL", "20252026")
Command-Line Interface¶
All scraping functions are also available via CLI for quick data exports:
# View available commands
python scrapernhl/cli.py --help
# Scrape teams
python scrapernhl/cli.py teams --output teams.csv
# Scrape schedule
python scrapernhl/cli.py schedule MTL 20252026 --format json
# Scrape with different output formats
python scrapernhl/cli.py standings --format parquet
See the CLI Examples page for comprehensive CLI usage.
IMPORTANT NOTE ABOUT GAME STRENGTHS¶
In order to differenciate empty-net situtations to non-empty net situations, there the scraper incorporates a unique format. It uses the star symbol to indentify the team with an empty net.
If you ever see 5v6*, it means that the player or team is facing a team that has 6 skaters and the net empty.
If you see 5*v4, it means that the player or team has 5 skaters and the net empty while facing a team with 4 skaters with a goalie.
Scrapers Module (scrapernhl.scrapers)¶
Teams (scrapernhl.scrapers.teams)¶
getTeamsData(source: str = "default") -> List[Dict]¶
Scrapes raw NHL team data from various public endpoints.
Parameters:
- source (str): Data source to use. Options: "default", "calendar", "records"
Returns:
- List[Dict]: Raw team data with metadata
Example:
scrapeTeams(source: str = "default", output_format: str = "pandas") -> DataFrame¶
Scrapes NHL team data and returns as DataFrame.
Parameters:
- source (str): Data source. Options: "default", "calendar", "records"
- output_format (str): Output format. Options: "pandas", "polars"
Returns:
- pd.DataFrame or pl.DataFrame: Team data with metadata
Example:
from scrapernhl.scrapers.teams import scrapeTeams
# Get as pandas DataFrame
teams = scrapeTeams()
# Get as polars DataFrame
teams_pl = scrapeTeams(output_format="polars")
Schedule (scrapernhl.scrapers.schedule)¶
getScheduleData(team: str = "MTL", season: Union[str, int] = "20252026") -> List[Dict]¶
Scrapes raw NHL schedule data for a team and season.
Parameters:
- team (str): Team abbreviation (e.g., "MTL", "TOR", "BOS")
- season (str | int): Season ID (e.g., "20252026" or 20252026)
Returns:
- List[Dict]: Raw schedule records
Example:
from scrapernhl.scrapers.schedule import getScheduleData
raw_schedule = getScheduleData("MTL", "20252026")
scrapeSchedule(team: str = "MTL", season: Union[str, int] = "20252026", output_format: str = "pandas") -> DataFrame¶
Scrapes NHL schedule data for a team and season.
Parameters:
- team (str): Team abbreviation
- season (str | int): Season ID
- output_format (str): Output format. Options: "pandas", "polars"
Returns:
- pd.DataFrame or pl.DataFrame: Schedule data
Example:
from scrapernhl.scrapers.schedule import scrapeSchedule
# Montreal Canadiens 2025-26 season
schedule = scrapeSchedule("MTL", "20252026")
Standings (scrapernhl.scrapers.standings)¶
getStandingsData(date: str = None) -> List[Dict]¶
Scrapes raw NHL standings data for a specific date.
Parameters:
- date (str, optional): Date in 'YYYY-MM-DD' format. Defaults to previous year's January 1st
Returns:
- List[Dict]: Raw standings records
Example:
from scrapernhl.scrapers.standings import getStandingsData
raw_standings = getStandingsData("2025-01-01")
scrapeStandings(date: str = None, output_format: str = "pandas") -> DataFrame¶
Scrapes NHL standings data for a specific date.
Parameters:
- date (str, optional): Date in 'YYYY-MM-DD' format
- output_format (str): Output format. Options: "pandas", "polars"
Returns:
- pd.DataFrame or pl.DataFrame: Standings data
Example:
Roster (scrapernhl.scrapers.roster)¶
getRosterData(team: str = "MTL", season: Union[str, int] = "20242025") -> List[Dict]¶
Scrapes raw NHL roster data for a team and season.
Parameters:
- team (str): Team abbreviation
- season (str | int): Season ID
Returns:
- List[Dict]: Raw roster records (forwards, defensemen, goalies)
Example:
scrapeRoster(team: str = "MTL", season: Union[str, int] = "20242025", output_format: str = "pandas") -> DataFrame¶
Scrapes NHL roster data for a team and season.
Parameters:
- team (str): Team abbreviation
- season (str | int): Season ID
- output_format (str): Output format
Returns:
- pd.DataFrame or pl.DataFrame: Roster data
Example:
Stats (scrapernhl.scrapers.stats)¶
getTeamStatsData(team: str = "MTL", season: Union[str, int] = "20252026", session: Union[str, int] = 2, goalies: bool = False) -> List[Dict]¶
Scrapes raw NHL team statistics.
Parameters:
- team (str): Team abbreviation
- season (str | int): Season ID
- session (str | int): Session type - 1: pre-season, 2: regular season, 3: playoffs
- goalies (bool): If True, fetch goalie stats; if False, fetch skater stats
Returns:
- List[Dict]: Raw statistics records
Example:
from scrapernhl.scrapers.stats import getTeamStatsData
# Get skater stats
skater_stats = getTeamStatsData("MTL", "20252026", session=2, goalies=False)
# Get goalie stats
goalie_stats = getTeamStatsData("MTL", "20252026", session=2, goalies=True)
scrapeTeamStats(team: str = "MTL", season: Union[str, int] = "20252026", session: Union[str, int] = 2, goalies: bool = False, output_format: str = "pandas") -> DataFrame¶
Scrapes NHL team statistics.
Parameters:
- team (str): Team abbreviation
- season (str | int): Season ID
- session (str | int): Session type
- goalies (bool): Fetch goalie stats vs skater stats
- output_format (str): Output format
Returns:
- pd.DataFrame or pl.DataFrame: Team statistics
Example:
from scrapernhl.scrapers.stats import scrapeTeamStats
stats = scrapeTeamStats("MTL", "20252026", session=2)
Draft (scrapernhl.scrapers.draft)¶
getDraftDataData(year: Union[str, int] = "2024", round: Union[str, int] = "all") -> List[Dict]¶
Scrapes raw NHL draft data.
Parameters:
- year (str | int): Draft year (e.g., "2024")
- round (str | int): Round number or "all" for all rounds
Returns:
- List[Dict]: Raw draft records
scrapeDraftData(year: Union[str, int] = "2024", round: Union[str, int] = "all", output_format: str = "pandas") -> DataFrame¶
Scrapes NHL draft data.
Parameters:
- year (str | int): Draft year
- round (str | int): Round number or "all"
- output_format (str): Output format
Returns:
- pd.DataFrame or pl.DataFrame: Draft data
Example:
from scrapernhl.scrapers.draft import scrapeDraftData
# All rounds
draft = scrapeDraftData("2024", "all")
# First round only
first_round = scrapeDraftData("2024", 1)
getRecordsDraftData(year: Union[str, int] = "2025") -> List[Dict]¶
Scrapes draft records from NHL Records API.
scrapeDraftRecords(year: Union[str, int] = "2025", output_format: str = "pandas") -> DataFrame¶
Scrapes draft records from NHL Records API.
getRecordsTeamDraftHistoryData(franchise: Union[str, int] = 1) -> List[Dict]¶
Scrapes team draft history for a franchise.
Parameters:
- franchise (str | int): Franchise ID
scrapeTeamDraftHistory(franchise: Union[str, int] = 1, output_format: str = "pandas") -> DataFrame¶
Scrapes team draft history.
Example:
from scrapernhl.scrapers.draft import scrapeTeamDraftHistory
# Montreal Canadiens (franchise ID 1)
mtl_draft_history = scrapeTeamDraftHistory(1)
Games (scrapernhl.scrapers.games)¶
getGameData(game: Union[str, int], addGoalReplayData: bool = False) -> Dict¶
Scrapes NHL play-by-play data for a game.
Parameters:
- game (str | int): Game ID (e.g., 2024020001)
- addGoalReplayData (bool): If True, fetch goal replay data
Returns:
- Dict: Complete game data with enriched plays
Example:
from scrapernhl.scrapers.games import getGameData
game_data = getGameData(2024020001, addGoalReplayData=True)
scrapePlays(game: Union[str, int], addGoalReplayData: bool = False, output_format: str = "pandas") -> DataFrame¶
Scrapes play-by-play data for a game.
Parameters:
- game (str | int): Game ID
- addGoalReplayData (bool): Include goal replay data
- output_format (str): Output format
Returns:
- pd.DataFrame or pl.DataFrame: Play-by-play data
Example:
getGoalReplayData(json_url: str) -> List[Dict]¶
Fetches goal replay data from a JSON URL.
Parameters:
- json_url (str): URL to goal replay JSON
Returns:
- List[Dict]: Goal replay data
convert_json_to_goal_url(json_url: str) -> str¶
Converts a JSON URL to NHL goal replay URL.
Parameters:
- json_url (str): JSON URL
Returns:
- str: Goal replay URL
Core Utilities (scrapernhl.core)¶
HTTP (scrapernhl.core.http)¶
fetch_json(url: str, timeout: int = 10) -> dict¶
Fetches JSON data from a URL with retry logic.
Parameters:
- url (str): URL to fetch
- timeout (int): Request timeout in seconds
Returns:
- dict: Parsed JSON response
Raises:
- requests.exceptions.RequestException: If request fails
Example:
fetch_html(url: str, timeout: int = 10) -> Optional[str]¶
Fetches HTML content from a URL.
Parameters:
- url (str): URL to fetch
- timeout (int): Request timeout in seconds
Returns:
- str or None: HTML content, or None if request fails
fetch_html_async(url: str, timeout: int = 10) -> Optional[str]¶
Async wrapper around fetch_html.
fetch_json_async(url: str, timeout: int = 10) -> dict¶
Async wrapper around fetch_json.
Utils (scrapernhl.core.utils)¶
time_str_to_seconds(time_str: Optional[str]) -> Optional[int]¶
Converts time string in 'MM:SS' format to total seconds.
Parameters:
- time_str (str): Time string (e.g., "05:30")
Returns:
- int or None: Total seconds
Example:
from scrapernhl.core.utils import time_str_to_seconds
seconds = time_str_to_seconds("05:30") # Returns 330
json_normalize(data: List[Dict], output_format: str = "pandas") -> DataFrame¶
Normalizes nested JSON data to flat table.
Parameters:
- data (List[Dict]): List of dictionaries
- output_format (str): "pandas" or "polars"
Returns:
- pd.DataFrame or pl.DataFrame: Normalized data
Configuration (scrapernhl.config)¶
Constants¶
from scrapernhl.config import (
DEFAULT_TEAM, # "MTL"
DEFAULT_SEASON, # "20252026"
DEFAULT_DATE, # "2025-11-11"
DEFAULT_HEADERS, # HTTP headers dict
DEFAULT_TIMEOUT, # 10 seconds
)
API Endpoints¶
from scrapernhl.config import (
NHL_API_BASE_URL,
NHL_API_BASE_URL_V1,
STANDINGS_ENDPOINT,
TEAM_SCHEDULE_ENDPOINT,
FRANCHISES_ENDPOINT,
)
Legacy Functions¶
Advanced functions (PBP parsing, feature engineering, analytics) are available via lazy loading from scraper_legacy.py. These load only when accessed:
from scrapernhl import (
scrape_game,
engineer_xg_features,
build_on_ice_wide,
toi_by_strength,
combo_on_ice_stats,
)
Note: These functions require additional dependencies (xgboost, etc.) and are being modularized in Phase 2.
Complete Example¶
from scrapernhl.scrapers.teams import scrapeTeams
from scrapernhl.scrapers.schedule import scrapeSchedule
from scrapernhl.scrapers.standings import scrapeStandings
from scrapernhl.scrapers.games import scrapePlays
# Get all teams
teams = scrapeTeams()
print(f"Found {len(teams)} teams")
# Get Montreal Canadiens schedule
mtl_schedule = scrapeSchedule("MTL", "20252026")
print(f"MTL has {len(mtl_schedule)} games")
# Get current standings
standings = scrapeStandings("2025-01-01")
print(f"Standings for {len(standings)} teams")
# Get play-by-play for a specific game
pbp = scrapePlays(2024020001)
print(f"Game has {len(pbp)} events")
Performance Tips¶
-
Use modular imports for faster loading:
-
Choose output format based on your needs:
-
Cache results when scraping multiple times:
For more information, see: - Getting Started Guide - Examples - Modularization Guide