California Weather and Fire Prediction Dataset (1984–2025) with Engineered Features

  • Cemil Emre Yavas (Creator)
  • Christopher Kadlec (Creator)
  • Jongyeop Kim (Creator)
  • Lei Chen (Creator)

Dataset

Description

Description: This dataset provides a comprehensive compilation of weather observations and wildfire data in California from 1984 to 2025. Designed for researchers and practitioners, it integrates meteorological data from NOAA Climate Data Online with fire incident data from CAL FIRE. The dataset includes engineered features that enhance predictive modeling capabilities, making it suitable for wildfire prediction and analysis tasks. Dataset Contents: The dataset consists of daily records with the following fields: DATE: The date of the observation. PRECIPITATION: Daily precipitation in inches. MAX_TEMP: Maximum daily temperature in degrees Fahrenheit. MIN_TEMP: Minimum daily temperature in degrees Fahrenheit. AVG_WIND_SPEED: Average daily wind speed in mph. FIRE_START_DAY: A binary indicator (True/False) showing whether a wildfire started on that date. YEAR: The year of the observation. TEMP_RANGE: The difference between maximum and minimum temperatures, indicating daily temperature variability. WIND_TEMP_RATIO: The ratio of average wind speed to maximum temperature, capturing wind-temperature dynamics. MONTH: The calendar month of the observation (1–12). SEASON: The season of the observation (Winter, Spring, Summer, Fall). LAGGED_PRECIPITATION: Cumulative precipitation over the preceding 7 days, reflecting recent moisture conditions. LAGGED_AVG_WIND_SPEED: Average wind speed over the preceding 7 days, indicating sustained wind patterns. DAY_OF_YEAR: The numeric day within the year (1–365/366). Potential Applications: Wildfire Prediction: The dataset supports machine learning models for predicting fire start days, enabling proactive wildfire management strategies. Environmental Analysis: Researchers can study the relationship between meteorological variables and wildfire dynamics. Seasonal Trends Analysis: Temporal features allow for insights into seasonal patterns of wildfires. Model Benchmarking: The dataset is ideal for testing and benchmarking predictive algorithms, including Random Forest, XGBoost, and other machine learning methods. Climate Impact Studies: It can be used to analyze how climate variability influences fire risk over time. Target Audience: This dataset is suitable for data scientists, environmental researchers, wildfire management professionals, and machine learning practitioners seeking to explore the interplay between weather conditions and wildfire occurrences. Usage: The dataset is in CSV format and is ready for use in Python, R, MATLAB, or other data analysis tools. It requires no additional preprocessing and is accompanied by clear, descriptive variable names.
Date made availableJan 21 2025
PublisherZENODO

Cite this