4.9 KiB
-- coding: utf-8 --
This Markdown document details the architecture, configuration, and maintenance procedures for the newpower2.py script.
Power Outage Scraper (newpower2.py) Specification
1. Overview
The newpower2.py script is a modular, extensible Python application designed to scrape, normalize, and store high-resolution power outage data from multiple utility providers. It supports distinct scraping strategies (Kubra Storm Center, Simple JSON) and persists standardized data into a PostGIS-enabled PostgreSQL database.
2. Architecture
The script utilizes a Strategy Pattern to handle different utility API formats while maintaining a unified execution pipeline.
[Image of strategy design pattern class diagram]
Core Components
- Provider Registry: A configuration list (
PROVIDERS) that defines every utility to be scraped. - Provider Classes:
BaseProvider(Abstract): Defines the blueprint for all scrapers.KubraProvider: Implements recursive quadkey drilling for Kubra maps.SimpleJsonProvider: Implements flat list parsing for standard JSON APIs.
- Database Handler (
PowerDB): Manages connection pooling, upsert operations, and post-processing SQL tasks. - Main Loop: Iterates through the registry, instantiates the correct provider class based on the
typefield, and executes the fetch.
3. Database Schema (newpower)
The script writes to a table named newpower. Ensure PostGIS is enabled (CREATE EXTENSION postgis;).
| Column | Type | Description |
|---|---|---|
id |
SERIAL PRIMARY KEY |
Auto-incrementing unique ID. |
incidentid |
TEXT |
Utility-assigned ID (or synthetic composite). |
utility |
TEXT |
Name of the utility (e.g., 'AEP-WV'). |
lat / lon |
FLOAT |
Coordinates of the outage. |
pointgeom |
TEXT UNIQUE |
Encoded polyline or coordinate string. Used for deduplication. |
geom |
GEOMETRY(Point, 4326) |
PostGIS Point object for spatial queries. |
areageom |
TEXT |
Encoded polyline for outage areas (if available). |
realareageom |
GEOMETRY(LineString) |
PostGIS LineString decoded from areageom. |
outagen |
INTEGER |
Current number of customers affected. |
peakoutage |
INTEGER |
Max customers affected (tracked over time). |
start_time |
TIMESTAMPTZ |
Reported start time. |
etr |
TIMESTAMPTZ |
Estimated Time of Restoration. |
active |
BOOLEAN |
TRUE if currently active, FALSE if restored. |
fetch_time |
TIMESTAMPTZ |
Timestamp of the last successful scrape. |
Post-Processing
After every run, the script executes SQL to:
- Enrich Data: Spatial joins with
countyandfzonetables to populatecounty,state, andcwacolumns. - Decode Geometry: Converts
areageomstrings intorealareageomPostGIS objects. - Update Status: Sets
active = FALSEfor records not seen in the last 30 minutes. - Cleanup: deletes records older than 365 days.
4. Configuration
All utility configurations are centralized in the PROVIDERS list at the top of the script.
Adding a "Simple JSON" Provider
Used for utilities that return a flat list of outages (e.g., South Central Power). """
{ 'name': 'Utility Name', 'type': 'simple_json', 'url': 'https://example.com/data/outages.json' }
"""### Adding a "Kubra" Provider
Used for utilities utilizing Kubra Storm Center maps (e.g., AEP, FirstEnergy). """
{ 'name': 'AEP-WV', 'type': 'kubra', 'meta_url': 'https://kubra.io/.../currentState?preview=false', # The 'Current State' URL 'layer': 'cluster-2', # Found via get_kubra_config.py 'quadkeys': ['0320001', '0320003'] # Generated via generate_keys.py }
"""-----
5. Helper Scripts
Two utility scripts assist in configuring new Kubra providers:
A. get_kubra_config.py
Purpose: Finds the hidden Cluster Layer ID and Hex Keys.
- Input: The "Current State" URL found in the browser Network tab.
- Output: The
layerID (e.g.,cluster-1) and constructed base URL.
B. generate_keys.py
Purpose: Generates the list of starting map tiles (quadkeys) for a specific region.
- Input: A WKT (Well-Known Text) geometry string from your database.
- SQL:
SELECT ST_AsText(geom) FROM county WHERE ...
- SQL:
- Output: A Python list of strings
['032...', '032...'].
6. Extensibility
To add support for a new API format (e.g., XML):
- Define Class: Create a class inheriting from
BaseProvider. - Implement
fetch(): Write logic to download and normalize data into the standard dictionary format. - Register: Add the class to
PROVIDER_REGISTRY.
"""
class XmlProvider(BaseProvider): def fetch(self): # ... XML parsing logic ... return [standardized_outages]
Register it
PROVIDER_REGISTRY = { 'kubra': KubraProvider, 'simple_json': SimpleJsonProvider, 'xml': XmlProvider }