Files
test/power2.MD
2025-12-07 02:07:12 +00:00

4.9 KiB

-- coding: utf-8 --

This Markdown document details the architecture, configuration, and maintenance procedures for the newpower2.py script.

Power Outage Scraper (newpower2.py) Specification

1. Overview

The newpower2.py script is a modular, extensible Python application designed to scrape, normalize, and store high-resolution power outage data from multiple utility providers. It supports distinct scraping strategies (Kubra Storm Center, Simple JSON) and persists standardized data into a PostGIS-enabled PostgreSQL database.


2. Architecture

The script utilizes a Strategy Pattern to handle different utility API formats while maintaining a unified execution pipeline.

[Image of strategy design pattern class diagram]

Core Components

  1. Provider Registry: A configuration list (PROVIDERS) that defines every utility to be scraped.
  2. Provider Classes:
    • BaseProvider (Abstract): Defines the blueprint for all scrapers.
    • KubraProvider: Implements recursive quadkey drilling for Kubra maps.
    • SimpleJsonProvider: Implements flat list parsing for standard JSON APIs.
  3. Database Handler (PowerDB): Manages connection pooling, upsert operations, and post-processing SQL tasks.
  4. Main Loop: Iterates through the registry, instantiates the correct provider class based on the type field, and executes the fetch.

3. Database Schema (newpower)

The script writes to a table named newpower. Ensure PostGIS is enabled (CREATE EXTENSION postgis;).

Column Type Description
id SERIAL PRIMARY KEY Auto-incrementing unique ID.
incidentid TEXT Utility-assigned ID (or synthetic composite).
utility TEXT Name of the utility (e.g., 'AEP-WV').
lat / lon FLOAT Coordinates of the outage.
pointgeom TEXT UNIQUE Encoded polyline or coordinate string. Used for deduplication.
geom GEOMETRY(Point, 4326) PostGIS Point object for spatial queries.
areageom TEXT Encoded polyline for outage areas (if available).
realareageom GEOMETRY(LineString) PostGIS LineString decoded from areageom.
outagen INTEGER Current number of customers affected.
peakoutage INTEGER Max customers affected (tracked over time).
start_time TIMESTAMPTZ Reported start time.
etr TIMESTAMPTZ Estimated Time of Restoration.
active BOOLEAN TRUE if currently active, FALSE if restored.
fetch_time TIMESTAMPTZ Timestamp of the last successful scrape.

Post-Processing

After every run, the script executes SQL to:

  • Enrich Data: Spatial joins with county and fzone tables to populate county, state, and cwa columns.
  • Decode Geometry: Converts areageom strings into realareageom PostGIS objects.
  • Update Status: Sets active = FALSE for records not seen in the last 30 minutes.
  • Cleanup: deletes records older than 365 days.

4. Configuration

All utility configurations are centralized in the PROVIDERS list at the top of the script.

Adding a "Simple JSON" Provider

Used for utilities that return a flat list of outages (e.g., South Central Power). """

{ 'name': 'Utility Name', 'type': 'simple_json', 'url': 'https://example.com/data/outages.json' }

"""### Adding a "Kubra" Provider

Used for utilities utilizing Kubra Storm Center maps (e.g., AEP, FirstEnergy). """

{ 'name': 'AEP-WV', 'type': 'kubra', 'meta_url': 'https://kubra.io/.../currentState?preview=false', # The 'Current State' URL 'layer': 'cluster-2', # Found via get_kubra_config.py 'quadkeys': ['0320001', '0320003'] # Generated via generate_keys.py }

"""-----

5. Helper Scripts

Two utility scripts assist in configuring new Kubra providers:

A. get_kubra_config.py

Purpose: Finds the hidden Cluster Layer ID and Hex Keys.

  • Input: The "Current State" URL found in the browser Network tab.
  • Output: The layer ID (e.g., cluster-1) and constructed base URL.

B. generate_keys.py

Purpose: Generates the list of starting map tiles (quadkeys) for a specific region.

  • Input: A WKT (Well-Known Text) geometry string from your database.
    • SQL: SELECT ST_AsText(geom) FROM county WHERE ...
  • Output: A Python list of strings ['032...', '032...'].

6. Extensibility

To add support for a new API format (e.g., XML):

  1. Define Class: Create a class inheriting from BaseProvider.
  2. Implement fetch(): Write logic to download and normalize data into the standard dictionary format.
  3. Register: Add the class to PROVIDER_REGISTRY.

"""

class XmlProvider(BaseProvider): def fetch(self): # ... XML parsing logic ... return [standardized_outages]

Register it

PROVIDER_REGISTRY = { 'kubra': KubraProvider, 'simple_json': SimpleJsonProvider, 'xml': XmlProvider }