stoat/test

Files

John Peck a6a69f47f8 initial powe2 change

2025-12-07 02:07:12 +00:00

4.9 KiB

Raw Blame History

-- coding: utf-8 --

This Markdown document details the architecture, configuration, and maintenance procedures for the newpower2.py script.

Power Outage Scraper (`newpower2.py`) Specification

1. Overview

The newpower2.py script is a modular, extensible Python application designed to scrape, normalize, and store high-resolution power outage data from multiple utility providers. It supports distinct scraping strategies (Kubra Storm Center, Simple JSON) and persists standardized data into a PostGIS-enabled PostgreSQL database.

2. Architecture

The script utilizes a Strategy Pattern to handle different utility API formats while maintaining a unified execution pipeline.

[Image of strategy design pattern class diagram]

Core Components

Provider Registry: A configuration list (PROVIDERS) that defines every utility to be scraped.
Provider Classes:
- BaseProvider (Abstract): Defines the blueprint for all scrapers.
- KubraProvider: Implements recursive quadkey drilling for Kubra maps.
- SimpleJsonProvider: Implements flat list parsing for standard JSON APIs.
Database Handler (PowerDB): Manages connection pooling, upsert operations, and post-processing SQL tasks.
Main Loop: Iterates through the registry, instantiates the correct provider class based on the type field, and executes the fetch.

3. Database Schema (`newpower`)

The script writes to a table named newpower. Ensure PostGIS is enabled (CREATE EXTENSION postgis;).

Column	Type	Description
`id`	`SERIAL PRIMARY KEY`	Auto-incrementing unique ID.
`incidentid`	`TEXT`	Utility-assigned ID (or synthetic composite).
`utility`	`TEXT`	Name of the utility (e.g., 'AEP-WV').
`lat` / `lon`	`FLOAT`	Coordinates of the outage.
`pointgeom`	`TEXT UNIQUE`	Encoded polyline or coordinate string. Used for deduplication.
`geom`	`GEOMETRY(Point, 4326)`	PostGIS Point object for spatial queries.
`areageom`	`TEXT`	Encoded polyline for outage areas (if available).
`realareageom`	`GEOMETRY(LineString)`	PostGIS LineString decoded from `areageom`.
`outagen`	`INTEGER`	Current number of customers affected.
`peakoutage`	`INTEGER`	Max customers affected (tracked over time).
`start_time`	`TIMESTAMPTZ`	Reported start time.
`etr`	`TIMESTAMPTZ`	Estimated Time of Restoration.
`active`	`BOOLEAN`	`TRUE` if currently active, `FALSE` if restored.
`fetch_time`	`TIMESTAMPTZ`	Timestamp of the last successful scrape.

Post-Processing

After every run, the script executes SQL to:

Enrich Data: Spatial joins with county and fzone tables to populate county, state, and cwa columns.
Decode Geometry: Converts areageom strings into realareageom PostGIS objects.
Update Status: Sets active = FALSE for records not seen in the last 30 minutes.
Cleanup: deletes records older than 365 days.

4. Configuration

All utility configurations are centralized in the PROVIDERS list at the top of the script.

Adding a "Simple JSON" Provider

Used for utilities that return a flat list of outages (e.g., South Central Power). """

{ 'name': 'Utility Name', 'type': 'simple_json', 'url': 'https://example.com/data/outages.json' }

"""### Adding a "Kubra" Provider

Used for utilities utilizing Kubra Storm Center maps (e.g., AEP, FirstEnergy). """

{ 'name': 'AEP-WV', 'type': 'kubra', 'meta_url': 'https://kubra.io/.../currentState?preview=false', # The 'Current State' URL 'layer': 'cluster-2', # Found via get_kubra_config.py 'quadkeys': ['0320001', '0320003'] # Generated via generate_keys.py }

"""-----

5. Helper Scripts

Two utility scripts assist in configuring new Kubra providers:

A. `get_kubra_config.py`

Purpose: Finds the hidden Cluster Layer ID and Hex Keys.

Input: The "Current State" URL found in the browser Network tab.
Output: The layer ID (e.g., cluster-1) and constructed base URL.

B. `generate_keys.py`

Purpose: Generates the list of starting map tiles (quadkeys) for a specific region.

Input: A WKT (Well-Known Text) geometry string from your database.
- SQL: SELECT ST_AsText(geom) FROM county WHERE ...
Output: A Python list of strings ['032...', '032...'].

6. Extensibility

To add support for a new API format (e.g., XML):

Define Class: Create a class inheriting from BaseProvider.
Implement fetch(): Write logic to download and normalize data into the standard dictionary format.
Register: Add the class to PROVIDER_REGISTRY.

"""

class XmlProvider(BaseProvider): def fetch(self): # ... XML parsing logic ... return [standardized_outages]

Register it

PROVIDER_REGISTRY = { 'kubra': KubraProvider, 'simple_json': SimpleJsonProvider, 'xml': XmlProvider }

4.9 KiB Raw Blame History