# -*- coding: utf-8 -*- This Markdown document details the architecture, configuration, and maintenance procedures for the `newpower2.py` script. # Power Outage Scraper (`newpower2.py`) Specification ## 1\. Overview The `newpower2.py` script is a modular, extensible Python application designed to scrape, normalize, and store high-resolution power outage data from multiple utility providers. It supports distinct scraping strategies (Kubra Storm Center, Simple JSON) and persists standardized data into a PostGIS-enabled PostgreSQL database. ----- ## 2\. Architecture The script utilizes a **Strategy Pattern** to handle different utility API formats while maintaining a unified execution pipeline. [Image of strategy design pattern class diagram] ### Core Components 1. **Provider Registry:** A configuration list (`PROVIDERS`) that defines every utility to be scraped. 2. **Provider Classes:** * `BaseProvider` (Abstract): Defines the blueprint for all scrapers. * `KubraProvider`: Implements recursive quadkey drilling for Kubra maps. * `SimpleJsonProvider`: Implements flat list parsing for standard JSON APIs. 3. **Database Handler (`PowerDB`):** Manages connection pooling, upsert operations, and post-processing SQL tasks. 4. **Main Loop:** Iterates through the registry, instantiates the correct provider class based on the `type` field, and executes the fetch. ----- ## 3\. Database Schema (`newpower`) The script writes to a table named `newpower`. Ensure PostGIS is enabled (`CREATE EXTENSION postgis;`). | Column | Type | Description | | :--- | :--- | :--- | | `id` | `SERIAL PRIMARY KEY` | Auto-incrementing unique ID. | | `incidentid` | `TEXT` | Utility-assigned ID (or synthetic composite). | | `utility` | `TEXT` | Name of the utility (e.g., 'AEP-WV'). | | `lat` / `lon` | `FLOAT` | Coordinates of the outage. | | `pointgeom` | `TEXT UNIQUE` | Encoded polyline or coordinate string. **Used for deduplication.** | | `geom` | `GEOMETRY(Point, 4326)` | PostGIS Point object for spatial queries. | | `areageom` | `TEXT` | Encoded polyline for outage areas (if available). | | `realareageom` | `GEOMETRY(LineString)` | PostGIS LineString decoded from `areageom`. | | `outagen` | `INTEGER` | Current number of customers affected. | | `peakoutage` | `INTEGER` | Max customers affected (tracked over time). | | `start_time` | `TIMESTAMPTZ` | Reported start time. | | `etr` | `TIMESTAMPTZ` | Estimated Time of Restoration. | | `active` | `BOOLEAN` | `TRUE` if currently active, `FALSE` if restored. | | `fetch_time` | `TIMESTAMPTZ` | Timestamp of the last successful scrape. | ### Post-Processing After every run, the script executes SQL to: * **Enrich Data:** Spatial joins with `county` and `fzone` tables to populate `county`, `state`, and `cwa` columns. * **Decode Geometry:** Converts `areageom` strings into `realareageom` PostGIS objects. * **Update Status:** Sets `active = FALSE` for records not seen in the last 30 minutes. * **Cleanup:** deletes records older than 365 days. ----- ## 4\. Configuration All utility configurations are centralized in the `PROVIDERS` list at the top of the script. ### Adding a "Simple JSON" Provider Used for utilities that return a flat list of outages (e.g., South Central Power). """ { 'name': 'Utility Name', 'type': 'simple_json', 'url': 'https://example.com/data/outages.json' } """### Adding a "Kubra" Provider Used for utilities utilizing Kubra Storm Center maps (e.g., AEP, FirstEnergy). """ { 'name': 'AEP-WV', 'type': 'kubra', 'meta_url': 'https://kubra.io/.../currentState?preview=false', # The 'Current State' URL 'layer': 'cluster-2', # Found via get_kubra_config.py 'quadkeys': ['0320001', '0320003'] # Generated via generate_keys.py } """----- ## 5\. Helper Scripts Two utility scripts assist in configuring new Kubra providers: ### A. `get_kubra_config.py` **Purpose:** Finds the hidden Cluster Layer ID and Hex Keys. * **Input:** The "Current State" URL found in the browser Network tab. * **Output:** The `layer` ID (e.g., `cluster-1`) and constructed base URL. ### B. `generate_keys.py` **Purpose:** Generates the list of starting map tiles (`quadkeys`) for a specific region. * **Input:** A WKT (Well-Known Text) geometry string from your database. * *SQL:* `SELECT ST_AsText(geom) FROM county WHERE ...` * **Output:** A Python list of strings `['032...', '032...']`. ----- ## 6\. Extensibility To add support for a new API format (e.g., XML): 1. **Define Class:** Create a class inheriting from `BaseProvider`. 2. **Implement `fetch()`:** Write logic to download and normalize data into the standard dictionary format. 3. **Register:** Add the class to `PROVIDER_REGISTRY`. """ class XmlProvider(BaseProvider): def fetch(self): # ... XML parsing logic ... return [standardized_outages] # Register it PROVIDER_REGISTRY = { 'kubra': KubraProvider, 'simple_json': SimpleJsonProvider, 'xml': XmlProvider }