initial powe2 change
This commit is contained in:
134
power2.MD
Normal file
134
power2.MD
Normal file
@@ -0,0 +1,134 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
This Markdown document details the architecture, configuration, and maintenance procedures for the `newpower2.py` script.
|
||||
|
||||
# Power Outage Scraper (`newpower2.py`) Specification
|
||||
|
||||
## 1\. Overview
|
||||
|
||||
The `newpower2.py` script is a modular, extensible Python application designed to scrape, normalize, and store high-resolution power outage data from multiple utility providers. It supports distinct scraping strategies (Kubra Storm Center, Simple JSON) and persists standardized data into a PostGIS-enabled PostgreSQL database.
|
||||
|
||||
-----
|
||||
|
||||
## 2\. Architecture
|
||||
|
||||
The script utilizes a **Strategy Pattern** to handle different utility API formats while maintaining a unified execution pipeline.
|
||||
|
||||
[Image of strategy design pattern class diagram]
|
||||
|
||||
### Core Components
|
||||
|
||||
1. **Provider Registry:** A configuration list (`PROVIDERS`) that defines every utility to be scraped.
|
||||
2. **Provider Classes:**
|
||||
* `BaseProvider` (Abstract): Defines the blueprint for all scrapers.
|
||||
* `KubraProvider`: Implements recursive quadkey drilling for Kubra maps.
|
||||
* `SimpleJsonProvider`: Implements flat list parsing for standard JSON APIs.
|
||||
3. **Database Handler (`PowerDB`):** Manages connection pooling, upsert operations, and post-processing SQL tasks.
|
||||
4. **Main Loop:** Iterates through the registry, instantiates the correct provider class based on the `type` field, and executes the fetch.
|
||||
|
||||
-----
|
||||
|
||||
## 3\. Database Schema (`newpower`)
|
||||
|
||||
The script writes to a table named `newpower`. Ensure PostGIS is enabled (`CREATE EXTENSION postgis;`).
|
||||
|
||||
| Column | Type | Description |
|
||||
| :--- | :--- | :--- |
|
||||
| `id` | `SERIAL PRIMARY KEY` | Auto-incrementing unique ID. |
|
||||
| `incidentid` | `TEXT` | Utility-assigned ID (or synthetic composite). |
|
||||
| `utility` | `TEXT` | Name of the utility (e.g., 'AEP-WV'). |
|
||||
| `lat` / `lon` | `FLOAT` | Coordinates of the outage. |
|
||||
| `pointgeom` | `TEXT UNIQUE` | Encoded polyline or coordinate string. **Used for deduplication.** |
|
||||
| `geom` | `GEOMETRY(Point, 4326)` | PostGIS Point object for spatial queries. |
|
||||
| `areageom` | `TEXT` | Encoded polyline for outage areas (if available). |
|
||||
| `realareageom` | `GEOMETRY(LineString)` | PostGIS LineString decoded from `areageom`. |
|
||||
| `outagen` | `INTEGER` | Current number of customers affected. |
|
||||
| `peakoutage` | `INTEGER` | Max customers affected (tracked over time). |
|
||||
| `start_time` | `TIMESTAMPTZ` | Reported start time. |
|
||||
| `etr` | `TIMESTAMPTZ` | Estimated Time of Restoration. |
|
||||
| `active` | `BOOLEAN` | `TRUE` if currently active, `FALSE` if restored. |
|
||||
| `fetch_time` | `TIMESTAMPTZ` | Timestamp of the last successful scrape. |
|
||||
|
||||
### Post-Processing
|
||||
|
||||
After every run, the script executes SQL to:
|
||||
|
||||
* **Enrich Data:** Spatial joins with `county` and `fzone` tables to populate `county`, `state`, and `cwa` columns.
|
||||
* **Decode Geometry:** Converts `areageom` strings into `realareageom` PostGIS objects.
|
||||
* **Update Status:** Sets `active = FALSE` for records not seen in the last 30 minutes.
|
||||
* **Cleanup:** deletes records older than 365 days.
|
||||
|
||||
-----
|
||||
|
||||
## 4\. Configuration
|
||||
|
||||
All utility configurations are centralized in the `PROVIDERS` list at the top of the script.
|
||||
|
||||
### Adding a "Simple JSON" Provider
|
||||
|
||||
Used for utilities that return a flat list of outages (e.g., South Central Power).
|
||||
"""
|
||||
|
||||
{
|
||||
'name': 'Utility Name',
|
||||
'type': 'simple_json',
|
||||
'url': 'https://example.com/data/outages.json'
|
||||
}
|
||||
|
||||
"""### Adding a "Kubra" Provider
|
||||
|
||||
Used for utilities utilizing Kubra Storm Center maps (e.g., AEP, FirstEnergy).
|
||||
"""
|
||||
|
||||
{
|
||||
'name': 'AEP-WV',
|
||||
'type': 'kubra',
|
||||
'meta_url': 'https://kubra.io/.../currentState?preview=false', # The 'Current State' URL
|
||||
'layer': 'cluster-2', # Found via get_kubra_config.py
|
||||
'quadkeys': ['0320001', '0320003'] # Generated via generate_keys.py
|
||||
}
|
||||
|
||||
"""-----
|
||||
|
||||
## 5\. Helper Scripts
|
||||
|
||||
Two utility scripts assist in configuring new Kubra providers:
|
||||
|
||||
### A. `get_kubra_config.py`
|
||||
|
||||
**Purpose:** Finds the hidden Cluster Layer ID and Hex Keys.
|
||||
|
||||
* **Input:** The "Current State" URL found in the browser Network tab.
|
||||
* **Output:** The `layer` ID (e.g., `cluster-1`) and constructed base URL.
|
||||
|
||||
### B. `generate_keys.py`
|
||||
|
||||
**Purpose:** Generates the list of starting map tiles (`quadkeys`) for a specific region.
|
||||
|
||||
* **Input:** A WKT (Well-Known Text) geometry string from your database.
|
||||
* *SQL:* `SELECT ST_AsText(geom) FROM county WHERE ...`
|
||||
* **Output:** A Python list of strings `['032...', '032...']`.
|
||||
|
||||
-----
|
||||
|
||||
## 6\. Extensibility
|
||||
|
||||
To add support for a new API format (e.g., XML):
|
||||
|
||||
1. **Define Class:** Create a class inheriting from `BaseProvider`.
|
||||
2. **Implement `fetch()`:** Write logic to download and normalize data into the standard dictionary format.
|
||||
3. **Register:** Add the class to `PROVIDER_REGISTRY`.
|
||||
|
||||
<!-- end list -->
|
||||
"""
|
||||
|
||||
class XmlProvider(BaseProvider):
|
||||
def fetch(self):
|
||||
# ... XML parsing logic ...
|
||||
return [standardized_outages]
|
||||
|
||||
# Register it
|
||||
PROVIDER_REGISTRY = {
|
||||
'kubra': KubraProvider,
|
||||
'simple_json': SimpleJsonProvider,
|
||||
'xml': XmlProvider
|
||||
}
|
||||
Reference in New Issue
Block a user