Public Access

Files

T

rpotter6298 c8e3016fd6 Add new daemons and debug scripts for Sigenergy and Oracle functionalities

- Implement `sigen_daemon.py` to poll Sigenergy plant metrics and store snapshots.
- Create `web_daemon.py` for serving a web interface with various endpoints.
- Add debug scripts:
  - `debug_duplicates.py` to find duplicate target times in forecast data.
  - `debug_energy_forecast.py` to print baseline energy forecast curves.
  - `debug_oracle_evaluations.py` to run the oracle evaluator.
  - `debug_sigen.py` to inspect stored Sigenergy plant snapshots.
  - `debug_weather.py` to trace resolved truth data.
  - `modbus_test.py` for exploring Sigenergy plants or inverters over Modbus TCP.
- Introduce `oracle_evaluator.py` for evaluating stored oracle predictions against actuals.
- Add TCN training scripts in `tcn` directory for training usage sequence models.

2026-04-28 08:14:00 +02:00

8.1 KiB

Raw Blame History

Ingestion & Storage

Purpose

Astrape needs a reliable way to collect energy-related data, normalize it, store it, and give Gibil a clean view of the current system state. The first version should favor boring, inspectable data flows over cleverness.

Gibil should not need to know whether a value came from Modbus, Home Assistant, a weather API, a price API, or a manual override. It should receive timestamped observations and snapshots with enough metadata to decide whether the data is fresh and trustworthy.

Initial Sources

Sigen Inverter

Protocol: Modbus TCP
Polling target: every 5-10 seconds for fast-changing electrical state
Initial metrics:
- solar_power_w
- battery_soc_pct
- battery_charge_w
- battery_discharge_w
- grid_import_w
- grid_export_w
- daily_yield_kwh
Risk: register map must be confirmed before this can be real

Home Assistant / Ganymede

Preferred integration: MQTT
Direction: HASS/Ganymede should publish selected state to Astrape where possible
Initial metrics:
- home_power_w
- indoor_temp_c
- selected device states
- selected sensor values needed for water/heating logic
Reasoning: MQTT keeps Astrape loosely coupled and avoids making HASS a synchronous dependency for every decision tick

Weather

Preferred first source: OpenMeteo
Polling target: hourly forecast refresh
Initial metrics:
- outdoor_temp_c
- cloud_cover_pct
- ghi_w_m2
- wind_speed_m_s
Use: external forecast history for generation and heating models

Grid Pricing

First implementation: static time-of-use config
Later implementation: spot pricing API if needed
Initial metrics:
- grid_price_per_kwh
- price_stage
- cheap_window_active
Reasoning: static config lets Gibil produce useful behavior before price API work is settled

Manual Inputs

Purpose: allow operator-supplied values when a real integration is not available yet
Inputs may come from local config or a small authenticated admin path
Manual data should be marked clearly with source = manual

Observation Shape

Every collector should produce normalized observations.

observed_at: timestamp when the measurement was true
received_at: timestamp when Astrape received it
source: sigen | hass | weather | price | manual
metric: stable metric name
value: number, string, or boolean
unit: W | kWh | pct | C | SEK/kWh | state | none
quality: ok | stale | estimated | missing | error
metadata: source-specific context

Guidelines:

observed_at and received_at are both needed because pushed data may arrive late
metric names should be stable and boring
raw source names/registers/entities belong in metadata, not in the metric name
Gibil should be able to ignore stale or low-quality observations

Derived Snapshots

Gibil should reason from snapshots, not directly from loose individual observations.

A snapshot is the best-known whole-system state at a decision tick. It can include:

current solar generation
current home consumption
battery SoC
battery charge/discharge power
grid import/export
current price stage
active forecast window
stale/missing input flags

Snapshots should be persisted because they explain what Gibil knew when it made a decision.

Storage Choice

Use TimescaleDB as the first primary store.

Reasons:

It is Postgres, so querying and joining data stays straightforward
It handles time-series retention and aggregation well
It works for raw observations, derived snapshots, decisions, forecasts, and events
It leaves room for later model training without needing a second historical store immediately

InfluxDB remains a reasonable alternative, but TimescaleDB is the better default if we want relational joins, auditability, and forecast training queries.

The runtime expects ASTRAPE_DATABASE_URL to point at TimescaleDB. Weather ingest also expects ASTRAPE_LATITUDE and ASTRAPE_LONGITUDE.

Initial Tables

`observations`

Raw normalized metric samples from all collectors.

Core fields:

id
observed_at
received_at
source
metric
value_num
value_text
value_bool
unit
quality
metadata

Notes:

use one value column based on the metric type
keep metadata as JSON for source-specific details
make this a hypertable on observed_at

`snapshots`

Periodic whole-system state used by Gibil.

Core fields:

id
created_at
snapshot
input_quality

Notes:

store the snapshot as JSON initially
this can be normalized later if query patterns demand it

`decisions`

Gibil outputs and reasoning.

Core fields:

id
created_at
snapshot_id
stage
recommendations
reasons
confidence

Notes:

decisions should be explainable enough to debug after the fact
this table becomes the audit trail for HASS-facing behavior

`weather_forecast_points`

Clean external weather forecast points from weather sources.

Core fields:

id
issued_at
target_at
horizon_hours
source
temperature_c
shortwave_radiation_w_m2
cloud_cover_pct

Notes:

this stores external forecasts, not internal predictions
make this a hypertable on target_at

`weather_resolved_truth`

Observed weather for target hours that have already happened.

Core fields:

id
resolved_at
source
temperature_c
shortwave_radiation_w_m2

Notes:

future prediction modules can join this to weather_forecast_points
make this a hypertable on resolved_at

`sigen_plant_snapshots`

High-resolution Sigenergy plant state from Modbus TCP.

Core fields:

observed_at
received_at
source
solar_power_w
battery_soc_pct
battery_soh_pct
battery_power_w
grid_power_w
grid_import_w
grid_export_w
load_power_w
plant_active_power_w
accumulated_pv_energy_kwh
daily_consumed_energy_kwh
accumulated_consumed_energy_kwh
status fields for EMS, running state, and grid sensor state
raw_values

Notes:

raw polling target is SIGEN_POLL_SECONDS=5
make this a hypertable on observed_at
keep raw JSON during integration so unsupported or surprising registers can be debugged
rollup views should preserve averages, min/max spikes, and sample counts so short-duration usage signatures are not erased completely

Initial rollups:

sigen_plant_snapshots_1m
sigen_plant_snapshots_15m
sigen_plant_snapshots_1h

`system_events`

Operational events from collectors, storage, Gibil, and publishers.

Core fields:

id
created_at
component
severity
event_type
message
metadata

Notes:

this should capture stale data, auth failures, bad Modbus reads, publish failures, and degraded-mode decisions

Retention

Initial retention targets:

raw 5-10 second observations: 7-30 days
1-minute aggregates: 6-12 months
15-minute/hourly aggregates: keep indefinitely unless storage becomes a problem
decisions: keep indefinitely
system events: keep indefinitely or archive after a year

Retention should be revisited after real sample rates and database size are known.

First Slice

The first implementation slice should prove the shape before touching real hardware.

Define the observation and snapshot models.
Add a manual collector only if needed for operator-supplied values.
Store observations in TimescaleDB or a local development substitute.
Build one snapshot from the latest observations.
Let Gibil make a simple stage decision from that snapshot.
Persist the decision with reasons.

This gives us the whole loop:

collector -> observations -> snapshot -> Gibil decision -> stored audit trail

MQTT publishing can come immediately after this loop exists.

Open Questions

Should development use real TimescaleDB from day one, or SQLite/Postgres first?
What is the exact MQTT topic namespace for HASS/Ganymede integration?
Which HASS entities should be included in the first read-only state feed?
How should the gibil IPA identity authenticate to MQTT and HASS?
What high-resolution retention target is acceptable on the Astrape VM?
Should snapshots be created on a fixed schedule, on new data, or both?

8.1 KiB Raw Blame History

Ingestion & Storage

Purpose

Initial Sources

Sigen Inverter

Home Assistant / Ganymede

Weather

Grid Pricing

Manual Inputs

Observation Shape

Derived Snapshots

Storage Choice

Initial Tables

observations

snapshots

decisions

weather_forecast_points

weather_resolved_truth

sigen_plant_snapshots

system_events

Retention

First Slice

Open Questions

8.1 KiB

Raw Blame History

`observations`

`snapshots`

`decisions`

`weather_forecast_points`

`weather_resolved_truth`

`sigen_plant_snapshots`

`system_events`