7.2 KiB
Ingestion & Storage
Purpose
Astrape needs a reliable way to collect energy-related data, normalize it, store it, and give Gibil a clean view of the current system state. The first version should favor boring, inspectable data flows over cleverness.
Gibil should not need to know whether a value came from Modbus, Home Assistant, a weather API, a price API, or a manual override. It should receive timestamped observations and snapshots with enough metadata to decide whether the data is fresh and trustworthy.
Initial Sources
Sigen Inverter
- Protocol: Modbus TCP
- Polling target: every 5-10 seconds for fast-changing electrical state
- Initial metrics:
solar_power_wbattery_soc_pctbattery_charge_wbattery_discharge_wgrid_import_wgrid_export_wdaily_yield_kwh
- Risk: register map must be confirmed before this can be real
Home Assistant / Ganymede
- Preferred integration: MQTT
- Direction: HASS/Ganymede should publish selected state to Astrape where possible
- Initial metrics:
home_power_windoor_temp_c- selected device states
- selected sensor values needed for water/heating logic
- Reasoning: MQTT keeps Astrape loosely coupled and avoids making HASS a synchronous dependency for every decision tick
Weather
- Preferred first source: OpenMeteo
- Polling target: hourly forecast refresh
- Initial metrics:
outdoor_temp_ccloud_cover_pctghi_w_m2wind_speed_m_s
- Use: external forecast history for generation and heating models
Grid Pricing
- First implementation: static time-of-use config
- Later implementation: spot pricing API if needed
- Initial metrics:
grid_price_per_kwhprice_stagecheap_window_active
- Reasoning: static config lets Gibil produce useful behavior before price API work is settled
Manual Inputs
- Purpose: allow operator-supplied values when a real integration is not available yet
- Inputs may come from local config or a small authenticated admin path
- Manual data should be marked clearly with
source = manual
Observation Shape
Every collector should produce normalized observations.
observed_at: timestamp when the measurement was true
received_at: timestamp when Astrape received it
source: sigen | hass | weather | price | manual
metric: stable metric name
value: number, string, or boolean
unit: W | kWh | pct | C | SEK/kWh | state | none
quality: ok | stale | estimated | missing | error
metadata: source-specific context
Guidelines:
observed_atandreceived_atare both needed because pushed data may arrive late- metric names should be stable and boring
- raw source names/registers/entities belong in metadata, not in the metric name
- Gibil should be able to ignore stale or low-quality observations
Derived Snapshots
Gibil should reason from snapshots, not directly from loose individual observations.
A snapshot is the best-known whole-system state at a decision tick. It can include:
- current solar generation
- current home consumption
- battery SoC
- battery charge/discharge power
- grid import/export
- current price stage
- active forecast window
- stale/missing input flags
Snapshots should be persisted because they explain what Gibil knew when it made a decision.
Storage Choice
Use TimescaleDB as the first primary store.
Reasons:
- It is Postgres, so querying and joining data stays straightforward
- It handles time-series retention and aggregation well
- It works for raw observations, derived snapshots, decisions, forecasts, and events
- It leaves room for later model training without needing a second historical store immediately
InfluxDB remains a reasonable alternative, but TimescaleDB is the better default if we want relational joins, auditability, and forecast training queries.
The runtime expects ASTRAPE_DATABASE_URL to point at TimescaleDB. Weather ingest also expects ASTRAPE_LATITUDE and ASTRAPE_LONGITUDE.
Initial Tables
observations
Raw normalized metric samples from all collectors.
Core fields:
idobserved_atreceived_atsourcemetricvalue_numvalue_textvalue_boolunitqualitymetadata
Notes:
- use one value column based on the metric type
- keep metadata as JSON for source-specific details
- make this a hypertable on
observed_at
snapshots
Periodic whole-system state used by Gibil.
Core fields:
idcreated_atsnapshotinput_quality
Notes:
- store the snapshot as JSON initially
- this can be normalized later if query patterns demand it
decisions
Gibil outputs and reasoning.
Core fields:
idcreated_atsnapshot_idstagerecommendationsreasonsconfidence
Notes:
- decisions should be explainable enough to debug after the fact
- this table becomes the audit trail for HASS-facing behavior
weather_forecast_points
Clean external weather forecast points from weather sources.
Core fields:
idissued_attarget_athorizon_hourssourcetemperature_cshortwave_radiation_w_m2cloud_cover_pct
Notes:
- this stores external forecasts, not internal predictions
- make this a hypertable on
target_at
weather_resolved_truth
Observed weather for target hours that have already happened.
Core fields:
idresolved_atsourcetemperature_cshortwave_radiation_w_m2
Notes:
- future prediction modules can join this to
weather_forecast_points - make this a hypertable on
resolved_at
system_events
Operational events from collectors, storage, Gibil, and publishers.
Core fields:
idcreated_atcomponentseverityevent_typemessagemetadata
Notes:
- this should capture stale data, auth failures, bad Modbus reads, publish failures, and degraded-mode decisions
Retention
Initial retention targets:
- raw 5-10 second observations: 7-30 days
- 1-minute aggregates: 6-12 months
- 15-minute/hourly aggregates: keep indefinitely unless storage becomes a problem
- decisions: keep indefinitely
- system events: keep indefinitely or archive after a year
Retention should be revisited after real sample rates and database size are known.
First Slice
The first implementation slice should prove the shape before touching real hardware.
- Define the observation and snapshot models.
- Add a manual collector only if needed for operator-supplied values.
- Store observations in TimescaleDB or a local development substitute.
- Build one snapshot from the latest observations.
- Let Gibil make a simple stage decision from that snapshot.
- Persist the decision with reasons.
This gives us the whole loop:
collector -> observations -> snapshot -> Gibil decision -> stored audit trail
MQTT publishing can come immediately after this loop exists.
Open Questions
- Should development use real TimescaleDB from day one, or SQLite/Postgres first?
- What is the exact MQTT topic namespace for HASS/Ganymede integration?
- Which HASS entities should be included in the first read-only state feed?
- How should the
gibilIPA identity authenticate to MQTT and HASS? - What high-resolution retention target is acceptable on the Astrape VM?
- Should snapshots be created on a fixed schedule, on new data, or both?