Storage Architecture¶
Storage is split into two tiers: a hot metadata tier in PostgreSQL and a cold bulk-data tier in Hetzner Object Storage. On-prem local disks hold only OS, VM images, and ephemeral scratch data.
| Data type | Storage | Format | Rationale |
|---|---|---|---|
| Simulation outputs | Hetzner Object Storage (S3) | Parquet | Write-once/read-many. Parquet compression suits sparse simulation data. Cloud-hosted = available even when on-prem is down. |
| Job metadata, user accounts | PostgreSQL (on app-1) | Relational rows | Lightweight. Only pointers to S3 objects live here, not bulk data. |
| Job queue | Redis (on app-1) | In-memory lists | Fast enqueue/dequeue. Workers pop jobs and claim work atomically. |
| VM images, OS | Local SSD (Proxmox host) | qcow2 / ext4 | No shared storage needed. Workers are stateless; images can be regenerated. |
| Input files (per scenario) | Hetzner Object Storage (S3) | Shapefile, EPW, CSV | Uploaded by user. InputLocator resolves S3 paths at job start. |
Why not Ceph?
CEA workers are stateless compute processes. Ceph's distributed block storage overhead is not justified when outputs go directly to object storage. Local SSD plus S3 achieves the same durability with far less operational complexity.
# Scenario output path convention in S3
s3://<bucket>/projects/<project_id>/scenarios/<scenario_id>/outputs/
demand_results.parquet
costs_summary.parquet
network_layout/topology.parquet
# Scenario input path convention
s3://<bucket>/projects/<project_id>/scenarios/<scenario_id>/inputs/
building_geometry.zip # shapefiles
weather.epw
building_properties.csv