Skip to content

Storage Architecture

Storage is split into two tiers: a hot metadata tier in PostgreSQL and a cold bulk-data tier in Hetzner Object Storage. On-prem local disks hold only OS, VM images, and ephemeral scratch data.

Data type Storage Format Rationale
Simulation outputs Hetzner Object Storage (S3) Parquet Write-once/read-many. Parquet compression suits sparse simulation data. Cloud-hosted = available even when on-prem is down.
Job metadata, user accounts PostgreSQL (on app-1) Relational rows Lightweight. Only pointers to S3 objects live here, not bulk data.
Job queue Redis (on app-1) In-memory lists Fast enqueue/dequeue. Workers pop jobs and claim work atomically.
VM images, OS Local SSD (Proxmox host) qcow2 / ext4 No shared storage needed. Workers are stateless; images can be regenerated.
Input files (per scenario) Hetzner Object Storage (S3) Shapefile, EPW, CSV Uploaded by user. InputLocator resolves S3 paths at job start.

Why not Ceph?

CEA workers are stateless compute processes. Ceph's distributed block storage overhead is not justified when outputs go directly to object storage. Local SSD plus S3 achieves the same durability with far less operational complexity.

# Scenario output path convention in S3
s3://<bucket>/projects/<project_id>/scenarios/<scenario_id>/outputs/
    demand_results.parquet
    costs_summary.parquet
    network_layout/topology.parquet
# Scenario input path convention
s3://<bucket>/projects/<project_id>/scenarios/<scenario_id>/inputs/
    building_geometry.zip   # shapefiles
    weather.epw
    building_properties.csv