Storage Architecture¶

Storage is split into two tiers: a hot metadata tier in PostgreSQL and a cold bulk-data tier in Hetzner Object Storage. On-prem local disks hold only OS, VM images, and ephemeral scratch data.

Data type	Storage	Format	Rationale
Simulation outputs	Hetzner Object Storage (S3)	Parquet	Write-once/read-many. Parquet compression suits sparse simulation data. Cloud-hosted = available even when on-prem is down.
Job metadata, user accounts	PostgreSQL (on app-1)	Relational rows	Lightweight. Only pointers to S3 objects live here, not bulk data.
Job queue	Redis (on app-1)	In-memory lists	Fast enqueue/dequeue. Workers pop jobs and claim work atomically.
VM images, OS	Local SSD (Proxmox host)	qcow2 / ext4	No shared storage needed. Workers are stateless; images can be regenerated.
Input files (per scenario)	Hetzner Object Storage (S3)	Shapefile, EPW, CSV	Uploaded by user. InputLocator resolves S3 paths at job start.

Why not Ceph?

CEA workers are stateless compute processes. Ceph's distributed block storage overhead is not justified when outputs go directly to object storage. Local SSD plus S3 achieves the same durability with far less operational complexity.

# Scenario output path convention in S3
s3://<bucket>/projects/<project_id>/scenarios/<scenario_id>/outputs/
    demand_results.parquet
    costs_summary.parquet
    network_layout/topology.parquet
# Scenario input path convention
s3://<bucket>/projects/<project_id>/scenarios/<scenario_id>/inputs/
    building_geometry.zip   # shapefiles
    weather.epw
    building_properties.csv