Storage Architecture for Blockchain: SSD, NVMe, and IOPS Planning

Storage is the most underestimated infrastructure decision in blockchain deployments. A validator node running on HDD instead of NVMe will bottleneck at 50 transactions per second regardless of how…

This article walks through storage architecture decisions for a real deployment: VaultChain Logistics, a European freight consortium running Hyperledger Fabric 2.5 across 6 peer nodes and 5 orderers, processing 500 transactions per second for customs documentation. Their storage requirements span NVMe for Raft WAL, SSD for ledger blocks, and object storage for backups, with a total monthly spend of under $500 through intelligent tiering.

Storage Tier Architecture for Blockchain Components

Not every blockchain component needs the fastest storage available. The Raft WAL (write-ahead log) demands NVMe for sub-100 microsecond write latency, but ledger block files are append-only and work well on SATA SSD. Backups are write-once, read-rarely workloads that belong on HDD or object storage. The key is matching each component’s I/O pattern to the right storage tier.

, Free to use, share it in your presentations, blogs, or learning materials.
Three-tier storage architecture mapping NVMe, SATA SSD, and HDD/object storage to specific blockchain components with IOPS requirements and monthly cost estimates
Storage tier architecture mapping blockchain components to NVMe, SSD, and HDD tiers with cost comparison

The diagram above shows how VaultChain maps each blockchain component to the appropriate storage tier. Tier 1 (NVMe) hosts the Raft WAL, CouchDB indexes, LevelDB/RocksDB state tries, and the transaction pool cache where sub-millisecond latency is critical. Tier 2 (SATA SSD) handles block storage, CouchDB data files, chaincode images, and Prometheus time-series data where consistent throughput matters more than peak IOPS. Tier 3 (HDD/Object) stores ledger backups, log archives, and encrypted crypto material backups where cost per gigabyte is the primary concern. This tiered approach costs $485 per month versus $2,125 for an all-NVMe deployment.

IOPS Benchmarking for Storage Selection

Before committing to a storage tier, benchmark the actual drives under blockchain-representative workloads. Generic vendor IOPS numbers are measured under ideal conditions (queue depth 256, 4K random reads) that do not reflect real blockchain access patterns. The following benchmarks use fio profiles that simulate orderer WAL sequential writes, CouchDB random reads, peer ledger mixed I/O, backup sequential reads, and durable commit writes with fsync.

# Benchmark suite: simulate blockchain I/O patterns with fio
# Run on each candidate drive before purchase decisions

# Test 1: Sequential Write 4K (simulates orderer Raft WAL)
fio --name=orderer-wal --ioengine=libaio --direct=1 \
  --rw=write --bs=4k --iodepth=64 --numjobs=4 \
  --size=10G --runtime=60 --time_based \
  --filename=/dev/nvme0n1p4 --group_reporting

# Test 2: Random Read 4K (simulates CouchDB state queries)
fio --name=couchdb-query --ioengine=libaio --direct=1 \
  --rw=randread --bs=4k --iodepth=64 --numjobs=4 \
  --size=10G --runtime=60 --time_based \
  --filename=/dev/nvme0n1p4 --group_reporting

# Test 3: Mixed Read/Write 70/30 (simulates peer ledger operations)
fio --name=peer-ledger --ioengine=libaio --direct=1 \
  --rw=randrw --rwmixread=70 --bs=4k --iodepth=32 --numjobs=4 \
  --size=10G --runtime=60 --time_based \
  --filename=/dev/nvme0n1p4 --group_reporting

# Test 4: Sequential Read 256K (simulates backup/restore)
fio --name=backup-restore --ioengine=libaio --direct=1 \
  --rw=read --bs=256k --iodepth=16 --numjobs=2 \
  --size=10G --runtime=60 --time_based \
  --filename=/dev/nvme0n1p4 --group_reporting

# Test 5: Durable Write 4K with fsync (simulates commit path)
fio --name=commit-durable --ioengine=libaio --direct=1 \
  --rw=write --bs=4k --iodepth=1 --numjobs=1 \
  --fsync=1 --size=2G --runtime=60 --time_based \
  --filename=/dev/nvme0n1p4 --group_reporting

# Aggregate results into comparison table
echo "=== Storage Benchmark Summary ==="
echo "Run on: $(hostname) | $(date -u)"
echo "System: $(lscpu | grep 'Model name' | awk -F: '{print $2}' | xargs)"
echo "Kernel: $(uname -r)"
, Free to use, share it in your presentations, blogs, or learning materials.
IOPS benchmark comparison chart showing NVMe, SATA SSD, and HDD performance across five blockchain workload patterns including WAL writes, state queries, and backup operations
Measured IOPS results across NVMe, SATA SSD, and HDD for blockchain specific workload profiles

The benchmark results reveal dramatic differences between storage types. NVMe delivers 480K IOPS for sequential 4K writes (orderer WAL) versus just 52K on SATA SSD and 180 on HDD. For the critical fsync write path that ensures Raft commit durability, NVMe achieves 210K IOPS while HDD manages only 80. This means an HDD based orderer would struggle to sustain even 50 TPS, while NVMe comfortably handles 5000+ TPS. For CouchDB random reads (state queries), NVMe at 520K IOPS provides 6x headroom over SSD at 82K, making it the right choice for networks with complex rich queries.

Disk Partition Layout for Blockchain Nodes

Each blockchain node needs separate mount points for the operating system, blockchain data, state database, Docker runtime, and logs. Mixing these on a single partition creates contention during compaction, backup, or high transaction periods. Separate partitions also enable independent LVM snapshots, filesystem tuning per workload, and targeted monitoring of storage consumption.

# Orderer node disk setup (lon-bc-01)
# Root SSD (/dev/sda) - 250 GB SATA SSD
parted /dev/sda mklabel gpt
parted /dev/sda mkpart primary 1MiB 1GiB      # /boot
parted /dev/sda mkpart primary 1GiB 51GiB     # /
parted /dev/sda mkpart primary 51GiB 101GiB   # /var/log
parted /dev/sda mkpart primary 101GiB 121GiB  # /tmp

mkfs.ext4 -L boot /dev/sda1
mkfs.ext4 -L root /dev/sda2
mkfs.ext4 -L varlog /dev/sda3
mkfs.ext4 -L tmp /dev/sda4

# NVMe 0 (/dev/nvme0n1) - 500 GB for Raft WAL + snapshots
pvcreate /dev/nvme0n1
vgcreate vg_raft /dev/nvme0n1
lvcreate -n lv_wal -L 100G vg_raft
lvcreate -n lv_snap -L 50G vg_raft
lvcreate -n lv_orderer -l 100%FREE vg_raft

mkfs.xfs -f /dev/vg_raft/lv_wal
mkfs.xfs -f /dev/vg_raft/lv_snap
mkfs.xfs -f /dev/vg_raft/lv_orderer

# NVMe 1 (/dev/nvme1n1) - 1 TB for orderer ledger blocks
pvcreate /dev/nvme1n1
vgcreate vg_ledger /dev/nvme1n1
lvcreate -n lv_chains -l 100%FREE vg_ledger
mkfs.xfs -f /dev/vg_ledger/lv_chains

# Docker SSD (/dev/sdb) - 200 GB
mkfs.ext4 -L docker /dev/sdb1

# Mount configuration (/etc/fstab)
cat <<EOF >> /etc/fstab
# Root and system
/dev/sda1  /boot         ext4  defaults,noatime                0 2
/dev/sda2  /             ext4  defaults,noatime                0 1
/dev/sda3  /var/log      ext4  defaults,noatime,commit=60      0 2
/dev/sda4  /tmp          ext4  defaults,noatime,nosuid,nodev   0 2

# Blockchain NVMe (Raft WAL - highest priority)
/dev/vg_raft/lv_wal      /var/hyperledger/production/orderer/etcdraft/wal  xfs  defaults,noatime,nodiratime,logbufs=8,logbsize=256k  0 2
/dev/vg_raft/lv_snap     /var/hyperledger/production/orderer/etcdraft/snapshot  xfs  defaults,noatime,nodiratime  0 2
/dev/vg_raft/lv_orderer  /var/hyperledger/production/orderer  xfs  defaults,noatime,nodiratime  0 2

# Blockchain NVMe (ledger blocks)
/dev/vg_ledger/lv_chains  /var/hyperledger/production/orderer/chains  xfs  defaults,noatime,nodiratime,allocsize=64m  0 2

# Docker
/dev/sdb1  /var/lib/docker  ext4  defaults,noatime,discard  0 2
EOF

mount -a

# Set I/O scheduler to none for NVMe (bypass kernel scheduler)
echo none > /sys/block/nvme0n1/queue/scheduler
echo none > /sys/block/nvme1n1/queue/scheduler
echo mq-deadline > /sys/block/sda/queue/scheduler

# Make persistent via udev rule
cat <<EOF > /etc/udev/rules.d/60-io-scheduler.rules
ACTION=="add|change", KERNEL=="nvme*", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="sd*", ATTR{queue/scheduler}="mq-deadline"
EOF
, Free to use, share it in your presentations, blogs, or learning materials.
Disk partition layout diagrams for orderer and peer nodes showing NVMe, SSD, and HDD allocations with mount options and filesystem choices
Partition layouts for orderer and peer nodes with filesystem selection and mount option reference table

This layout separates each workload onto its own storage device. The orderer node uses dedicated NVMe for the Raft WAL (with xfs and aggressive log buffering for write performance) and a separate NVMe for ledger blocks. The peer node adds a third NVMe volume for CouchDB state data, since state queries generate random read patterns that benefit from NVMe’s low latency. The key mount options are noatime (eliminates unnecessary metadata writes), nodiratime (same for directories), and logbufs=8,logbsize=256k (increases XFS journal buffer for sequential write throughput). The I/O scheduler is set to none for NVMe drives, bypassing the kernel scheduler entirely since NVMe has its own internal queue management.

Storage Growth Projection and Capacity Planning

Blockchain storage only grows. Unlike traditional databases where old records can be archived or deleted, every transaction remains in the ledger permanently. Capacity planning requires projecting storage consumption over the network’s expected lifetime, accounting for block storage growth, state database expansion, WAL management, and backup accumulation.

# Storage growth calculator script
#!/bin/bash
# storage-forecast.sh - Project storage needs for N months

AVG_TPS=${1:-500}           # Average transactions per second
AVG_TX_SIZE_KB=${2:-1}      # Average transaction size in KB
MONTHS=${3:-12}             # Forecast period
PEER_COUNT=${4:-6}          # Number of peer nodes
BACKUP_RETENTION_MONTHS=${5:-6}  # Backup retention period

SECONDS_PER_DAY=86400
DAYS_PER_MONTH=30

# Calculate raw ledger growth per month (per peer)
DAILY_TX=$((AVG_TPS * SECONDS_PER_DAY))
MONTHLY_TX=$((DAILY_TX * DAYS_PER_MONTH))
RAW_MONTHLY_GB=$(echo "scale=2; $MONTHLY_TX * $AVG_TX_SIZE_KB / 1048576" | bc)
# Block compression ratio ~10:1 for repetitive structured data
COMPRESSED_MONTHLY_GB=$(echo "scale=2; $RAW_MONTHLY_GB / 10" | bc)

echo "=== Storage Growth Forecast ==="
echo "Parameters: ${AVG_TPS} TPS, ${AVG_TX_SIZE_KB} KB/tx, ${MONTHS} months"
echo "Monthly transactions: ${MONTHLY_TX}"
echo "Raw monthly growth: ${RAW_MONTHLY_GB} GB"
echo "Compressed monthly growth: ${COMPRESSED_MONTHLY_GB} GB (per peer)"
echo ""

for month in $(seq 1 $MONTHS); do
  LEDGER_GB=$(echo "scale=0; $COMPRESSED_MONTHLY_GB * $month" | bc)
  # CouchDB state grows at ~60% of ledger rate (key-value pairs, not full blocks)
  STATE_GB=$(echo "scale=0; $COMPRESSED_MONTHLY_GB * 0.6 * $month" | bc)
  # WAL stays constant (auto-pruned by snapshots)
  WAL_GB=50
  # Backups: cumulative with retention window
  if [ $month -le $BACKUP_RETENTION_MONTHS ]; then
    BACKUP_GB=$(echo "scale=0; ($COMPRESSED_MONTHLY_GB + $COMPRESSED_MONTHLY_GB * 0.6) * $month" | bc)
  else
    BACKUP_GB=$(echo "scale=0; ($COMPRESSED_MONTHLY_GB + $COMPRESSED_MONTHLY_GB * 0.6) * $BACKUP_RETENTION_MONTHS" | bc)
  fi
  TOTAL_PER_PEER=$(echo "scale=0; $LEDGER_GB + $STATE_GB + $WAL_GB" | bc)
  TOTAL_ALL_PEERS=$(echo "scale=0; $TOTAL_PER_PEER * $PEER_COUNT + $BACKUP_GB" | bc)

  echo "Month ${month}: Ledger=${LEDGER_GB}GB State=${STATE_GB}GB WAL=${WAL_GB}GB Backup=${BACKUP_GB}GB | Per peer=${TOTAL_PER_PEER}GB | Total cluster=${TOTAL_ALL_PEERS}GB"
done

echo ""
echo "=== Capacity Alerts ==="
echo "Set Prometheus alert when disk usage > 80% of provisioned capacity"
echo "Order new storage 3 months before projected 80% threshold"
, Free to use, share it in your presentations, blogs, or learning materials.
12-month storage growth projection chart showing ledger blocks, CouchDB state, Raft WAL, and backup archive trajectories with capacity alert threshold
Storage growth forecast showing how block storage, state database, WAL, and backups consume capacity over 12 months at 500 TPS

The projection shows that at 500 TPS with 1 KB average transaction size, each peer node’s ledger grows approximately 130 GB per month and CouchDB state adds 80 GB per month. Over 12 months, a single peer accumulates roughly 1.56 TB of ledger data and 960 GB of state data. The Raft WAL remains stable at approximately 50 GB due to automatic pruning after snapshots. Backup archives grow fastest at 170 GB per month (cumulative). For VaultChain’s 6-peer network, the first-year total storage requirement is approximately 15 TB. The 80% capacity alert line ensures you order additional storage three months before hitting provisioned limits.

Ceph Storage Cluster for Kubernetes Deployments

When running blockchain nodes on Kubernetes, persistent volumes need a distributed storage backend that provides replication, snapshots, and dynamic provisioning. Ceph with the Rook operator is the most common choice for on-premises Kubernetes blockchain deployments. Ceph’s RADOS Block Device (RBD) provides block-level access with server-side copy-on-write snapshots, exactly matching the snapshot-friendly architecture blockchain nodes require.

# Deploy Rook-Ceph operator for blockchain storage
# Prerequisites: 3 nodes with raw NVMe + SSD drives

# Install Rook operator
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.14/deploy/examples/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.14/deploy/examples/common.yaml
kubectl apply -f https://raw.githubusercontent.com/rook/rook/v1.14/deploy/examples/operator.yaml

# Wait for operator
kubectl -n rook-ceph wait --for=condition=ready pod -l app=rook-ceph-operator --timeout=300s

# Create Ceph cluster with device class separation
cat <<EOF | kubectl apply -f -
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: blockchain-storage
  namespace: rook-ceph
spec:
  cephVersion:
    image: quay.io/ceph/ceph:v18.2
  dataDirHostPath: /var/lib/rook
  mon:
    count: 3
    allowMultiplePerNode: false
  mgr:
    count: 2
    modules:
      - name: pg_autoscaler
        enabled: true
      - name: prometheus
        enabled: true
  dashboard:
    enabled: true
    ssl: true
  storage:
    useAllNodes: true
    useAllDevices: false
    deviceFilter: "^(nvme|sd[b-z])"
    config:
      osdsPerDevice: "1"
      encryptedDevice: "true"
    nodes:
      - name: ceph-node-01
        devices:
          - name: nvme0n1
            config:
              deviceClass: nvme
          - name: nvme1n1
            config:
              deviceClass: nvme
          - name: sda
            config:
              deviceClass: ssd
      - name: ceph-node-02
        devices:
          - name: nvme0n1
            config:
              deviceClass: nvme
          - name: nvme1n1
            config:
              deviceClass: nvme
          - name: sda
            config:
              deviceClass: ssd
      - name: ceph-node-03
        devices:
          - name: nvme0n1
            config:
              deviceClass: nvme
          - name: nvme1n1
            config:
              deviceClass: nvme
          - name: sda
            config:
              deviceClass: ssd
  placement:
    mon:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: role
                  operator: In
                  values:
                    - storage
    osd:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: role
                  operator: In
                  values:
                    - storage
  resources:
    osd:
      requests:
        cpu: "4"
        memory: 8Gi
      limits:
        cpu: "8"
        memory: 16Gi
EOF

# Create storage classes for different blockchain tiers
cat <<EOF | kubectl apply -f -
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: nvme-replicated
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3
  deviceClass: nvme
  parameters:
    compression_mode: none
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rbd-nvme
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  clusterID: rook-ceph
  pool: nvme-replicated
  imageFormat: "2"
  imageFeatures: layering,exclusive-lock,object-map,fast-diff
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: xfs
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: Immediate
---
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: ssd-replicated
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3
  deviceClass: ssd
  parameters:
    compression_mode: aggressive
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rbd-ssd
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  clusterID: rook-ceph
  pool: ssd-replicated
  imageFormat: "2"
  imageFeatures: layering,exclusive-lock,object-map,fast-diff
  csi.storage.k8s.io/fstype: xfs
reclaimPolicy: Retain
allowVolumeExpansion: true
EOF

# Verify cluster health
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd tree
, Free to use, share it in your presentations, blogs, or learning materials.
Ceph storage cluster architecture showing 3 nodes with NVMe and SSD OSDs, MON and MGR daemons, Rook operator, and Kubernetes PVC mappings for orderer, peer, and CouchDB workloads
Rook-Ceph cluster architecture with device class separation providing tiered block storage for Kubernetes blockchain nodes

The Ceph cluster above uses three storage nodes, each with two NVMe drives and one SSD, providing 46 TB raw capacity and 15.3 TB usable capacity with 3x replication. The Rook operator manages the lifecycle of OSDs, MONs, and MGRs, while the CSI driver provisions persistent volumes dynamically. Two storage classes separate NVMe-backed pools (for orderer WAL and CouchDB) from SSD-backed pools (for peer ledger blocks). The reclaimPolicy: Retain setting ensures PVCs are never accidentally deleted, and allowVolumeExpansion: true enables online volume growth as blockchain data accumulates.

Storage Snapshot Pipeline for Blockchain Backup

Blockchain backup requires crash-consistent snapshots that capture the ledger and state database at the same point in time. The most common approach is freeze-snap-unfreeze: briefly pause filesystem I/O (50ms), create a copy-on-write snapshot (10ms), then resume operations. The snapshot can then be transferred to backup storage at leisure without impacting the running node.

# Crash-consistent snapshot script for blockchain peer
#!/bin/bash
# peer-snapshot.sh - Creates consistent snapshots of peer data + CouchDB

set -euo pipefail

PEER_MOUNT="/var/hyperledger/production"
COUCH_MOUNT="/opt/couchdb/data"
SNAP_DATE=$(date +%Y%m%d_%H%M%S)
RETAIN_COUNT=3

echo "[$(date -u)] Starting blockchain snapshot..."

# Step 1: Freeze both filesystems atomically
echo "[$(date -u)] Freezing filesystems..."
fsfreeze --freeze "$PEER_MOUNT"
fsfreeze --freeze "$COUCH_MOUNT"

# Step 2: Create LVM snapshots (COW, instant)
echo "[$(date -u)] Creating LVM snapshots..."
lvcreate -s -n peer-snap-${SNAP_DATE} -L 50G /dev/vg_peer/lv_ledger
lvcreate -s -n couch-snap-${SNAP_DATE} -L 30G /dev/vg_couch/lv_data

# Step 3: Unfreeze immediately
echo "[$(date -u)] Unfreezing filesystems..."
fsfreeze --unfreeze "$COUCH_MOUNT"
fsfreeze --unfreeze "$PEER_MOUNT"

FREEZE_DURATION_MS=$(($(date +%s%N)/1000000 - START_MS))
echo "[$(date -u)] I/O freeze duration: ~60ms"

# Step 4: Mount snapshots and transfer to S3 (background)
mkdir -p /mnt/snap-peer /mnt/snap-couch
mount -o ro /dev/vg_peer/peer-snap-${SNAP_DATE} /mnt/snap-peer
mount -o ro /dev/vg_couch/couch-snap-${SNAP_DATE} /mnt/snap-couch

echo "[$(date -u)] Transferring to S3..."
tar cf - -C /mnt/snap-peer . | pigz -6 | \
  aws s3 cp - s3://vaultchain-backup/peer/${SNAP_DATE}/ledger.tar.gz \
  --expected-size $((200*1024*1024*1024))

tar cf - -C /mnt/snap-couch . | pigz -6 | \
  aws s3 cp - s3://vaultchain-backup/peer/${SNAP_DATE}/couchdb.tar.gz \
  --expected-size $((100*1024*1024*1024))

# Step 5: Cleanup
umount /mnt/snap-peer /mnt/snap-couch
echo "[$(date -u)] Cleaning old snapshots (keeping last ${RETAIN_COUNT})..."
lvs --noheadings -o lv_name vg_peer | grep "peer-snap" | sort | head -n -${RETAIN_COUNT} | \
  xargs -I{} lvremove -f vg_peer/{}
lvs --noheadings -o lv_name vg_couch | grep "couch-snap" | sort | head -n -${RETAIN_COUNT} | \
  xargs -I{} lvremove -f vg_couch/{}

echo "[$(date -u)] Snapshot complete. Uploaded to S3."
, Free to use, share it in your presentations, blogs, or learning materials.
Storage snapshot pipeline showing freeze-snap-unfreeze sequence with LVM, Ceph RBD, and Velero CSI approaches including performance metrics for each method
Three snapshot approaches compared: LVM (bare-metal), Ceph RBD (distributed), and Velero CSI (Kubernetes native) with performance metrics

The diagram compares three snapshot approaches. LVM snapshots work well for bare-metal deployments with 10ms creation time but 5-15% I/O overhead from copy-on-write. Ceph RBD snapshots are the best choice for Kubernetes deployments, creating cluster-side COW snapshots in 5ms with less than 1% I/O impact. Velero CSI snapshots add a Kubernetes-native backup layer that can restore entire namespaces (pods + PVCs + secrets) from S3, making it ideal for full disaster recovery. VaultChain uses Ceph RBD snapshots for hourly operational backups and Velero for daily off-site disaster recovery.

Database Compaction and Pruning Lifecycle

Without regular maintenance, blockchain storage databases accumulate fragmentation and tombstones that degrade query performance and waste disk space. CouchDB stores deleted document tombstones until compaction, RocksDB accumulates stale SSTable files across compaction levels, and Besu’s state trie retains historical state revisions that are no longer needed.

# CouchDB compaction automation
#!/bin/bash
# couchdb-compact.sh - Compact all blockchain databases

COUCH_URL="http://admin:${COUCHDB_PASSWORD}@localhost:5984"

# List all databases (channels + system DBs)
DBS=$(curl -s "${COUCH_URL}/_all_dbs" | jq -r '.[]')

for db in $DBS; do
  echo "[$(date -u)] Compacting database: ${db}"

  # Check current disk size
  SIZE_BEFORE=$(curl -s "${COUCH_URL}/${db}" | jq '.disk_size')
  echo "  Size before: $(echo "scale=2; ${SIZE_BEFORE}/1073741824" | bc) GB"

  # Trigger compaction
  curl -s -X POST "${COUCH_URL}/${db}/_compact" \
    -H "Content-Type: application/json"

  # Compact views
  VIEWS=$(curl -s "${COUCH_URL}/${db}/_all_docs?startkey=%22_design%2F%22&endkey=%22_design0%22" | \
    jq -r '.rows[].id' | sed 's/_design\///')

  for view in $VIEWS; do
    curl -s -X POST "${COUCH_URL}/${db}/_compact/${view}" \
      -H "Content-Type: application/json"
    echo "  Compacted view: ${view}"
  done

  # Wait for compaction to finish
  while true; do
    COMPACTING=$(curl -s "${COUCH_URL}/${db}" | jq '.compact_running')
    if [ "$COMPACTING" = "false" ]; then break; fi
    sleep 10
  done

  SIZE_AFTER=$(curl -s "${COUCH_URL}/${db}" | jq '.disk_size')
  SAVED=$(echo "scale=2; ($SIZE_BEFORE - $SIZE_AFTER)/1073741824" | bc)
  echo "  Size after: $(echo "scale=2; ${SIZE_AFTER}/1073741824" | bc) GB (saved ${SAVED} GB)"
done

# Schedule via cron (weekly, off-peak)
# 0 2 * * 0 /opt/scripts/couchdb-compact.sh >> /var/log/couchdb-compact.log 2>&1

# Besu trie-log pruning (requires maintenance window)
# WARNING: Node must be stopped during trie-log prune
echo "=== Besu Trie-Log Prune ==="
echo "1. Stop Besu: systemctl stop besu"
echo "2. Run: besu --data-path=/var/lib/besu/data storage trie-log prune --blocks=512"
echo "3. Start Besu: systemctl start besu"
echo "Alternative: Use --data-storage-format=BONSAI for live pruning (Besu 23.1+)"
, Free to use, share it in your presentations, blogs, or learning materials.
Database compaction and pruning lifecycle showing CouchDB compaction, Raft WAL pruning, Besu state trie pruning, and LevelDB/RocksDB automatic compaction processes
Maintenance lifecycle for CouchDB compaction, Raft WAL pruning, Besu state trie pruning, and RocksDB auto-compaction

As shown above, each database has different compaction characteristics. CouchDB compaction runs online (no downtime) but requires 1.5x the current database size as temporary free space, and it reclaims approximately 30% of disk space from tombstones and fragmentation. Raft WAL pruning is fully automatic, triggered by the SnapshotIntervalSize configuration. Besu state trie pruning reclaims the most space (up to 80%) but requires the node to be stopped during the process. RocksDB handles compaction automatically in background threads, with tunable parameters for write buffer sizes and compression algorithms at different levels.

Storage Monitoring and Alerting

Proactive storage monitoring prevents the worst-case scenario: a node running out of disk space during peak transaction processing. Set up Prometheus alerts for disk usage thresholds, IOPS saturation, and write latency spikes that indicate storage degradation.

# Prometheus alert rules for blockchain storage
cat <<EOF > /etc/prometheus/rules/storage-alerts.yml
groups:
  - name: blockchain_storage
    rules:
      - alert: DiskSpaceCritical
        expr: |
          (node_filesystem_avail_bytes{mountpoint=~"/var/hyperledger.*|/opt/couchdb.*"}
          / node_filesystem_size_bytes{mountpoint=~"/var/hyperledger.*|/opt/couchdb.*"}) < 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Blockchain storage < 10% free on {{ \$labels.instance }}:{{ \$labels.mountpoint }}"

      - alert: DiskSpaceWarning
        expr: |
          (node_filesystem_avail_bytes{mountpoint=~"/var/hyperledger.*|/opt/couchdb.*"}
          / node_filesystem_size_bytes{mountpoint=~"/var/hyperledger.*|/opt/couchdb.*"}) < 0.2
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Blockchain storage < 20% free on {{ \$labels.instance }}:{{ \$labels.mountpoint }}"

      - alert: NVMeWriteLatencyHigh
        expr: |
          rate(node_disk_write_time_seconds_total{device=~"nvme.*"}[5m])
          / rate(node_disk_writes_completed_total{device=~"nvme.*"}[5m]) > 0.001
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "NVMe write latency > 1ms on {{ \$labels.instance }} ({{ \$labels.device }})"

      - alert: IOPSSaturation
        expr: |
          rate(node_disk_io_time_seconds_total{device=~"nvme.*|sd.*"}[5m]) > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Disk I/O utilization > 90% on {{ \$labels.instance }} ({{ \$labels.device }})"

      - alert: CephPoolNearFull
        expr: ceph_pool_stored_raw / ceph_pool_max_avail > 0.8
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Ceph pool {{ \$labels.pool_id }} > 80% capacity"

      - alert: CephOSDDown
        expr: ceph_osd_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Ceph OSD {{ \$labels.ceph_daemon }} is down"
EOF

promtool check rules /etc/prometheus/rules/storage-alerts.yml
systemctl reload prometheus

These alerts cover the critical storage failure modes: disk space exhaustion below 10% (critical) and 20% (warning), NVMe write latency exceeding 1ms (indicating drive degradation or overload), I/O utilization above 90% (indicating the need to scale or tier), Ceph pool capacity above 80%, and OSD failures. The latency alert is particularly important for orderer nodes where NVMe write performance directly impacts Raft consensus speed and transaction throughput.

Summary

Storage architecture for production blockchain networks requires matching each component’s I/O characteristics to the right storage tier. NVMe is essential for consensus write-ahead logs and state database indexes where sub-millisecond latency directly impacts transaction throughput. SATA SSD provides the right balance of cost and performance for append-only ledger blocks and monitoring data. HDD and object storage handle backup and archival workloads at minimal cost.

VaultChain Logistics demonstrates that intelligent storage tiering reduces monthly costs by 77% (from $2,125 to $485) while maintaining the performance needed for 500 TPS processing. Combined with proper partition layout (separate mount points per workload), automated snapshot pipelines (LVM or Ceph RBD), and scheduled compaction (CouchDB weekly, RocksDB automatic), this architecture ensures blockchain nodes never run out of disk space, never lose data, and never bottleneck on storage I/O.