Show HN: Streambed – Stream Postgres to Iceberg on S3, Supports Postgres Wire

TL;DR

Streambed is a new tool that streams Postgres write-ahead logs directly to Iceberg tables on S3, allowing analytical queries without changing applications. It supports the Postgres wire protocol, making integration straightforward. This development aims to offload analytics from production databases efficiently.

Streambed has been introduced as a new open-source tool that streams Postgres write-ahead log (WAL) changes directly into Iceberg tables stored on S3, while supporting the Postgres wire protocol for querying. This development allows users to offload analytical workloads from their production Postgres databases without modifying their existing applications, which is a significant advancement in data lake management and real-time analytics.

Streambed connects to a Postgres database as a logical replication subscriber, decoding WAL messages such as inserts, updates, and deletes. It buffers these changes and periodically writes them as Parquet files to an S3 bucket, simultaneously updating Iceberg metadata to reflect the latest state.

The system supports updates and deletes through copy-on-write merging, ensuring data consistency. It also includes a built-in query server that exposes Iceberg tables via the Postgres wire protocol, allowing users to query the data with standard Postgres clients like psql without needing additional tools or ETL processes.

Setup involves running a Docker container for Postgres and MinIO, building the Go-based Streambed binary, and configuring synchronization commands. The architecture leverages decoding WAL, buffering, and writing Parquet files, with the query server enabling seamless integration with existing Postgres workflows.

Why It Matters

This development is significant because it simplifies the process of offloading analytical workloads from production databases, reducing load and potential performance impacts. By supporting the Postgres wire protocol, it enables users to query their data directly with familiar tools, bridging operational databases and data lakes efficiently. It also eliminates the need for complex ETL pipelines or Spark-based processing, potentially lowering costs and complexity for data teams.

Amazon

PostgreSQL to S3 data lake connector

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Traditional data lakes often rely on batch ETL or Spark-based streaming to move data from transactional databases to analytical platforms, which can introduce latency and complexity. Recent efforts have focused on real-time CDC solutions, but many require significant setup or change to existing systems. Streambed builds on these trends by providing a lightweight, open-source alternative that integrates directly with Postgres, leveraging logical replication and Iceberg for scalable, queryable storage on S3. Its introduction follows ongoing industry interest in simplifying real-time analytics and reducing infrastructure overhead.

“Streambed streams WAL changes via logical replication, writes Parquet files to S3, and commits Iceberg metadata, supporting Postgres wire protocol for querying.”

— Viggy28 (Hacker News user)

“It allows offloading analytical queries from production databases without changing your application, using just Postgres + S3.”

— Streambed developer

Amazon

Iceberg table management tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

Details about performance at scale, handling of complex transactions, and long-term stability are still emerging. It is not yet clear how well Streambed performs under heavy workloads or how it manages schema changes over time. Additionally, adoption and real-world testing are ongoing, so some features may evolve.

Amazon

Postgres wire protocol compatible client

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include broader testing in production environments, performance benchmarking, and potential feature additions such as support for more complex schema evolution or enhanced query capabilities. The developers also plan to improve documentation and ease of deployment for wider adoption.

Amazon

real-time data streaming tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

How does Streambed compare to existing CDC tools?

Streambed offers a lightweight, open-source alternative that streams WAL directly to Iceberg on S3, supporting the Postgres wire protocol, which many CDC tools do not provide natively.

Can I query the data immediately after streaming?

Yes, the built-in query server exposes Iceberg tables over the Postgres wire protocol, allowing immediate querying with standard Postgres clients.

Does using Streambed impact my production database performance?

Since it uses logical replication to stream WAL changes asynchronously, it is designed to minimize impact on the primary database, but real-world performance depends on workload specifics.

Is Streambed suitable for all workloads?

It is best suited for scenarios where real-time analytics and offloading are priorities. Complex transactional systems may require additional testing to ensure compatibility and performance.

What are the prerequisites for deploying Streambed?

It requires Go 1.22+, CGO support, Docker for testing, and a Postgres database with logical replication enabled.

Source: Hacker News

You May Also Like

Different Game, or Already Lost? Reading Mistral’s Sovereignty Bet

Mistral used its Paris AI Now Summit to pitch a full-stack, sovereign AI strategy, raising questions about its place in the frontier race.

Build vs Buy a Prebuilt AI Workstation

Struggling to decide between building or buying your AI workstation? Discover the real costs, benefits, and when each option makes sense today.

Understanding Anthropic’s $965B Series H: The Compute Revolution

Anthropic says its Series H will expand Claude compute, with chip partners and multi-gigawatt capacity central to the round.

Mac vs GPU Tower for Local LLMs: The Heat-and-Noise Tradeoff

Thorsten Meyer AI compares Apple Silicon and GPU towers for local LLMs, focusing on heat, noise, speed and model capacity.