🤖 AI Summary
This post describes a practical technique to make selective, atomic deletes inside Parquet files stored on S3 without downloading and rewriting unchanged bytes. Instead of tombstones, visibility is controlled by a tiny CAS-updated manifest that points to immutable Parquet objects. To delete rows inside a single row group, the client range-reads only the footer and the target row group, decodes and re-encodes the group (updating stats and row counts), then uses S3 Multipart Upload with UploadPartCopy to assemble a new object: server-side copy(prefix) → upload(edited row group) → copy(suffix) → upload(new footer+PAR1) and CompleteMultipartUpload. Readers pin to the manifest/ETag so scans see a coherent snapshot; the manifest CAS flips visibility atomically.
Key technical implications: this localizes data movement (no egress for copied bytes), keeps Parquet layout intact by using footer offsets (rg_start, rg_end, footer_start) and shifting subsequent group offsets by delta = new_rg_size−old_rg_size, and preserves scan performance by packing files ~128–256 MiB with ~8–16 MiB row groups. Operational caveats include S3 MPU rules (non-last parts ≥ 5 MiB; CopySourceRange is inclusive), treating MPU ETags as version tokens (not MD5), assembling to a new key and aborting failed MPUs, and regenerating footer/index/column stats correctly. Garbage collection of orphaned versions and batching edits per row group are recommended to bound latency and cost.
Loading comments...
login to comment
loading comments...
no comments yet