Flattening JSON(b) in Postgres (2022) (ellisvalentiner.com)

0 points 2 hours ago ago | visit original

🤖 AI Summary

Postgres provides built-in JSON functions that make it straightforward to normalize nested JSON/JSONB right in the database — a big win for data analysts, data scientists, and ML engineers who often encounter denormalized JSON columns. Using json_to_record/jsonb_to_record you can turn a JSON object into a relational record (columns), and using json_to_recordset/jsonb_to_recordset you can expand a JSON array into multiple rows. This avoids heavy external ETL, keeps processing close to the data, and simplifies reproducible feature engineering and querying for downstream modeling. Key technical points: both functions are applied directly to columns (not just string literals) and require an AS clause with a composite type that maps JSON keys (case-sensitive) to Postgres data types (e.g., coord(lat numeric, lng numeric)). Extra keys in the JSON are ignored; keys declared but missing in the JSON become NULL. Use jsonb_to_record for objects, jsonb_to_recordset for arrays, and compose them (e.g., recordset → record) to flatten nested structures — for example, expand a countries table’s cities array into one row per city and then extract nested coordinates into lat/lng columns. This pattern yields clean, queryable tables for exploration and model input without leaving the database.

Loading comments...

loading comments...