Updating Data in a Lake

There are plans to eventually support this, captured in this GitHub Issue #4024. But for now, we’ll have to fudge it.

All we can do for now are separate delete and load actions. It’s safer to do a load-then-delete, in case the delete fails, we’ll at least have duplicated data, vs. no data at all in the case of failure during a delete-then-load.

Since we have unstructured data, we can attempt to track the …

load: {id:4,foo:1,ts:time(‘2025-02-18T01:00:00’)} load: {id:4,foo:2,ts:time(‘2025-02-18T02:00:00’)} delete: -where ‘id==4 and ts < 2025-02-18T02:00:00’

if it’s typed data

delete: -where ‘is() ...'

if we need to double-check duplicate data:

‘count() by is()'