Working in SQL with date ranges

Setup

There is two tables with following columns: id, start_at and end_at. The first table contains 11M rows, the second one almost 400M.

Solution #1 — Naive

Theoretically the problem can be reduced to discreet values and the grouping may be applied. If we know all the timestamps when the value may change (when it starts or ends) we can build a timeline to use for evaluation using between operator. The performance depends on number of rows and the actual concurrency level.

Solution #2 — Windowing

The next iteration was to create kind of stream of events with a balance (1 if started and -1 if stopped) and apply accumulative sum to aggregate it. It worked quite performant, which is confirmed in the query plan.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adrian Macal

Adrian Macal

Software Developer, Data Engineer with solid knowledge of Business Intelligence. Passionate about programming.