Subqueries
While there are many different types of subqueries, this document so far is just highlighting some common scenarios that may not have obvious implementations in superdb.
Correlated Subqueries
Let’s start with this simple dataset:
{"id":1, "date":"2025-02-27", "foo": 3}
{"id":2, "date":"2025-02-27", "foo": 2}
{"id":3, "date":"2025-02-28", "foo": 5}
{"id":4, "date":"2025-02-28", "foo": 9}
And we want to select the entries with the largest foo for each date.
One way to do this in SQL looks like this:
select id, date, foo
from data
where (date, foo) in
(select date, max(foo) as max_foo
from data
group by date);
Another way, by joining a derived table:
select *
from data d
join
(select date, max(foo) as max_foo
from data
group by date) max_foo
on d.date = max_foo.date and
d.foo = max_foo.max_foo;
In super this can be done with piped operators and a Lateral Subquery:
{"id":1, "date":"2025-02-27", "foo": 3}
{"id":2, "date":"2025-02-27", "foo": 2}
{"id":3, "date":"2025-02-28", "foo": 5}
{"id":4, "date":"2025-02-28", "foo": 9}
Here’s an example using the piped where operator with unnest ... into:
super -s -c '
collect(this)
| {data: this}
| maxes:=[unnest this.data | foo:=max(foo) by date | values {date,foo}]
| unnest {this.maxes, this.data} into (
where {this.data.date, this.data.foo} in this.maxes
| values this.data
)' data.json
{id:1,date:"2025-02-27",foo:3}
{id:4,date:"2025-02-28",foo:9}
And a simpler example using the piped join operator with from:
super -s -c '
from data.json
| inner join (from data.json
| foo:=max(foo) by date
| values {date,foo})
on {left.date,left.foo}={right.date,right.foo}
| values left
| sort id'
{id:1,date:"2025-02-27",foo:3}
{id:4,date:"2025-02-28",foo:9}
super also supports SQL syntax, and these subqueries work1:
super -s -c '
select *
from "data.json"
where foo in (select max(foo), date
from "data.json"
group by date) '
{id:1,date:"2025-02-27",foo:3}
{id:4,date:"2025-02-28",foo:9}
If we save off the max data to a file first, then we can start to see how this could look:
super -s -c '
select max(foo) as max_foo
from "data.json"
group by date' > max.sup
super -s -c '
select l.id, l.date, l.foo
from "data.json" l
join "max.sup" r
on l.foo==r.max_foo'
{id:1,date:"2025-02-27",foo:3}
{id:4,date:"2025-02-28",foo:9}
Subquery with Related Data Join
A more realistic scenario: find the records with the top score per date, and also pull in user information from a related table.
{"id":1, "date":"2025-02-27", "score": 3, "user_id": 101}
{"id":2, "date":"2025-02-27", "score": 2, "user_id": 102}
{"id":3, "date":"2025-02-28", "score": 5, "user_id": 101}
{"id":4, "date":"2025-02-28", "score": 9, "user_id": 103}
{"user_id": 101, "name": "Moxie"}
{"user_id": 102, "name": "Ziggy"}
{"user_id": 103, "name": "Sprocket"}
First, the basic join returns all records with user names:
super -s -c '
select s.id, s.date, s.score, s.user_id, u.name
from "scores.json" s
join "users.json" u on s.user_id = u.user_id
order by s.id'
{id:1,date:"2025-02-27",score:3,user_id:101,name:"Moxie"}
{id:2,date:"2025-02-27",score:2,user_id:102,name:"Ziggy"}
{id:3,date:"2025-02-28",score:5,user_id:101,name:"Moxie"}
{id:4,date:"2025-02-28",score:9,user_id:103,name:"Sprocket"}
Filtering to top scores per date using a subquery:
super -s -c '
select *
from "scores.json"
where score in (select max(score), date
from "scores.json"
group by date)'
{id:1,date:"2025-02-27",score:3,user_id:101}
{id:4,date:"2025-02-28",score:9,user_id:103}
The “obvious” SQL approach with tuple comparison returns empty — this is a known issue (#6326):
super -s -c '
select s.id, s.date, s.score, s.user_id, u.name
from "scores.json" s
join "users.json" u on s.user_id = u.user_id
where (s.date, s.score) in (
select date, max(score)
from "scores.json"
group by date)'
A derived table approach (subquery in FROM) does work:
super -s -c '
select s.id, s.date, s.score, s.user_id, u.name
from "scores.json" s
join (
select date, max(score) as max_score
from "scores.json"
group by date
) m on s.date = m.date and s.score = m.max_score
join "users.json" u on s.user_id = u.user_id
order by s.id'
{id:1,date:"2025-02-27",score:3,user_id:101,name:"Moxie"}
{id:4,date:"2025-02-28",score:9,user_id:103,name:"Sprocket"}
The piped approach also works — filter first, then join to get usernames:
super -s -c '
from "scores.json"
| where score in (select max(score), date from "scores.json" group by date)
| inner join (from "users.json") on left.user_id=right.user_id
| select left.id, left.date, left.score, right.name
| sort id'
{id:1,date:"2025-02-27",score:3,name:"Moxie"}
{id:4,date:"2025-02-28",score:9,name:"Sprocket"}
as of versions
super --version
Version: v0.2.0
-
SQL subqueries that reference files re-read the file for each subquery, which increases CPU usage and wall time compared to the piped approach. ↩