Best Practices for structuring Query Abstractions with SQL / jOOQ

38 views
Skip to first unread message

Bernd Huber

unread,
Oct 3, 2025, 11:07:38 AMOct 3
to jOOQ User Group
Hello guys,

jOOQ provides great power, and with great power comes great responsibility :)

I happily implemented a Repository Abstraction for my Team to use,  but it currently shows it's limits that result in bad performance, which is mainly resulting from my inability to understand specific performance implications, when writing SQL for my target database (mariadb).

I don't have a specific question but seek a bit for a guide / cookbook for writing performant SQL (via jOOQ DSL) for the business use case of a Repository with remote-paginated query-functionality (filter, sort, keyset pagination, ...).

My initial abstraction expected the users to write everything in one big query, that uses multisets and joins. The joins are needed for filtering etc, and the multisets to get the complex results. For example:

Repository Query Abstraction - Part 1 (Performance Problems)

@Dependent public class TopicRepository extends AbstractRepository<TopicDTO, UUID> { @Inject TopicMapper topicMapper; @Inject TopicLangMapper topicLangMapper; @Inject TopicTagLangMapper topicTagLangMapper; @Inject LabelMapper labelMapper; @Inject LabelLangMapper labelLangMapper; @Inject UserTopicAnswerMapper userTopicAnswerMapper; @Inject UserTopicAnswerLangMapper userTopicAnswerLangMapper; private boolean onlyUntagged = false; // Default constructor for CDI public TopicRepository() { super(TOPIC.TOPICID); } public TopicRepository onlyUntagged(boolean onlyUntagged) { this.onlyUntagged = onlyUntagged; return this; } @Override protected SelectFinalStep<Record1<TopicDTO>> prepareQuery(XQuery query) throws InvalidDataException { final QueryJooqMapper queryJooqMapper = new QueryJooqMapper(query, TOPIC) .addMappableFields(TOPIC) .addMappableFields(TOPICLANG) .addMappableFields(TOPICLABEL) .addMappableFields(LABEL) .addMappableFields(LABELLANG) .addMappableFields(USER) .addMappableFields(TOPICTAGLANG) .addMappableFields(USERTOPICANSWER) .addMappableFields(USERTOPICANSWERLANG); UUID userId = null; if (this.request() instanceof UserRequestContext) { userId = ((UserRequestContext) this.request()).getUserId(); } final Condition onlyUntaggedCondition = TOPIC.TAGSGENERATEDAT.isNull(); return dsl() .select( row( TOPIC, multiset( select( LABEL, multiset( select(LABELLANG) .from(LABELLANG) .where(LABELLANG.LABELID.eq(LABEL.LABELID)) ).convertFrom(r -> r.map(rec -> labelLangMapper.map(rec))) ).from(TOPICLABEL) .join(LABEL).on(LABEL.LABELID.eq(TOPICLABEL.LABELID)) .where(TOPICLABEL.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetLabels") .convertFrom(r -> r.map(rec -> labelMapper.map(rec))), multiset( select(TOPICTAGLANG) .from(TOPICTAGLANG) .where(TOPICTAGLANG.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetTags") .convertFrom(r -> r.map(rec -> topicTagLangMapper.map(rec))), multiset( select(TOPICLANG) .from(TOPICLANG) .where(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetLangs") .convertFrom(r -> r.map(rec -> topicLangMapper.map(rec))), row( USERTOPICANSWER, multiset( select(USERTOPICANSWERLANG) .from(USERTOPICANSWERLANG) .where(USERTOPICANSWERLANG.ANSWERID.eq(USERTOPICANSWER.ANSWERID)) ).convertFrom(r -> r.map(rec -> userTopicAnswerLangMapper.map(rec))) ).convertFrom(r -> userTopicAnswerMapper.mapOrNull(r)) ).convertFrom(rec -> topicMapper.map(rec)) ) .from(TOPIC) .join(TOPICLANG).on(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)) .leftJoin(USERTOPICANSWER).on(USERTOPICANSWER.TOPICID.eq(TOPIC.TOPICID).and(USERTOPICANSWER.USERID.eq(userId))) .where(queryJooqMapper.getFilters()) .and(onlyUntagged ? onlyUntaggedCondition : DSL.trueCondition()) .groupBy(TOPIC.TOPICID) .orderBy(queryJooqMapper.getSorter()) .offset(queryJooqMapper.getOffset()) .limit(queryJooqMapper.getLimit()); } }

After testing a bit, i found that following two points are mainly resulting in performance problems:
- many unbound joins (meaning that a join is made without further filtering on the joined table)
- many multisets that mariadb does not seem to apply at the very end on the already filtered results (unclear)

Repository Query Abstraction - Part 2 (Splitting the Repository in Two)

I thought about how to solve this, and it seems that a split in Two results in a overall better splitting that my development team could work with without needing to be an expert. For example:

First Part: Only execute query to get a list of Ids (Page)
@Dependent @Data @Accessors(chain = true) @EqualsAndHashCode(callSuper = true) public class TopicSelector extends AbstractSelector<TopicRecord, UUID> { boolean onlyUntagged; UUID userId; /** * Create a selector for topic. * * @param dsl dsl * @param query query * @param onlyUntagged onlyUntagged * @param userId userId * @return selector */ public static TopicSelector create(DSLContext dsl, XQuery query, boolean onlyUntagged, UUID userId) { return SelectorFactory.create(TopicSelector.class, dsl, query, TOPIC, TOPIC.TOPICID, DOR.getTables()) .setOnlyUntagged(onlyUntagged) .setUserId(userId); } @Override protected Map<String, Field<?>> configQueryFields() { return Map.ofEntries( entry("title", TOPICLANG.TITLE), entry("langCode", TOPICLANG.LANGCODE), entry("labels", TOPICLABEL.LABELID), entry("answered", USERTOPICANSWER.ANSWERID) ); } @Override protected List<Join> configJoins() { return List.of( join(TOPICLANG).on(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)), leftJoin(TOPICLABEL).on(TOPICLABEL.TOPICID.eq(TOPIC.TOPICID)), leftJoin(USERTOPICANSWER).on(USERTOPICANSWER.TOPICID.eq(TOPIC.TOPICID) .and(USERTOPICANSWER.USERID.eq(userId))) ); } @Override protected List<Condition> configConditions() { return List.of(onlyUntagged ? TOPIC.TAGSGENERATEDAT.isNull() : DSL.trueCondition()); } }

The result of this Selector will be a List of Ids of the Main-Table that the Selector works on (here: TOPIC), filtered and sorted by a given "query" object that contains query-fields which are mapped to the corresponding fields (TOPICLANG.TITLE, TOPICLANG.LANGCODE, ...), and are applied on the configured joins with the configured additional conditions.

Second Part: Getting hydrated/extended DTOs for the given List of Ids (Result of Part1)
@Dependent @Data @Accessors(chain = true) @EqualsAndHashCode(callSuper = true) public class TopicRepository extends AbstractRepositoryV2<TopicDTO, TopicRecord, UUID> { @Inject TopicLabelListLoader topicLabelListLoader; @Inject TopicLangListLoader topicLangListLoader; @Inject TopicTagLangListLoader topicTagLangListLoader; @Inject UserTopicAnswerLoader userTopicAnswerLoader; /** * Create a repository for topic. * * @param dsl dsl * @return repository */ public static TopicRepository create(DSLContext dsl) { return RepositoryV2Factory.create(TopicRepository.class, dsl, TOPIC, TOPIC.TOPICID, TopicMapper.class); } @Override protected List<TopicDTO> expandResult(List<TopicDTO> results, List<UUID> ids) { var topicLabels = topicLabelListLoader.load(dsl, ids); var topicTagLangs = topicTagLangListLoader.load(dsl, ids); var topicLangs = topicLangListLoader.load(dsl, ids); var topicAnswers = userTopicAnswerLoader.load(dsl, ids, this.userId); return results.stream().map(topic -> topic .setLabels(list(topicLabels, topic.getTopicId())) .setTagLangs(list(topicTagLangs, topic.getTopicId())) .setLangs(list(topicLangs, topic.getTopicId())) .setAnswer(one(topicAnswers, topic.getTopicId()))).toList(); } } @ApplicationScoped public class TopicLabelListLoader implements ListLoader<UUID, LabelDTO> { @Inject LabelMapper labelMapper; @Inject LabelLangMapper labelLangMapper; @Override public Map<UUID, List<LabelDTO>> load(DSLContext dsl, List<UUID> ids) { return dsl.select( LABEL.LABELID, LABEL, multiset( select(LABELLANG) .from(LABELLANG) .where(LABELLANG.LABELID.eq(LABEL.LABELID)) ).convertFrom(r -> r.map(rec -> labelLangMapper.map(rec))) ).from(TOPICLABEL) .join(LABEL).on(LABEL.LABELID.eq(TOPICLABEL.LABELID)) .where(TOPICLABEL.TOPICID.in(ids)) .fetchGroups(LABEL.LABELID, labelMapper::map); } } @ApplicationScoped public class TopicLangListLoader implements ListLoader<UUID, TopicLangDTO> { @Inject TopicLangMapper topicLangMapper; @Override public Map<UUID, List<TopicLangDTO>> load(DSLContext dsl, List<UUID> ids) { return dsl.select(TOPICLANG) .from(TOPICLANG) .where(TOPICLANG.TOPICID.in(ids)) .fetchGroups(TOPICLANG.TOPICID, topicLangMapper::map); } } @ApplicationScoped public class TopicTagLangListLoader implements ListLoader<UUID, TopicTagLangDTO> { @Inject TopicTagLangMapper topicTagLangMapper; @Override public Map<UUID, List<TopicTagLangDTO>> load(DSLContext dsl, List<UUID> ids) { return dsl.select(TOPICTAGLANG) .from(TOPICTAGLANG) .where(TOPICTAGLANG.TOPICID.in(ids)) .fetchGroups(TOPICTAGLANG.TOPICID, topicTagLangMapper::map); } } @ApplicationScoped public class UserTopicAnswerLoader { @Inject UserTopicAnswerMapper mapper; @Inject UserTopicAnswerLangMapper userTopicAnswerLangMapper; public Map<UUID, UserTopicAnswerDTO> load(DSLContext dsl, List<UUID> ids, UUID userId) { return dsl.select( USERTOPICANSWER.TOPICID, USERTOPICANSWER, multiset( select(USERTOPICANSWERLANG) .from(USERTOPICANSWERLANG) .where(USERTOPICANSWERLANG.ANSWERID.eq(USERTOPICANSWER.ANSWERID)) ).convertFrom(r -> r.map(rec -> userTopicAnswerLangMapper.map(rec))) ).from(USERTOPICANSWER) .where(USERTOPICANSWER.TOPICID.in(ids)) .and(USERTOPICANSWER.USERID.eq(userId)) .fetchMap(USERTOPICANSWER.TOPICID, mapper::mapOrNull); } }

The Repository now is only responsible for the Fetching of nested DTOs by the given list of Ids that the Selector has found. Each multiset has been split into a separate call to the database via a separate "Loader"-class that encapsulates it for reusability.

As you guys can see, while Part1 had it's performance problems it was really nice that everything was in one big code-block and everything! the filtering, sorting and result-generation was outsourced to the database, and only one call to the database was necessary to execute it.

Part2 now is much more focused on splitting the code into separate fragments and needs to interact more times with the database. First: Loading Ids from the database, Second: Calling the database for each involved multiset seperately with additional calls.

Just wanted to share this story of my architecture roundtrips in the group. Maybe someone has an idea or a tip if there is something i have completely missed.

But yeah, i love that jOOQ provides the ability to build own abstractions, but it is often not easy to forsee if an architecture that you write on your own is performance-safe when the number of joins and multisets are growing over time. The example i presented here would be no problem. But in my job we join much more tables, and also need much more results. I also thought about trying out Views but somehow never gave it a chance until now.

Lukas Eder

unread,
Oct 3, 2025, 11:34:44 AMOct 3
to jooq...@googlegroups.com
Perhaps, you should have asked, before refactoring ;) This isn't a software design question, but just a simple mistake

On Fri, Oct 3, 2025 at 5:07 PM 'Bernd Huber' via jOOQ User Group <jooq...@googlegroups.com> wrote:
Hello guys,

jOOQ provides great power, and with great power comes great responsibility :)

I happily implemented a Repository Abstraction for my Team to use,  but it currently shows it's limits that result in bad performance, which is mainly resulting from my inability to understand specific performance implications, when writing SQL for my target database (mariadb).

I don't have a specific question but seek a bit for a guide / cookbook for writing performant SQL (via jOOQ DSL) for the business use case of a Repository with remote-paginated query-functionality (filter, sort, keyset pagination, ...).

My initial abstraction expected the users to write everything in one big query, that uses multisets and joins. The joins are needed for filtering etc, and the multisets to get the complex results. For example:

Repository Query Abstraction - Part 1 (Performance Problems)

@Dependent public class TopicRepository extends AbstractRepository<TopicDTO, UUID> { @Inject TopicMapper topicMapper; @Inject TopicLangMapper topicLangMapper; @Inject TopicTagLangMapper topicTagLangMapper; @Inject LabelMapper labelMapper; @Inject LabelLangMapper labelLangMapper; @Inject UserTopicAnswerMapper userTopicAnswerMapper; @Inject UserTopicAnswerLangMapper userTopicAnswerLangMapper; private boolean onlyUntagged = false; // Default constructor for CDI public TopicRepository() { super(TOPIC.TOPICID); } public TopicRepository onlyUntagged(boolean onlyUntagged) { this.onlyUntagged = onlyUntagged; return this; } @Override protected SelectFinalStep<Record1<TopicDTO>> prepareQuery(XQuery query) throws InvalidDataException { final QueryJooqMapper queryJooqMapper = new QueryJooqMapper(query, TOPIC) .addMappableFields(TOPIC) .addMappableFields(TOPICLANG) .addMappableFields(TOPICLABEL) .addMappableFields(LABEL) .addMappableFields(LABELLANG) .addMappableFields(USER) .addMappableFields(TOPICTAGLANG) .addMappableFields(USERTOPICANSWER) .addMappableFields(USERTOPICANSWERLANG); UUID userId = null; if (this.request() instanceof UserRequestContext) { userId = ((UserRequestContext) this.request()).getUserId(); } final Condition onlyUntaggedCondition = TOPIC.TAGSGENERATEDAT.isNull(); return dsl() .select( row( TOPIC, multiset( select( LABEL, multiset( select(LABELLANG) .from(LABELLANG) .where(LABELLANG.LABELID.eq(LABEL.LABELID)) ).convertFrom(r -> r.map(rec -> labelLangMapper.map(rec))) ).from(TOPICLABEL) .join(LABEL).on(LABEL.LABELID.eq(TOPICLABEL.LABELID)) .where(TOPICLABEL.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetLabels") .convertFrom(r -> r.map(rec -> labelMapper.map(rec))), multiset( select(TOPICTAGLANG) .from(TOPICTAGLANG) .where(TOPICTAGLANG.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetTags") .convertFrom(r -> r.map(rec -> topicTagLangMapper.map(rec))), multiset( select(TOPICLANG) .from(TOPICLANG) .where(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)) ).as("multisetLangs") .convertFrom(r -> r.map(rec -> topicLangMapper.map(rec))), row( USERTOPICANSWER, multiset( select(USERTOPICANSWERLANG) .from(USERTOPICANSWERLANG) .where(USERTOPICANSWERLANG.ANSWERID.eq(USERTOPICANSWER.ANSWERID)) ).convertFrom(r -> r.map(rec -> userTopicAnswerLangMapper.map(rec))) ).convertFrom(r -> userTopicAnswerMapper.mapOrNull(r)) ).convertFrom(rec -> topicMapper.map(rec)) ) .from(TOPIC) .join(TOPICLANG).on(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)) .leftJoin(USERTOPICANSWER).on(USERTOPICANSWER.TOPICID.eq(TOPIC.TOPICID).and(USERTOPICANSWER.USERID.eq(userId))) .where(queryJooqMapper.getFilters()) .and(onlyUntagged ? onlyUntaggedCondition : DSL.trueCondition()) .groupBy(TOPIC.TOPICID) .orderBy(queryJooqMapper.getSorter()) .offset(queryJooqMapper.getOffset()) .limit(queryJooqMapper.getLimit()); } }

After testing a bit, i found that following two points are mainly resulting in performance problems:
- many unbound joins (meaning that a join is made without further filtering on the joined table)
- many multisets that mariadb does not seem to apply at the very end on the already filtered results (unclear)

Why are you joining *and* collecting the child table in a subquery? Do either one thing or the other thing, but not both. When you look at any nested collection examples from jOOQ, you'll always find either one of these three approaches:

1. multiset() subqueries and (usually) no joins in the outermost query. If there are joins, they're typically to-one joins, not to-many joins
2. multsetAgg() on queries with only one to-many join path, not several branching paths
3. Collector based deduplicating and nesting of results on queries with only one to-many join path, not several branching paths

In general (not in this case), if you're joining only one path of child tables, joins and multiset are indeed alternatives, and I've explored performance characteristics here:

But you shouldn't really join two child tables to one parent table in SQL (independently of jOOQ), or you're getting a cartesian product between TOPICLANG and USERTOPICANSER. Try it with a simple example, assuming PostgreSQL:

with

topic (id) as (values (1),(2),(3)),

topic_lang (id, lang) as (values (1, 'en'), (1, 'de'), (2, 'en'), (3, 'de')),

topic_user (id, name) as (values (1, 'Jon'), (1, 'Doe'), (2, 'Jon'), (2, 'Doe'))

select *

from topic as t

join topic_lang as tl on t.id = tl.id

left join topic_user as tu on t.id = tu.id


This results in:

|id |id |lang|id |name|
|---|---|----|---|----|
|1  |1  |de  |1  |Jon |
|1  |1  |en  |1  |Jon |
|1  |1  |de  |1  |Doe |
|1  |1  |en  |1  |Doe |
|2  |2  |en  |2  |Jon |
|2  |2  |en  |2  |Doe |
|3  |3  |de  |   |    |


Can you see the cartesian product for topic 1? There are a total of 2x2 rows. 2 for the languages x 2 for the users. This can result in a ton of rows, which are all unnecessary, because you're correlating each of these relations again in multiset derived tables *on each row*!

I hope this helps,
Lukas

Bernd Huber

unread,
Oct 3, 2025, 2:21:32 PMOct 3
to jOOQ User Group
Hello Lukas, 

thank you for the throughout answer!

yes it was a mistake, i think i get the point now.
Thanks for the great performance blog entry as reference.

i will go with following plan for collecting data (i think i now got this)
- manually running multiple queries per nest level and matching results in the client (avoid duplicating results)
- possible to use multiset carefully per nest level to get results, where no large-ish data sets are used
- avoid joins that create cartesian product

i will go with following plan for filtering, sorting, paginating data on a main table with conditions on nested sub-tables (can be multiple nesting-levels)
- always group by main-table to avoid returning cartesian product
- never join tables that are not needed for filtering/sorting (only join those tables really necessary)
- always use indexes on fields that are critical for filtering
- return only a list of ids of the main-table or a stream of ids to avoid the situation you described ("Can you see the cartersian product for topic 1...")

For example (i still need to invest into seek-pagination instead of offset/limit):
// only select the id of the main-table SelectJoinStep<Record1<T>> selectFromStep = dsl() .select(idField) .from(table); // apply all joins that the concrete implementation needs for applying filters/sorters on sub-tables SelectJoinStep<Record1<T>> selectJoinedStep = Joins.apply(selectFromStep, configJoins()); // apply all filters from the given query and additional conditions of the concrete implementation List<Condition> conditions = new ArrayList<>(queryJooqMapper.getFilters()); conditions.addAll(configConditions()); SelectConditionStep<Record1<T>> selectConditionStep = selectJoinedStep.where(conditions); // group the result by the id of the main-table and apply order/offset/limit return selectConditionStep .groupBy(idField) .orderBy(queryJooqMapper.getSorter()) .offset(queryJooqMapper.getOffset()) .limit(queryJooqMapper.getLimit()) .fetch(idField);

thanks again for the input! 

Bernd Huber

unread,
Oct 4, 2025, 4:05:38 PMOct 4
to jOOQ User Group
i still think i need to learn / relearn many SQL concepts, because i seem to be missing too much knowledge to write a filter/sort/paginate abstraction.

For example:
- i always used joins for filtering the main-table based on conditions on a sub-table, even though i read about "EXISTS" today, which i can use directly in conditions, and which will not lead to a cartesian product like joins do, and will just do what i need (filter -> at least one rec in subtable matches condition). This also helps to totally avoid groupBy 
- for sorting the main-table based on a field in a sub-table, i may still need to join this sub-table, but there is also potential to avoid cartesian product, because i may find a way to sort without joining.

for example:

DSL.exists(
  DSL.selectOne().from(TOPICLANG)
     .where(TOPICLANG.TOPICID.eq(TOPIC.TOPICID))
     .and(TOPICLANG.LANGCODE.eq("en"))
     .and(TOPICLANG.TITLE.likeIgnoreCase("%" + text + "%"))
);

i think i will try to get myself into understanding how to filter/sort without joining tables explicitly. 
That should be a first good step for a better design.

Lukas Eder

unread,
Oct 5, 2025, 10:31:23 AMOct 5
to jooq...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "jOOQ User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jooq-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/jooq-user/4aa6dc70-1536-40e1-867b-44691d8e265bn%40googlegroups.com.

Bernd Huber

unread,
Oct 5, 2025, 10:45:43 AMOct 5
to jOOQ User Group
thank you Lukas! that's really exactly relevant now for me to read. 
I know i have read it sometimes on your blog in the past, but often things are read more eagerly when the problems are at hand :)

i currently plan to group my search-conditions in semi-joins per default.
At least the semi-joins are for now the holy grail to solve my performance problems, until i find some edge cases ;)

Great weekend to you, 
i may get back to the group, when i have some design that is not error-prone again :)
Reply all
Reply to author
Forward
0 new messages