Hello guys,
jOOQ provides great power, and with great power comes great responsibility :)
I happily implemented a Repository Abstraction for my Team to use, but it currently shows it's limits that result in bad performance, which is mainly resulting from my inability to understand specific performance implications, when writing SQL for my target database (mariadb).
I don't have a specific question but seek a bit for a guide / cookbook for writing performant SQL (via jOOQ DSL) for the business use case of a Repository with remote-paginated query-functionality (filter, sort, keyset pagination, ...).
My initial abstraction expected the users to write everything in one big query, that uses multisets and joins. The joins are needed for filtering etc, and the multisets to get the complex results. For example:
Repository Query Abstraction - Part 1 (Performance Problems)
@Dependent
public class TopicRepository extends AbstractRepository<TopicDTO, UUID> {
@Inject
TopicMapper topicMapper;
@Inject
TopicLangMapper topicLangMapper;
@Inject
TopicTagLangMapper topicTagLangMapper;
@Inject
LabelMapper labelMapper;
@Inject
LabelLangMapper labelLangMapper;
@Inject
UserTopicAnswerMapper userTopicAnswerMapper;
@Inject
UserTopicAnswerLangMapper userTopicAnswerLangMapper;
private boolean onlyUntagged = false;
// Default constructor for CDI
public TopicRepository() {
super(TOPIC.TOPICID);
}
public TopicRepository onlyUntagged(boolean onlyUntagged) {
this.onlyUntagged = onlyUntagged;
return this;
}
@Override
protected SelectFinalStep<Record1<TopicDTO>> prepareQuery(XQuery query) throws InvalidDataException {
final QueryJooqMapper queryJooqMapper = new QueryJooqMapper(query, TOPIC)
.addMappableFields(TOPIC)
.addMappableFields(TOPICLANG)
.addMappableFields(TOPICLABEL)
.addMappableFields(LABEL)
.addMappableFields(LABELLANG)
.addMappableFields(USER)
.addMappableFields(TOPICTAGLANG)
.addMappableFields(USERTOPICANSWER)
.addMappableFields(USERTOPICANSWERLANG);
UUID userId = null;
if (this.request() instanceof UserRequestContext) {
userId = ((UserRequestContext) this.request()).getUserId();
}
final Condition onlyUntaggedCondition = TOPIC.TAGSGENERATEDAT.isNull();
return dsl()
.select(
row(
TOPIC,
multiset(
select(
LABEL,
multiset(
select(LABELLANG)
.from(LABELLANG)
.where(LABELLANG.LABELID.eq(LABEL.LABELID))
).convertFrom(r -> r.map(rec -> labelLangMapper.map(rec)))
).from(TOPICLABEL)
.join(LABEL).on(LABEL.LABELID.eq(TOPICLABEL.LABELID))
.where(TOPICLABEL.TOPICID.eq(TOPIC.TOPICID))
).as("multisetLabels")
.convertFrom(r -> r.map(rec -> labelMapper.map(rec))),
multiset(
select(TOPICTAGLANG)
.from(TOPICTAGLANG)
.where(TOPICTAGLANG.TOPICID.eq(TOPIC.TOPICID))
).as("multisetTags")
.convertFrom(r -> r.map(rec -> topicTagLangMapper.map(rec))),
multiset(
select(TOPICLANG)
.from(TOPICLANG)
.where(TOPICLANG.TOPICID.eq(TOPIC.TOPICID))
).as("multisetLangs")
.convertFrom(r -> r.map(rec -> topicLangMapper.map(rec))),
row(
USERTOPICANSWER,
multiset(
select(USERTOPICANSWERLANG)
.from(USERTOPICANSWERLANG)
.where(USERTOPICANSWERLANG.ANSWERID.eq(USERTOPICANSWER.ANSWERID))
).convertFrom(r -> r.map(rec -> userTopicAnswerLangMapper.map(rec)))
).convertFrom(r -> userTopicAnswerMapper.mapOrNull(r))
).convertFrom(rec -> topicMapper.map(rec))
)
.from(TOPIC)
.join(TOPICLANG).on(TOPICLANG.TOPICID.eq(TOPIC.TOPICID))
.leftJoin(USERTOPICANSWER).on(USERTOPICANSWER.TOPICID.eq(TOPIC.TOPICID).and(USERTOPICANSWER.USERID.eq(userId)))
.where(queryJooqMapper.getFilters())
.and(onlyUntagged ? onlyUntaggedCondition : DSL.trueCondition())
.groupBy(TOPIC.TOPICID)
.orderBy(queryJooqMapper.getSorter())
.offset(queryJooqMapper.getOffset())
.limit(queryJooqMapper.getLimit());
}
}
After testing a bit, i found that following two points are mainly resulting in performance problems:
- many unbound joins (meaning that a join is made without further filtering on the joined table)
- many multisets that mariadb does not seem to apply at the very end on the already filtered results (unclear)
Repository Query Abstraction - Part 2 (Splitting the Repository in Two)
I thought about how to solve this, and it seems that a split in Two results in a overall better splitting that my development team could work with without needing to be an expert. For example:
First Part: Only execute query to get a list of Ids (Page)
@Dependent
@Data
@Accessors(chain = true)
@EqualsAndHashCode(callSuper = true)
public class TopicSelector extends AbstractSelector<TopicRecord, UUID> {
boolean onlyUntagged;
UUID userId;
/**
* Create a selector for topic.
*
* @param dsl dsl
* @param query query
* @param onlyUntagged onlyUntagged
* @param userId userId
* @return selector
*/
public static TopicSelector create(DSLContext dsl, XQuery query, boolean onlyUntagged, UUID userId) {
return SelectorFactory.create(TopicSelector.class, dsl, query, TOPIC, TOPIC.TOPICID, DOR.getTables())
.setOnlyUntagged(onlyUntagged)
.setUserId(userId);
}
@Override
protected Map<String, Field<?>> configQueryFields() {
return Map.ofEntries(
entry("title", TOPICLANG.TITLE),
entry("langCode", TOPICLANG.LANGCODE),
entry("labels", TOPICLABEL.LABELID),
entry("answered", USERTOPICANSWER.ANSWERID)
);
}
@Override
protected List<Join> configJoins() {
return List.of(
join(TOPICLANG).on(TOPICLANG.TOPICID.eq(TOPIC.TOPICID)),
leftJoin(TOPICLABEL).on(TOPICLABEL.TOPICID.eq(TOPIC.TOPICID)),
leftJoin(USERTOPICANSWER).on(USERTOPICANSWER.TOPICID.eq(TOPIC.TOPICID)
.and(USERTOPICANSWER.USERID.eq(userId)))
);
}
@Override
protected List<Condition> configConditions() {
return List.of(onlyUntagged ? TOPIC.TAGSGENERATEDAT.isNull() : DSL.trueCondition());
}
}
The result of this Selector will be a List of Ids of the Main-Table that the Selector works on (here: TOPIC), filtered and sorted by a given "query" object that contains query-fields which are mapped to the corresponding fields (TOPICLANG.TITLE, TOPICLANG.LANGCODE, ...), and are applied on the configured joins with the configured additional conditions.
Second Part: Getting hydrated/extended DTOs for the given List of Ids (Result of Part1)
@Dependent
@Data
@Accessors(chain = true)
@EqualsAndHashCode(callSuper = true)
public class TopicRepository extends AbstractRepositoryV2<TopicDTO, TopicRecord, UUID> {
@Inject
TopicLabelListLoader topicLabelListLoader;
@Inject
TopicLangListLoader topicLangListLoader;
@Inject
TopicTagLangListLoader topicTagLangListLoader;
@Inject
UserTopicAnswerLoader userTopicAnswerLoader;
/**
* Create a repository for topic.
*
* @param dsl dsl
* @return repository
*/
public static TopicRepository create(DSLContext dsl) {
return RepositoryV2Factory.create(TopicRepository.class, dsl, TOPIC, TOPIC.TOPICID, TopicMapper.class);
}
@Override
protected List<TopicDTO> expandResult(List<TopicDTO> results, List<UUID> ids) {
var topicLabels = topicLabelListLoader.load(dsl, ids);
var topicTagLangs = topicTagLangListLoader.load(dsl, ids);
var topicLangs = topicLangListLoader.load(dsl, ids);
var topicAnswers = userTopicAnswerLoader.load(dsl, ids, this.userId);
return results.stream().map(topic -> topic
.setLabels(list(topicLabels, topic.getTopicId()))
.setTagLangs(list(topicTagLangs, topic.getTopicId()))
.setLangs(list(topicLangs, topic.getTopicId()))
.setAnswer(one(topicAnswers, topic.getTopicId()))).toList();
}
}
@ApplicationScoped
public class TopicLabelListLoader implements ListLoader<UUID, LabelDTO> {
@Inject
LabelMapper labelMapper;
@Inject
LabelLangMapper labelLangMapper;
@Override
public Map<UUID, List<LabelDTO>> load(DSLContext dsl, List<UUID> ids) {
return dsl.select(
LABEL.LABELID,
LABEL,
multiset(
select(LABELLANG)
.from(LABELLANG)
.where(LABELLANG.LABELID.eq(LABEL.LABELID))
).convertFrom(r -> r.map(rec -> labelLangMapper.map(rec)))
).from(TOPICLABEL)
.join(LABEL).on(LABEL.LABELID.eq(TOPICLABEL.LABELID))
.where(TOPICLABEL.TOPICID.in(ids))
.fetchGroups(LABEL.LABELID, labelMapper::map);
}
}
@ApplicationScoped
public class TopicLangListLoader implements ListLoader<UUID, TopicLangDTO> {
@Inject
TopicLangMapper topicLangMapper;
@Override
public Map<UUID, List<TopicLangDTO>> load(DSLContext dsl, List<UUID> ids) {
return dsl.select(TOPICLANG)
.from(TOPICLANG)
.where(TOPICLANG.TOPICID.in(ids))
.fetchGroups(TOPICLANG.TOPICID, topicLangMapper::map);
}
}
@ApplicationScoped
public class TopicTagLangListLoader implements ListLoader<UUID, TopicTagLangDTO> {
@Inject
TopicTagLangMapper topicTagLangMapper;
@Override
public Map<UUID, List<TopicTagLangDTO>> load(DSLContext dsl, List<UUID> ids) {
return dsl.select(TOPICTAGLANG)
.from(TOPICTAGLANG)
.where(TOPICTAGLANG.TOPICID.in(ids))
.fetchGroups(TOPICTAGLANG.TOPICID, topicTagLangMapper::map);
}
}
@ApplicationScoped
public class UserTopicAnswerLoader {
@Inject
UserTopicAnswerMapper mapper;
@Inject
UserTopicAnswerLangMapper userTopicAnswerLangMapper;
public Map<UUID, UserTopicAnswerDTO> load(DSLContext dsl, List<UUID> ids, UUID userId) {
return dsl.select(
USERTOPICANSWER.TOPICID,
USERTOPICANSWER,
multiset(
select(USERTOPICANSWERLANG)
.from(USERTOPICANSWERLANG)
.where(USERTOPICANSWERLANG.ANSWERID.eq(USERTOPICANSWER.ANSWERID))
).convertFrom(r -> r.map(rec -> userTopicAnswerLangMapper.map(rec)))
).from(USERTOPICANSWER)
.where(USERTOPICANSWER.TOPICID.in(ids))
.and(USERTOPICANSWER.USERID.eq(userId))
.fetchMap(USERTOPICANSWER.TOPICID, mapper::mapOrNull);
}
}
The Repository now is only responsible for the Fetching of nested DTOs by the given list of Ids that the Selector has found. Each multiset has been split into a separate call to the database via a separate "Loader"-class that encapsulates it for reusability.
As you guys can see, while Part1 had it's performance problems it was really nice that everything was in one big code-block and everything! the filtering, sorting and result-generation was outsourced to the database, and only one call to the database was necessary to execute it.
Part2 now is much more focused on splitting the code into separate fragments and needs to interact more times with the database. First: Loading Ids from the database, Second: Calling the database for each involved multiset seperately with additional calls.
Just wanted to share this story of my architecture roundtrips in the group. Maybe someone has an idea or a tip if there is something i have completely missed.
But yeah, i love that jOOQ provides the ability to build own abstractions, but it is often not easy to forsee if an architecture that you write on your own is performance-safe when the number of joins and multisets are growing over time. The example i presented here would be no problem. But in my job we join much more tables, and also need much more results. I also thought about trying out Views but somehow never gave it a chance until now.