jOOQ: Stream processing PostgreSQL query results

When Java 8 came out, it proved a great effort to cope with the great demand of stream processing. Streams and lambdas allow you to process the data in a list asynchronously, so that you don’t have to wait until the last element of step 1 is calculated in order to start step 2 (apart from the evident code cleanness).

With great amounts of data, this reduces a very painful bottleneck, so why wouldn’t we want this huge data-related improvement in our ORMs? Of course, the query planner needs to do all the calculations, but as soon as it starts to pop data out, we want to start processing them.

This is one of the advantages jOOQ provides. So what’s jOOQ? In a world in which a great share of the ORM market is taken by magical annotation-based solutions, jOOQ takes a totally different approach: it provides a strongly typed way to access your data by autogenerating the necessary classes.

It’s not among my intentions to decide which approach is best, but in this post I’ll try to convince you that, although it’s not the most famous solution, jOOQ can be a face-to-face contender with any other ORM in the world.

Understanding the basics

The most important part of jOOQ’s functioning is that it autogenerates classes that represent your database objects. This way, it makes it very easy for you to write expressive SQL-like java code. This task of generating classes is a separate process, that is not at a code-generation stage. As such, it should be run in a Maven (mvn) task.

So, first of all we need to import jOOQ. Typically, something like this is enough:

<dependency>
   <groupId>org.jooq</groupId>
   <artifactId>jooq</artifactId>
   <version>${jooq.version}</version>
</dependency>

And then, create the task for code generation. Here we’ll post a simple example based on our own experience, but jOOQ’s doc has much more information on the topic, for example:

  • Including and excluding elements based on name (via RegEx) or on the kind of object they are (tables, functions, sequences…)
  • Including data from several schemas
  • Using custom java types

But as a simple example, this one may be good:

<plugin>
   <groupId>org.jooq</groupId>
   <artifactId>jooq-codegen-maven</artifactId>
   <version>3.11.4</version>

   <dependencies>
       <dependency>
           <groupId>org.postgresql</groupId>
           <artifactId>postgresql</artifactId>
           <version>42.2.4</version>
       </dependency>

       <dependency>
           <groupId>org.jooq</groupId>
           <artifactId>jooq</artifactId>
           <version>3.11.4</version>
       </dependency>
   </dependencies>

   <executions>
       <execution>
           <id>jooq-codegen-offline</id>
           <phase>generate-sources</phase>
           <goals>
               <goal>generate</goal>
           </goals>
           <configuration>
               <jdbc>
                   <url>jdbc:postgresql://url_to_your_database</url>
               </jdbc>
               <generator>
                   <database>
                       <schemata>
                           <schema>
                               <inputSchema>params</inputSchema>
                           </schema>
                       </schemata>
                   </database>
                   <target>
                       <packageName>com.ongres.persistence.jooq.generated.offline</packageName>
                   </target>
               </generator>
           </configuration>
       </execution>
   </executions>

   <configuration>

       <jdbc>
           <driver>org.postgresql.Driver</driver>
           <user>your username</user>
           <password>your password</password>
       </jdbc>

       <generator>
           <database>
               <name>org.jooq.util.postgres.PostgresDatabase</name>
               <includes>.*</includes>

               <forcedTypes>
                   <forcedType>
                       <name>UUID</name>
                       <expression>.*</expression>
                       <types>(?i:UUID)</types>
                   </forcedType>
               </forcedTypes>
           </database>
           <target>
               <directory>target/generated-sources/jooq</directory>
           </target>
       </generator>
   </configuration>
</plugin>

As you can see, in this example we specify:

  • The jdbc connection to the DB
  • The schema (or schemata) from where to pull the objects
  • The access credentials
  • The PostgreSQL driver, given that we specify this is a PostgreSQL database
  • The folder in which the generated classes will be created
  • Their base package

Also, we can see how easy it is to tell jOOQ to map all fields that have UUID as database type with the Java class UUID. What happens if you add new tables or modify the DDL of the existing ones? Nothing, just run the maven task to reimport all the classes, and you’re done with the problem.

Tables and Records

Among all the classes that jOOQ generates, we’ll focus on 2 types:

  • Tables: as you might think, they represent DB tables. Their names are CamelCased versions of the tables scraped from your DB. As we will see, they’re useful for selecting data from them, or for accessing the fields.
  • Records: this one is also obvious, they represent records, that is, the tuples returned from a query. The class TableNameRecord represents a tuple returned from SELECT * FROM table_name.

Basic example

Let’s start with a first example, taken from the source code used for the development of postgresqlCO.NF, a tool that makes PostgreSQL configuration easy for human beings.

In this case, we wanted to obtain the info of an exact postgresql.conf parameter, in the given language. The query is simple enough to see the way a basic jOOQ query typically looks:

dslContext
   .selectFrom(ProcessedFields.PROCESSED_FIELDS)
   .where(ProcessedFields.PROCESSED_FIELDS.NAME.eq(parameter))
   .and(ProcessedFields.PROCESSED_FIELDS.LANG.eq(language.getAbbrev()))
   .fetchOne());

The first thing we can see is that everything is typed. Needless to say, the less queries or parts of a query that you’re forced to write as a String, the better. It becomes more battle proof, more DB-independent, easier to migrate… Never trust Strings in this context.

In jOOQ, fields in the select clause are typed, tables in the from clause are typed, conditions in the where clause are typed… or more accurately speaking, all of them can be typed (you can also write them as Strings, if you want).

Also, there are several methods to return the result. The one used here, fetchOne(), is good for returning a single result, but we’ll see more in other examples. Of course, in this example the fetchOne() method returns an instance on ProcessedFieldsRecord, as it is the record associated with the queried table.

DslContext

In order to run a query, we need an instance of DslContext. This class contains all the necessary information to access a database, and serves as the starting point for our queries (it’s the class that contains the select/delete/update/insertInto methods).

In case you need to connect alternatively to different databases, or to use different connections(read-only connections vs write connection), I’d recommend you to wrap them in a simple DslContextSupplier.

For example, in a part of postgresqlCO.NF, we have two databases, one in a context we call online and another one in a context called offline. Well, the solution looks pretty simple:

public enum Context {
 OFFLINE,
 ONLINE
}
public interface DslContextSupplier {
 <T> T executeFunction(Context context, Function<DSLContext, T> function);
 void executeConsumer(Context context, Consumer<DSLContext> consumer);
 <T, IT> T executeFunctionWithStream(Context context, Function<DSLContext, Stream<IT>> producer, Function<Stream<IT>, T> consumer);
 <IT> void executeConsumerWithStream(Context context, Function<DSLContext, Stream<IT>> producer, Consumer<Stream<IT>> consumer);
}

Taking care of closing resources in jOOQ is as important as it always is in DB access. That’s the reason why we have so many methods in our DslContextSupplier:

  • executeConsumer/executeFunction: we obtain the DSLContext from the Context in a try-with-resources block and then run the desired consumer inside the block.
  • executeConsumerWithStream/executeFunctionWithStream: similar to the previous methods, but these are intended to be used when jOOQ is used for producing a Stream (via stream() or fetchStream()). These cases need a special treatment, because the Stream produced by them contains a reference to the ResultSet, so it needs to be treated in a special way: the producer is called in a second try-with-resources block and the consumer is called inside it. This has an important corollary: as Streams are closed when these methods are called, your DAO methods shouldn’t return Streams if their implementation calls these methods, because the classes that rely on those methods will receive a closed Stream. You’d better apply the IoC principle and include a callback in your DAOs. Let’s see an example of this.

Advanced queries: jOOQ magic

One of the goals in postgresqlCO.NF is to allow you to upload your config files, update parameter values and comments as many times as you want, and download them in whatever format you want. In order to do this, we need to calculate which is the last version of each parameter. Yes, we’re going to use typed window functions in jOOQ. More concretely, we’ll use a row_number() and then filter those with row_number=1. The SQL code would be:

SELECT [all the fields but row_number]
FROM (
    SELECT *,
row_number() OVER(PARTITION BY param ORDER BY ts) row_number
FROM postgresql_conf_edit
WHERE row_number=1
AND configuration = ?
) a;

The whole point of this article is: can we do THAT in Java with jOOQ? Yes, we can:

public <T> T getAggregatedFileChanges(UUID configurationId,
   Function<Stream<PostgresqlConfFileChange>, T> callback) {

 Function<DSLContext, Stream<Record>> producer = (DSLContext dslContext) -> {
   List<SelectField<?>> nestedFields =
       new ArrayList<>(Arrays.asList(PostgresqlconfEdit.POSTGRESQLCONF_EDIT.fields()));
   Field<Integer> rowNumber = DSL.rowNumber().over()
       .partitionBy(PostgresqlconfEdit.POSTGRESQLCONF_EDIT.PARAM)
       .orderBy(PostgresqlconfEdit.POSTGRESQLCONF_EDIT.TS.desc())
       .as("row_number");
   nestedFields.add(rowNumber);

   return dslContext
       .select()
       .from(dslContext.select(nestedFields)
           .from(PostgresqlconfEdit.POSTGRESQLCONF_EDIT)
           .where(PostgresqlconfEdit.POSTGRESQLCONF_EDIT.CONFIGURATION.eq(configurationId))
       )
       .where(rowNumber.eq(1))
       .fetchStream();
 };

 Function<Stream<Record>, T> consumer = stream -> callback.apply(stream
     .map(record -> new PostgresqlConfFileChangeImpl(
         record.into(PostgresqlconfEdit.POSTGRESQLCONF_EDIT)))
 );

 return supplier.executeFunctionWithStream(Context.ONLINE, producer, consumer);
}

In order to understand what this does, please keep in mind that the producer function creates the jOOQ-generated Stream<IntermediateType> in a try-with-resources block, and that the consumer turns it into whatever type T we want to return from the DAO method. As you see, the method receives a callback that is applied to the obtained data, in order to return this wildcard type T. That said, let’s get down to the great advantages jOOQ gives in this case:

  • Natural window functions: the first thing we do is calculate the fields that will go in the inner query, solving a tricky problem (the SELECT *, extra_field issue). Of course, it includes creating the window function, which is (surprisingly?) easy. With jOOQ, in order for your code to work, you need to write it the way you would expect to have to write it. Now, that’s what I call an understandable and easy-to-use framework!!
  • How easy it is to use a subquery: jOOQ automatically detects you’re intending to use the inner SELECT as a subquery, and gives a random alias to the subquery that it generates in SQL. In fact, by default all the SQL translations that jOOQ does include fields like “schema”.”table”.”field”, while the fields it creates for the outer select are translated as “random_alias”.”field”.
  • Fetching as Stream: Ok, in this case there aren’t that many postgresql.conf parameters for this to be such a great advantage, but think about the possibility of querying and obtaining huge amounts of data in a stream. I don’t have to explain the advantages this renders in terms of being able to process the results bit by bit, instead of having to wait for all of them to be gathered in order to start the processing. Using a callback allows us to keep this advantage while ensuring that the Stream will finally be closed.
  • Adapting records to the desired record: moving to the consumer part, the PostgresqlconfEditRecord is a record that would suit the n-1 first fields of each returned tuple, but the query returns the row_number extra field as well (please remember that the outer select() clause takes no fields in jOOQ in this example, so it’s like doing a SELECT *, inheriting the fields from the inner query, which does contain the row_number).
    So we need the magic of this record.into(Table). Thanks to the jOOQ’s strongly typed nature, it knows that records generated by the PostgresqlconfEdit class are of type PostgresqlconfEditRecord, so it returns a strongly typed PostgresqlconfEditRecord object, so that we can use it in the constructor for our internal postgresqlCO.NF PostgresqlConfFileChangeImpl class, which is what we finally pass to the callback. Don’t worry if there’s an extra field!
    Believe me, if you don’t think this point is great, just think about the readability provided by not having to write down the 15 fields that this PostgresqlconfEditRecord turns out to have! But if you’re still not convinced about jOOQ’s magic metadata, please take a look at how jOOQ decouples fields, coming from two joined, tables in this example. Sheer elegance!

Not only queries

You’re not constrained to querying, jOOQ also allows you to do any kind of DML (create, insert, update). Tip: don’t forget to call the easy-to-forget execute() method as the last step of your statement! Otherwise, no operation will be actually executed (by the way, in this example we needed the id of the inserted tuple, so we don’t have the need of running execute() but returning() and fetchOne():

RawFieldsRecord rawFieldsRecord = dslContext.insertInto(RawFields.RAW_FIELDS)
   .columns(RawFields.RAW_FIELDS.CATEGORY_ID,      RawFields.RAW_FIELDS.DEFAULT_VALUE,
       RawFields.RAW_FIELDS.CONTEXT, RawFields.RAW_FIELDS.ENUMVALS,
       RawFields.RAW_FIELDS.MAX_VAL, RawFields.RAW_FIELDS.MIN_VAL,
       RawFields.RAW_FIELDS.NAME, RawFields.RAW_FIELDS.NEEDS_RESTART,
       RawFields.RAW_FIELDS.UNIT, RawFields.RAW_FIELDS.VARTYPE,
       RawFields.RAW_FIELDS.VERSION)
   .values(categoriesRecord.getId(), pgSettingsRecord.getBootVal(),
       pgSettingsRecord.getContext(), pgSettingsRecord.getEnumvals(),
       pgSettingsRecord.getMaxVal(), pgSettingsRecord.getMinVal(),
       pgSettingsRecord.getName(), needsRestart, pgSettingsRecord.getUnit(),
       pgSettingsRecord.getVartype(), pgVersion.getDotted())
   .returning(RawFields.RAW_FIELDS.ID).fetchOne();

And that’s all for now, I hope this post helped you understand how jOOQ can make querying PostgreSQL from Java natural, easier and safer, while allowing you to create Stream of the results which you can process with regular Java8+ or reactive APIs. Could it be better?