Having lunch with PostgreSQL, MongoDB and JSON

· 12 min read
Álvaro Hernández   postgresql, mongodb, json

On a post titled “Postgres JSON, Developer Productivity, and The MongoDB Advantage”, Buzz Moschetti discussed about PostgreSQL’s handling of JSON and how (inconvenient) it is for developers, specially when compared to MongoDB. While the post is almost 18 months old, the principles described there have not changed, and I (mostly) respectfully disagree. Here is my opinion on the topic.

Let’s see what there is on today’s menu.

Cartoon: PostgreSQL and MongoDB entering into the Database Steakhouse, to eat some JSON

Small bites

SQL syntax and, indeed, the relational model as a whole are designed to work with single, scalar values which carry the same type from row to row, not rich shapes like JSON that can contain substructures and arrays and different elements from row to row.

If anything, SQL is about set operations on tuples, not scalar values. But, I get Buzz’s point, he probably meant “columns”. Yet still not correct. The SQL standard has had support for arrays as a column type since 1999! Including functions to access, construct or create arrays. PostgreSQL is actually more advanced, supporting multidimensional arrays, and even a set of key-value pairs with the hstore datatype (again: all that within a single column). On top of that, PostgreSQL also supports custom data types (which can also be row types or data structures) and combinations of all that. So not simple scalar values. And it obviously supports JSON (with the jsonb data type), which will be further discussed here.

And the extensions to SQL utilized by Postgres to manipulate JSON are proprietary, unknown to most SQL developers, and not supported or even recognized by most 3rd party SQL tools.

I’m not aware of MongoDB’s language being part of any standard, so we should assume Buzz’s comment about proprietary language applies to both MongoDB and PostgreSQL equally. Being that true, there are some important catches:

You cannot simply set aside the fact JSON does not support dates as a native type; you must do something, somewhere in your stack to accomodate for this so that a real date e.g. java.util.Date is used by your application. Letting the application itself handle the problem is a completely non-scalable architecture and dooms the system to bugs and reconciliation errors as individual applications make decisions about how to deal with the data.

I also prefer a richer data serialization format than JSON. Yet most people deal with JSONs directly, even when using MongoDB –rather than MongoDB’s BSON. In any case, a data type conversion like the one used as an example on Buzz’s post can be very easily done at query time:

select (content->>'cd')::timestamp from foo;
┌─────────────────────────┐
│        timestamp        │
├─────────────────────────┤
│ 2017-11-22 15:20:34.326 │
└─────────────────────────┘

Note: the timestamp contained in the JSON string above contains a timezone indication. It would have been better to cast it to a timestamptz . But since PostgreSQL would have converted that to a timezone based on your server’s local timezone, just for representational purposes, it may have caused some confusion –for the non versed reader in PostgreSQL advanced date and time management capabilities.

Moreover, JSON is typed. And actually, PostgreSQL provides support via the jsonb_typeof function to return the resolved datatypes:

select jsonb_typeof(content->'props') typeof_props, jsonb_typeof(content->'props'->'a') typeof_a from foo;
┌──────────────┬──────────┐
│ typeof_props │ typeof_a │
├──────────────┼──────────┤
│ object       │ number   │
└──────────────┴──────────┘

I believe Buzz’s statement quoted above is overly exaggerated. It is not doomsday to do some application-level data type enrichment. It is, at least, the very same problem the application needs to deal with loose schemas: managing absent keys, different versions of documents or different data types for the same document key. Even if they would come as strongly typed properties in BSON! So neither PostgreSQL nor MongoDB avoid this problem –when working with unstructured data.

Repository with all the source code relevant to this blog post, data used and README.md

Main course

Nearly all traditional Postgres interaction with applications – written in almost every language – works via O/JDBC drivers and these drivers do not have the capability to properly and precisely convert the JSON into a useful native type (such as a Map or List in Java or a Dictionary or List in Python).

Buzz goes on to say that “we have to manually parse the JSON in our application”. And I agree. But I don’t see it being a problem. Let’s show how simple it is to do it:

<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.8.5</version>
</dependency>
class JsonContent {
    private int a;
    private int[] fn;

    @Override
    public String toString() {
        return "JsonContent{ a=" + a + ", fn=" + Arrays.toString(fn) + " }";
    }
}
String json = rs.getString(1);
Gson gson = new GsonBuilder().create();
JsonContent jsonContent = gson.fromJson(json, JsonContent.class);
System.out.println(jsonContent);

The output is the expected one:

JsonContent{ a=12, fn=[10, 20, 30] }

It doesn’t look to me like the end of the world. Buzz considers also as a problem the availability of different JSON parsers and their interoperability. About the former I see it more as an advantage; and about the latter, it’s a non-issue: after all, JSON is a spec!

Source code for the example above

Side order: polymorphism

Libraries like Gson also have very good support to parse arbitrary, polymorphic JSONs. But even so, it’s a rare case that your data shape is changing so dramatically from document to document. Because even when parsed correctly, your application still needs to deal with that polymorphism! Otherwise, just treating it as a simple string would be so much easier.

So how does a parser like Gson deal with unexpected or heavily changing JSON documents anyway? Just a few lines of code, not much dissimilar from Buzz’s parsing code for a polymorphic BSON document:

StringBuffer sb = new StringBuffer();

while(rs.next()) {
    String json = rs.getString(1);
    JsonParser parser = new JsonParser();
    JsonObject object = parser.parse(json).getAsJsonObject();

    for(Map.Entry<String, JsonElement> entry : object.entrySet()) {
        walkMap(sb, entry.getKey(), entry.getValue());
    }
}

System.out.println(sb.toString());

The output it produces is:

a: JsonNumber{value=12}
fn: JsonArray
	0: JsonNumber{value=10}
	1: JsonNumber{value=20}
	2: JsonNumber{value=30}

a: JsonNumber{value=5}
fn: JsonArray
	0: JsonString{value="mix"}
	1: JsonNumber{value=7.0}
	2: JsonString{value="2017-11-22"}
	3: JsonBoolean{value=true}
	4: JsonDocument
		x: JsonNumber{value=3.0}

Source code for the example above

What is noticeable here is that the example above, which produces an output quite similar to that of Buzz’s post, did not require to construct specific BSON constructs, and instead relied on plain, “old” JSON. While we might argue, again, that BSON provides a richer set of datatypes, it is of questionable applicability due to the verboseness of constructing BSON documents and the need to interact with pervasive, existing JSON documents.

Indeed, compare the insert we did on PostgreSQL to get the above output:

insert into foo values (
        '{"props": {"a": 12, "fn": [10,20,30]}, "cd":"2017-11-22T15:20:34.326Z"}'
),(
        '{"props": {"a":5, "fn":["mix", 7.0, "2017-11-22", true, {"x":3.0} ]}}'
);

with the one proposed by Buzz, for MongoDB:

db.foo.insert([ {"props": {"a": NumberInt("12"),
                             "fn": [NumberDecimal("10"),NumberInt("20"),NumberLong("30")] },
                             "cd": new ISODate("2017-11-22T15:20:34.326Z") } },
                   {"props": {"a": NumberInt("5"),
                              "fn": ["mix", 7.0, new ISODate("2017-11-22"), true, {x:3.0}] } }
]);

I’d personally stick with the first one. It is clearly less verbose, and is interoperable JSON, not MongoDB’s proprietary BSON.

Second side order: versioned documents

Having solved also the polymorphic case, I would like to come back to more real use cases. What you will probably need to deal is not arbitrary JSON documents, but rather an evolving schema. Not only adding new, optional fields; but also even changing the type of existing ones. Would PostgreSQL and the JSON library be able to cope with that? Short answer: no problem.

They key here is to exploit another of the advantages of the relational schema: a mix of “standard” columns, with fixed data types, with the variable JSON. We may encode all the variability in the JSON, while reserving (at least) one column on the regular table to indicate the version of the accompanying document. For example, by creating the table foo2 as:

create table foo2 (version integer, content jsonb);

With this help from PostgreSQL, it is possible to store versioned “schemas” of the (otherwise variable) JSON documents, like in:

insert into foo2 values (
        1, '{"cd": "2017-11-22T15:20:34.326Z", "props": {"a": 12, "fn": [10, 20, 30]}}'
), (
        2, '{"cd": "2017-11-22T15:20:34.326Z", "props": {"a": "twelve", "fn": [10, 20, 30], "j": false}}'
);

Using Gson’s library ability to map a given document to a class, we just need to create a 2nd version of the class object that would support the second document shape:

public class JsonContentV2 {
    private String a;
    private int[] fn;
    private boolean j;

    @Override
    public String toString() {
        return "JsonContent{ a=" + a + ", fn=" + Arrays.toString(fn) + ", j=" + j + " }";
    }
}

and parse instantiating one or the other class with a simple switch statement (or a visitor, or Java 12’s new switch syntax):

Gson gson = new GsonBuilder().create();

while (rs.next()) {
    int version = rs.getInt(1);
    String json = rs.getString(2);

    Object jsonContent = null;
    switch (version) {
        case 1: jsonContent = gson.fromJson(json, JsonContent.class); break;
        case 2: jsonContent = gson.fromJson(json, JsonContentV2.class); break;
    }

    System.out.println(jsonContent);
}

The result:

JsonContent{ a=12, fn=[10, 20, 30] }
JsonContent{ a=twelve, fn=[10, 20, 30], j=false }

Source code for the example above

Dessert: BSON inside PostgreSQL

So far, we have seen how to do strict typed JSON, polymorphic JSON and then versioned JSON to support several different evolutions of the document’s schemas, with JSON schema type inference. We have also seen how JSON has indeed data types, and PostgreSQL also enables to expose them via the jsonb_typeof function.

It is noteworthy that standard JSON types are a subset of BSON’s types. Mongo shell uses the SpiderMonkey JavaScript engine that supports only standard JSON types, the same used by PostgreSQL. To make mongo shell work with BSON format, BSON’s types are wrapped inside objects (like in NumberDecimal("10")). Taking that into account, the same mechanism could be applied in PostgreSQL by wrapping a value inside an object to indicate an extended type like { "a": { "type": "int", "value": "12" } }. With this we address Buzz’s concern that “this still does not answer the fundamental type fidelity problem: fn is an array of what, exactly?”. The whole JSON document would look like:

{ "a": { "type": "int", "value": 12 }, "fn": [ { "type": "decimal", "value": 10 }, { "type": "int", "value": 20 }, { "type": "long", "value": 30 } ] }

This is a bit verbose, but we could wrap all the logic inside PostgreSQL or Java using some helper functions or classes.

But… wait a minute. If MongoDB does this by wrapping BSON datatypes into standard JSON, PostgreSQL could surely do the same! Certainly, PostgreSQL could use BSON too, by storing it into standard PostgreSQL’s jsonb type! It would look like this:

select content from foo3;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                      content                                                      │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ {"cd": {"$date": 1511364034326}, "props": {"a": 12, "fn": [{"$numberDecimal": "10"}, 20, {"$numberLong": "30"}]}} │
│ {"props": {"a": 5, "fn": ["mix", 7.0, {"$date": 1511308800000}, true, {"x": 3.0}]}}                               │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

But how do we query this? Very easily too. We can just add MongoDB’s driver as a dependency to our pom.xml and then use a code practically identical to the one used before, but parsing to BSON:

while (rs.next()) {
    String json = rs.getString(1);

    BsonDocument bson = RawBsonDocument.parse(json);

    for (Map.Entry<String, BsonValue> entry : bson.entrySet()) {
        walkMap(sb, entry.getKey(), entry.getValue());
    }
}

which yields identical result to what Buzz did with MongoDB:

a: BsonInt32{value=BsonInt32{value=12}}
fn: BsonArray
	0: BsonDecimal128{value=BsonDecimal128{value=10}}
	1: BsonInt32{value=BsonInt32{value=20}}
	2: BsonInt64{value=BsonInt64{value=30}}

a: BsonInt32{value=BsonInt32{value=5}}
fn: BsonArray
	0: BsonString{value=BsonString{value='mix'}}
	1: BsonDouble{value=BsonDouble{value=7.0}}
	2: BsonDateTime{value=BsonDateTime{value=1511308800000}}
	3: BsonBoolean{value=BsonBoolean{value=true}}
	4: BsonDocument
		x: BsonDouble{value=BsonDouble{value=3.0}}

Source code for the example above

A Patxarán shot

In Spain, it is not unfrequent to end a good, long meal with a digestivo (“digestif”). One of my favorite ones and very typical Spanish is Patxarán. Let’s have a shot of Patxarán!

If you would like to have a Patxarán shot after a delicious Mediterranean meal, while at the same time enojoying great PostgreSQL conversations with PostgreSQL experts and peers, join us on the PostgreSQL Ibiza Conference. OnGres is a proud Cluster Level / Platinum sponsor of the conference.

PostgreSQL has the best of both worlds: strongly typed columns, with support for advanced schema validation, triggers and foreign keys; and loosely typed schemas with columns of JSON (in PostgreSQL land: jsonb) datatype. MongoDB has only got the latter (strictly speaking MongoDB has schema validation; but it is a very poor version of that, where the schema validation is not enforced on old data if validation changes, and does not support foreign keys or triggers).

What’s even more interesting is the discussion that Buzz goes into with regards to productivity. If anything, the more functionality the database provides, the more productive you will be. Otherwise, you end up re-inventing the wheel on the application side. And this is exactly what you need to end up doing with MongoDB in many cases:

All these factors, and the more robust and trustable PostgreSQL operations that it exhibits in production, make PostgreSQL, I truly believe, a much more productive database than MongoDB.

Hope you enjoyed your meal today.

comments powered by Disqus