666 - Hell not found

September 20th, 2013

Yesterday evening, a few tweets made it into my Tweetbot column listening to tweets related to Spring Data. The one raising my attention was pointing to a blog post creatively entitled “Spring Data MongoDB - A Mismatch Made In Hell”. As the title already suggests, it contains a rather rigid critique of the features and design approaches we chose for the MongoDB module in the Spring Data project. The post has a very harsh tone and is equipped with a whole bunch of either deep misconceptions or deliberate refusal to see facts, which I found quite surprising. Let me go through it bit by bit and clear the dust it created.

But let’s start with the blog post:

The Spring guys love to tout their support for MongoDb as an advantage over frameworks like JEE. However, using it in a real life project shows how severely lacking and fundamentally mis-designed the framework is. Spring Data MongoDB tries to shoehorn an ORM style strictness to a fundamentally non-relational database, resulting in it being rendered useless in real life.

Quite an intro, right? Rest assured, the tone doesn’t get any less aggressive throughout the post. Having that said, what’s more important to me here is that it would’ve been tremendously helpful to what this ORM-style strictness is, that we impose onto users. Just to give you an impression, here’s what it looks like to persist a raw String or MongoDB DBObject using our MongoTemplate:

String document = "{'firstname' : 'Dave', 'lastname' : 'Matthews'}";
template.save(document, "myCollection");

DBObject document = new BasicDBObject();
document.put("firstname", "Dave");
document.put("lastname", "Matthews");
template.save(document, "myCollection");

template.findAll(Map.class, "myCollection")

Not very ORMish, right? You don’t even need types to persist data and get it out of the store again. “What is MongoTemplate then even good for?”, you might ask. First, you can make use of the extensive object-to-store mapping we provide if you like to. However, even if you opt out of that, what you definitely get is appropriate resource mapping and exception translation from MongoDB specific exceptions into Spring’s DataAccessExceptions. Just the way Spring developers expect it when using a template provided by us. In general the automatic object-to-store mapping is one of the key reasons developers use Spring Data MongoDB over the raw driver, but let’s go on…

Can’t retrieve fields - Full document only

This is the biggest indicator of how flawed Spring Mongo’s design is. It tries to model documents like rows in a SQL database and wants you to make ‘entity’ classes like an ORM. Guess what, the two are not the same. A document can be far more complex and larger than a typical row in sql database.

So first of all - no we don’t. As indicated above it’s perfectly fine to use Spring Data MongoDB with plain Maps or DBObjects. However, writing Java application, developers are used to interact with types to write type safe code. Hence it’s an extremely valuable feature to be able to get the documents retrieved from the store mapped onto domain classes automatically. A friendly reminder: an entity is not a concept tied to ORM. Feel free to read up on this in Eric Evans’ “Domain Driven Design” - a highly recommended book anyway.

With a row in a sql db, it is assumed that you will pull out all of it on most queries, otherwise the data is usually split in multiple tables. But documents are a different beast. Documents are nested within each other, and most of the time you only want to pull out a small subset of it from your database.

But with Spring Mongo’s ‘entity’ modelling, you are forced to pull out your entire document on each query.

I am terribly sorry, but this is wrong again. There are a few ways to skin the cat here: first of all, on the reading side, you’re not forced to use the same types as you might have used when writing the data. So it’s perfectly fine to persist a Person, but trigger a query for PersonSummary and which only consists of firstname, lastname and email address.

The second approach is just staying with Person but equip the Query instance you use with a field spec to in- or exclude the fields you’d like:

Query query = new Query().fields().include("firstname").include("lastname");
template.find(query, Person.class);

This would essentially populate only the firstname and lastname field of the type handed into the template.

No DBRef lazy loading

This one is pure insane.

At this point I got used to the cursing :).

If you use a DBRefs to refer to another document, Spring Mongo will pull out the entire document, instead of just the reference. So if you have a bunch of documents connected to each other via dbrefs, a small query where you really only want to access a couple fields will end up pulling out the entire document graph!

If you have a lot of documents related to each other, there are two things to consider here: first, I’d raise the question if MongoDB is the right store for this kind of data. The argument feels a bit like a “It’s frickin’ complicated to store a tree in a relational database. JDBC is awful!”. Maybe have a look into Neo4j for highly interconnected data.

The fact that the bug report on this is almost 2 years old and has been assigned a ‘minor’ status with no resolution in sight is mindboggling, and shows how misaligned Spring Mongo’s visions are with its usage in the real world.

Admittedly DATAMONGO-348 has been open for quite a while. Still, we’ve seen tons of customer projects working perfectly fine with eagerly loaded DBRefs or - if that was a problem - simply using manually maintained references. In fact, the MongoDB reference documentation actually recommends not to use DBRefs but manual references. So what constitutes the “real world” is a matter of perspective.

No cursor support

Want to use a cursor to stream and iterate through a collection. No dice. Either pull out the entire collection or fall back to the native Mongo Java driver.

Exactly! However, what’s so utterly wrong with that? A few lines ago we’ve got complaints about abstracting too much, now we’re back to too little. Here’s how it goes:

template.execute(new CollectionCallback<Void>() {
    Void doInCollection(DBCollection collection) {

        DBObject query = new BasicDBObject("lastname", "Matthews");
        DBCurson cursor = collection.find(query);

        while (cursor.hasNext()) {
            // … do something
        }

        return null;
    }
})

It’s not really “falling back” but rather using the template mechanisms of resource management and exception translation but still getting access to the driver API to perform operations closer to the store. For plain iterating we have DATAMONGO-203. It hasn’t got many votes so far but we’re of course happy to implement it.

Incomplete aggregation framework support

Spring Mongo only very recently started supporting the Mongo Aggregation framework, and like the rest of Spring Mongo, it is a half hearted effort.

I’d like to see a list of other Java-based MongoDB libraries that already support the aggregation framework.

The framework documentation is sparse and confusing and most of your real world aggregation queries wont end up fitting with Spring Mongo’s implementation (not without nonsensical workarounds anyway); forcing you yet again to fall back to the native driver.

This is the “sparse” documentation about what’s currently implemented. The design of the API has been closely aligned with the examples in the MongoDB reference documentation, made a review round through the MongoDB Java driver development team and has been pre-release applied to a hand full of real world projects where it had worked just fine.

I am not saying the support is perfect, but trying to create the impression it’s the dumbest thing since JPA for NoSQL is a bit off, I think.

Incomplete index support

I have to admit, I couldn’t make too much sense of that section. Maybe an example would have been helpful here as there seems to be some obvious confusion.

Spring also has an ensureIndex() method to manually create indexes on fields, without using annotations, but there are no guidelines whatsoever as to when and how frequently it should be called and what the performance implications of calling it are.

Spring doesn’t have methods. MongoTemplate has, the said one in particular. No guidelines? What are the performance implications? We assume basic knowledge of MongoDB indexing but generally speaking the usage of ensureIndex() is recommended over the use of annotations mostly as you’ll probably like to create the indexes in your app explicitly rather than the mapping sub-system create the them on the fly when you bootstrap the app.

Can’t switch database

A lot of times you will want to store data in different databases on a single instance of Mongo for things like keeping data of individual customers separate.

With the native Mongo driver, switching between the databases is a simple matter of calling getDb(dbName). With Spring however, this is close to impossible unless you are willing to write a chunk of the framework yourself (which may change underneath you across releases).

What is described as “close to impossible” here is essentially a MongoTemplate instance per database you want to interact with. I even described an approach to multi-tenancy with Spring Data MongoDB using our MongoDbFactory abstraction in a post on StackOverflow recently. Not sure you want to get more dynamic than that. By the way, the approach is close to what you get with AbstractRoutingDataSource in the Spring JDBC world. I’d also argue it’s good practice not to actively reach out for resources from within application code, so hiding this behind configuration and using dependency injection to access such components has been agreed on to be the right way to go for almost a decade.

Regarding the changing APIs: first, I fail to see something utterly hacky is necessary to achieve the flexibility and second we stick to semantic versioning and don’t deliberately break APIs.

Surprise Logging Framework

I have already written about Spring’s documentation issues. The Spring documentation says it uses Jakarta Commons Logging, but Spring Data apparently uses SLF4J. However, the Spring Data documentation makes absolutely no mention of this fact. This means if you are starting out with Spring Data, your very first experience will consist of a bunch of undocumented runtime errors, which you will figure out only after banging your head for a few hours on Stack Overflow.

The reason Spring is still on Commons Logging is backwards compatibility and has been outlined in 2009 already. If we started the core framework from scratch we’d use Slf4j. If you write Java software today and still you’re still on Commons Logging, you have quite a different problem anyway. The usage of Slf4j is consistent amongst all the Spring Data modules in the release train.

We don’t explicitly include this in the reference documentation as we strongly recommend to use tools like Maven or Gradle for dependency management. Using those it’s a matter of e.g. mvn dependency:tree to see what get’s pulled in from where. Explicitly listing dependencies in the reference documentation creates redundancy and increases the risk of inconsistencies which we want to avoid.

I haven’t really heard of people being heavily plagued by runtime errors related to logging when starting with Spring Data modules. Especially if you follow the sample projects or even use Spring Boot to get started finding dependencies is either really simple or even being taken care of for you.

No support for dynamic nature of documents

The biggest reason to use a document-oriented database like MongoDb is for its dynamic nature. Things like the fact that one document can be different from the other, so I can have multiple versions in the same collection and progressively upgrade my users. Documents can be nested. Key names don’t have to be known in advance, so I can insert a map of properties directly into my document.

I am not sure where this comes from, and would be highly interested what triggered the impression. You can have totally diverse documents in a collection with Spring Data, you can of course nest documents, you can also persist Maps or nested Maps easily. All of these original claims are so utterly and obviously wrong, they really make me wonder how you use Spring Data MongoDB, play with the smallest examples and still come with statements like these. It’s a bit like stating: “The bad thing about the Spring framework is that it doesn’t support transactions.”

Conclusion

In general the entire Spring Data team is very open to feedback from the community - we do our best to regularly answer questions in our forums and on stack overflow, usually have a look at incoming JIRA issues quickly. I can’t remember the blog author to have reached out for support or help in any of those - if I’ve overseen something here, I am sorry. Beyond that, we’ve seen an increasing amount of pull requests being submitted against Spring Data MongoDB recently and we happily go ahead and integrate them. So if there’s really something incredibly missing but important to you, there’s always the option to roll up your sleeves and give the feature a spike.

On a final note: if you’re a software developer ever being so frustrated about a piece of technology, that it makes you want to write a blogpost like the just discussed one - you know you’ve done something wrong. You could have reached out for help, raised your voice way before you reached this level of frustration. I encourage everyone to go ahead and ask questions, file bugs and get in touch with us. We’re eager to get better in any way we can and do our best to continuously improve - the software we write and the processes we work in. Feedback helps, rage posts do not.