05. ORMs

Falsehoods programmers believe about using ORMs

I’ll caveat this with the fact that this is opinionated and I make no apologies for it. Complicated ORMs can and do work, but I clearly have my views on them.

Quick terminology intro!

Quick explainer:

Predicate - this describes a condition that we are enforcing upon the data. table.column = 5 would be a predicate. Hydration - this is when we take some data (say, id, name) from a database row and turn it into an easy-to-use object for our application

On queries and hydration

Developers on the whole don’t like to do the same work twice. They like to abstract problems to an interface and then not have to worry about what is happening underneath.

The first time that people come across a database table where they want a single row is implement a class for their Table with a ->find($id) function. Then they make it return an object that represents their database row. This kind of querying is excellent (with a few caveats) for bread and butter programming, but it starts to fall down in two areas:

efficient queries - what if I have 100,000 rows, but I only care about two of the columns? Why am I fetching all that extra data?
representing more complex data - things like GROUP BY, JOINs and subqueries

This is where your abstraction starts to fall apart around the seams:

if I’m just asking for 2 columns, how do I represent that in my model? What happens if I hydrate an object then someone tries to access a column that doesn’t exist?
if I’m aggregating data, then this no longer corresponds to a model at all! What am I returning?
if I’m joining an extra table, how can I represent that in my result set?

Well done! You’ve made the leap into a key realisation about database queries - meaningful clear hydration can only really be achieved on queries that consist of nothing but filtering of some form or another.

Once you’ve done a JOIN, your ORM model no longer makes any sense. Once you’ve aggregated, it no longer makes any sense. Once you’ve filtered down from pulling back all columns, your ORM model no longer makes any sense.

There are caveats to both of these in that Doctrine can kind of pull it off through a central entity manager, but suddenly the complexity of the solution has jumped through the roof.

One tiny JOIN caveat

…there is one more caveat on this, in that a JOIN can be used for two things:

to retrieve data from an additional related dataset and
to add a constraint to the data - when an INNER JOIN it essentially adds the constraints of the ON to the WHERE clause

If we’re just using the JOIN to reduce the dataset, then it doesn’t really matter

Scalar queries

So, we can have simple ORM queries that end with hydration. What if I want data I can only obtain from GROUP BYs or JOINs? That my friend is what we refer to as a scalar query.

The data we can represent when we’re not constrained by the columns of a single table can be magnificent. This is indeed what SQL was designed to do! Use the power of the relationships between your data to get exactly what you’re after!

Hang on, if all I can meaningfully do to hydrate my models is filter my data, aren’t the queries I’ll write to query for hydration and for complex data different?

I mean, there are definitely arguments to be had, but yes, they are. Which nicely leads us into the next section…

Why are we ORMing anyway?

ORM = Object Relationship Mapper. The name is entirely about simple querying and hydration. If we took the name at face value, there would be no place for scalar queries at all inside an ORM system. In reality, the ORM tends to end up taking the responsibility of all the code that communicates to the database, which means that an ORM will potentially be doing:

Migrations
Simple Queries (that can be easily be hydrated)
Complex Queries (where hydration doesn’t make sense)

This is where the difference between scalar and non-scalar queries comes in. They feel similar in principle, but in reality, you’re far better thinking of them as separate beasts.

I think the TL:DR here is that if you have a complicated set of data to return from a database. query to run, just write the SQL (or DQL if that floats your boat). Most ORMs should give you an escape hatch - don’t be afraid to use it.

Why are you so passionate about this?

I had a former employer who wrote an ORM that solved our specific migration issues and was fine for simple queries but had no logical underpinnings for what a complex query should look like, creating a horrific mash that hurts me inside to this day.

At some point I’m clearly going to end up building an ORM based on what I think, but hopefully I can distract myself with other projects rather than adding another ORM to the world, no matter how beautiful and clean I think it’d be.