I've been updating some training materials recently, and thinking about better ways of teaching and talking about JPA. One of the things I've been thinking about is how we have typically used JPA, and how that should change given the pains I've experienced (and observed).
JPA is often seen as a set of annotations (or XML files) that provide O/R (object-relational) mapping information. And most developers think that the more mapping annotations they know and use, the more benefits they get. But the past few years of wrestling with small to medium monoliths/systems (with about 200 tables/entities) have taught me something else.
TL;DR
- Reference entities by ID (only map entity relationships within an aggregate)
- Don't let JPA steal your Identity (avoid
@GeneratedValue
when you can) - Use ad-hoc joins to join unrelated entities
Reference Entities by Identifier
Only map entity relationships within an aggregate.
Tutorials (and training) would typically go about teaching and covering all possible relationship mappings. After basic mappings, many would start from simple uni-directional @ManyToOne
mapping. Then proceed to bi-directional @OneToMany
and @ManyToOne
. Unfortunately, most often than not, they fail to explicitly point out that it is perfectly fine to not map the relationship. So, beginners would often complete the training thinking that it would be a mistake to not map a related entity. They mistakenly think that a foreign key field must be mapped as a related entity.
In fact, it is not an error, if you change the @ManyToOne
mapping below…
@Entity public class SomeEntity { // ... @ManyToOne private Country country; // ... } @Entity public class Country { @Id private String id; // e.g. US, JP, CN, CA, GB, PH // ... }
…into a basic field that contains the primary key value of the related entity.
@Entity public class SomeEntity { // ... @Column private String countryId; // ... } @Entity public class Country { @Id private String id; // e.g. US, JP, CN, CA, GB, PH // ... }
Why is this a problem?
Mapping all entity relationships increases the chances of unwanted traversals that usually lead to unnecessary memory consumption. This also leads to an unwanted cascade of EntityManager
operations.
This may not be much if you're dealing with just a handful of entities/tables. But it becomes a maintenance nightmare when working with dozens (if not hundreds) of entities.
When do you map a related entity?
Map related entities only when they are within an Aggregate (in DDD).
Aggregate is a pattern in Domain-Driven Design. A DDD aggregate is a cluster of domain objects that can be treated as a single unit. An example may be an order and its line-items, these will be separate objects, but it's useful to treat the order (together with its line items) as a single aggregate.
@Entity public class Order { // ... @OneToMany(mappedBy = "order", ...) private List<OrderItem> items; // ... } @Entity public class OrderItem { // ... @ManyToOne(optional = false) private Order order; // ... }
More modern approaches to aggregate design (see Vaughn Vernon's Implementing Domain-Driven Design) advocate a cleaner separation between aggregates. It is a good practice to refer to an aggregate root by storing its ID (unique identifier), not a full reference.
If we expand the simple order example above, the line-item (OrderItem
class) should not have a @ManyToOne
mapping to the product (since it is another aggregate in this example). Instead, it should just have the ID of the product.
@Entity public class Order { // ... @OneToMany(mappedBy = "order", ...) private List<OrderItem> items; // ... } @Entity public class OrderItem { // ... @ManyToOne(optional = false) private Order order; // @ManyToOne private Product product; // <-- Avoid this! @Column private ... productId; // ... }
But… what if the Product
(aggregate root entity) has its @Id
field mapped as @GeneratedValue
? Are we forced to persist/flush first and then use the generated ID value?
And, what about joins? Can we still join those entities in JPA?
Don't Let JPA Steal Your Id
entity
Using @GeneratedValue
may initially make the mapping simple and easy to use. But when you start referencing other entities by ID (and not by mapping a relationship), it becomes a challenge.
If the Product
(aggregate root entity) has its @Id
field mapped as @GeneratedValue
, then calling getId()
may return null
. When it returns null
, the line-item (OrderItem
class) will not be able to reference it!
In an environment where all entities always have a non-null
Id
field, referencing any entity by ID becomes easier. Furthermore, having non-null
Id
fields all the time, makes equals(Object)
and hashCode()
easier to implement.
And because all Id
fields become explicitly initialized, all (aggregate root) entities have a public
constructor that accepts the Id
field value. And, as I've posted long ago, a protected
no-args constructor can be added to keep JPA happy.
@Entity public class Order { @Id private Long id; // ... public Order(Long id) { // ... this.id = id; } public Long getId() { return id; } // ... protected Order() { /* as required by ORM/JPA */ } }
But beware! When using Spring Data JPA to save()
an entity that does not use @GeneratedValue
on its @Id
field, an unnecessary SQL SELECT
is issued before the expected INSERT
. This is due to SimpleJpaRepository
's save()
method (shown below). It relies on the presence of the @Id
field (non-null
value) to determine whether to call persist(Object)
or merge(Object)
.
public class SimpleJpaRepository // ... @Override public <S extends T> save(S entity) { // ... if (entityInformation.isNew(entity)) { em.persist(entity); return entity; } else { return em.merge(entity); } } }
The astute reader will notice that, if the @Id
field is never null
, the save()
method will always call merge()
. This causes the unnecessary SQL SELECT
(before the expected INSERT
).
Fortunately, the work-around is simple — implement Persistable<ID>
.
@MappedSuperclass public abstract class BaseEntity<ID> implements Persistable<ID> { @Transient private boolean persisted = false; @Override public boolean isNew() { return !persisted; } @PostPersist @PostLoad protected void setPersisted() { this.persisted = true; } }
The above also implies that all updates to entities must be done by loading the existing entity into the persistence context first, and applying changes to the managed entity.
Use Ad-hoc Joins to Join Unrelated Entities
And, what about joins? Now that we reference other entities by ID, how can we join unrelated entities in JPA?
In JPA version 2.2, unrelated entities cannot be joined. However, I cannot confirm if this has become a standard in version 3.0, where all javax.persistence
references were renamed to jakarta.persistence
.
Given the OrderItem
entity, the absence of the @ManyToOne
mapping causes it to fail to be joined with the Product
entity.
@Entity public class Order { // ... } @Entity public class OrderItem { // ... @ManyToOne(optional = false) private Order order; @Column private ... productId; // ... }
Thankfully 😊, Hibernate 5.1.0+ (released back in 2016) and EclipseLink 2.4.0+ (released back in 2012) have been supporting joins of unrelated entities. These joins are also referred to as ad-hoc joins.
SELECT o FROM Order o JOIN o.items oi JOIN Product p ON (p.id = oi.productId) -- supported in Hibernate and EclipseLink
Also, this has been raised as an API issue (Support JOIN/ON for two root entities). I really hope that it will become a standard soon.
In Closing
What do you think about the above changes? Are you already using similar approaches? Do you use native SQL to explicitly retrieve a generated value (e.g. sequence object) to create an entity with a non-null
Id
field? Do you use entity-specific ID types to differentiate ID values? Let me know in the comments below.