Efficiently delete data with JPA and Hibernate

You may come to the situation where you have to perform a bulk deletion on a huge amount of datasets stored in a relational database. If you use JPA with Hibernate as underlying OR mapper, you might try to call the remove() method of the EntityManager in a way like the following:

public void removeById(long id) {
    RootEntity rootEntity = entityManager.getReference(RootEntity.class, id);
    entityManager.remove(rootEntity);
}

First of all, we load a reference representation of the entity we want to delete and then pass this reference to the EntityManager. Let’s assume the RootEntity from above has a child relation to a class called ChildEntity:

@OneToMany(mappedBy = "rootEntity", fetch = FetchType.EAGER, cascade = CascadeType.ALL)
private Set childEntities = new HashSet(0);

If we now turn on the property show_sql of hibernate, we will wonder what SQL statements are issued:

    select
        rootentity0_.id as id5_1_,
        rootentity0_.field1 as field2_5_1_,
        rootentity0_.field2 as field3_5_1_,
        childentit1_.PARENT as PARENT5_3_,
        childentit1_.id as id3_,
        childentit1_.id as id4_0_,
        childentit1_.field1 as field2_4_0_,
        childentit1_.field2 as field3_4_0_,
        childentit1_.PARENT as PARENT4_0_
    from
        ROOT_ENTITY rootentity0_
    left outer join
        CHILD_ENTITY childentit1_
            on rootentity0_.id=childentit1_.PARENT
    where
        rootentity0_.id=?

    delete
    from
        CHILD_ENTITY
    where
        id=?

   delete
   from
       ROOT_ENTITY
   where
       id=?

Why does Hibernate first load all data into memory in order to delete this data immediately afterwards? The reason is that JPA’s lifecycle requires that the object is in “managed” state, before it can be deleted. Only in this state all lifecycle functionality like interceptors is available (see here). Therefore Hibernate issues a SELECT query before the deletion in order to transfer both RootEntity and ChildEntity to the “managed” state.
But what can we do, if we just want to delete RootEntity and ChildEntity, if we know the id of RootEntity? The answer is to use a simple DELETE query like the following one. But due to the integrity constraint on the child table, we first have to delete all depending child entities. The following code demonstrates how:

List childIds = entityManager.createQuery("select c.id from ChildEntity c where c.rootEntity.id = :pid").setParameter("pid", id).getResultList();
for(Long childId : childIds) {
    entityManager.createQuery("delete from ChildEntity c where c.id = :id").setParameter("id", childId).executeUpdate();
}
entityManager.createQuery("delete from RootEntity r where r.id = :id").setParameter("id", id).executeUpdate();

The above code results in the three SQL statements we would have expected by calling remove(). Now you may argue, that this way of deletion is more complicated than just calling the EntityManager’s remove() method. It also ignores annotations like @OneToMany and @ManyToOne we have placed in the two entity classes.
So why not write some code that uses the knowledge about the two entities that already exists in the two class files? First of all, we look for @OneToMany annotations using reflection in the RootEntity class, extract the type of the child entity and then look for its back relation field annotated with @ManyToOne. Having done this, we can easily write the three SQL statements in a more generic way:

public void delete(EntityManager entityManager, Class parentClass, Object parentId) {
    Field idField = getIdField(parentClass);
    if (idField != null) {
        List oneToManyFields = getOneToManyFields(parentClass);
        for (Field field : oneToManyFields) {
            Class childClass = getFirstActualTypeArgument(field);
            if (childClass != null) {
                Field manyToOneField = getManyToOneField(childClass, parentClass);
                Field childClassIdField = getIdField(childClass);
                if (manyToOneField != null && childClassIdField != null) {
                    List childIds = entityManager.createQuery(String.format("select c.%s from %s c where c.%s.%s = :pid", childClassIdField.getName(), childClass.getSimpleName(), manyToOneField.getName(), idField.getName())).setParameter("pid", parentId).getResultList();
                    for (Long childId : childIds) {
                        entityManager.createQuery(String.format("delete from %s c where c.%s = :id", childClass.getSimpleName(), childClassIdField.getName())).setParameter("id", childId).executeUpdate();
                    }
                }
            }
        }
        entityManager.createQuery(String.format("delete from %s e where e.%s = :id", parentClass.getSimpleName(), idField.getName())).setParameter("id", parentId).executeUpdate();
    }
}

The methods getFirstActualTypeArgument(), getManyToOneField(), getIdField() and getOneToManyFields() in the code above are not depicted here, but do what their name sounds like. Once implemented we can easily delete all entities beginning with the root of the tree.

A simple example application that can be used to examine the behavior and solution described above, can be found on github.

Tags: , , , , ,

Leave a comment