Load a tree with JPA and Hibernate

Every now and again I go to a Tikal customer, who needs to model hierarchical data structure for his domain model. The hierarchy can be a department hierarchy, a category hierarchy , a geographic hierarchy, or any other hierarchy you can think of, that might be needed by the customer. Usually this hierarchy is modeled by a Tree Data Structure.
There are a few implementation variants for the tree data-structure in Java. Lets have a look at a very basic implementation using a class named “Node”. The “Node” class can be easily mapped to a table named “tree” in the DB. This table contains all nodes objects in the tree. The table will have a foreign key from each node record to its parent node's id (the primary key in the table) using a field called “parent_id”. The “Node" entity has a property called “children” - a persistent ordered list (assuming there is meaning for the children order) of child Nodes, a new feature in JPA 2. We map the join column “parent_id” for the children association. Here is sample of the “Node” entity Java class:
 


@Entity 
@Table(name = "tree") 
public class Node { 

	@Id 
	@GeneratedValue 
	private Integer id; 

	@NotNull 
	private String name; 

	@OneToMany 
	@JoinColumn(name = "parent_id") 
	@OrderColumn
	private List<Node> children = new LinkedList<Node>(); 

	... 
} 

 

The Problem

What is the best way to load the tree per end-use request?

Lets assume for our discussion, the tree is not huge, less than a few hundreds of nodes, and you really need to load it all (this is also a valuable question). A naive approach, is to load the tree's root node, and then traverse its descendants.
The problem with this attitude, it will create N queries on a tree with N nodes -  For each and every node in the tree, Hibernate will issue an SQL that fetches the node children.

So is there a better way to load the whole tree, and avoid these N queries ?

 

The Solution

Luckily, there is a better way...
We can take advantage of the JPA's persistence context (part of the JPA spec). When we issue a query that loads all nodes with their children, all nodes are loaded into the JPA persistence context. Actually, the same node can be loaded either as a parent, a child or both. However, the JPA spec requires that the same entity instance will exist only once in the same persistence context, Thus both references will point to the same instance. The result is that we get the whole hierarchy interrelated. Last thing we need to do, is loading the root category, which will be loaded from the persistence context rather than the DB since it was loaded with all nodes in the query.

So our data access method to load the whole tree remains simple. Lets have a look on it:

public Node getTree(){ 
  entityManager.createNamedQuery
    ("findAllNodesWithTheirChildren").getResultList(); 
   Node root = entityManager.find(Node.class, ROOT_ID); 
   return root;
} 

 

Here is the explanation for the code above:
The injected entityManager is been used with JPA in our DAO class. The ROOT_ID is a constant that holds the id for the root category in an application cache. We can see that our method contains only two lines to get the whole tree.
The first line uses a named query called  “findAllNodesWithTheirChildren”. This query loads all nodes from the DB with left join fetch of their children.
Here is the named query implementation for our needs:
select n from Node n left join fetch n.children

Please note we are not making explicit use of the result list in our method above. The usage is done implicitly, since all nodes are automatically loaded into the persistence context.

The second line just loads the root node. Since the root node has already been entered the persistence context (due to the first line), there will be no extra select for the root.
 

Working With Bidirectional Tree

Another use case is when we have the tree built from nodes with parent-child bi-directional association. In this case, we have both association from parent node to its children, and we also have a reverse association from the child to the parent. Since we have a bi-directional association with index column (to preserve the order of the children in the list), we can either map the index column. Another option is to make the OneToMany-side the owning side by removing the "mappedBy" element and setting the @JoinColumn on the ManyToOne-side as "insertable=false" and "updateable=false".

I chose to do the second option. Here is the code with the added association named “parent”:

@Entity
@Table(name = "tree")
public class Node {

	@Id
	@GeneratedValue
	private Integer id;

	@NotNull
	private String name;

	@OneToMany
	@OrderColumn
	@JoinColumn(name = "parent_id")
	private List<Node> children = new LinkedList<Node>();
	
	@ManyToOne(fetch=FetchType.LAZY)
	@JoinColumn(name = "parent_id",insertable=false,updatable=false)
	private Node parent;

	…
}

 

Assuming we want to use the same trick of loading all nodes, Should we change our query, by adding a fetch join for “parent” association ? The answer is … NO!.
The reason we should NOT apply fetch query on the parent (even-though its declared as lazy) is because “parent” association is been initialized automatically without extra selects - All nodes are loaded in the query and , when they are loaded the children nodes “parent” association are been initialized. For this reason the query should be remained as is with fetch join to the children only.

 

Summary

We are done now – We ended with only one select that loads the whole tree (all N nodes are been loaded) instead of N selects. This is true with either unidirectional or bidirectional tree, and when we changed our mapping to be bidirectional, we didn't change the query that loads the tree.
Obviously the second option of loading all the nodes with query runs in order of magnitude faster.
Of course, if the tree is read mostly, its also  strongly recommended to use the 2nd level cache, and we can spare the single select to the DB per each end user request.

Comments

Please forgive me, I'm a newbie to hibernate. Could you please provide an example how to get the query. I'm missing something and can't seem figure it out.

The query can be as follow:

select n from Node n
left join fetch n.children
left join fetch n.parent

 

How would you do this when you have a many to many relations ship. In that a node can have more then one parent and multiple children?

 You can do fetch join more than one relationship, as follow

select n from Node n
left join fetch n.childrenOfTypeA
left join fetch n.parentOfTypeA
left join fetch n.childrenOfTypeB
left join fetch n.parentOfTypeB

 

Hi am trying to display menus dynamically (Tree Structure) so that user can access the links to which they have access to. I tried a similar approached but could not achieve that can you suggest me on this. It will be greater if u can give a sample of the above approach if possible  

 Assuming you want to load on the menu tree in advance, you can define your entity as follows:

@Entity 
@Table(name = "menus") 
@NamedQueries({
@NamedQuery(name="findRootMenusWithDescentants", 
			query="select m from Menu m left join fetch m.submenus")})
public class Menu { 

	@Id 
	@GeneratedValue 
	private Integer id; 

	@NotNull 
	private String name; 

	@OneToMany 
	@JoinColumn(name = "parent_menu_id") 
	@OrderColumn
	private List<Menu> submenus = new LinkedList<Menu>(); 

	... 
} 

 

 

And then, you can write your DAO implementation:

public class MenuDaoJPAImpl() implements MenuDao{
	@PersistenceContext
	private EntityManager entityManager;
	
	public Menu findRootMenusWithDescentants(){
		entityManager.createNamedQuery("findRootMenusWithDescentants").getResultList();
   		return entityManager.find(Node.class, ROOT_MENU_ID);
	}
	...
}

 

 

 Hi Yanai, 

 

After reading your article, I have tried this approach under JPA 2.0 + EclipseLink :

 



public class Node {	
    public  Node(){};
   
    @Id
    @Column(name="ID")    
    @Setter 
    @Getter 
    String oid;
	
    
    @Setter 
    @Getter 
    @Column(name="data")
	String data;
    
    @Getter 
    @OneToMany(etch=FetchType.EAGER)
    @JoinColumn(name="PARENTID")
    public List<Node > children = new LinkedList<Node >(); 
   
 
}

 

It works fine , but  this generate 1 select for row.  Depends on the FetchType loads all childs when you do a find of the root (FetchType.Eager) , or LazyLoad when you explictilly call to the children ,  but always 1 select for row

 

If  you have  this structure, for example

 

  • Node0
  • Node1 , Parant Node 0
  • Node2 , Parant Node 0

 

When i do "find" of root node , This is going to make 3 selects . Depends on the FetchType , it makes this 3 selects on different times, but i have checked tha always  do 3, 4, or ... "n" depends on the deep of the tree...

 

I am doing somthing worng, but don't know what!! -:((

 

 Any suggestion or any idea??

 

Thank yoou very much.

 

Azim,

 

Hi Azuimutz,
 
Please note that EAGER defines if the relationship should be loaded, not how it should be accessed from the database. I suggest you should remove the EAGER, since in some other cases you may want to load only pa particular node. When you want to load the whole tree, please use a query that apply the "fetch" sematic as explained in the article.
 
Thank you,
Yanai

 

 Hi Yanai!

 

I have removed de Eager type fetch and created a named query :



@Entity

@NamedQueries({
@NamedQuery(name="findRootMenusWithDescentants", query="select a  from NodeTable a left join fetch a.children")})
@Table(name = "TABLE")
public class NodeTable {	
 public   NodeTable (){};
   
    @Id
    @Column(name="OID")    
    @Setter 
    @Getter 
    String oid;
	
    
    @Setter 
    @Getter 
    @Column(name="DATA")
    String DATA;
    
  
    @OneToMany
    @JoinColumn(name="PARENTID")
    public List<NodeTable > children = new LinkedList<NodeTable >(); 

 

Just executing the namedQuery :  List<NodeTable>  node= getJpaTemplate().findByNamedQuery("findRootMenusWithDescentants");

I see the resulting queryes into the logs :

 

ion(31172322)--Thread(Thread[http-8080-1,5,main])--SELECT t1.OID, t1.DATA, t0.OID, t0.DATA FROM TABLE t1 LEFT OUTER JOIN TABLE t0 ON (t0.PARENTOID = t1.OID)
[EL Fine]: 2012-03-03 19:07:00.828--ServerSession(5951541)--Connection(16645359)--Thread(Thread[http-8080-1,5,main])--SELECT OID, DATA FROM TABLE WHERE (PARENTOID = ?)
	bind => [1]
[EL Fine]: 2012-03-03 19:07:01.0--ServerSession(5951541)--Connection(21343203)--Thread(Thread[http-8080-1,5,main])--SELECT OID, DATA FROM TABLE WHERE (PARENTOID = ?)
	bind => [2]
[EL Fine]: 2012-03-03 19:07:01.171--ServerSession(5951541)--Connection(16501193)--Thread(Thread[http-8080-1,5,main])--SELECT OID, DATA FROM TABLE WHERE (PARENTOID = ?)
	bind => [21]
[EL Fine]: 2012-03-03 19:07:01.281--ServerSession(5951541)--Connection(29651212)--Thread(Thread[http-8080-1,5,main])--SELECT OID, DATA FROM TABLE WHERE (PARENTOID = ?)
	bind => [12]
[EL Fine]: 2012-03-03 19:07:01.343--ServerSession(5951541)--Connection(31055936)--Thread(Thread[http-8080-1,5,main])--SELECT OID, DATA FROM TABLE WHERE (PARENTOID = ?)
	bind => [211]
[EL Fine]: 2012-03-03 19:07:01.421--ServerSession(5951541)--Connection(10168117)--Thread(Thread[http-8080-1,5,main])--SELECT OID, DATA FROM TABLE WHERE (PARENTOID = ?)
	bind => [22]
[EL Fine]: 2012-03-03 19:07:01.484--ServerSession(5951541)--Connection(12193236)--Thread(Thread[http-8080-1,5,main])--SELECT OID, DATA FROM TABLE WHERE (PARENTOID = ?)
	bind => [11]

 

Like you see, the namedquery executes a LeftJoin but also a query per node.....I think i am doing something wrong but i don't know what...

 

I supossed that this solution was going to do one global query (theleft join fetch) but after, i don't know why it do one query per node

 

Any idea or suggestion?

 

Thank yoy very much,

 

 Hi,

 

Sorry for the late response. Anyway, I haven't tried it with EclipseLink, but I tried your code in JPA2+Hibernate, and it produces only one query. In this sense, Hibernate is more efficient, since since it caches all the children associations better, without the need to apply the additional queries.

 

Thank you,

Yanai

Yanai's approach is great! But maybe there is a little defect. I wonder the Entity with the NamedQuery should be:

 

@Entity

@NamedQueries({
@NamedQuery(name="findRootMenusWithDescentants", query="select a  from NodeTable a left join fetch a.parent")})
@Table(name = "TABLE")
public class NodeTable { 
 public   NodeTable (){};
  
    @Id
    @Column(name="OID")   
    @Setter
    @Getter
    String oid;
 
   
    @Setter
    @Getter
    @Column(name="DATA")
    String DATA;
    
  

    @ManyToOne 

    @JoinColumn(name="PARENTID")

     private NodeTable parent;

    @OneToMany(mapby="parent")
    public List<NodeTable > children = new LinkedList<NodeTable >();

 

 

I test this with EclipseLink. Only one query was executed. but i haven't test according your original namedquery.

  

 

Please note that Azuimuts has a uni-directional mapping (so there is no parent in his example) - With Hibernate this works fine with both uni-directional and bi-directional mapping.

 

In addition, please note that in your query you are missing the "fetch" join to the children. This means that the "children" association for each and every node will NOT be loaded in your query. To check this try to traverse your tree from root to the leaf. In my opinion, you will get N+1 queries.

 

To prevent this you must also add the fetch join to the children, as follow:

 

@NamedQuery(name="findRootMenusWithDescentants", query="select a  from NodeTable a left join fetch a.parent left join fetch a.children")}) 

Yeah, Yanai, you're almost all right. But in fact, with EclipseLink, I found at last, this approach is not available at all.

At first, I run the app of which the named query is just join the parent in debug mode. Just before return the root node, I found the only one join query and one id search query were invoked. So I'd thought that was what we'd expected. But later I found when I visit the elment of the list of children, each visiting to one results in querying children of that one, even when view the the child in debug watch window. So what I said above is not right.

In fact, I guess, EclipseLink doesn't check if the object's parent is the same type. it treats the children as different type classes. So when we invoke namedquery which joined child list, even in process of the invoking of the query and before find the root, EclipseLink create additional n-1 queries to get the children of every object in the returned list of the first query. It doesn't know in fact all children have been retrieved and at the following position of the returned list. It only registed the root object and the direct child objects in cache system. Hibernate maybe optimized the cache algorithm in this situation that the parent and the children are same type.

Has JPA given the standard strategy of Obect caching in this situation?

Now I thought out a more common approach to resolve the parent-child relation map problem. The approach is only mapping the parent and adding the child to the non-persisted list property in a @postload method. The code as following:

 

 

 

package com.bioroad.app.web.manage.model;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.List;
import javax.persistence.*;
import javax.validation.constraints.NotNull;
import javax.validation.constraints.Size;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlTransient;

/**
 *
 * @author hermit
 */
@Entity
@Table(name = "privilege_point")
@XmlRootElement
@NamedQueries({
    @NamedQuery(name = "PrivilegePoint.findAll", query = "SELECT p FROM PrivilegePoint p left join fetch p.parent"),
    @NamedQuery(name = "PrivilegePoint.findBySn", query = "SELECT p FROM PrivilegePoint p WHERE p.sn = :sn"),
    @NamedQuery(name = "PrivilegePoint.findByName", query = "SELECT p FROM PrivilegePoint p WHERE p.name = :name"),
    @NamedQuery(name = "PrivilegePoint.findByDescription", query = "SELECT p FROM PrivilegePoint p WHERE p.description = :description")})
public class PrivilegePoint implements Serializable {
    private static final long serialVersionUID = 1L;
    
    @Id
    @Basic(optional = false)
    @NotNull
    @Column(name = "sn")
    @SequenceGenerator(allocationSize=1, name="privilege_point_sn_seq", sequenceName="privilege_point_sn_seq")
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "privilege_point_sn_seq")
    private Integer sn;
    @Size(max = 1000)
    @Column(name = "name")
    private String name;
    @Size(max = 500)
    @Column(name = "description")
    private String description;
//    @OneToMany(mappedBy="parent", fetch = FetchType.LAZY)
    @Transient
    private List<PrivilegePoint> privilegePointList = new ArrayList<>();
    @JoinColumn(name = "parent", referencedColumnName = "sn")
    @ManyToOne(fetch = FetchType.EAGER)
    private PrivilegePoint parent;

    public PrivilegePoint() {
    }

    public PrivilegePoint(Integer sn) {
        this.sn = sn;
    }

    public Integer getSn() {
        return sn;
    }

    public void setSn(Integer sn) {
        this.sn = sn;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getDescription() {
        return description;
    }

    public void setDescription(String description) {
        this.description = description;
    }

    @XmlTransient
    public List<PrivilegePoint> getPrivilegePointList() {
        return privilegePointList;
    }

    public void setPrivilegePointList(List<PrivilegePoint> privilegePointList) {
        this.privilegePointList = privilegePointList;
    }

    public PrivilegePoint getParent() {
        return parent;
    }

    public void setParent(PrivilegePoint parent) {
        if (this.parent != null) {
            this.parent.getPrivilegePointList().remove(this);
        }
        this.parent = parent;
        if (this.parent != null) {
            this.parent.getPrivilegePointList().add(this);
        }
    }

    @PostLoad
    public void addToParent() {
        if (this.parent != null) {
            int pos = this.parent.getPrivilegePointList().indexOf(this);
            if (pos == -1) {
                this.parent.getPrivilegePointList().add(this);
            } else {
                // if the node and its parent are reload, 
                // its parent will be merge with the one with
                // same id in the cache, so its parent has 
                // the the child with same id of this but 
                // refers an old object. So need to replace 
                // the old one with this object.
                this.parent.getPrivilegePointList().set(pos, this);
            }
        }
    }
    
    @Override
    public int hashCode() {
        int hash = 0;
        hash += (sn != null ? sn.hashCode() : 0);
        return hash;
    }

    @Override
    public boolean equals(Object object) {
        if (!(object instanceof PrivilegePoint)) {
            return false;
        }
        PrivilegePoint other = (PrivilegePoint) object;
        if ((this.sn == null && other.sn != null) || (this.sn != null && !this.sn.equals(other.sn))) {
            return false;
        }
        return true;
    }

    @Override
    public String toString() {
        return "com.bioroad.app.web.manage.model.PrivilegePoint[ sn=" + sn + " ]";
    }
    
}




@Stateless
public class PrivilegePointFacade {
    @PersistenceContext(unitName = "BiotechWebAppPU")
    private EntityManager em;

    @Override
    protected EntityManager getEntityManager() {
        return em;
    }
    
    public PrivilegePoint getRoot() {
        List<PrivilegePoint> list = em.createNamedQuery("PrivilegePoint.findAll", PrivilegePoint.class).getResultList();
        PrivilegePoint root = em.find(PrivilegePoint.class, Integer.valueOf(1));
        return root;
    }
}

 

 

Invoke the PrivilegePointFacade's getRoot method to get the Root Node and all of its children. After execute the namedQuery, because only parent property is persisted, EclipseLink (or say, JPA) think the objects in the returned list is completely loaded without any property not resolved. So it cached all the objects. @postload annotation as JPA standard annotation which annotating the method which need callback after the entity was loaded. Here the method addToParent is called back and add the entity itself to its parent's child list.
 

Hi yanai,

I was wondering if you could help me in my use case, which is slightly different. I can have several trees in the table, and I only want to load a single tree at a time in the query. I am using Hibernate as the JPA provider, and I have a uni-directional relationship in my Node entity. I modified the query to the following and it seems to be working based on my limited test cases, but I wanted to verify it with you.

 

select distinct n from Node n left join fetch n.children where n.id = :rootId

 

I had to use the "distinct" keyword, otherwise I see duplicates for the root node. 

 

**Edit** I added some debug statements and realized that when I use this query, only the root node seems to be returned. However when I traverse the children nodes, I don't see output from Hibernate going out to fetch data. I guess my console must not be set up correctly? There's no way my result set size = 1 but when I traverse its 2 children, I'm not seeing Hibernate's additional queries. Anyway, if you can help me with my query that will be great. I want to limit the result set to only the nodes in the tree I'm querying for.

 

Thanks.

 

Found the problem in my tests: entityManager was returning the same instance of the root node that I used to create the tree in my queries. The create call was just a line up from the query call:

Node tree = createTree();
dao.persist(tree);
Node result = dao.loadTree(ROOT_ID);
// tree and result here are the same instance!
tree == instance // this comes out to true

The bad news is once I updated my test to detach the "tree" node after the call to persist(), none of my queries are working. The tree nodes are returned the query result list, but the children relationship is not set. I.e. result.getChildren() returns an empty collection. My mappings are as follows:

    
    @ManyToOne(fetch=FetchType.LAZY)
    @JoinColumn(name = "parent_Id",insertable=false,updatable=false)
    private Node parent;

    @OneToMany(cascade = CascadeType.ALL, orphanRemoval = true)
    @OrderColumn
    @JoinColumn(name= "parent_Id")
    private List<Node> children = new LinkedList<Node>();

 

This is the query I'm using:

select n from Node n left join fetch n.children

 

Do you know why the children List is not initialized, even though the children nodes are in the results returned?

Please change the "detach", to tx.commit(). This will commit all your tree nodes to the DB and also make your entities detached.

Then create yet another new transaction and load the tree within it. If it still doesn't work, please make sure your createTree() works correctly.

Just out of curiosity: why aren't you mapping the Node like this?:

 

	@ManyToOne
	@JoinColumn(name = "parent_id", referencedColumnName = "id")
	protected Node parent;

	@OneToMany(mappedBy = "parent")
	@OrderColumn(name = "ordinal_nbr")
	protected List<Node> children;
    

 

This saves the double @JoinColumn + the read-only attributes (insertable = false, updatable = false) on the children...

 Hi,

 

As said in the article, there are two options to map the bi-directional relationship (according to the "owner" of the relationship), and I chose the second one , since I guess people are less familiar with it. The same idiom of loading the tree I presented, works for both methods.

 

Thank you,

Yanai