Constraints in Neo4j

Constraints in Neo4j

A constraint is implemented internally as an index. It is used to constrain what is added to the graph. There are three types of constraints you can define:

  • Uniqueness for a single node property value or a set of node property values.

  • Existence for a property of a node or relationship.

  • Existence and uniqueness for a node property value or a set of node property values (Node key).

Uniqueness constraints

A uniqueness constraint can be defined for a property of a node with a given label.

For example, in the Movie graph, we uniquely identify every Person node. To do this, we identify a property whose value will be unique for all nodes with the Person label.

Execute this query that returns all Person nodes for a person named Austin Green:

cypher
MATCH (p:Person)
WHERE p.name = 'Austin Green'
RETURN p

This query returns two Person nodes for two different people. Their names are the same, but their tmdbId values (and other property values) are different. In our Movie graph, the tmdbId value for every Person node is unique.

Importing CSV Data

If you took the GraphAcademy course Importing CSV Data into Neo4j, you have already used uniqueness constraints to load the data.

If we define a uniqueness constraint on a property for a labeled node, an error is raised if we attempt to create another labeled node with the same property value or if we set the value of an existing node property to a value that already exists in the graph.

Best Practices

A graph data modeling best practice is to always uniquely identify a node with a given label in the graph where a node will typically represent the business entities in your application.

Uniqueness constraint is an index

Neo4j implements a uniqueness constraint as an index. It is used to quickly look up a node by a property value to determine if it is unique for the graph.

For this reason, this Cypher code will execute very quickly when a uniqueness constraint is defined. You can execute this code:

cypher
MERGE (p:Person {tmdbId: '135067'})
ON CREATE
SET p.name = 'Austin Green',
    p.imdbId = '0337619',
    p.url = 'https://themoviedb.org/person/135067'
ON MATCH
SET p.updated = date()
RETURN p

For this code, graph engine first attempts to retrieve the node with the tmdbId value of '135067'. It can quickly look up the node because it has a uniqueness constraint on the tmdbId property. If there were not a uniqueness constraint (index) on this property, the graph engine would need to inspect all Person nodes and test whether they have the same tmdbId value. This is very expensive in a large graph.

If the node is not found, then the ON CREATE code is executed, otherwise the ON MATCH code is executed.

The MERGE Clause

MERGE is the best practice for creating nodes and relationships in the graph

In our graph, this node exists. The uniqueness constraint ensures that you cannot create a node with the same property value. For example, this code will return an error. Try executing this code:

cypher
CREATE (p:Person {tmdbId: '135067'})
SET p.name = 'Austin Green',
    p.imdbId = '0337619',
    p.url = 'https://themoviedb.org/person/135067'
RETURN p

The uniqueness constraint will catch the error and prevent the node from being created.

Existence constraints

An existence constraint for a node label or relationship type property means that a property must exist. Even though, by default, you need not create all nodes or relationships with the same property keys, your graph data model may benefit from requiring that a particular property key exists.

For example, in the Movie graph, we want to enforce that all Person nodes have a name property with a value. Remember that a node with a property with a null value is the same as that node not having that property.

You can create an existence constraint for a node label or relationship type property. When the node or relationship is created or updated, it must have that property with a value, otherwise and error is raised.

Suppose we had an existence constraint for the updated property for the Person nodes. If we were to execute this code. It first looks for the Person node and does not find it. Then it executes the ON CREATE clause. It will fail because the Person node has an existence constraint whereby any Person node must have the updated property, else an error is returned.

cypher
// example - do not execute
MERGE (p:Person {tmdbId: '999999'})
ON CREATE
SET p.name = 'Humpty Dumpty',
    p.imdbId = '999999',
    p.url = 'https://themoviedb.org/person/999999'
ON MATCH
SET p.updated = date()
RETURN p

You will learn how to create and test an existence constraint later in this module.

Node key constraint

A node key is a specialized type of constraint in the graph that enables you to define a set of properties for a node label that must:

  1. Exist for all nodes with that label.

  2. Be unique for all values.

For example, suppose you want to ensure that all Person nodes in the graph have unique values for Person.name and Person.tmdbId. In the earlier Austin Green example, we have two nodes with the same Person.name value, but we have unique nodes for the Person.name and Person.tmdbId value. The constraint would prevent more than one Person node to exist in the graph with this combination of values. In addition to enforcing uniqueness and existence, this type of constraint is very efficient for looking up data where multiple property values are tested.

Check your understanding

1. What type of constraint can you define to ensure a set of property values is unique?

  • ❏ Uniqueness on a node property.

  • ❏ Uniqueness on a set of relationship type properties.

  • ✓ Node key for a set of properties of a node label.

  • ❏ Uniqueness key for a set of properties of a node label.

Hint

There is one construct used to uniquely identify property values for a set of node labels.

Solution

A Node key is used to ensure that a set of property values for all nodes of that label type are unique in the graph.

2. What type of constraint can be created for node properties or relationship properties?

  • ❏ Node key

  • ❏ Uniqueness

  • ✓ Existence

  • ❏ IS NULL

Hint

There is only one type of constraint that you can define on node properties or relationship properties.

Solution

The correct answer is Existence constraint.

Summary

In this lesson, you learned about the types of constraints that Neo4j supports. In the next lesson, you will learn about creating a uniqueness constraint for a node property.