Identifying What Constraints and Indexes to Create

Data model for this course

This course uses the movie recommendations dataset as a starting point for your learning.

This is the same dataset that is used in the application development courses in GraphAcademy.

Here is the graph data model:

Movie Data Model

The node labels for the graph include:

  • Person

  • Actor

  • Director

  • Movie

  • Genre

  • User

The relationships for the graph include:

  • ACTED_IN (with an optional role property)

  • DIRECTED (with an optional role property)

  • RATED (with rating and timestamp properties)

  • IN_GENRE

Also notice that the nodes have a number of properties, along with the type of data that will be used for each property.

Step 1: Identify constraints

You use constraints to:

  • Uniquely identify a node.

  • Ensure a property exists for a node or relationship.

  • Ensure a set of properties is unique and exists for a node (Node key).

Uniqueness constraints

You analyze the data requirements for the application and determine how each node will be uniquely identified. In our Movie graph, we will define uniqueness constraints for these node labels:

  • Movie nodes use movieId.

  • Person nodes use tmdbId.

  • User nodes use userId.

  • Genre nodes use name.

Existence constraints

Depending on how data is loaded or updated in the graph, you may want to further constrain that specific properties must exist for nodes or relationships. These constraints are separate from the uniqueness constraints.

For example, you may want to enforce that every role property of the ACTED_IN relationship must have a value. Or that a Person node must have a value for the name property.

Node key constraints

In addition, there may be a combination of property values for a node that you want to ensure exist and are unique for every node with that label.

For example, there cannot be two Movie nodes in the graph that have the same title and year.

Step 2: Create constraints

Next you create the constraints per your analysis.

Step 3: Load the data

You typically load the data for your application and ensure that all data loaded correctly adhering to the constraints defined. If a constraint is violated, the Cypher load will fail.

A best practice is to always use MERGE for creating nodes and relationships. MERGE first does a lookup (using the uniqueness constraint which is an index), then creates the node if it does not exist.

You can use LOAD CSV to load data or you can use the Neo4j Data Importer App. The Neo4j Data Importer App actually creates the uniqueness constraints for you.

Step 4: Identify indexes

Identifying the indexes for your graph depends on the most important use cases (queries) of your application.

For example, if this is an important query in your application:

cypher
// Find all movies for this actor
// aName is a parameter with a string value for an actor
MATCH (p:Person)-[:ACTED_IN]->(m)
WHERE p.name = $aName
RETURN m.title

The anchor of the query is the Person node with a specific name value. This query can benefit from a RANGE index on the name property.

But if your important query is the following:

cypher
// Find all actors for a movie with this in the title
// titleSubString is a portion of the title as a string
MATCH (p)-[:ACTED_IN]->(m:Movie)
WHERE m.title CONTAINS $titleSubString
RETURN p.name, m.title

The anchor of the query is the title property of Movie nodes. The test is CONTAINS. A RANGE index will help this query, but a TEXT index will perform better.

Through the remainder of this course, you will have an opportunity to create and use constraints and indexes.

Step 5: Create indexes

After you have loaded the data and identified the indexes you will need, you create the indexes.

As you test your application, an important part is testing the performance of the queries. Use cases for the application may change so the identifying and creating indexes to improve query performance will be an ongoing process during the lifecycle of your application.

Check your understanding

1. What are the uniqueness constraints?

Refer to the data model shown at the beginning of this lesson.

Before we load the data into the graph, we need to add constraints for some node labels in the graph per our data model:

  1. The person data has a unique value for the tmdbId field.

  2. The movie data has a unique value for the movieId field.

  3. The user data has a field to uniquely identify itself. There may be reviewers with the same name.

  4. The genre data has a field to uniquely identify itself.

What properties do we define uniqueness constraints for? (select all that apply)

  • ✓ Person.tmdbId

  • ✓ Movie.movieId

  • ❏ Actor.name

  • ❏ Director.name

  • ✓ User.userId

  • ❏ Genre.genreId

  • ✓ Genre.name

Hint

You do not define uniqueness constraints for the Actor and Director labels as these nodes are created as Person nodes and a node only needs one unique identifier.

This data model requires four constraints.

Solution

The correct answers are:

  • Person.tmdbId

  • Movie.movieId

  • User.userId

  • Genre.name

2. When to create indexes and constraints?

Suppose you are starting with an empty graph and you need to load millions of nodes and relationships into the graph. What is the best practice for when to create indexes and constraints in your graph? (select all that apply)

  • ❏ Create all constraints and indexes before you load the data into the graph.

  • ❏ Create all constraints and indexes after you load the data into the graph.

  • ✓ Create the constraints before you load the data into the graph.

  • ✓ Create the indexes after you load the data into the graph.

Hint

You want some checking of data and fast lookups during the loading of the data to prevent duplication of data. Write performance is diminished when indexes need to be maintained.

Solution

The correct answers are:

  • Create the constraints before you load the data into the graph.

  • Create the indexes after you load the data into the graph.

Summary

In this lesson, you learned how to identify the constraints and indexes you will need in your graph. In the next module, you will learn about creating and using constraints.