Using a TEXT index

Step 1: Create a new property

The database used for this course does not have the type of data that will benefit from a TEXT index. A TEXT index performs better than a RANGE index when the property being indexed contains a lot of duplicate values.

Let’s temporarily create some string properties in the graph that will contain many duplicates. You will add a ratingValue property to all RATED relationships where the value is the concatenation of the user name, rating, and year of the movie.

Run this code that adds ratingInfo property to all RATING relationships (Note: It will take ~20 seconds to execute.)

cypher
CALL apoc.periodic.iterate(
  "MATCH (u:User)-[r:RATED]->(m:Movie) RETURN  left(toString(m.year),2) AS rY, id(r) AS rId",
  "MATCH ()-[r:RATED]->()  WHERE id(r) = rId SET r.ratingY = rY",
  {batchSize:1000})

Step 2: Query the graph

After having added this property to all RATING relationships in the graph, we have this query that you can run:

cypher
PROFILE MATCH ()-[r:RATED]->(m:Movie)
// movies in the 1900's
WHERE r.ratingY CONTAINS "9"
RETURN m.title,m.year, r.rating

Repeat this query to get the true elapsed time and total db hits. The second query is required because the first query adds the query to the query cache because it needs to parse the query and plan it.

Step 3: Add RANGE index

Add a regular RANGE index to the graph using:

  • index_name: RATED_ratingY

  • relationship_type: RATED

  • property_key: ratingY

Step 4: Repeat the query

After having added the RANGE index. Execute this query twice:

cypher
PROFILE MATCH ()-[r:RATED]->(m:Movie)
// movies in the 1900's
WHERE r.ratingY CONTAINS "9"
RETURN m.title,m.year, r.rating

Do you see that the RANGE index is used and the total db hits and elapsed time is lower?

Step 5: Add TEXT index

Add a TEXT index to the graph using:

  • index_name: RATED_ratingY_text

  • relationship_type: RATED

  • property_key: ratingY

Step 6: Repeat the query

After having added the TEXT index. Execute this query twice:

cypher
PROFILE MATCH ()-[r:RATED]->(m:Movie)
// movies in the 1900's
WHERE r.ratingY CONTAINS "9"
RETURN m.title,m.year, r.rating

Do you see that the TEXT index is used? This is because the query planner will use a TEXT index if one exists when the predicate uses CONTAINS or ENDS WITH. The performance of the query in this scenario does not benefit from the TEXT index.

Validate Results

Once you have completed the six steps of this Challenge, click the Check Indexes button and we will check the database for you.

Hint

Index names, relationship type names, and property key names are all case-sensitive.

Those properties should be used to create the composite index. You can type SHOW INDEXES after you have created the index to confirm that it created the RANGE index and TEXT indexes.

If you mess up, you can reload this Challenge page and you should be where you need to be at the beginning of this challenge.

Then you would need to perform the steps of this Challenge again.

Solution

Here are the statements to create the RANGE and TEXT indexes:

cypher
CALL apoc.periodic.iterate(
  "MATCH (u:User)-[r:RATED]->(m:Movie) RETURN  left(toString(m.year),2) AS rY, id(r) AS rId",
  "MATCH ()-[r:RATED]->()  WHERE id(r) = rId SET r.ratingY = rY",
  {batchSize:1000});

 CREATE  INDEX RATED_ratingY IF NOT EXISTS FOR ()-[x:RATED]-() ON (x.ratingY);

 CREATE  TEXT INDEX RATED_ratingY_text IF NOT EXISTS FOR ()-[x:RATED]-() ON (x.ratingY)

Summary

In this Challenge, you demonstrated that you can create a TEXT index in the graph in an effort to improve the performance of a query. A TEXT index is not always the best solution. In the next module, you will learn about full-text indexes.