Creating TEXT Indexes

TEXT Indexes in Neo4j

A TEXT index is designed for single properties of nodes or relationships that are strings.

A TEXT index can be used for:

  • Equality checks =

  • String comparisons ENDS WITH, CONTAINS

  • List membership x.prop IN ["a","b","c"]

TEXT indexes cannot be used for existence checks like RANGE indexes can.

In addition, you should only create a TEXT index if you want to optimize ENDS WITH or CONTAINS tests which are faster with TEXT indexes. In all other cases, you should use a RANGE index.

Syntax for creating a TEXT index for node property

Here is the syntax for creating a TEXT index for a single property of a node:

cypher
CREATE TEXT INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON x.<property_key>

You specify the name of the index, the node label it will be associated with, and the name of the property.

  • If an index already exists in the graph with the same name, no index is created.

  • If an index does not exist in the graph with the same name:

    • No index is created if there already is a TEXT index for that node label and property key.

    • Otherwise, the TEXT index is created.

RANGE vs. TEXT for CONTAINS or ENDS WITH predicates

In some cases, TEXT indexes will outperform RANGE indexes. Here is a query to find all Person nodes with a name that contains "Robert".

cypher
PROFILE MATCH (p:Person)
WHERE p.name CONTAINS 'Robert'
RETURN p.name

Your graph should already have a RANGE index on the name property of Person nodes.

If you run this query, you will see it uses the RANGE index. Rerun this query to get the true execution time. The first run of the query includes parsing the query and creating the execution plan. The second run of the query will always yield shorter execution times which is what you should always do to measure query performance.

Now, let’s create a TEXT index for this property. Execute this code:

cypher
CREATE TEXT INDEX Person_name_text IF NOT EXISTS
FOR (x:Person)
ON (x.name)

You can list the indexes to see that there are now two indexes on this Person node property:

  • Person_name (RANGE)

  • Person_name_text (TEXT)

Repeat the above query for the Person nodes that contain the string "Robert". Notice that it uses the TEXT index.

You should execute this query twice to get its true performance statistics. The TEXT index performs better than the RANGE index. In the next Challenge, you will create duplicate data in the graph so you can see the benefit of a TEXT index. It is important to ensure that whatever indexes you create for your graph will truly benefit the queries.

Execution time vs DB Hits

In many queries, the elapsed ms times are the same for two queries so we use the lower total db hits as a way of choosing which query is better. Total db hits are generally a good measure to compare queries, but with an index such as a TEXT index, the db hits measurement may not be adequate for testing the query performance. TEXT index usage may yield higher total db hits, but lower elapsed times.

TEXT index for list inclusion

Suppose we had created a TEXT index on the Person.bornIn property. Execute this code:

cypher
CREATE TEXT INDEX Person_bornIn_text IF NOT EXISTS
FOR (x:Person)
ON (x.bornIn)

This query would use the index:

cypher
PROFILE match (p:Person)
WHERE p.bornIn IN ["France", "Los Angeles, California, USA","Berlin, Germany" ]
RETURN p.name, p.bornIn

Why use a TEXT index?

Here are some reasons why you may choose a TEXT index over a RANGE index:

  • TEXT indexes perform better than RANGE indexes if there are many duplicate properties.

  • TEXT indexes take up less space in the graph

Syntax for creating a TEXT index for a relationship property

Here is the syntax for creating a TEXT index for a property of a relationship:

cypher
CREATE TEXT INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
ON (x.<property_key>)

You specify the name of the index, the relationship type it will be associated with, and the name of the property.

  • If an index already exists in the graph with the same name, the index is not created.

  • If an index does not exist in the graph with the same name:

    • No index is created if there already is TEXT index for that relationship type and property key.

    • Otherwise, the TEXT index is created.

Check your understanding

1. Creating a TEXT index

Suppose we have a graph that contains Company nodes. One of the properties of a Company node is name. We want to be able to optimize queries that test whether the Company name contains a string. What is the correct statement to create this TEXT index?

  • CREATE TEXT-INDEX Company_name_text IF NOT EXISTS ON (Company.name)

  • CREATE TEXT INDEX Company_name_text IF NOT EXISTS FOR (x:Company) ON (x.name)

  • CREATE TEXT_INDEX Company_name_text IF NOT EXISTS FOR (x:Company) ON (x.name)

  • CREATE TEXT INDEX Company_name_text IF NOT EXISTS ON (Company.name)

Hint

You are creating a TEXT index on the name property of a Company node.

Solution

The correct code for creating the TEXT index is:

CREATE TEXT INDEX Company_name_text IF NOT EXISTS FOR (x:Company) ON (x.name)

2. Advantage of TEXT indexes

What are some advantages of using a TEXT index? (Select all that apply.)

  • ✓ Better performance for CONTAINS or ENDS WITH predicates for large graphs.

  • ❏ Better performance for any string test predicate.

  • ✓ TEXT indexes perform better when there is a lot of duplicate data in the graph.

  • ✓ TEXT indexes use less space in the graph.

Hint

There three reasons why you would use a TEXT index.

Solution

The correct answers for the advantages of using a TEXT index:

  1. Better performance for CONTAINS or ENDS WITH predicates for large graphs.

  2. TEXT indexes perform better when there is a lot of duplicate data in the graph.

  3. TEXT indexes use less space in the graph.

Summary

In this lesson, you learned how to create a TEXT index for a property of a node or relationship. In the next Challenge, you will create duplicate properties in the graph to see that the TEXT index can be a benefit.