Creating and Using Full-text Indexes

Full-text indexes in Neo4j

A full-text index is designed for queries that need to search string properties using possibly complex search predicates. It can also be created to index multiple properties in multiple node labels or relationship types.

In many cases, a full-text index will help improve the performance of queries where the predicates test complex patterns in string properties. For example:

  • All movie titles that contain "Night" followed by "Sky" later in the title.

  • All movie plots that have "murder" but not "drugs" in them.

  • All movie titles that contain "Night" and "Dead", with a plot that contains "French".

Some of these types of queries can be very expensive using standard RANGE or TEXT indexes. A full-text index may perform better. You create a full-text index on node properties or on relationship properties.

You use the CREATE clause to create a full-text index, but what makes full-text indexes different from other index types (RANGE, TEXT), is that they are not automatically used in your Cypher queries. You must call a special procedure, db.index.fulltext.queryNodes() to query node properties using the index. And you call db.index.fulltext.queryRelationships() to query relationship properties using the index.

Syntax for creating a full-text index for a node property

Here is the syntax for creating a full-text index for a property of a node:

cypher
CREATE FULLTEXT INDEX <index_name> IF NOT EXISTS
FOR (x:<node_label>)
ON EACH [x.<property_key>]

You specify the name of the index, the node label it will be associated with, and the name of the property.

  • If an index already exists in the graph with the same name, no index is created.

  • If an index does not exist in the graph with the same name:

    • No full-text index is created if there already is a full-text index for that node label and property key.

    • Otherwise, the full-text index is created.

Creating the full-text index for the plot property of a Movie node

Execute this Cypher to create your first full-text index in the graph:

cypher
CREATE FULLTEXT INDEX Movie_plot_ft IF NOT EXISTS
FOR (x:Movie)
ON EACH [x.plot]

You can use SHOW INDEXES to view your indexes in the graph.

Querying with the full-text index

Unlike RANGE and TEXT indexes that are automatically used at runtime, you must explicitly call the procedure that uses the full-text index. When you call the procedure to query using the full-text index, you use the Lucene query to retrieve the nodes. Execute this Cypher to retrieve all Movie nodes that have both "murder" and "drugs" in them.

cypher
CALL db.index.fulltext.queryNodes
("Movie_plot_ft", "murder AND drugs")
YIELD node
RETURN node.title, node.plot

When you profile this query, you will observe the elapsed time of ~4 ms (2nd execution). The total db hits statistics do not reflect the true db hits because the PROFILE will never expose details of a procedure. All that you can measure for queries that call procedures is elapsed time.

Compare TEXT index with full-text index

You should always profile your queries to determine if they might benefit from a TEXT index versus a full-text index. Let’s compare the performance of each type of index.

Create a TEXT index on the plot property:

cypher
CREATE TEXT INDEX Movie_plot_text IF NOT EXISTS
FOR (x:Movie)
ON (x.plot)

Now profile the query using the TEXT index (at least twice):

cypher
PROFILE MATCH (m:Movie)
WHERE m.plot CONTAINS "murder"
AND m.plot CONTAINS "drugs"
RETURN m.title,m.plot

The query using the TEXT index performs similarly to the full-text index for this particular query and data in the graph.

Syntax for creating a full-text index for a relationship property

Here is the syntax for creating a full-text index for a property of a relationship:

cypher
CREATE FULLTEXT INDEX <index_name> IF NOT EXISTS
FOR ()-[x:<RELATIONSHIP_TYPE>]-()
ON EACH [x.<property_key>]

You specify the name of the index, the relationship type it will be associated with, and the name of the property.

  • If an index already exists in the graph with the same name, no index is created.

  • If an index does not exist in the graph with the same name:

    • No full-text index is created if there already is a full-text index for that relationship type and property key.

    • Otherwise, the full-text index is created.

Syntax for querying relationships with the full-text index

cypher
CALL db.index.fulltext.queryRelationships
("<index_name>", "<lucene_query>")
YIELD relationship
RETURN relationship.<property>

In the next Challenge, you will have an opportunity to create and use a full-text index on a relationship property.

Syntax for creating a full-text index for multiple node labels and properties

The advantage of a full-text index is that it can span multiple node labels and multiple properties to support complex string queries.

Here is the syntax for creating a full-text index for multiple properties of multiple node labels:

cypher
CREATE FULLTEXT INDEX <index_name> IF NOT EXISTS
(x:<node_label1> | <node_label2> | ...)
ON EACH [x.<property_key1>, x.<property_key2>,...]

Creating a full-text index for multiple properties and node labels

Suppose we want to optimize this query where we are testing properties in two different node labels, Actor and Movie. We want to return the actors or Movies in the graph where:

  • The movie has "british" in its plot and "death" in its title

  • The actor has "british" and "actress" in its bio

To optimize these CONTAINS queries, create these TEXT indexes:

cypher
CREATE TEXT INDEX Movie_plot_text
IF NOT EXISTS
FOR (x:Movie)
ON (x.plot);

CREATE TEXT INDEX Movie_title_text IF NOT EXISTS
FOR (x:Movie)
ON (x.text);

CREATE TEXT INDEX Actor_bio_text IF NOT EXISTS
FOR (x:Actor)
ON (x.bio)

Next, execute this query:

cypher
PROFILE MATCH (m:Movie)
WHERE (m.plot CONTAINS "british"
OR m.plot CONTAINS "British")
AND (m.title CONTAINS "death"
OR m.title CONTAINS "Death")
RETURN m.name AS Name, m.bio AS Bio,
m.title AS Title,  m.plot AS Plot

UNION ALL

MATCH (a:Actor)
WHERE (a.bio CONTAINS "British"
OR a.bio CONTAINS "british")
AND (a.bio CONTAINS "actress"
OR a.bio contains "Actress")
RETURN a.name AS Name, a.bio AS Bio,
a.title AS Title,  a.plot AS Plot

For this query, we specify both the lower and upper case beginnings of the test strings so that the TEXT index can be used. If we were to use toLower() or to toUpper() in our predicates, the TEXT index would not be used.

The best elapsed time for this query is ~8 ms and it returns 166 rows.

Let’s see if we can do better with a full-text index. Execute this code to create the full-text index:

cypher
CREATE FULLTEXT INDEX Actor_bio_Movie_plot_title_ft IF NOT EXISTS
FOR (x:Actor | Movie)
ON EACH [x.bio, x.plot, x.title]

This index is used for Actor and Movie nodes where the properties indexed are bio, plot, and title.

Now let’s do the query using the full-text index:

cypher
PROFILE CALL db.index.fulltext.queryNodes
("Actor_bio_Movie_plot_title_ft",
"(plot: british AND title: death)
OR (bio: british  AND bio: actress)")
YIELD node
RETURN node.name AS Name, node.bio AS Bio,
node.title AS Title,  node.plot AS Plot

This query returns both movie and actor nodes that match the properties specified in the second argument to queryNodes().

This query returns in ~13 ms, better than the previous query that used the TEXT indexes.

When a full-text index is used, it calculates a "hit score" that represents the closeness of the values in the graph to the query string.

Here is an example where our search is for any node that contains matrix or reloaded in its title:

cypher
CALL db.index.fulltext.queryNodes(
     'Actor_bio_Movie_plot_title_ft',
     'title: matrix reloaded')
     YIELD node, score
WITH score, node
WHERE node:Movie
RETURN node.title, score

The nodes returned have a Lucene score based upon how much of matrix and reloaded was in the title.

You can explore some ways that full-text indexes can be used in the Cypher Reference Manual.

To get the most out of queries that use full-text indexes, you should also refer to the Lucene Query Language

Dropping full-text indexes

Since you create a full-text index with a name, execute this code to drop the full-text index:

cypher
DROP INDEX Actor_bio_Movie_plot_title_ft

You can use SHOW INDEXES to view your indexes in the graph.

Check your understanding

1. Creating a full-text index

Suppose we have a graph that contains Company nodes. One of the properties of a Company node is description. We want to be able to optimize queries that test whether the Company description contains a string pattern that has a combination of phrases. The Cypher to perform this query would not be efficient so we want to create a full-text index that will be used for our queries.

  • CREATE FULLTEXT INDEX Company_description_ft IF NOT EXISTS FOR (x:Company) ON EACH [x.description]

  • CREATE FULLTEXT INDEX Company_description_ft IF NOT EXISTS FOR (x:Company) ON (x.description)

  • CREATE FULL-TEXT INDEX Company_description_ft IF NOT EXISTS FOR (x:Company) ON (x.description)

  • CREATE FULL-TEXT INDEX Company_description_ft IF NOT EXISTS ON (Company.description)

Hint

You are creating a FULLTEXT index on every description property of each Company node.

Solution

The correct code for creating the full-text index is:

CREATE FULLTEXT INDEX Company_description_ft IF NOT EXISTS FOR (x:Company) ON EACH [x.description]

2. What can you index with full-text indexes

What can you index with a full-text index? (Select all that apply.)

  • ✓ A string property in each node with a given label.

  • ✓ Multiple string properties across multiple node labels.

  • ✓ A string property in each relationship with a given type.

  • ✓ Multiple string properties across multiple relationship types.

  • ❏ Multiple string properties across multiple node labels and relationship types.

Hint

A full-text indexes are created either for nodes or relationships, but not both.

Solution

The correct answers for what you can index with a full-text index:

  1. A string property in each node with a given label.

  2. Multiple string properties across multiple node labels.

  3. A string property in each relationship with a given type.

  4. Multiple string properties across multiple relationship types.

Summary

In this lesson, you learned how to create and use a full-text index. In the next Challenge, you will create and use full-text index on a relationship property.