Full-text Indexes in Neo4j

Full-text indexes

Full-text indexes are useful in applications that must parse property values for evaluating whether the property satisfies the criteria. Full-text indexes rely on Apache Lucene for their implementation which makes their parsing capabilities very powerful.

With a full-text index, you can use Lucene’s full-text query language to express how the values will be matched in a query. A full-text index can be defined for multiple labels and/or properties, or for multiple relationship types and/or properties.

Unlike RANGE and TEXT indexes, you must call a procedure to use a full-text index at runtime. That is, the query planner will not automatically use a full-text index unless you specify it in your Cypher code.

Why use a full-text index?

Suppose you want to find all Movies that have certain phrases in their plots.

And suppose we added a TEXT index for the plot property:

cypher
CREATE TEXT INDEX Movie_plot_text IF NOT EXISTS FOR (x:Movie) ON (x.plot)

Performing this type of retrieval using RANGE or TEXT indexes could be very expensive. For example:

cypher
PROFILE MATCH (m:Movie)
WHERE m.plot CONTAINS "murder"
AND m.plot CONTAINS "drugs"
RETURN m.title,m.plot

The default behavior in Neo4j is to use only one index for a query. A subquery can use an additional index. But for this query, the query engine needs to determine which predicate will be more efficient. The first predicate returns all "murder" rows. The second predicate returns all "drugs" rows. For our dataset, the graph engine uses the index to select all properties that contain "drugs". It can do so because it can determine using the data in the index, that the number of the rows that contain "drugs" is smaller than the number of rows that contain "murder". For those rows, the query engine, then tests the properties for "murder". That is, the index can only be used once for this query.

When using a full-text index, you can specify an expression that will find all properties that contain both strings, anywhere in them. For example, if we had a full-text index on the Movie plot property named Movie_plot_ft, we could return the nodes that have both "murder" and "drugs" in them with this code:

cypher
CALL db.index.fulltext.queryNodes
('Movie_plot_ft', 'murder AND drugs')
YIELD node

This query uses Lucene’s full-text query language to retrieve the nodes.

Another benefit of creating a full-text index is that you can specify an index on multiple properties associated with multiple labels.

Check your understanding

1. What are the advantages of full-text indexes? (Select all that apply.)

  • ✓ You can define an index on multiple node labels and multiple properties for those labels.

  • ✓ You can define an index on multiple relationship types and multiple properties for those types.

  • ❏ The data pointed to by the index can reside outside the graph.

  • ✓ You can use regular expressions for your query predicates that are quite complex.

Hint

These three advantages make full-text indexes very useful for an application.

Solution

The advantages of using full-text indexes are:

  1. You can define an index on multiple node labels and multiple properties for those labels.

  2. You can define an index on multiple relationship types and multiple properties for those types.

  3. You can use regular expressions for your query predicates that are quite complex.

2. Full-text index implementation

What is the default underlying implementation of a full-text index in Neo4j?

  • ❏ Apache SOLR

  • ❏ Elasticsearch

  • ❏ Typesense

  • ✓ Apache Lucene

Hint

One of these search libraries is an open source project by Apache.

Solution

The correct answer is Apache Lucene

Summary

In this lesson, you learned what a full-text index is in Neo4j. In the next lesson, you will learn how to create a full-text index.