Limiting or Counting Results Returned

Limiting results

Although you can filter queries to reduce the number of results returned, you may also want to limit the number of results returned. This is useful if you have very large result sets and you only need to see the beginning or end of a set of ordered results. You can use the LIMIT keyword to specify the number of results returned.

cypher
MATCH (m:Movie)
WHERE m.released IS NOT NULL
RETURN m.title AS title,
m.released AS releaseDate
ORDER BY m.released DESC LIMIT 100

In this query we filter out movie nodes that have no value for the released property. Then we return the 100 most-recently released movies in our graph.

Or, we may want to determine the youngest person in the graph:

cypher
MATCH (p:Person) WHERE
p.born IS NOT NULL
RETURN p.name as name,
p.born AS birthDate
ORDER BY p.born DESC LIMIT 1

Skipping some results

In an ordered result set, you may want to control what results are returned. This is useful in an application where pagination is required.

In this query we are returning the names of people born in 1980 ordered by their birth date.

cypher
MATCH (p:Person)
WHERE p.born.year = 1980
RETURN  p.name as name,
p.born AS birthDate
ORDER BY p.born

You can add a SKIP and LIMIT keyword to control what page of results are returned.

cypher
MATCH (p:Person)
WHERE p.born.year = 1980
RETURN  p.name as name,
p.born AS birthDate
ORDER BY p.born SKIP 40 LIMIT 10

In this query, we return 10 rows representing page 5, where each page contains 10 rows.

Eliminating duplicate records

You have seen a number of query results where there is duplication in the results returned. In some cases, you may want to eliminate duplicated results. You do so by using the DISTINCT keyword.

Here is a simple example where duplicate data is returned.

cypher
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN m.title, m.released
ORDER BY m.title

Tom Hanks both acted in and directed the movie, Larry Crowne, so the movie is returned twice in the result stream.

We can eliminate the duplication by specifying the DISTINCT keyword as follows:

cypher
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN DISTINCT m.title, m.released
ORDER BY m.title

Using DISTINCT in the RETURN clause here means that rows with identical values will not be returned.

Uses of DISTINCT

You can use DISTINCT to eliminate duplication of:

  • rows returned (you have just learned this)

  • property values

  • nodes

Here is an example where we eliminate duplicate property values:

cypher
MATCH (m:Movie)
RETURN DISTINCT m.year
ORDER BY m.year

In the above query, only a single value will be returned for each Movie year. If you were to not use DISTINCT, all Movie year values would be returned.

And here is an example where we eliminate duplicate nodes:

cypher
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN  DISTINCT m

If we do not specify DISTINCT in the above query, the query returns a duplicate movie node.

Check your understanding

1. What movies have reviews?

We want to return the movies that have been reviewed.

How would you complete this query so that duplicate movie titles are not returned?

Once you have selected your option, click the Check Results query button to continue.

cypher
MATCH (m:Movie)<-[:RATED]-()
/*select:RETURN DISTINCT m.title*/
  • RETURN UNIQUE m.title

  • RETURN DISTINCT m.title

  • RETURN WITH DISTINCT m.title

  • RETURN WITH UNIQUE m.title

Hint

Another way of describing a list without duplicates is a distinct list.

Solution

Including RETURN DISTINCT m.title will return a distinct list of title property for the m node.

2. Reducing data returned

Why would you want to use LIMIT in a RETURN clause?

  • ✓ To reduce the amount of data returned to the client.

  • ❏ To return only some properties of nodes.

  • [] To determine to highest value for a property in a query.

  • [] To determine the lowest value for a property in a query.

Hint

You would use LIMIT and RETURN to control the data returned from your Neo4j database.

Solution

You could use limit to *reduce the amount of data returned to the client.

Summary

In this lesson, you learned how you can limit results returned, count results returned, and eliminate duplication im the rows returned.

In the next challenges, you will write queries to limit, count, or reduce duplication.