Using Aggregation to Count

Using count() to aggregate data

With the Cypher count() function, you can perform a count of properties, nodes, relationships, paths, or rows during query processing. When you aggregate in a Cypher statement, the query must process all patterns in the MATCH clause to complete the aggregation to either return results or perform the next part of the query.

Here is an example:

cypher
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE a.name = 'Tom Hanks'
RETURN a.name AS ActorName,
count(*) AS NumMovies

This query returns the number of movies Tom Hanks acted in. count(*) returns the number of rows retrieved in the query. What we return to the client is the name of the actor and the count of the rows.

Here is another example:

cypher
PROFILE MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
RETURN a.name AS ActorName,
d.name AS DirectorName,
count(*) AS NumMovies
ORDER BY NumMovies DESC

Here we are returning the actor name and director name pairs and aggregating on the number of rows for each pair. That is, the number of movies for each actor/director pair.

And if we wanted to collect the titles of the movies, it would use the same type of eager aggregation:

cypher
PROFILE MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
RETURN a.name AS ActorName,
d.name AS DirectorName,
count(*) AS NumMovies, collect(m.title) AS Movies
ORDER BY NumMovies DESC

Using count() on property values

You can count the number of properties to see if there are nodes that do not have a value. Here is an example:

cypher
MATCH (p:Person)
RETURN count(p) , count(p.born)

In this query, we return the number of Person nodes and the number of born properties in these nodes. We see that these numbers differ so we know that there are Person nodes that do not have the born property in the grapn.

Counting for the next part of the query

Here is an example where we start the query processing by retrieving all actors and their movies. During the query processing, we want to only return actors that have 2 movies. We use count() to further filter the query.

cypher
PROFILE MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH  a, count(*) AS NumMovies
// can add Movies to also create the list of movies
//, collect(m.title) AS Movies
WHERE NumMovies = 2
RETURN a.name AS Actor
//, Movies

Notice that when this query executes, is does an EagerAggregation for the count() before the filter step.

Counting specific nodes

In addition to counting the number of rows returned, you can also count specific nodes retrieved in the query. In this query:

cypher
MATCH (p:Person {name: 'Elvis Presley'})-[]-(m:Movie)-[]-(a:Actor)
RETURN count(*), count(m), count (a)

All count values are the same. The total number of rows is 34. The total number of Actor nodes and Movie nodes is also 34 because the nodes are repeated in each row.

You specify DISTINCT so that the exact number of movies and actors are counted with no duplicates:

cypher
MATCH (p:Person {name: 'Elvis Presley'})-[]-(m:Movie)-[]-(a:Actor)
RETURN count(*), count(DISTINCT m), count (DISTINCT a)

This query returns the total count of rows returned (34), the number of distinct movies (11) and the number of distinct actors (33).

Check your understanding

1. Counting rows

Suppose you have a query that retrieves Movie and Person nodes:

Partial
MATCH (p:Person)-[]-(m:Movie)

Which Cypher clause do you use to return the number of rows retrieved?

  • RETURN total(*)

  • RETURN count(*)

  • RETURN totalRows(*)

  • RETURN numRows(*)

Hint

This Cypher function counts rows retrieved in the query step.

Solution

You use the count(*) function to return the number of rows retrieved in this step of the query.

2. Movies that are rated

We want a count of the movie titles in the graph that have been rated.

Use the dropdown below to select the correct RETURN clause.

Once you have selected your option, click the Check Results query button to continue.

cypher
MATCH (u:User)-[:RATED]->(m:Movie)
/*select:RETURN  count (DISTINCT m.title)*/
  • RETURN count (*)

  • RETURN count (DISTINCT *)

  • RETURN count (m.title)

  • RETURN count (DISTINCT m.title)

Hint

The query returns all user/movie node pairs with the RATED relationship. Users have rated multiple movies. How can you just count the number of unique movie titles that have been rated?

Solution

The correct answer is RETURN count (DISTINCT m.title)

Summary

In this lesson, you reviewed and learned more about aggregating data using count() in your Cypher queries. In the next Challenge, you will create a query to aggregate using count().