Cypher Evaluation Chain

Have you ever heard of Cunningham’s Law? Cunningham’s Law states:

The best way to get the correct answer on the internet is not to ask a question; it’s to post the wrong answer.

It also seems that this is true when it comes to LLMs!

While LLMs are good at generating Cypher statements, they really excel at correcting the statements they have written.

To complete this challenge, you must modify the initCypherEvaluationChain() function in modules/agent/tools/cypher/cypher-evaluation.chain.ts to return a chain that validates the provided Cypher statement for accuracy and corrects where necessary.

  1. Create a prompt instructing the LLM to analyze a Cypher statement and return a list of errors.

  2. Create a chain that replaces placeholders in the prompt for schema, question, cypher, and errors.

  3. Pass the formatted prompt to the LLM

  4. Parse the output as a JSON object.

This chain will recursively correct the Cypher statement generated by the LLM.

Open cypher-evaluation.chain.ts

Prompt Template

In the initCypherEvaluationChain() function, use the PromptTemplate.fromTemplate() method to create a new prompt template with the following prompt.

Prompt
You are an expert Neo4j Developer evaluating a Cypher statement written by an AI.

Check that the cypher statement provided below against the database schema to check that
the statement will answer the user's question.
Fix any errors where possible.

The query must:
* Only use the nodes, relationships and properties mentioned in the schema.
* Assign a variable to nodes or relationships when intending to access their properties.
* Use `IS NOT NULL` to check for property existence.
* Use the `elementId()` function to return the unique identifier for a node or relationship as `_id`.
* For movies, use the tmdbId property to return a source URL.
  For example: `'https://www.themoviedb.org/movie/'+ m.tmdbId AS source`.
* For movie titles that begin with "The", move "the" to the end.
  For example "The 39 Steps" becomes "39 Steps, The" or "the matrix" becomes "Matrix, The".
* For the role a person played in a movie, use the role property on the ACTED_IN relationship.
* Limit the maximum number of results to 10.
* Respond with only a Cypher statement.  No preamble.

Respond with a JSON object with "cypher" and "errors" keys.
  * "cypher" - the corrected cypher statement
  * "corrected" - a boolean
  * "errors" - A list of uncorrectable errors.  For example, if a label,
      relationship type or property does not exist in the schema.
      Provide a hint to the correct element where possible.

Fixable Example #1:
* cypher:
    MATCH (a:Actor {{name: 'Emil Eifrem'}})-[:ACTED_IN]->(m:Movie)
    RETURN a.name AS Actor, m.title AS Movie, m.tmdbId AS source,
    elementId(m) AS _id, m.released AS ReleaseDate, r.role AS Role LIMIT 10
* errors: ["Variable `r` not defined (line 1, column 172 (offset: 171))"]
* response:
    MATCH (a:Actor {{name: 'Emil Eifrem'}})-[r:ACTED_IN]->(m:Movie)
    RETURN a.name AS Actor, m.title AS Movie, m.tmdbId AS source,
    elementId(m) AS _id, m.released AS ReleaseDate, r.role AS Role LIMIT 10


Schema:
{schema}

Question:
{question}

Cypher Statement:
{cypher}

{errors}

Output Instructions

This prompt instructs the LLM to output a JSON object containing keys for cypher and errors.

This differs from the chains you have built before because they all return a string. To parse the output as a string, you will use the JsonOutputParser class to interpret the response and coerce it into an object.

Your code should resemble the following:

typescript
Prompt Template
// Prompt template
const prompt = PromptTemplate.fromTemplate(`
  You are an expert Neo4j Developer evaluating a Cypher statement written by an AI.

  Check that the cypher statement provided below against the database schema to check that
  the statement will answer the user's question.
  Fix any errors where possible.

  The query must:
  * Only use the nodes, relationships and properties mentioned in the schema.
  * Assign a variable to nodes or relationships when intending to access their properties.
  * Use \`IS NOT NULL\` to check for property existence.
  * Use the \`elementId()\` function to return the unique identifier for a node or relationship as \`_id\`.
  * For movies, use the tmdbId property to return a source URL.
    For example: \`'https://www.themoviedb.org/movie/'+ m.tmdbId AS source\`.
  * For movie titles that begin with "The", move "the" to the end.
    For example "The 39 Steps" becomes "39 Steps, The" or "the matrix" becomes "Matrix, The".
  * For the role a person played in a movie, use the role property on the ACTED_IN relationship.
  * Limit the maximum number of results to 10.
  * Respond with only a Cypher statement.  No preamble.

  Respond with a JSON object with "cypher" and "errors" keys.
    * "cypher" - the corrected cypher statement
    * "corrected" - a boolean
    * "errors" - A list of uncorrectable errors.  For example, if a label,
        relationship type or property does not exist in the schema.
        Provide a hint to the correct element where possible.

  Fixable Example #1:
  * cypher:
      MATCH (a:Actor {{name: 'Emil Eifrem'}})-[:ACTED_IN]->(m:Movie)
      RETURN a.name AS Actor, m.title AS Movie, m.tmdbId AS source,
      elementId(m) AS _id, m.released AS ReleaseDate, r.role AS Role LIMIT 10
  * errors: ["Variable \`r\` not defined (line 1, column 172 (offset: 171))"]
  * response:
      MATCH (a:Actor {{\name: 'Emil Eifrem'}})-[r:ACTED_IN]->(m:Movie)
      RETURN a.name AS Actor, m.title AS Movie, m.tmdbId AS source,
      elementId(m) AS _id, m.released AS ReleaseDate, r.role AS Role LIMIT 10


  Schema:
  {schema}

  Question:
  {question}

  Cypher Statement:
  {cypher}

  {errors}
`);

Braces in prompts

Use double braces ({{ and }}) to escape braces that are not text placeholders.

Return a Runnable Sequence

Use the RunnableSequence.from() method to create a new chain.

typescript
Return a RunnableSequence
return RunnableSequence.from([
  // ...
])

Initial Inputs

The chain will recursively verify using the output described in the prompt, which includes an array of errors.

The prompt will need these in string format, so as the first step, use the RunnablePassthrough.assign() method to convert the array of errors into a single string.

typescript
return RunnableSequence.from<
  CypherEvaluationChainInput,
  CypherEvaluationChainOutput
>([
  RunnablePassthrough.assign({
    // Convert errors into an LLM-friendly list
    errors: ({ errors }) => {
      if (
        errors === undefined ||
        (Array.isArray(errors) && errors.length === 0)
      ) {
        return "";
      }

      return `Errors: * ${
        Array.isArray(errors) ? errors?.join("\n* ") : errors
      }`;
    },
  }),
  // ...
]);

Format Prompt and Process

Now that you have the inputs that the prompt expects, update the chain to format the prompt, pass it to the LLM to process and parse the output.

typescript
return RunnableSequence.from<
  CypherEvaluationChainInput,
  CypherEvaluationChainOutput
>([
  RunnablePassthrough.assign({
    // Convert errors into an LLM-friendly list
    errors: ({ errors }) => {
      if (
        errors === undefined ||
        (Array.isArray(errors) && errors.length === 0)
      ) {
        return "";
      }

      return `Errors: * ${
        Array.isArray(errors) ? errors?.join("\n* ") : errors
      }`;
    },
  }),
  prompt,
  llm,
  new JsonOutputParser<CypherEvaluationChainOutput>(),
]);

Completed Sequence

If you have followed the steps correctly, your code should resemble the following:

typescript
export default async function initCypherEvaluationChain(
  llm: BaseLanguageModel
) {
  // Prompt template
  const prompt = PromptTemplate.fromTemplate(`
    You are an expert Neo4j Developer evaluating a Cypher statement written by an AI.

    Check that the cypher statement provided below against the database schema to check that
    the statement will answer the user's question.
    Fix any errors where possible.

    The query must:
    * Only use the nodes, relationships and properties mentioned in the schema.
    * Assign a variable to nodes or relationships when intending to access their properties.
    * Use \`IS NOT NULL\` to check for property existence.
    * Use the \`elementId()\` function to return the unique identifier for a node or relationship as \`_id\`.
    * For movies, use the tmdbId property to return a source URL.
      For example: \`'https://www.themoviedb.org/movie/'+ m.tmdbId AS source\`.
    * For movie titles that begin with "The", move "the" to the end.
      For example "The 39 Steps" becomes "39 Steps, The" or "the matrix" becomes "Matrix, The".
    * For the role a person played in a movie, use the role property on the ACTED_IN relationship.
    * Limit the maximum number of results to 10.
    * Respond with only a Cypher statement.  No preamble.

    Respond with a JSON object with "cypher" and "errors" keys.
      * "cypher" - the corrected cypher statement
      * "corrected" - a boolean
      * "errors" - A list of uncorrectable errors.  For example, if a label,
          relationship type or property does not exist in the schema.
          Provide a hint to the correct element where possible.

    Fixable Example #1:
    * cypher:
        MATCH (a:Actor {{name: 'Emil Eifrem'}})-[:ACTED_IN]->(m:Movie)
        RETURN a.name AS Actor, m.title AS Movie, m.tmdbId AS source,
        elementId(m) AS _id, m.released AS ReleaseDate, r.role AS Role LIMIT 10
    * errors: ["Variable \`r\` not defined (line 1, column 172 (offset: 171))"]
    * response:
        MATCH (a:Actor {{\name: 'Emil Eifrem'}})-[r:ACTED_IN]->(m:Movie)
        RETURN a.name AS Actor, m.title AS Movie, m.tmdbId AS source,
        elementId(m) AS _id, m.released AS ReleaseDate, r.role AS Role LIMIT 10


    Schema:
    {schema}

    Question:
    {question}

    Cypher Statement:
    {cypher}

    {errors}
  `);

  return RunnableSequence.from<
    CypherEvaluationChainInput,
    CypherEvaluationChainOutput
  >([
    RunnablePassthrough.assign({
      // Convert errors into an LLM-friendly list
      errors: ({ errors }) => {
        if (
          errors === undefined ||
          (Array.isArray(errors) && errors.length === 0)
        ) {
          return "";
        }

        return `Errors: * ${
          Array.isArray(errors) ? errors?.join("\n* ") : errors
        }`;
      },
    }),
    prompt,
    llm,
    new JsonOutputParser<CypherEvaluationChainOutput>(),
  ]);
}

Testing your changes

If you have followed the instructions, you should be able to run the following unit test to verify the response using the npm run test command.

sh
Running the Test
npm run test cypher-evaluation.chain.test.ts
View Unit Test
typescript
cypher-evaluation.chain.test.ts
import { ChatOpenAI } from "@langchain/openai";
import { config } from "dotenv";
import { BaseChatModel } from "langchain/chat_models/base";
import { RunnableSequence } from "@langchain/core/runnables";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";
import initCypherEvaluationChain from "./cypher-evaluation.chain";

describe("Cypher Evaluation Chain", () => {
  let graph: Neo4jGraph;
  let llm: BaseChatModel;
  let chain: RunnableSequence;

  beforeAll(async () => {
    config({ path: ".env.local" });

    graph = await Neo4jGraph.initialize({
      url: process.env.NEO4J_URI as string,
      username: process.env.NEO4J_USERNAME as string,
      password: process.env.NEO4J_PASSWORD as string,
      database: process.env.NEO4J_DATABASE as string | undefined,
    });

    llm = new ChatOpenAI({
      openAIApiKey: process.env.OPENAI_API_KEY,
      modelName: "gpt-3.5-turbo",
      temperature: 0,
      configuration: {
        baseURL: process.env.OPENAI_API_BASE,
      },
    });

    chain = await initCypherEvaluationChain(llm);
  });

  afterAll(async () => {
    await graph.close();
  });

  it("should fix a non-existent label", async () => {
    const input = {
      question: "How many movies are in the database?",
      cypher: "MATCH (m:Muvee) RETURN count(m) AS count",
      schema: graph.getSchema(),
      errors: ["Label Muvee does not exist"],
    };

    const { cypher, errors } = await chain.invoke(input);

    expect(cypher).toContain("MATCH (m:Movie) RETURN count(m) AS count");

    expect(errors.length).toBe(1);

    let found = false;

    for (const error of errors) {
      if (error.includes("label Muvee does not exist")) {
        found = true;
      }
    }

    expect(found).toBe(true);
  });

  it("should fix a non-existent relationship", async () => {
    const input = {
      question: "Who acted in the matrix?",
      cypher:
        'MATCH (m:Muvee)-[:ACTS_IN]->(a:Person) WHERE m.name = "The Matrix" RETURN a.name AS actor',
      schema: graph.getSchema(),
      errors: [
        "Label Muvee does not exist",
        "Relationship type ACTS_IN does not exist",
      ],
    };

    const { cypher, errors } = await chain.invoke(input);

    expect(cypher).toContain("MATCH (m:Movie");
    expect(cypher).toContain(":ACTED_IN");

    expect(errors.length).toBeGreaterThanOrEqual(2);

    let found = false;

    for (const error of errors) {
      if (error.includes("ACTS_IN")) {
        found = true;
      }
    }

    expect(found).toBe(true);
  });

  it("should return no errors if the query is fine", async () => {
    const cypher = "MATCH (m:Movie) RETURN count(m) AS count";
    const input = {
      question: "How many movies are in the database?",
      cypher,
      schema: graph.getSchema(),
      errors: ["Label Muvee does not exist"],
    };

    const { cypher: updatedCypher, errors } = await chain.invoke(input);

    expect(updatedCypher).toContain(cypher);
    expect(errors.length).toBe(0);
  });

  it("should keep variables in relationship", async () => {
    const cypher =
      "MATCH (a:Actor {name: 'Emil Eifrem'})-[r:ACTED_IN]->" +
      "(m:Movie {title: 'Neo4j - Into the Graph'}) RETURN r.role AS Role";
    const input = {
      question: "What role did Emil Eifrem play in Neo4j - Into the Graph",
      cypher,
      schema: graph.getSchema(),
      errors: [],
    };

    const { cypher: updatedCypher, errors } = await chain.invoke(input);

    expect(updatedCypher).toContain(cypher);
    expect(errors.length).toBe(0);
  });
});

It works!

If all the tests have passed, you will have a chain that evaluates a Cypher statement and provides hints if any errors are detected.

Hit the button below to mark the challenge as complete.

Summary

In this lesson, you built a chain that evaluates the Cypher statement generated in the Cypher Generation chain.

In the next lesson, you will create a chain that will generate an authoritative answer to a question based on the context provided.