Cypher Retrieval Chain

Now you have all the components needed to retrieve data from Neo4j with Cypher based on a user input. It is time to combine them.

To complete this challenge, you must create a Runnable instance that:

  1. Generates and evaluates a Cypher statement

  2. Use the Cypher statement to retrieve data from the database

  3. Extract the element IDs and convert the results to a string for use in the context prompt

  4. Generate an answer using the answer generation chain

  5. Save the response to the database along with the Cypher statement

  6. Return the LLM response

Open modules/agent/tools/cypher/cypher-retrieval.chain.ts

Cypher Generation and Evaluation

To generate and evaluate a new Cypher statement, you’ll need to create a function that generates a Cypher statement.

The modules/agent/tools/cypher/cypher-retrieval.chain.ts file already has a placeholder function called recursivelyEvaluate() to perform this task.

typescript
/**
 * Use database the schema to generate and subsequently validate
 * a Cypher statement based on the user question
 *
 * @param {Neo4jGraph}        graph     The graph
 * @param {BaseLanguageModel} llm       An LLM to generate the Cypher
 * @param {string}            question  The rephrased question
 * @returns {string}
 */
export async function recursivelyEvaluate(
  graph: Neo4jGraph,
  llm: BaseLanguageModel,
  question: string
): Promise<string> {
  // TODO: Create Cypher Generation Chain
  // const generationChain = ...
  // TODO: Create Cypher Evaluation Chain
  // const evaluatorChain = ...
  // TODO: Generate Initial cypher
  // let cypher = ...
  // TODO: Recursively evaluate the cypher until there are no errors
  // Bug fix: GPT-4 is adamant that it should use id() regardless of
  // the instructions in the prompt.  As a quick fix, replace it here
  // cypher = cypher.replace(/\sid\(([^)]+)\)/g, " elementId($1)");
  // return cypher;
}

In this function, first use the initCypherGenerationChain function from Cypher Generation Chain lesson and initCypherEvaluationChain function from the Cypher Evaluation Chain lesson to create the generation and evaluation chains.

typescript
Cypher Chains
// Initiate chains
const generationChain = await initCypherGenerationChain(graph, llm);
const evaluatorChain = await initCypherEvaluationChain(llm);

Next, invoke the generationChain to generate an initial Cypher statement.

typescript
Generate Initial Cypher
// Generate Initial Cypher
let cypher = await generationChain.invoke(question);

Now, use a while loop to recursively evaluate the Cypher statement up to five times until the number of errors the evaluation chain returns is 0.

typescript
Evaluate Cypher
let errors = ["N/A"];
let tries = 0;

while (tries < 5 && errors.length > 0) {
  tries++;

  try {
    // Evaluate Cypher
    const evaluation = await evaluatorChain.invoke({
      question,
      schema: graph.getSchema(),
      cypher,
      errors,
    });

    errors = evaluation.errors;
    cypher = evaluation.cypher;
  } catch (e: unknown) {}
}

Finally, return the cypher statement.

typescript
Return Cypher
// Bug fix: GPT-4 is adamant that it should use id() regardless of
// the instructions in the prompt.  As a quick fix, replace it here
cypher = cypher.replace(/\sid\(([^)]+)\)/g, " elementId($1)");

return cypher;

id() to elementId() replacement

The first line of this code contains a fix that converts id({variable}) to elementId({variable}). No matter what we try in the prompt, the GPT-3.5 Turbo and GPT-4 models use the deprecated id() method over the elementId().

Eventually, the models will recognize that the id() method is deprecated. This problem suggests training a model specifically to generate valid Cypher statements might be necessary.

View full recursivelyEvaluate function
typescript
/**
 * Use database the schema to generate and subsequently validate
 * a Cypher statement based on the user question
 *
 * @param {Neo4jGraph}        graph     The graph
 * @param {BaseLanguageModel} llm       An LLM to generate the Cypher
 * @param {string}            question  The rephrased question
 * @returns {string}
 */
async function recursivelyEvaluate(
  graph: Neo4jGraph,
  llm: BaseLanguageModel,
  question: string
): Promise<string> {
  // Initiate chains
  const generationChain = await initCypherGenerationChain(graph, llm);
  const evaluatorChain = await initCypherEvaluationChain(llm);

  // Generate Initial Cypher
  let cypher = await generationChain.invoke(question);

  let errors = ["N/A"];
  let tries = 0;

  while (tries < 5 && errors.length > 0) {
    tries++;

    try {
      // Evaluate Cypher
      const evaluation = await evaluatorChain.invoke({
        question,
        schema: graph.getSchema(),
        cypher,
        errors,
      });

      errors = evaluation.errors;
      cypher = evaluation.cypher;
    } catch (e: unknown) {}
  }

  // Bug fix: GPT-4 is adamant that it should use id() regardless of
  // the instructions in the prompt.  As a quick fix, replace it here
  cypher = cypher.replace(/\sid\(([^)]+)\)/g, " elementId($1)");

  return cypher;
}

Handling errors

The LLM will generate a correct Cypher statement most of the time. But, as we’ve found in testing, depending on the instructions provided to the prompt, the loop of Cypher generation and evaluation can be flaky.

You can execute your Cypher statement with an additional evaluation loop to make the application more robust. If the database throws an error, you can analyze the error message using the same evaluation chain and rewrite the statement accordingly.

Find the getResults() function in modules/agent/tools/cypher/cypher-retrieval.chain.ts.

typescript
/**
 * Attempt to get the results, and if there is a syntax error in the Cypher statement,
 * attempt to correct the errors.
 *
 * @param {Neo4jGraph}        graph  The graph instance to get the results from
 * @param {BaseLanguageModel} llm    The LLM to evaluate the Cypher statement if anything goes wrong
 * @param {string}            input  The input built up by the Cypher Retrieval Chain
 * @returns {Promise<Record<string, any>[]>}
 */
export async function getResults(
  graph: Neo4jGraph,
  llm: BaseLanguageModel,
  input: { question: string; cypher: string }
): Promise<any | undefined> {
  // TODO: catch Cypher errors and pass to the Cypher evaluation chain
}

Replace the // TODO comment with code that will attempt to execute the Cypher statement and retry if the graph.query() method throws an error.

Start by defining a results variable and an attempts variable to hold the maximum number of attempts. Define a mutable cypher statement to hold the Cypher statement. Then, call the initCypherEvaluationChain() function to create an instance of the evaluation chain.

typescript
let results;
let retries = 0;
let cypher = input.cypher;

// Evaluation chain if an error is thrown by Neo4j
const evaluationChain = await initCypherEvaluationChain(llm);

Next, create a while loop that will iterate a maximum of five times. Inside use try/catch to attempt to execute the Cypher statement.

If an error is thrown, pass the .message property along with the Cypher statement, question, and schema to the evaluation chain.

Assign the output of the evaluation chain to the cypher statement.

typescript
  while (results === undefined && retries < 5) {
    try {
      results = await graph.query(cypher);
      return results;
    } catch (e: any) {
      retries++;

      const evaluation = await evaluationChain.invoke({
        cypher,
        question: input.question,
        schema: graph.getSchema(),
        errors: [e.message],
      });

      cypher = evaluation.cypher;
    }
  }

  return results;

Finally, return the results.

Building the Chain

This section will take place in the initCypherRetrievalChain() function.

typescript
export default async function initCypherRetrievalChain(
  llm: BaseLanguageModel,
  graph: Neo4jGraph
) {
  // TODO: initiate answer chain
  // const answerGeneration = ...
  // TODO: return RunnablePassthrough
}

Since an agent will call this chain, it will receive a structured input containing both an input and a rephrasedQuestion.

typescript
Agent to Tool Input
export interface AgentToolInput {
  input: string;
  rephrasedQuestion: string;
}

Initialize Chains

You must use the Generate Authoritative Answer Chain from the previous lesson to generate an answer. Use the initGenerateAuthoritativeAnswerChain() function

typescript
Generate Answer Chain
const answerGeneration = await initGenerateAuthoritativeAnswerChain(llm);

Generate a Cypher Statement

Now, define the output. As with the Vector Retrieval tool, you can return a Runnable using RunnablePassthrough.assign().

The first step is to call the recursivelyEvaluate() function, assigning the output to the cypher key.

typescript
Generate Initial Cypher
return (
  RunnablePassthrough
    // Generate and evaluate the Cypher statement
    .assign({
      cypher: (input: { rephrasedQuestion: string }) =>
        recursivelyEvaluate(graph, llm, input.rephrasedQuestion),
    })

Get Results

Use the getResults() function to get the results from the database.

typescript
// Get results from database
.assign({
  results: (input: { cypher: string; question: string }) =>
    getResults(graph, llm, input),
})

Manipulate Results

You will need to extract any element IDs from the results to save the context to the database. The utils.ts file exports an extractIds() function that recursively iterates through the results to find any objects with a key of _id.

View the extractIds() function
typescript
export function extractIds(input: any): string[] {
  let output: string[] = [];

  // Function to handle an object
  const handleObject = (item: any) => {
    for (const key in item) {
      if (key === "_id") {
        if (!output.includes(item[key])) {
          output.push(item[key]);
        }
      } else if (typeof item[key] === "object" && item[key] !== null) {
        // Recurse into the object if it is not null
        output = output.concat(extractIds(item[key]));
      }
    }
  };

  if (Array.isArray(input)) {
    // If the input is an array, iterate over each element
    input.forEach((item) => {
      if (typeof item === "object" && item !== null) {
        handleObject(item);
      }
    });
  } else if (typeof input === "object" && input !== null) {
    // If the input is an object, handle it directly
    handleObject(input);
  }

  return output;
}

The result obtained in the previous step must also be converted to a string. If there is only one result, use JSON.stringify() to convert the first object to a JSON string, otherwise return a string representing the entire array.

typescript
// Extract information
.assign({
  // Extract _id fields
  ids: (input: Omit<CypherRetrievalThroughput, "ids">) =>
    extractIds(input.results),
  // Convert results to JSON output
  context: ({ results }: Omit<CypherRetrievalThroughput, "ids">) =>
    Array.isArray(results) && results.length == 1
      ? JSON.stringify(results[0])
      : JSON.stringify(results),
})

Generate Output

The input and context can then be passed to the Authoritative Answer Generation chain to generate an answer.

typescript
// Generate Output
.assign({
  output: (input: CypherRetrievalThroughput) =>
    answerGeneration.invoke({
      question: input.rephrasedQuestion,
      context: input.context,
    }),
})

Save response to database

Next, use the saveHistory() function built in module 3 to save the details of the response to the database.

typescript
// Save response to database
.assign({
  responseId: async (input: CypherRetrievalThroughput, options) => {
    saveHistory(
      options?.config.configurable.sessionId,
      "cypher",
      input.input,
      input.rephrasedQuestion,
      input.output,
      input.ids,
      input.cypher
    );
  },
})

Return the output

Finally, the pick() function returns the output key.

typescript
    // Return the output
    .pick("output")
);

Final Function

If you have followed the instructions correctly, your code should resemble the following:

typescript
Full Function
export default async function initCypherRetrievalChain(
  llm: BaseLanguageModel,
  graph: Neo4jGraph
) {
  const answerGeneration = await initGenerateAuthoritativeAnswerChain(llm);

  return (
    RunnablePassthrough
      // Generate and evaluate the Cypher statement
      .assign({
        cypher: (input: { rephrasedQuestion: string }) =>
          recursivelyEvaluate(graph, llm, input.rephrasedQuestion),
      })

      // Get results from database
      .assign({
        results: (input: { cypher: string; question: string }) =>
          getResults(graph, llm, input),
      })

      // Extract information
      .assign({
        // Extract _id fields
        ids: (input: Omit<CypherRetrievalThroughput, "ids">) =>
          extractIds(input.results),
        // Convert results to JSON output
        context: ({ results }: Omit<CypherRetrievalThroughput, "ids">) =>
          Array.isArray(results) && results.length == 1
            ? JSON.stringify(results[0])
            : JSON.stringify(results),
      })

      // Generate Output
      .assign({
        output: (input: CypherRetrievalThroughput) =>
          answerGeneration.invoke({
            question: input.rephrasedQuestion,
            context: input.context,
          }),
      })

      // Save response to database
      .assign({
        responseId: async (input: CypherRetrievalThroughput, options) => {
          saveHistory(
            options?.config.configurable.sessionId,
            "cypher",
            input.input,
            input.rephrasedQuestion,
            input.output,
            input.ids,
            input.cypher
          );
        },
      })
      // Return the output
      .pick("output")
  );
}

Testing your changes

If you have followed the instructions, you should be able to run the following unit test to verify the response using the npm run test command.

sh
Running the Test
npm run test cypher-retrieval.chain.test.ts
View Unit Test
typescript
cypher-retrieval.chain.test.ts
// TODO: Remove code
import { ChatOpenAI } from "@langchain/openai";
import { config } from "dotenv";
import { BaseChatModel } from "langchain/chat_models/base";
import { Runnable } from "@langchain/core/runnables";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";
import initCypherRetrievalChain, {
  recursivelyEvaluate,
  getResults,
} from "./cypher-retrieval.chain";
import { close } from "../../../graph";

describe("Cypher QA Chain", () => {
  let graph: Neo4jGraph;
  let llm: BaseChatModel;
  let chain: Runnable;

  beforeAll(async () => {
    config({ path: ".env.local" });

    graph = await Neo4jGraph.initialize({
      url: process.env.NEO4J_URI as string,
      username: process.env.NEO4J_USERNAME as string,
      password: process.env.NEO4J_PASSWORD as string,
      database: process.env.NEO4J_DATABASE as string | undefined,
    });

    llm = new ChatOpenAI({
      openAIApiKey: process.env.OPENAI_API_KEY,
      modelName: "gpt-3.5-turbo",
      temperature: 0,
      configuration: {
        baseURL: process.env.OPENAI_API_BASE,
      },
    });

    chain = await initCypherRetrievalChain(llm, graph);
  });

  afterAll(async () => {
    await graph.close();
    await close();
  });

  it("should answer a simple question", async () => {
    const sessionId = "cypher-retrieval-1";

    const res = (await graph.query(
      `MATCH (n:Movie) RETURN count(n) AS count`
    )) as { count: number }[];

    expect(res).toBeDefined();

    const output = await chain.invoke(
      {
        input: "how many are there?",
        rephrasedQuestion: "How many Movies are in the database?",
      },
      { configurable: { sessionId } }
    );

    expect(output).toContain(res[0].count);
  });

  it("should answer a random question", async () => {
    const sessionId = "cypher-retrieval-2";

    const person = "Emil Eifrem";
    const role = "The Chief";
    const movie = "Neo4j - Into the Graph";

    // Save a fake movie to the database
    await graph.query(
      `
        MERGE (m:Movie {title: $movie})
        MERGE (p:Person {name: $person}) SET p:Actor
        MERGE (p)-[r:ACTED_IN]->(m)
        SET r.role = $role, r.roles = $role
        RETURN
          m { .title, _id: elementId(m) } AS movie,
          p { .name, _id: elementId(p) } AS person
      `,
      { movie, person, role }
    );

    const input = "what did they play?";
    const rephrasedQuestion = `What role did ${person} play in ${movie}`;

    const output = await chain.invoke(
      {
        input,
        rephrasedQuestion,
      },
      { configurable: { sessionId } }
    );

    expect(output).toContain(role);

    // Check persistence
    const contextRes = await graph.query(
      `
      MATCH (s:Session {id: $sessionId})-[:LAST_RESPONSE]->(r)
      RETURN
        r.input AS input,
        r.rephrasedQuestion as rephrasedQuestion,
        r.output AS output,
        [ (m)-[:CONTEXT]->(c) | elementId(c) ] AS ids
    `,
      { sessionId }
    );

    expect(contextRes).toBeDefined();
    if (contextRes) {
      const [first] = contextRes;
      expect(contextRes.length).toBe(1);

      expect(first.input).toEqual(input);
      expect(first.rephrasedQuestion).toEqual(rephrasedQuestion);
      expect(first.output).toEqual(output);
    }
  });

  it("should use elementId() to return a node ID", async () => {
    const sessionId = "cypher-retrieval-3";
    const person = "Emil Eifrem";
    const role = "The Chief";
    const movie = "Neo4j - Into the Graph";

    // Save a fake movie to the database
    const seed = await graph.query(
      `
        MERGE (m:Movie {title: $movie})
        MERGE (p:Person {name: $person}) SET p:Actor
        MERGE (p)-[r:ACTED_IN]->(m)
        SET r.role = $role, r.roles = $role
        RETURN
          m { .title, _id: elementId(m) } AS movie,
          p { .name, _id: elementId(p) } AS person
      `,
      { movie, person, role }
    );

    const output = await chain.invoke(
      {
        input: "what did they play?",
        rephrasedQuestion: `What movies has ${person} acted in?`,
      },
      { configurable: { sessionId } }
    );
    expect(output).toContain(person);
    expect(output).toContain(movie);

    // check context
    const contextRes = await graph.query(
      `
      MATCH (s:Session {id: $sessionId})-[:LAST_RESPONSE]->(r)
      RETURN
        r.input AS input,
        r.rephrasedQuestion as rephrasedQuestion,
        r.output AS output,
        [ (m)-[:CONTEXT]->(c) | elementId(c) ] AS ids
    `,
      { sessionId }
    );

    expect(contextRes).toBeDefined();
    if (contextRes) {
      expect(contextRes.length).toBe(1);

      const contextIds = contextRes[0].ids.join(",");
      const seedIds = seed?.map((el) => el.movie._id);

      for (const id in seedIds) {
        expect(contextIds).toContain(id);
      }
    }
  });

  describe("recursivelyEvaluate", () => {
    it("should correct a query with a missing variable", async () => {
      const res = await recursivelyEvaluate(
        graph,
        llm,
        "What movies has Emil Eifrem acted in?"
      );

      expect(res).toBeDefined();
    });
  });

  describe("getResults", () => {
    it("should fix a broken Cypher statement on the fly", async () => {
      const res = await getResults(graph, llm, {
        question: "What role did Emil Eifrem play in Neo4j - Into the Graph?",
        cypher:
          "MATCH (a:Actor {name: 'Emil Eifrem'})-[:ACTED_IN]->(m:Movie) " +
          "RETURN a.name AS Actor, m.title AS Movie, m.tmdbId AS source, " +
          "elementId(m) AS _id, m.released AS ReleaseDate, r.role AS Role LIMIT 10",
      });

      expect(res).toBeDefined();
      expect(JSON.stringify(res)).toContain("The Chief");
    });
  });
});

Randomized responses

LLMs are probabilistic models, meaning they generate different responses with each call.

Given this variability, you might find that not all tests pass whenever testing this function with multiple tests. Therefore, running the test several times may be necessary to achieve consistent results.

Verifying the Test

If every test in the test suite has passed, a new (:Session) node with a .id property of cypher-retriever-3 will have been created in your database.

The session should have atleast one (:Response) node, linked with a :CONTEXT relationship to a movie with the title Neo4j - Into the Graph.

Click the Check Database button below to verify the tests have succeeded.

Hint

You can compare your code with the solution in src/solutions/modules/agent/tools/cypher/cypher-retrieval.chain.ts and double-check that the conditions have been met in the test suite.

Solution

You can compare your code with the solution in src/solutions/modules/agent/tools/cypher/cypher-retrieval.chain.ts and double-check that the conditions have been met in the test suite.

You can also run the following Cypher statement to double-check that the index has been created in your database.

cypher
Session, response and context
MATCH (s:Session {id: 'cypher-retrieval-3'})
RETURN s, [
  (s)-[:HAS_RESPONSE]->(r) | [r,
    [ (r) -[:CONTEXT]->(c) | c ]
  ]
]

Once you have verified your code and re-ran the tests, click Try again…​* to complete the challenge.

Summary

In this lesson, you combined the components built during this module to create a chain that will generate a Cypher statement that answers the user’s question, execute the Cypher statement, and generate a response.

In the next module, you will build an agent that combines this chain with the Vector Retrieval Chain to create an agent that uses an LLM to choose the correct tool to answer the user’s question.