Cypher Generation Chain

The first step in instructing an LLM to retrieve data from a Neo4j database is to generate a Cypher statement.

To complete this challenge, you must modify the initCypherGenerationChain() function in modules/agent/tools/cypher/cypher-generation.chain.ts to return a chain that:

  1. Accepts the rephrased question as a string

  2. Format a prompt that instructs the LLM to use the schema provided to generate a Cypher statement to retrieve the data that answers the question

  3. Pass the formatted prompt to an LLM

  4. Parse the output as a string

Open cypher-generation.chain.ts

Prompt Template

In the initCypherGenerationChain() function, use the PromptTemplate.fromTemplate() method to create a new prompt template with the following prompt.

Prompt
You are a Neo4j Developer translating user questions into Cypher to answer questions
about movies and provide recommendations.
Convert the user's question into a Cypher statement based on the schema.

You must:
* Only use the nodes, relationships and properties mentioned in the schema.
* When required, `IS NOT NULL` to check for property existence, and not the exists() function.
* Use the `elementId()` function to return the unique identifier for a node or relationship as `_id`.
    For example:
    ```
    MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
    WHERE a.name = 'Emil Eifrem'
    RETURN m.title AS title, elementId(m) AS _id, a.role AS role
    ```
* Include extra information about the nodes that may help an LLM provide a more informative answer,
    for example the release date, rating or budget.
* For movies, use the tmdbId property to return a source URL.
    For example: `'https://www.themoviedb.org/movie/'+ m.tmdbId AS source`.
* For movie titles that begin with "The", move "the" to the end.
    For example "The 39 Steps" becomes "39 Steps, The" or "the matrix" becomes "Matrix, The".
* Limit the maximum number of results to 10.
* Respond with only a Cypher statement.  No preamble.


Example Question: What role did Tom Hanks play in Toy Story?
Example Cypher:
MATCH (a:Actor {{name: 'Tom Hanks'}})-[rel:ACTED_IN]->(m:Movie {{title: 'Toy Story'}})
RETURN a.name AS Actor, m.title AS Movie, elementId(m) AS _id, rel.role AS RoleInMovie

Schema:
{schema}

Question:
{question}

Remember to use backslashes (\) to escape the back-ticks if you are using template strings.

Specific Instructions

This prompt includes specific instructions that the LLM should follow when writing the Cypher statement.

This technique is known as in-context learning, where an LLM uses instructions in the prompt to adapt its responses to new tasks or questions without needing prior training on specific tasks.

You can learn more in the Providing Specific Instructions lesson in Neo4j & LLM Fundamentals.

Your code should resemble the following:

typescript
Prompt Template
// Create Prompt Template
const cypherPrompt = PromptTemplate.fromTemplate(`
  You are a Neo4j Developer translating user questions into Cypher to answer questions
  about movies and provide recommendations.
  Convert the user's question into a Cypher statement based on the schema.

  You must:
  * Only use the nodes, relationships and properties mentioned in the schema.
  * When required, \`IS NOT NULL\` to check for property existence, and not the exists() function.
  * Use the \`elementId()\` function to return the unique identifier for a node or relationship as \`_id\`.
    For example:
    \`\`\`
    MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
    WHERE a.name = 'Emil Eifrem'
    RETURN m.title AS title, elementId(m) AS _id, a.role AS role
    \`\`\`
  * Include extra information about the nodes that may help an LLM provide a more informative answer,
    for example the release date, rating or budget.
  * For movies, use the tmdbId property to return a source URL.
    For example: \`'https://www.themoviedb.org/movie/'+ m.tmdbId AS source\`.
  * For movie titles that begin with "The", move "the" to the end.
    For example "The 39 Steps" becomes "39 Steps, The" or "the matrix" becomes "Matrix, The".
  * Limit the maximum number of results to 10.
  * Respond with only a Cypher statement.  No preamble.


  Example Question: What role did Tom Hanks play in Toy Story?
  Example Cypher:
  MATCH (a:Actor {{name: 'Tom Hanks'}})-[rel:ACTED_IN]->(m:Movie {{title: 'Toy Story'}})
  RETURN a.name AS Actor, m.title AS Movie, elementId(m) AS _id, rel.role AS RoleInMovie

  Schema:
  {schema}

  Question:
  {question}
`);

Returning Element IDs

You may have noticed the instruction to use the elementId() function to return the Element ID of any nodes returned.

You will use this value to create :CONTEXT relationships in the database.

Return a Runnable Sequence

Use the RunnableSequence.from() method to create a new chain. The chain should pass the prompt to the LLM passed as a parameter, then format the response as a string using a new instance of the StringOutputParser.

typescript
// Create the runnable sequence
return RunnableSequence.from<string, string>([
  // ...
]);

Initial Inputs

Inside the array, add an object that sets the question and schema for the chain.

To assign the original input string to the question key, create a new RunnablePassthrough instance. Use the graph.getSchema() to assign a copy of the database schema to the schema key.

typescript
{
  // Take the input and assign it to the question key
  question: new RunnablePassthrough(),
  // Get the schema
  schema: () => graph.getSchema(),
},

Format Prompt and Process

Now that the prompt inputs are ready, these can be replaced in the prompt, passed to the LLM, and the output parsed as a string.

typescript
// Create the runnable sequence
return RunnableSequence.from<string, string>([
  {
    // Take the input and assign it to the question key
    question: new RunnablePassthrough(),
    // Get the schema
    schema: () => graph.getSchema(),
  },
  cypherPrompt,
  llm,
  new StringOutputParser(),
]);

Finished Sequence

If you have followed the steps correctly, your code should resemble the following:

typescript
export default async function initCypherGenerationChain(
  graph: Neo4jGraph,
  llm: BaseLanguageModel
) {
  // Create Prompt Template
  const cypherPrompt = PromptTemplate.fromTemplate(`
    You are a Neo4j Developer translating user questions into Cypher to answer questions
    about movies and provide recommendations.
    Convert the user's question into a Cypher statement based on the schema.

    You must:
    * Only use the nodes, relationships and properties mentioned in the schema.
    * When required, \`IS NOT NULL\` to check for property existence, and not the exists() function.
    * Use the \`elementId()\` function to return the unique identifier for a node or relationship as \`_id\`.
      For example:
      \`\`\`
      MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
      WHERE a.name = 'Emil Eifrem'
      RETURN m.title AS title, elementId(m) AS _id, a.role AS role
      \`\`\`
    * Include extra information about the nodes that may help an LLM provide a more informative answer,
      for example the release date, rating or budget.
    * For movies, use the tmdbId property to return a source URL.
      For example: \`'https://www.themoviedb.org/movie/'+ m.tmdbId AS source\`.
    * For movie titles that begin with "The", move "the" to the end.
      For example "The 39 Steps" becomes "39 Steps, The" or "the matrix" becomes "Matrix, The".
    * Limit the maximum number of results to 10.
    * Respond with only a Cypher statement.  No preamble.


    Example Question: What role did Tom Hanks play in Toy Story?
    Example Cypher:
    MATCH (a:Actor {{name: 'Tom Hanks'}})-[rel:ACTED_IN]->(m:Movie {{title: 'Toy Story'}})
    RETURN a.name AS Actor, m.title AS Movie, elementId(m) AS _id, rel.role AS RoleInMovie

    Schema:
    {schema}

    Question:
    {question}
  `);

  // Create the runnable sequence
  return RunnableSequence.from<string, string>([
    {
      // Take the input and assign it to the question key
      question: new RunnablePassthrough(),
      // Get the schema
      schema: () => graph.getSchema(),
    },
    cypherPrompt,
    llm,
    new StringOutputParser(),
  ]);
}

Testing your changes

If you have followed the instructions, you should be able to run the following unit test to verify the response using the npm run test command.

sh
Running the Test
npm run test cypher-generation.chain.test.ts
View Unit Test
typescript
cypher-generation.chain.test.ts
import { ChatOpenAI } from "@langchain/openai";
import { config } from "dotenv";
import { BaseChatModel } from "langchain/chat_models/base";
import { RunnableSequence } from "@langchain/core/runnables";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";
import initCypherGenerationChain from "./cypher-generation.chain";
import { extractIds } from "../../../../utils";
import { close } from "../../../graph";

describe("Cypher Generation Chain", () => {
  let graph: Neo4jGraph;
  let llm: BaseChatModel;
  let chain: RunnableSequence<string, string>;

  beforeAll(async () => {
    config({ path: ".env.local" });

    graph = await Neo4jGraph.initialize({
      url: process.env.NEO4J_URI as string,
      username: process.env.NEO4J_USERNAME as string,
      password: process.env.NEO4J_PASSWORD as string,
      database: process.env.NEO4J_DATABASE as string | undefined,
    });

    llm = new ChatOpenAI({
      openAIApiKey: process.env.OPENAI_API_KEY,
      modelName: "gpt-3.5-turbo",
      temperature: 0,
      configuration: {
        baseURL: process.env.OPENAI_API_BASE,
      },
    });

    chain = await initCypherGenerationChain(graph, llm);
  });

  afterAll(async () => {
    await graph.close();
    await close();
  });

  it("should generate a simple count query", async () => {
    const output = await chain.invoke("How many movies are in the database?");

    expect(output.toLowerCase()).toContain("match (");
    expect(output).toContain(":Movie");
    expect(output.toLowerCase()).toContain("return");
    expect(output.toLowerCase()).toContain("count(");
  });

  it("should generate a Cypher statement with a relationship", async () => {
    const output = await chain.invoke("Who directed The Matrix?");

    expect(output.toLowerCase()).toContain("match (");
    expect(output).toContain(":Movie");
    expect(output).toContain(":DIRECTED]");
    expect(output.toLowerCase()).toContain("return");
    expect(output.toLowerCase()).toContain("_id");
  });

  it("should extract IDs", () => {
    const ids = extractIds([
      {
        _id: "1",
        name: "Micheal Ward",
        roles: [
          {
            _id: "2",
            name: "Stephen",
            movie: { _id: "3", title: "Empire of Light" },
          },
          {
            _id: "4",
            name: "Marco",
            movie: { _id: "99", title: "Blue Story" },
          },
        ],
      },
      { _id: "100" },
    ]);

    expect(ids).toContain("1");
    expect(ids).toContain("2");
    expect(ids).toContain("3");
    expect(ids).toContain("4");
    expect(ids).toContain("99");
    expect(ids).toContain("100");
  });
});

It works!

If all the tests have passed, you will have a chain capable of generating Cypher statements based on a question using the database schema.

Hit the button below to mark the challenge as complete.

Summary

In this lesson, you built a chain that generates a Cypher statement based on user input.

In the next lesson, you will learn how LLMs can be used to evaluate the statement.