Harnessing the Power of LangChain for NLP: A Software Engineer’s Guide

Document Loaders: Your Data Access Point

One of the first steps in any NLP project is acquiring and loading data. LangChain simplifies this process with its Document Loaders. These loaders enable you to effortlessly access various types of textual data, from local files to remote web resources. Whether you’re working with articles, books, or web pages, Document Loaders provide a unified interface for data retrieval.


from langchain.document_loaders import TextLoader

loader = TextLoader("./index.md")
loader.load()

Document Loaders Documentation

Text Splitter: Segmenting Text

Once you have your data, you often need to break it down into smaller, manageable pieces for analysis. LangChain’s Text Splitter comes to the rescue. It helps you segment large text documents into sentences or paragraphs, allowing for more granular analysis.


# This is a long document we can split up.
with open('./state_of_the_union.txt') as f:
    state_of_the_union = f.read()

from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(
    separator = "\n\n",
    chunk_size = 1000,
    chunk_overlap  = 200,
    length_function = len,
    is_separator_regex = False,
)

texts = text_splitter.create_documents([state_of_the_union])

Text Splitters Documentation

Vector Stores: Storing and Retrieving Vectors

In many NLP tasks, you’ll work with vector representations of text, often generated by pre-trained models. LangChain’s Vector Stores simplify the storage and retrieval of these vectors. You can save and load vectors associated with documents, making it easy to maintain and access vector data.


from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

# Load the document, split it into chunks, embed each chunk and load it into the vector store.
raw_documents = TextLoader('./state_of_the_union.txt').load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
documents = text_splitter.split_documents(raw_documents)
db = Chroma.from_documents(documents, OpenAIEmbeddings())

query = "What did the president say about Ketanji Brown Jackson"
docs = db.similarity_search(query)

Vector Stores Documentation

Memory: Building Conversational Context

LangChain’s Memory module plays a vital role in constructing and maintaining conversational context. It enables the accumulation of the conversation’s history, helping the language model understand and respond coherently to ongoing discussions. You can store and retrieve the conversation’s history within Memory, allowing for more context-aware interactions.


from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("hi!")
memory.chat_memory.add_ai_message("what's up?")

The Memory module in LangChain facilitates the development of conversational agents and chatbots by preserving the context of the conversation, which is essential for creating meaningful and relevant responses in dynamic dialogues.

Memory Documentation

Tools: Interfaces for Interaction

In LangChain, Tools are interfaces that enable an Agent to interact with the external world. These versatile interfaces provide access to various external data sources, enhancing the Agent’s functionality and versatility.


from langchain.agents import load_tools
tool_names = [...]
tools = load_tools(tool_names)

LangChain’s Tools allow Agents to seamlessly connect to external databases, and other services, making it a powerful platform for building intelligent language-powered applications. By leveraging these interfaces, you can enhance your Agent’s capabilities and broaden its range of interactions with the world.

Tools Documentation

Agents: Language-Powered Decision Makers

In LangChain, Agents serve as intelligent decision-makers powered by language models. Unlike chains, where a sequence of actions is hardcoded in the code, Agents leverage language models as reasoning engines to dynamically determine which actions to take and in what order.


from langchain.agents import Tool
from langchain.agents import AgentType
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI
from langchain.utilities import SerpAPIWrapper
# Setup search tool for agent to use
search = SerpAPIWrapper()
tools = [
    Tool(
        name="Current Search",
        func=search.run,
        description="useful for when you need to answer questions about current events or the current state of the world"
    ),
]
# Create connection to OpenAI LLM
llm=OpenAI(temperature=0)

from langchain.agents import initialize_agent
from langchain.tools.render import render_text_description
from langchain.agents.output_parsers import ReActSingleInputOutputParser
from langchain.agents.format_scratchpad import format_log_to_str
from langchain import hub

prompt = hub.pull("hwchase17/react-chat")
prompt = prompt.partial(
    tools=render_text_description(tools),
    tool_names=", ".join([t.name for t in tools]),
)

llm_with_stop = llm.bind(stop=["\nObservation"])

agent = {
    "input": lambda x: x["input"],
    "agent_scratchpad": lambda x: format_log_to_str(x['intermediate_steps']),
    "chat_history": lambda x: x["chat_history"]
} | prompt | llm_with_stop | ReActSingleInputOutputParser()

from langchain.agents import AgentExecutor
memory = ConversationBufferMemory(memory_key="chat_history")
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, memory=memory)

# Query agent
agent_executor.invoke({"input": "hi, i am bob"})['output']

LangChain’s Agents utilize language models to analyze the context and formulate actions based on natural language input. This dynamic decision-making process makes Agents versatile and adaptable, allowing them to respond effectively to various situations and tasks.

Agents Documentation

In Conclusion: LangChain - Tailoring Agents with Precision

LangChain emerges as a powerful abstraction layer that empowers developers to craft custom agents with precision. It allows you to assemble agents with only the necessary components, avoiding unnecessary complexity. By leveraging LangChain’s modular structure, you can build intelligent language agents tailored to your specific requirements, ensuring that your projects are efficient, streamlined, and optimized for your unique needs.