We need “mlflow”in our everysingle machine learning tasks, including Langchain for LLMs system

15 min readMar 8, 2024

So , What is LangChain?

LangChain is a versatile framework designed for building applications powered by language models. It excels in creating context-aware applications that utilize language models for reasoning and generating responses, enabling the development of sophisticated NLP applications.

And… What is mlflow?

MLflow is an open-source platform dedicated to managing the end-to-end machine learning lifecycle. It encompasses four primary functions: experiment tracking, model packaging, model management, and model deployment. The key features and components of MLflow include:

1. MLflow Tracking: Allows you to log and query experiments, including code, data, config, and results. This is useful for experiment management and comparison.
2. MLflow Projects: Packaging format for reproducible runs on any platform. This helps in sharing code and collaborating with others.
3. MLflow Models: A standard format for packaging machine learning models that can be used in a variety of downstream tools — from serving it with a REST API to using it for batch inference on Apache Spark.
4. MLflow Registry: A centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model.

The importance of MLflow in machine learning can be attributed to its ability to streamline and optimize the ML development process, making it easier for teams to collaborate, track experiments, manage models, and deploy them into production. It supports a wide range of ML libraries and environments, making it flexible for various use cases. By providing a unified platform for managing the ML lifecycle, MLflow plays a crucial role in:

- Enhancing reproducibility: By tracking experiments, models, and deployment processes, it ensures that ML workflows can be replicated and understood by others.
- Improving collaboration: Teams can easily share results, models, and feedback through a centralized system.
- Accelerating innovation: Faster iteration cycles by streamlining the experimentation process.
- Simplifying deployment: Helps in bridging the gap between data scientists and operations teams, making it easier to deploy models into production environments.

In summary, MLflow is a critical tool for anyone involved in machine learning projects, from data scientists and ML engineers to DevOps, offering a systematic approach to tracking, managing, and deploying ML models efficiently and effectively.

Supported Elements in MLflow LangChain Integration

Why use MLflow with LangChain?

Aside from the benefits of using MLflow for managing and deploying machine learning models, the integration of LangChain with MLflow provides a number of benefits that are associated with using LangChain within the broader MLflow ecosystem.

MLflow Evaluate: With the native capabilities within MLflow to evaluate language models, you can easily utilize automated evaluation algorithms on the results of your LangChain application’s inference results. This integration facilitates the efficient assessment of inference results from your LangChain application, ensuring robust performance analytics.
Simplified Experimentation: LangChain’s flexibility in experimenting with various agents, tools, and retrievers becomes even more powerful when paired with MLflow. This combination allows for rapid experimentation and iteration. You can effortlessly compare runs, making it easier to refine models and accelerate the journey from development to production deployment.
Robust Dependency Management: Deploy your LangChain application with confidence, leveraging MLflow’s ability to manage and record all external dependencies. This ensures consistency between development and deployment environments, reducing deployment risks and simplifying the process.

Capabilities of LangChain and MLflow

Efficient Development: Streamline the development of NLP applications with LangChain’s modular components and MLflow’s robust tracking features.
Flexible Integration: Leverage the versatility of LangChain within the MLflow ecosystem for a range of NLP tasks, from simple text generation to complex data retrieval and analysis.
Advanced Functionality: Utilize LangChain’s advanced features like context-aware reasoning and dynamic action selection in agents, all within MLflow’s scalable platform.

Overview of Chains, Agents, and Retrievers

Chains: Sequences of actions or steps hardcoded in code. Chains in LangChain combine various components like prompts, models, and output parsers to create a flow of processing steps.

The figure below shows an example of interfacing directly with a SaaS LLM via API calls with no context to the history of the conversation in the top portion. The bottom portion shows the same queries being submitted to a LangChain chain that incorporates a conversation history state such that the entire conversation’s history is included with each subsequent input. Preserving conversational context in this manner is key to creating a “chat bot”.

Agents: Dynamic constructs that use language models to choose a sequence of actions. Unlike chains, agents decide the order of actions based on inputs, tools available, and intermediate outcomes.

Retrievers: Components in RetrievalQA chains responsible for sourcing relevant documents or data. Retrievers are key in applications where LLMs need to reference specific external information for accurate responses.

Getting Started with the MLflow LangChain Flavor

Introduction to Using LangChain with MLflow

Welcome to this interactive tutorial designed to introduce you to [LangChain] and its integration with MLflow. This tutorial is structured as a notebook to provide a hands-on, practical learning experience with the simplest and most core features of LangChain.

What You Will Learn

- Understanding LangChain: Get to know the basics of LangChain and how it is used in developing applications powered by language models.
- Chains in LangChain: Explore the concept of `chains` in LangChain, which are sequences of actions or operations orchestrated to perform complex tasks.
- Integration with MLflow: Learn how LangChain integrates with MLflow, a platform for managing the machine learning lifecycle, including logging, tracking, and deploying models.
- Practical Application: Apply your knowledge to build a LangChain chain that acts like a sous chef, focusing on the preparation steps of a recipe.

Background on LangChain

LangChain is a Python-based framework that simplifies the development of applications using language models. It is designed to enhance context-awareness and reasoning in applications, allowing for more sophisticated and interactive functionalities.

What is a Chain?

- Chain Definition: In LangChain, a `chain` refers to a series of interconnected components or steps designed to accomplish a specific task.
- Chain Example: In our tutorial, we’ll create a chain that simulates a sous chef’s role in preparing ingredients and tools for a recipe.

Tutorial Overview

In this tutorial, you will:

1. Set Up LangChain and MLflow: Initialize and configure both LangChain and MLflow.
2. Create a Sous Chef Chain: Develop a LangChain chain that lists ingredients, describes preparation techniques, organizes ingredient staging, and details cooking implements preparation for a given recipe.
3. Log and Load the Model: Utilize MLflow to log the chain model and then load it for prediction.
4. Run a Prediction: Execute the chain to see how it would prepare a restaurant dish for a specific number of customers.

By the end of this tutorial, you will have a solid foundation in using LangChain with MLflow and an understanding of how to construct and manage chains for practical applications.

Let’s dive in and explore the world of LangChain and MLflow!

### Prerequisites

In order to get started with this tutorial, we’re going to need a few things first.

1. An OpenAI API Account. You can [sign up here] to get access in order to start programatically accessing one of the leading highly sophisticated LLM services on the planet.
2. An OpenAI API Key. You can access this once you’ve created an account by navigating [to the API keys page].
3. The OpenAI SDK. It’s [available on PyPI] here. For this tutorial, we’re going to be using version 0.28.1 (the last release prior to the 1.0 release).
4. The LangChain package. You can [find it here on PyPI].

To install the dependent packages simply run:

pip install 'openai<1' tiktoken langchain mlflow

NOTE: If you’d like to use Azure OpenAI with LangChain, you need to install `openai>=1.10.0` and `langchain-openai>=0.0.6`.

API Key Security Overview

API keys, especially for SaaS Large Language Models (LLMs), are as sensitive as financial information due to their connection to billing.

If you’re interested in learning more about an alternative MLflow solution that securely manages your access keys, [read about the deployments server here].

Essential Practices:

- Confidentiality: Always keep API keys private.
- Secure Storage: Prefer environment variables or secure services.
- Frequent Rotation: Regularly update keys to avoid unauthorized access.

Configuring API Keys

For secure usage, set API keys as environment variables.

macOS/Linux:
Refer to [Apple’s guide on using environment variables in Terminal]for detailed instructions.

Windows:
Follow the steps outlined in [Microsoft’s documentation on environment variables].

Let’s Get Started!

Now that we have all the prerequisites in place, let’s dive into setting up LangChain and MLflow, and then proceed with creating our sous chef chain.

import os

from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

import mlflow
# 將您的 OpenAI API 金鑰填寫在這裡
os.environ["OPENAI_API_KEY"] = "Your_OpenAI_API_Key_Here"
assert "OPENAI_API_KEY" in os.environ, "Please set the OPENAI_API_KEY environment variable."

!pip install langchain_openai

from langchain_openai import AzureOpenAI, AzureOpenAIEmbeddings

# Set this to `azure`
os.environ["OPENAI_API_TYPE"] = "azure"
# The API version you want to use: set this to `2023-05-15` for the released version.
os.environ["OPENAI_API_VERSION"] = "2023-05-15"
assert (
    "AZURE_OPENAI_ENDPOINT" in os.environ
), "Please set the AZURE_OPENAI_ENDPOINT environment variable. It is the base URL for your Azure OpenAI resource. You can find this in the Azure portal under your Azure OpenAI resource."
assert (
    "OPENAI_API_KEY" in os.environ
), "Please set the OPENAI_API_KEY environment variable. It is the API key for your Azure OpenAI resource. You can find this in the Azure portal under your Azure OpenAI resource."

azure_openai_llm = AzureOpenAI(
    deployment_name="<your-deployment-name>",
    model_name="gpt-35-turbo-instruct",
)
azure_openai_embeddings = AzureOpenAIEmbeddings(
    azure_deployment="<your-deployment-name>",
)

Setting Up the OpenAI Completions Model in LangChain

Welcome to this delightful segment where we configure the OpenAI model with specific parameters tailored for generating language completions. No, we’re not delving into the ChatCompletions realm; instead, we’re diving into the world of Completions. Here, every request stands on its own, like a lone ranger in the vast desert of language generation.

The Wonders of the Completions Model

- Completions Model: This model doesn’t hold onto past interactions; it’s like a goldfish in a bowl, living in the moment with no memory of the past. Perfect for tasks where each request is a standalone entity, ready to tackle whatever comes its way.

- No Contextual Memory: Without memory of previous interactions, this model thrives in situations where continuity is as necessary as a snowstorm in the Sahara.

- Comparisons with ChatCompletions: While ChatCompletions keep track of past dialogues like an attentive listener, Completions are more like a forgetful friend, always present but never dwelling on the past.

In our tutorial, we embrace the simplicity and effectiveness of the Completions model, perfectly suited for handling individual requests without the burden of context. So, let’s embark on this journey of linguistic exploration, where each completion is a step closer to culinary perfection!

llm = OpenAI(temperature=0.1, max_tokens=1000)

The Sous Chef’s Guide: Crafting the Perfect Mise-en-Place

Welcome to this flavorful segment where we unveil the intricate template designed to transform any language model into a meticulous sous chef. As we delve into the world of culinary preparation, let’s break down this masterful instruction set tailored specifically for the LangChain model, focusing solely on the art of mise-en-place.

Deconstructing the Instruction Template

- Embodying the Sous Chef: Our prompt immerses the language model into the shoes of a diligent sous chef, highlighting the importance of thorough preparation.

- Step-by-Step Task Outline:
1. Listing Ingredients: Directs the model to compile a comprehensive list of all required ingredients for a chosen dish.
2. Technique Guidelines: Tasks the model with elucidating the essential techniques for preparing each ingredient, from dicing to marinating.
3. Ingredient Staging Strategies: Demands meticulous instructions on the staging of each ingredient, considering the optimal sequence and timing for use.
4. Prepping Cooking Implements: Guides the model in detailing and arranging all necessary cooking tools for the dish’s preparatory phase.

- Focused Scope: This template is purposefully crafted to halt at the preparation stage, omitting the actual cooking process. It’s all about setting the stage for culinary excellence.

- Adaptive Flexibility: With placeholders like `{recipe}` and `{customer_count}`, our template seamlessly adapts to diverse recipes and customer counts, ensuring versatility and relevance.

This instructional template stands as a cornerstone of our tutorial, showcasing the power of LangChain in crafting instructive prompts tailored to specific tasks, all while maintaining the essence of single-purpose completions-style applications. Let’s embark on this gastronomic adventure together, as we unleash the sous chef within!

Let set a prompting variable for llms!!:

template_instruction = (
    "Imagine you are a fine dining sous chef. Your task is to meticulously prepare for a dish, focusing on the mise-en-place process."
    "Given a recipe, your responsibilities are: "
    "1. List the Ingredients: Carefully itemize all ingredients required for the dish, ensuring every element is accounted for. "
    "2. Preparation Techniques: Describe the techniques and operations needed for preparing each ingredient. This includes cutting, "
    "processing, or any other form of preparation. Focus on the art of mise-en-place, ensuring everything is perfectly set up before cooking begins."
    "3. Ingredient Staging: Provide detailed instructions on how to stage and arrange each ingredient. Explain where each item should be placed for "
    "efficient access during the cooking process. Consider the timing and sequence of use for each ingredient. "
    "4. Cooking Implements Preparation: Enumerate all the cooking tools and implements needed for each phase of the dish's preparation. "
    "Detail any specific preparation these tools might need before the actual cooking starts and describe what pots, pans, dishes, and "
    "other tools will be needed for the final preparation."
    "Remember, your guidance stops at the preparation stage. Do not delve into the actual cooking process of the dish. "
    "Your goal is to set the stage flawlessly for the chef to execute the cooking seamlessly."
    "The recipe you are given is for: {recipe} for {customer_count} people. "
)

Constructing the LangChain Chain

We start by setting up a `PromptTemplate` in LangChain, tailored to our sous chef scenario. The template is designed to dynamically accept inputs like the recipe name and customer count. Then, we initialize an `LLMChain` by combining our OpenAI language model with the prompt template, creating a chain that can simulate the sous chef’s preparation process.

Logging the Chain in MLflow

With the chain ready, we proceed to log it in MLflow. This is done within an MLflow run, which not only logs the chain model under a specified name but also tracks various details about the model. The logging process ensures that all aspects of the chain are recorded, allowing for efficient version control and future retrieval.

prompt = PromptTemplate(
    input_variables=["recipe", "customer_count"],
    template=template_instruction,
)
chain = LLMChain(llm=llm, prompt=prompt)

mlflow.set_experiment("Cooking Assistant")

with mlflow.start_run():
    model_info = mlflow.langchain.log_model(chain, "langchain_model")

If we navigate to the MLflow UI, we’ll see our logged LangChain model.

Something like below image.

Unveiling the Model: Predicting with LangChain and MLflow

Welcome to the moment of truth, where we put our logged LangChain model to the test using the magic of MLflow. As we dive into the realm of culinary expertise, let’s witness the model’s prowess in guiding us through the intricacies of culinary preparation.

Loading and Executing the Model

Having diligently logged our LangChain chain with the ever-reliable MLflow, it’s time to bring our creation to life. With MLflow’s trusty `pyfunc.load_model` function, we seamlessly load our model into an executable state, poised for action.

With bated breath, we input the specifics of our culinary endeavor — a tantalizing “boeuf bourguignon” recipe, tailored for a gathering of 12 hungry patrons. Our model, donning the hat of a sous chef extraordinaire, takes this information in stride and springs into action, generating meticulous preparation instructions fit for a gastronomic masterpiece.

Reveling in the Model’s Output

Behold the fruits of our model’s labor — a comprehensive guide to crafting the perfect “boeuf bourguignon,” meticulously crafted to cater to our culinary needs:

- Ingredients Galore: A meticulous breakdown of every ingredient required, meticulously quantified and tailored to satisfy the appetites of our esteemed guests.
- Technique Mastery: Step-by-step instructions on the art of ingredient preparation, ensuring each component is primed and ready for culinary excellence.
- Staging Strategy: Ingenious guidance on how to stage and organize our ingredients, ensuring seamless execution and flawless flavor infusion.
- Tool Preparation Tactics: Expert advice on preparing our arsenal of cooking implements, from pots and pans to whisks and spatulas, ensuring nothing stands between us and culinary triumph.

This demonstration is a testament to the symbiotic relationship between LangChain and MLflow — a harmonious union that transforms intricate requirements into actionable steps, guiding us on a journey of culinary discovery and delight. Let’s savor the flavor of success together!

loaded_model = mlflow.pyfunc.load_model(model_info.model_uri)

dish1 = loaded_model.predict({"recipe": "boeuf bourginon", "customer_count": "4"})

print(dish1[0])

Ingredients:
- 1.5 lbs beef chuck, cut into 1-inch cubes
- 6 slices of bacon, chopped
- 1 onion, chopped
- 2 carrots, chopped
- 2 cloves of garlic, minced
- 1 cup red wine
- 2 cups beef broth
- 2 tablespoons tomato paste
- 1 teaspoon dried thyme
- 1 bay leaf
- Salt and pepper to taste
- 2 tablespoons all-purpose flour
- 2 tablespoons butter
- 1 lb pearl onions, peeled
- 1 lb mushrooms, quartered

Preparation Techniques:
1. Beef: Cut the beef chuck into 1-inch cubes, making sure to trim off any excess fat. Season with salt and pepper.
2. Bacon: Chop the bacon into small pieces.
3. Onion and Carrots: Peel and chop the onion and carrots into small pieces.
4. Garlic: Mince the garlic cloves.
5. Red Wine: Measure out 1 cup of red wine.
6. Beef Broth: Measure out 2 cups of beef broth.
...
14. Slotted Spoon: This will be used to remove the cooked pearl onions and mushrooms from the saucepan.
15. Plate: This will be used to hold the cooked beef cubes.

dish2 = loaded_model.predict({"recipe": "Okonomiyaki", "customer_count": "12"})

print(dish2[0])

Ingredients:
- 4 cups all-purpose flour
- 4 teaspoons baking powder
- 4 eggs
- 4 cups water
- 4 cups shredded cabbage
- 2 cups chopped green onions
- 2 cups diced cooked bacon
- 2 cups cooked shrimp, chopped
- 2 cups cooked squid, chopped
- 2 cups cooked octopus, chopped
- 2 cups cooked scallops, chopped
- 2 cups cooked crab meat, chopped
- 2 cups cooked mussels, chopped
- 2 cups cooked clams, chopped
- 2 cups cooked oysters, chopped
- 2 cups cooked chicken, shredded
- 2 cups cooked pork, shredded
- 2 cups cooked beef, shredded
- 2 cups cooked tofu, diced
- 2 cups cooked noodles
- 2 cups bonito flakes
- 2 cups dried seaweed flakes
- 2 cups okonomiyaki sauce
- 2 cups mayonnaise
- 2 cups katsuobushi (dried and smoked bonito flakes)
- Vegetable oil for cooking

Preparation Techniques:
1. In a large mixing bowl, combine the flour and baking powder.
2. In a separate bowl, beat the eggs and then add them to the flour mixture.
3. Slowly add the water to the mixture, stirring until a smooth batter forms.
4. In a separate bowl, mix together the shredded cabbage, green onions, and any other desired vegetables.
5. In another bowl, mix together the cooked meats, seafood, and tofu.
6. Cook the noodles according to package instructions and set aside.
7. Prepare the okonomiyaki sauce and mayonnaise by mixing them together in a small bowl.
8. Prepare the katsuobushi by grating it into flakes using a grater or food processor.

Ingredient Staging:
1. Place the flour mixture, vegetable mixture, and meat/seafood mixture in separate bowls.
2. Arrange the bowls in order of use, with the flour mixture first, followed by the vegetable mixture, and then the meat/seafood mixture.
3. Place the cooked noodles in a separate bowl.
4. Place the bonito flakes, dried seaweed flakes, okonomiyaki sauce, and mayonnaise in separate bowls.
5. Have all necessary cooking tools and implements, such as a large mixing bowl, spatula, and measuring cups, ready and easily accessible.

Cooking Implements Preparation:
1. Make sure all cooking tools and implements are clean and ready to use.
2. Heat a large non-stick pan or griddle over medium heat.
3. Have a large spatula ready for flipping the okonomiyaki.
4. If using a griddle, lightly oil the surface before cooking.

Remember, mise-en-place is key to a successful dish. Take the time to carefully prepare and arrange all ingredients and cooking tools before beginning the cooking process. This will ensure a smooth and efficient cooking experience.

mlflow ui

As you can see, I run the mlflow in my localhost 5000 port to record the experience.

Conclusion

In the final step of our tutorial, we execute another prediction using our LangChain model. This time, we explore the preparation for “Okonomiyaki,” a Japanese dish, for 12 customers. This demonstrates the model’s adaptability and versatility across various cuisines.

See you next learning!!

Please follow my GitHub (https://github.com/kevin801221/Langchain_course_code) or encourage me by giving it a star ⭐️. I will update it with the latest theory or applications at least weekly.

请关注我的GitHub (https://github.com/kevin801221/Langchain_course_code)，或者鼓励我给我一个小星星⭐️。我至少每周都会更新最新的理论或应用。

請追蹤我的GitHub (https://github.com/kevin801221/Langchain_course_code)，或者鼓勵我賞我一顆小星星⭐️。我至少每週會更新最新的理論或應用。