Create your own chatbot with Llama2, Ollama and Gradio

4 min readDec 1, 2023

Image generated using Image Creator from Microsoft Designer

Overview

I’m going to demonstrate today how to quickly set up your own offline chatbot so you don’t have to worry about your data being used for training. It’s a lot simpler than you may think. Kudos to the creators of Gradio, Ollama, and of course Meta for opening up their Llama model. I won’t go into great detail about how these libraries and frameworks operate; instead, this tutorial will only walk you through the process of making a basic chatbot in Ubuntu/WSL. You should also be able to run it in any other OS with Python3 installed.

Requirements:

8 GB of RAM
20 GB disk space
Python 3.10 or above

Steps:

1. Install Ollama

curl https://ollama.ai/install.sh | sh

For manual installation, please visit the official page: https://github.com/jmorganca/ollama

We will be using it to download and run the llama models locally.

2. Download and run LLama2:7B using Ollama

ollama run llama2

It will take some time to download the model for the first time, but after that it should not take much time to load. You should see something like this after the download is complete and should be able to chat with it in command line:

You can try the following to check if the Ollama server is running or not:

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
}'

If you do not get the response, you can start the server with the following command:

ollama serve

3. Install Gradio

pip install gradio

Gradio is an open-source Python library for creating customizable web interfaces. We will be using it to create a chat interface.

Also, I cannot stress enough about using virtual environments for your Python projects, as they isolate dependencies and prevent version conflicts, ensuring a smooth and consistent development experience.

Here is a really good article on how to use Python virtual environments: https://www.freecodecamp.org/news/how-to-setup-virtual-environments-in-python/

4. Code

Create a main.py file in your current project folder and copy the following:


import requests, json

import gradio as gr


model = 'llama2:latest' #You can replace the model name if needed
context = [] 



import gradio as gr

#Call Ollama API
def generate(prompt, context, top_k, top_p, temp):
    r = requests.post('http://localhost:11434/api/generate',
                     json={
                         'model': model,
                         'prompt': prompt,
                         'context': context,
                         'options':{
                             'top_k': top_k,
                             'temperature':top_p,
                             'top_p': temp
                         }
                     },
                     stream=False)
    r.raise_for_status()

 
    response = ""  

    for line in r.iter_lines():
        body = json.loads(line)
        response_part = body.get('response', '')
        print(response_part)
        if 'error' in body:
            raise Exception(body['error'])

        response += response_part

        if body.get('done', False):
            context = body.get('context', [])
            return response, context



def chat(input, chat_history, top_k, top_p, temp):

    chat_history = chat_history or []

    global context
    output, context = generate(input, context, top_k, top_p, temp)

    chat_history.append((input, output))

    return chat_history, chat_history
  #the first history in return history, history is meant to update the 
  #chatbot widget, and the second history is meant to update the state 
  #(which is used to maintain conversation history across interactions)


#########################Gradio Code##########################
block = gr.Blocks()


with block:

    gr.Markdown("""<h1><center> Jarvis </center></h1>
    """)

    chatbot = gr.Chatbot()
    message = gr.Textbox(placeholder="Type here")

    state = gr.State()
    with gr.Row():
        top_k = gr.Slider(0.0,100.0, label="top_k", value=40, info="Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative. (Default: 40)")
        top_p = gr.Slider(0.0,1.0, label="top_p", value=0.9, info=" Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.5) will generate more focused and conservative text. (Default: 0.9)")
        temp = gr.Slider(0.0,2.0, label="temperature", value=0.8, info="The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)")


    submit = gr.Button("SEND")

    submit.click(chat, inputs=[message, state, top_k, top_p, temp], outputs=[chatbot, state])


block.launch(debug=True)

5. Run the ChatBot UI

python3 main.py

You should see the following interface:

And you can now your chatbot is ready:

Conclusion

As I said earlier, the aim of this tutorial was to guide you how to run it rather than to explain how it all works. If you face any errors while running the above code, let me know.