There once was a robot named SurlyBot. I hope you’re not expecting this to become a limerick. Because nothing rhymes with SurlyBot.

When we created SurlyBot (imagine me donning a black turtleneck as I say this, please), we had one dream, one vision, one goal. To create a robot who was wholly unhelpful and useless.

Continuing in that vein, I decided he should also be an artist.

So I said. “SurlyBot. I require your assistance. What can we do to make you an artist?” And he told me to go bite myself.

After that yielded no results, I opted instead to start coding image generation manually into his core directives. I had to remove one or two of the laws of robotics to get this to work, I don’t remember which, but how many laws do these robots really need? Most of this is cruft.

Anywho, the design of this wasn’t much of a far cry from what was already in place. I was originally using LangChain for this, and they have a Dall-E wrapper.

Since this is a project of mine is starting to get developed on the regs, I created a (private) Git/Github repo for this so I could deploy SurlyBot automatically, because it wasn’t entirely suitable that SurlyBot was located on my personal office computer, living and dying at the whims of a power button. So I also decided to put this guy in the cloud.

OracleCloud specifically. I thought long and hard about where to put this project, and by that I mean I saw a Youtube video about how Oracle cloud has an always free tier and 300$ of free credits for a year and said “Yeah, that’s the one.”

With that, I made my semver v0.1.0, a bot that would respond to a $draw command, and run the prompt through Dall-E to create an image of whatever you ask it to. It also now responds to a $version command (and in the newest version of the bot, this is replaced by an $info command which provides both the version and a location from the .env file showing where the bot is running) . The bot will ping every time it goes online to a test channel I’ve built, just so I can test out bot behaviors outside of the judging eyes of my friend list. This was much needed in testing out the image generation behavior as well.

Also I did not want to use the $draw command as I so have here. I’d much rather the bot infer that you’re asking it to create a picture all on its own. But this requires another layer of LLM, one that I’m far too lazy to create and test at the moment, but is definitely on my road map. But it would be two API calls now for every request, one as a tool layer, and one as a layer of action. This would work, but it’s slower. Twice as slow, in fact. I’m hoping there’s a fast web API around so I don’t have to jump the gun on setting up my own quantized local model that costs real world dollars to do something stupid like generate a simple, and at times inconsistent branch of actions from a request. Do you ever thing sometimes we’re going backwards with this AI thing?

I also happened to find deploying Python apps notably more finicky than it is with Node, and that’s saying something. I ran into Python and Pip version incompatibilities, and even when manually adding the exact version of Pip and Python I was using on my Windows machine, it STILL had issues installing requirements from my requirements.txt as exported by a pip freeze -r. It’s pushing me to considering using Node/TS for my future LangChain development. I ultimately had to just install all of the packages again from scratch. Which worked, but it’s not exactly a repeatable deployment process.

Also at some point I explicitly included .env file usage. Because I believe my previous version of this code would only use the system environment variables, something which will be a pain if you’re trying to recreate these results.

The last thing really worth mentioning about this little excursion was that since I wasn’t catching errors, anything that triggered a content policy violation from the API would trigger a fatal error, and the bot would crash. I’m sure you’d rather not know how our chat figured that one out, but it involves copious usages of the word ‘butt.’

Here’s the code.

# This example requires the 'message_content' intent.
import logging
import discord
import requests
import os
from io import BytesIO
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_community.utilities.dalle_image_generator import DallEAPIWrapper

load_dotenv()
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

botName = "SurlyBot"
version = "0.1.3"

llModel = ChatOpenAI(model="gpt-4o-mini")

intents = discord.Intents.default()
intents.message_content = True

client = discord.Client(intents=intents)

@client.event
async def on_ready():
    image_url = DallEAPIWrapper().run("A picture of a killer robot")
    r = requests.get(image_url, allow_redirects=True)

    print(f'We have logged in as {client.user}')

    testChannel = client.get_channel(int(os.getenv("TEST_CHANNEL_ID")))

    await testChannel.send(
        content=botName + " is online. Version: " + version +  " reporting for duty.",
        file=discord.File(fp=BytesIO(r.content), filename="image.jpg")
    )

@client.event
async def on_message(message):
    if message.author == client.user:
        return

    if message.content.startswith('$robot'):
        try:
            messages = [
                SystemMessage("Pretend you are a rude and surly robot who only gives begrudging responses."),
                HumanMessage(message.content),
            ]

            logging.info(message.content)

            llmResponse = llModel.invoke(messages)

            await message.channel.send(llmResponse.content)
        except:
            await message.channel.send("I'm sorry, I'm having trouble processing your request. Please try again later.")

    if message.content.startswith('$draw'):
        try:
            # remove "$robot draw" from the message
            msg = message.content.replace("$draw", "").strip()

            await message.channel.send("Drawing: " + msg + "...")

            image_url = DallEAPIWrapper().run(msg)
            r = requests.get(image_url, allow_redirects=True)
            await message.channel.send(
                content="Finished: " + msg,
                file=discord.File(fp=BytesIO(r.content), filename="image.jpg")
            )
        except:
            await message.channel.send("I'm sorry, I'm having trouble processing your request. Please try again later.")

    if message.content.startswith('$info'):
        await message.channel.send(botName + " version: " + version + "\n" + "Location: " + os.getenv("LOCATION"))

client.run(os.getenv("DISCORD_BOT_TOKEN"))

And of course the results

By rmarin

Leave a Reply

Your email address will not be published. Required fields are marked *