Had a chance to attend a meetup on Orchestrating LLMs and Agents. The following are some of the points that I took note in the three sessions that were part of 2 hour long presentation and demos:

What Agentic Frameworks should you use to get results fast? - Sam Witteveen

  • One year ago, the agents hit the scene
  • Autonomous Agent:
    • Something that produces a result using by itself
    • Acting on the world - via tools, search
    • Uses a LLM for decision making and reasoning
    • Modular
    • Supposedly act like humans
    • More than just a linear chain
  • Agents vs chains
    • Put a loop on to chains
    • Agents
      • Do I have enough information to do what I want to do ?
    • Chains have a clear starting point and end point and probably do not have loops
  • Agents equation
    • Output = function( context + instruction/query)
      • Can be a call to a tool
      • Can be a reasoning step
      • Can be a final answer
    • How do deal with different outputs from the function
    • output -> controller -> tools
  • What do you need for agents to function ?
    • You need a good LLM to create a good agent
    • Last year this time, LLMs were not good enough for agents
    • Mistral AI, Gemini, Cohere, OpenAI, groq and many more LLM models that can act as reasoning machines for agents
      • Ask a model 10 times and then average it out
      • Ask multiple models and average it out
    • Need a bunch of good tools
    • Often need to make custom tools for specific use case
    • Finance tools
    • Social media tools
    • Drug/Chemical look ups
    • Other NN as tools
      • Agent calls a finetuned BERT rather than use the LLM for sentiment analysis
  • Tool Quality
    • It is all about data quality from the tools
    • How does the tool return the data it gets to help the LLM
    • How does it handle failures
    • The tool should be able to highlight what is important
    • Most of the tools do not fail safe properly. This needs massive improvement
  • A good agent framework
    • Handle low level calls for you
    • Have a good prompts that match the LLM: The way you prompt OpenAI and the way you prompt Gemini is different
      • How do we deal with this ? May be translate prompts across LLMs. Rewrite a Dalle prompts to another LLM
      • Do a translation - text to text
      • Meta prompt that will write a prompt based on your requirements
      • Everyone is used to OpenAI and hence most of the LLMs use similar prompts like OpenAI
    • Can use great tools
    • Easy to use: Some frameworks are powerful but pain to use
    • Flexibility
      • Raptor paper agent framework
    • Tracing
      • Most of the agents suck in this area
    • Version control
      • Almost no one give this out of box
  • Agent Components
    • Task decomposition
    • Plan and execute
    • Self Critique
    • Sequential vs Hierarchy vs Graph
    • Manager and workers
    • LLM programs
    • ReACT
    • Function Calling
  • Task Decomposition
    • How will you and LLM breakdown the result
    • How do yo turn into a series of steps that can be executed
  • Function Calling
    • Structured responses
    • Tool use
    • Heavily use things like Pydantic, JSON and XML(Anthropic)
    • Mostly for Prop models but it is changing. But it is coming on open source models
    • Can finetune some open models
    • Gemma is built for responsible AI development from the same research and technology used to create Gemini models
  • Planning an agent
    • Paper, Pen/Pencil are your friends
    • Map out the steps of what you will want
    • Map out the decisions that need to be made
    • How could you constrain the decision
    • Constrain the decision as much as possible
  • Prompting for Agents
    • Break things down
    • Make it simple
    • Ask them to check
    • Smart Prompts - Prompts that you parse out
    • Prompt post processing
  • 3 types of memory
    • Conversational memory
    • Persona memory - self consistency
    • Long term memory - Look ups
  • Release dates
    • Auto-GPT, BabyAGI, Camel, Generative Agents, Voyager
    • Auto-GPT
      • Does not have flexibility
    • BabyAGI
      • Task Creation
      • Task Prioritization
    • Minecraft Voyager
      • Mini programs
    • Engineer-GPT
      • Breaking things down
      • Preprompts
      • Logical steps
  • Auto Gen
    • Made by MSFT
    • Sept 2023 release
    • Conversable agent
    • Multi agent
    • Conversation driven
    • User Proxy agents - Can be a human or not a human
    • Able to use more tools
  • Auto Gen Studio
    • No code version of Auto Gen
  • Instructor
    • Generating structure
    • Not an agent framework
    • Guide the LLM response back the way
    • Function calling on steroids
    • Pydantic and Schema generation
    • JSON
  • Crew AI
    • Built on LangChain
    • Multi Agent
    • Mimics a org chart
    • Can be sequential or hierarchical
    • Tools from Langchain + CrewAI tools
    • Heavily anthropomorphic
    • role, goal and backstory as key input to CrewAI agent
    • Each agent has a set of tools that it can call. It can delegate
    • Define the agents, Define the tasks
  • Lang Graph
    • Inverse of CrewAI
    • Crash course and example - Sam’s youtube channel
    • Built using LCEL
    • Stateful - Build a state machine
    • Multi actor
    • Persist that state and pass it around the graph
    • Nodes - building blocks each one represents a function or a computation step
    • Edges - Join the steps
    • https://python.langchain.com/docs/langgraph
  • Work on CrewAI and then work on Lang Graph
  • CrewAI is rigid
  • Task Weaver
  • CrewAI, Lang Graph and Auto Gen
  • Think about patterns of agents
  • Every step we check the oracle and then move forward
  • Is there a possibility if a tool is missing, LLM writes the code that mimics the tool

LLM Agents - B2C use cases & evaluation - Praveen Govindaraj

  • B2C world
  • L1 technical support
  • L1 Auditor that audits the prompts
  • Singtel team using Lang Graph to create an agent in the context of customer service
  • L2 Technical support
  • Using Mistral AI 8x7b-instruct
  • L2 Auditor
  • Q&A
    • Is the system in production ?
      • No. It is at a POC stage
    • What vector database are using and what graph database are you using ?
      • chromadb
    • What is the $ value savings or time savings per agent ?
      • 200k to 300k SGD per month
    • How many people are working ?
      • 3 member team
  • LLM Data Agents
  • Evaluation criteria for agents
    • Query translation
    • Context
    • Groundedness
  • Use trulens to track metrics
    • evaluate the agents
    • evaluate the costs
  • Track all the prompts

Using DSPy with Gemini and Gemma - Martin Andrews