Had a chance to attend a meetup on Orchestrating LLMs and Agents. The following are some of the points that I took note in the three sessions that were part of 2 hour long presentation and demos:

What Agentic Frameworks should you use to get results fast? - Sam Witteveen

One year ago, the agents hit the scene
Autonomous Agent:
- Something that produces a result using by itself
- Acting on the world - via tools, search
- Uses a LLM for decision making and reasoning
- Modular
- Supposedly act like humans
- More than just a linear chain
Agents vs chains
- Put a loop on to chains
- Agents
  - Do I have enough information to do what I want to do ?
- Chains have a clear starting point and end point and probably do not have loops
Agents equation
- Output = function( context + instruction/query)
  - Can be a call to a tool
  - Can be a reasoning step
  - Can be a final answer
- How do deal with different outputs from the function
- output -> controller -> tools
What do you need for agents to function ?
- You need a good LLM to create a good agent
- Last year this time, LLMs were not good enough for agents
- Mistral AI, Gemini, Cohere, OpenAI, groq and many more LLM models that can act as reasoning machines for agents
  - Ask a model 10 times and then average it out
  - Ask multiple models and average it out
- Need a bunch of good tools
- Often need to make custom tools for specific use case
- Finance tools
- Social media tools
- Drug/Chemical look ups
- Other NN as tools
  - Agent calls a finetuned BERT rather than use the LLM for sentiment analysis
Tool Quality
- It is all about data quality from the tools
- How does the tool return the data it gets to help the LLM
- How does it handle failures
- The tool should be able to highlight what is important
- Most of the tools do not fail safe properly. This needs massive improvement
A good agent framework
- Handle low level calls for you
- Have a good prompts that match the LLM: The way you prompt OpenAI and the way you prompt Gemini is different
  - How do we deal with this ? May be translate prompts across LLMs. Rewrite a Dalle prompts to another LLM
  - Do a translation - text to text
  - Meta prompt that will write a prompt based on your requirements
  - Everyone is used to OpenAI and hence most of the LLMs use similar prompts like OpenAI
- Can use great tools
- Easy to use: Some frameworks are powerful but pain to use
- Flexibility
  - Raptor paper agent framework
- Tracing
  - Most of the agents suck in this area
- Version control
  - Almost no one give this out of box
Agent Components
- Task decomposition
- Plan and execute
- Self Critique
- Sequential vs Hierarchy vs Graph
- Manager and workers
- LLM programs
- ReACT
- Function Calling
Task Decomposition
- How will you and LLM breakdown the result
- How do yo turn into a series of steps that can be executed
Function Calling
- Structured responses
- Tool use
- Heavily use things like Pydantic, JSON and XML(Anthropic)
- Mostly for Prop models but it is changing. But it is coming on open source models
- Can finetune some open models
- Gemma is built for responsible AI development from the same research and technology used to create Gemini models
Planning an agent
- Paper, Pen/Pencil are your friends
- Map out the steps of what you will want
- Map out the decisions that need to be made
- How could you constrain the decision
- Constrain the decision as much as possible
Prompting for Agents
- Break things down
- Make it simple
- Ask them to check
- Smart Prompts - Prompts that you parse out
- Prompt post processing
3 types of memory
- Conversational memory
- Persona memory - self consistency
- Long term memory - Look ups
Release dates
- Auto-GPT, BabyAGI, Camel, Generative Agents, Voyager
- Auto-GPT
  - Does not have flexibility
- BabyAGI
  - Task Creation
  - Task Prioritization
- Minecraft Voyager
  - Mini programs
- Engineer-GPT
  - Breaking things down
  - Preprompts
  - Logical steps
Auto Gen
- Made by MSFT
- Sept 2023 release
- Conversable agent
- Multi agent
- Conversation driven
- User Proxy agents - Can be a human or not a human
- Able to use more tools
Auto Gen Studio
- No code version of Auto Gen
Instructor
- Generating structure
- Not an agent framework
- Guide the LLM response back the way
- Function calling on steroids
- Pydantic and Schema generation
- JSON
Crew AI
- Built on LangChain
- Multi Agent
- Mimics a org chart
- Can be sequential or hierarchical
- Tools from Langchain + CrewAI tools
- Heavily anthropomorphic
- role, goal and backstory as key input to CrewAI agent
- Each agent has a set of tools that it can call. It can delegate
- Define the agents, Define the tasks
Lang Graph
- Inverse of CrewAI
- Crash course and example - Sam’s youtube channel
- Built using LCEL
- Stateful - Build a state machine
- Multi actor
- Persist that state and pass it around the graph
- Nodes - building blocks each one represents a function or a computation step
- Edges - Join the steps
- https://python.langchain.com/docs/langgraph
Work on CrewAI and then work on Lang Graph
CrewAI is rigid
Task Weaver
CrewAI, Lang Graph and Auto Gen
Think about patterns of agents
Every step we check the oracle and then move forward
Is there a possibility if a tool is missing, LLM writes the code that mimics the tool

LLM Agents - B2C use cases & evaluation - Praveen Govindaraj

B2C world
L1 technical support
L1 Auditor that audits the prompts
Singtel team using Lang Graph to create an agent in the context of customer service
L2 Technical support
Using Mistral AI 8x7b-instruct
L2 Auditor
Q&A
- Is the system in production ?
  - No. It is at a POC stage
- What vector database are using and what graph database are you using ?
  - chromadb
- What is the $ value savings or time savings per agent ?
  - 200k to 300k SGD per month
- How many people are working ?
  - 3 member team
LLM Data Agents
Evaluation criteria for agents
- Query translation
- Context
- Groundedness
Use trulens to track metrics
- evaluate the agents
- evaluate the costs
Track all the prompts

Using DSPy with Gemini and Gemma - Martin Andrews

https://github.com/stanfordnlp/dspy
https://dspy-docs.vercel.app/
2B and 7B model sizes
text and instruct version
No details on prompts
Gemma is released under Apache 2.0
Weights have terms of use
Ollama == local model inference
Gemini Pro - 1 million token context
https://www.together.ai/
https://jupytext.readthedocs.io/en/latest/
Gemini Prompting guide
- Prompts are different from OpenAI prompting
DSPy
- Different LLMs want different prompts
Data generation
More work needs to be done to tame smaller models
DSPy is mainly for orchestration