A Note on Large Language Models

This is an ongoing attempt to make sense of how LLMs work, aimed at a general audience
note
resources
llms
Author

Wim Louw

Published

July 2, 2024

Modified

December 3, 2024

In my role as a Data Scientist at the City of Cape Town, I’ve had the opportunity to work on a number of exploratory text-processing projects — from simple topic-modelling, to building custom text-classification models, to working with word-embeddings, audio-data, implementing semantic search and retrieval augmented generation, prompt evals, and more. Adding LLMs to my NLP tool-kit has been fun and frustrating, and I’ve learned a lot about the quirks and challenges of working with the technology. I’ve gained some hands-on experience in building products with both “open” LLM models (llama, gemma, mistral, wizard, &c.) and commercial models (chatgpt, claude, &c.). A request the team gets a lot is to talk to people in other departments about “Generative AI” and LLMs, and to explain a bit about how they work. Here are some rough notes in that direction.

Some definitions

  • The field of “Artificial Intelligence” (AI) develops technologies to emulate human performance in solving a variety of tasks
  • “Machine Learning” (ML) is a sub-field of AI that develops statistical models that use relationships established in example data to make predictions with new data
  • “Generative AI” is a type of ML that uses relationships established in example data to generate synthetic data that closely resembles the characteristics of the example data

What is a Large Language Model (LLM)

  • Large Language Models are a type of Generative AI
  • Think of LLMs as “synthetic” text generators
  • An LLM model represents numerical relationships between billions of word fragments, combinations, and sequences, obtained from massive amounts of (semi-curated) text from the internet, books, news sources, forums, &c., and then augmented with additional feedback and examples to influence the way content is generated
  • The better the quality and volume of data to draw relationships from, the better the ability of the model to create realistic and potentially useful content in response to — or “in continuation of” — new bits of text it is given, such as a question or a task from a user
  • This is rather cool, because for quite a few things, these continuations will tend towards correctness (in the mundane sense). The flip-side is it can also tend towards the kind of common biases and misconceptions we see on the internet (despite efforts to avoid that), so watch out
  • You interact with an LLM model through a “prompt”. The “prompt” is whatever text you give the LLM and the history of the text you’ve given it and the text it generates, and so on
  • bit of text (prompt) -> model -> continuation
  • “What has this model likely seen a lot of?”, and “How can I get this LLM to generate something in the right vein”, are useful thoughts to have when interacting with these models
  • The implication is that you can get better or worse answers depending on how you pose a question, i.e. what you put in the prompt and conversation history
A mermaid diagram showing three blocks labeled 'lots and lots of semi-curated text', 'model parameters', and 'augmentation'. An arrow from the curated text box points to the model (representing training), and another arrow points to and from augmentation (representing fine-tuning).
A mermaid diagram with three boxes labeled 'Input', 'Model', and 'Output'. Arrows flow from input to model, from model to output, and back from output to input. The input is 'prompt' and the output is 'continuation'.

Quick clarification

  • Vanilla LLM: Trained on text, generates more text in continuation of the user’s prompt, for e.g. in answering a question — think OpenAI’s ChatGPT-3.5, or Meta’s llama 3
  • LLM “compound” model: Trained on text, has the ability to trigger other (external) processes like searching the web or a database to find bits of additional text to concatenate to the prompt. This can result in a more relevant or useful continuation in response to a user’s text — think Google’s Gemini or Microsoft’s Copilot. This is a form of “retrieval augmented generation” (RAG) where you enrich the prompt with potentially useful information the model was not trained on
  • Multi-modal model: Trained on a mix of things, including images or audio, not just text, depending — think OpenAI’s GPT-4o model and beyond
  • This post focuses on LLMs and how they work, but the general principle remains the same, i.e. they are synthetic content generators

Gotchas

  • LLMs can be unreliable — they mimic human language, as encoded in their parameters, and are only “concerned” with generating a coherent sequence of words. A lot of the time that will tend towards the answer you’re looking for, but not always
  • They are not a search-engine or a database of facts, they just generate text! But as mentioned before, there are approaches like RAG, in compound-models, where a prompt can be enriched by an external search process, i.e. “relevant” bits of additional text is concatenated to your prompt ahead of generation
  • They are not “intelligent” in the sense of being intentional, or able to “reason” holistically, though efforts are being made to approximate “reasoning” better. And one can goad a kind of rudimentary reasoning pattern out of it with structured prompting techniques
  • Nonetheless, they are immensely powerful, and can often be very useful if you know what you want to do
  • Handle their output with skepticism, while having a clear idea in mind of what you want to achieve — i.e. “how do I direct this stream of words in a useful direction?”

Best use-cases

  • Mundane, low-stakes tasks — boilerplate content, code, &c.
  • Things you can check for correctness and monitor
  • First passes, first drafts, &c.
  • A complement to other Natural Language Processing (NLP) tasks — low-stakes ones you can check or improve over time, like text extraction, categorization, tagging, &c.

Conclusion

  • You can’t outsource your good sense
  • When trying to solve problems, break them down, think about what you are trying to achieve, the stakes, and the best tool for the job — sometimes an LLM is not the tool for the job!
  • GenAI and LLMs can be extremely useful tools and time-savers, but are fundamentally unreliable, so consider how much that matters for the task you have in mind
  • If you plan on doing something very ambitious, like automating some types of tasks with LLMs, I’d highly recommend you keep a human-in-the-loop, to check, approve, and monitor how the tool is doing
  • Cultivate the idea of “LLM-as-tool” — a weird unwieldy tool — instead of “LLM-as-expert”