What is AI doing...and why does it work (for lawyers) - Part 1

Introduction

If you have ever sat in a firm presentation telling you to use Artificial Intelligence (AI) tools in your legal work - without anyone explaining what AI is actually doing, or why it should work for you - you are not alone. During my time in the AI and Innovation teams of two major law firms, running sessions for lawyers on exactly this topic, I have noticed a consistent pattern. At the start of every session, there is a sense of mystery and confusion about what AI is, with reactions ranging from extreme scepticism ("I cannot trust anything it says") to extreme optimism about its capabilities mixed with existential dread ("Will AI automate my job as a lawyer?").

This lack of understanding about what AI is and does has real consequences - if you underestimate what it can do, you may not identify the best use cases in legal practice, which you are best placed to do as the subject matter expert. On the other hand, if you overestimate what it can do, you come away disillusioned when you get a poor outcome, and worse, may fail to use AI responsibly in line with your professional obligations. It can also be challenging to go down the self-learning route. Go too deep, and you are lost in mathematics and model architecture. Go too shallow, and you are left with prompting tips like - "just use these keywords in your prompt" - but without a framework to understand what is happening.

What I found, running those sessions, is that there is a level of abstraction in between that resonates particularly well with lawyers: accurate enough to be genuinely useful, grounded in logic rather than mathematics, and illustrated with examples drawn from legal practice rather than computer science. The lawyer-mindset, it turns out, is very well suited to understanding the logical way in which AI works, once you strip away the jargon. The post (as for others in this series) is an attempt to arrive at that sweet spot of explaining AI, primarily for a legal audience. By the end, you will understand how AI actually works, the ways in which it differs from the "human" way of learning or reasoning, and why despite its limitations, it can work for you, as a lawyer.

The three kinds of AI

A common source of confusion is that AI, Machine Learning (ML), Large Language Models (LLMs), and Generative AI (GenAI) are often used interchangeably - they are related concepts but they do not mean the same thing. Our natural lawyer's instinct here is to reach for the Definitions Clause. And this exists in Statutes like Article 3, EU AI Act. However, because they try to be comprehensive enough for a fast-moving field like AI, these definitions can get quite verbose, and are not particularly illuminating for our purposes.

A good starting point is how the International Standards Organization defines AI - a computer system's ability to perform tasks that would typically require human intelligence. This is a broad definition, and includes systems that execute a certain action based on the happening of a pre-condition (i.e. if "X", then "Y") also called Rules-based AI. In that sense, the first system that fits this definition may be older than you think - the Logic Theorist, built in 1955-1956, used rules of logic to prove mathematical theorems. No learning. No language models. Just rules. With that starting point, concepts like ML and LLMs become much easier to map - not as competing terms, but as concentric circles, each sitting inside the one before it.

Rules-based AI

Have you ever tried to type "omw" to your friend, while rushing out to meet them, and it automatically changes to "On my way!"? If yes, you have used Rules-based AI. Behind the scenes, the logic is exactly as straightforward as it appears – if someone types a shortcut, substitute it with the phrase. If "X" happens, then do "Y".

Some might argue that this seems so simple, it is unfair to actually call it AI. Yet this same principle - if X, then Y - underpins some of the most widely-used legal technology (LegalTech) solutions, even today.

  • Most contract automation solutions like ContractExpress, Avvoka, Clarilis and Document Drafter, use conditional logic at their core, which allows users to say things like if the governing law is New York, include this clause; if not, omit it.

  • Contract Companion is a proofreading tool that applies a consistent set of human-defined rules (check for defined terms, cross-reference accuracy, numbering consistency) across every document it reviews.

  • Offices and Dragons makes mass edits across entire suites of documents (such as changing party names, financial figures, or entire clauses), and creates redline mark-ups of entire document suites simultaneously using rules-driven operations.

  • Neota Logic, Bryter and Josef allow lawyers to build client-facing applications - intake forms, decision trees, and triage workflows - entirely from rules-based logic.

The great thing about Rules-based AI is that you have complete control over the output - if you have specified a rule, it will be executed that same way every time. But what happens if you encounter a situation that is not covered by your rules? Rules-based AI fails in such edge cases. More importantly, if you need to make a change to keep the rules up to date, you need to understand how the rules are set up - which is like learning a new language. Look at the screenshot below of a Contract Express template catering for a situation where you have a new versus an existing customer - changing one condition requires understanding the entire markup language.

This is not to say that Rules-based AI is not useful - the tools above are proof of that, and there are use cases where it is exactly what you want. But it leaves a significant gap, rooted in a fundamental limitation of how Rules-based AI works - rules only work when you can anticipate every situation in advance and write explicit instructions for each. The real world is full of situations that nobody thought to write rules for.

There is also an important distinction worth highlighting here - Rules-based AI is deterministic: the same input will always produce the same output, every single time. Every tool in the circles beyond this is probabilistic - meaning the same input can produce different outputs at different times. This has practical implications for how much you can rely on them, and in what contexts. However, this is not a bug, but a feature of how these systems work - one that makes it possible to solve problems too complex to reduce to a set of rules.

Machine Learning ("ML") based AI

That is where ML-based AI comes in, and things get more interesting - instead of spelling out the rules the AI must follow, you show it solved examples and the AI "learns" how to solve the problem. You might have heard of the infamous Zoom Cat Lawyer, where a US attorney showed up in a hearing with the cat filter on. Let us imagine we want to avoid such embarrassments by creating a Zoom Cat filter, which recognises pictures of cats (and prevents them from showing up in court!).

The Rules-based AI approach - defining every rule for what makes a cat a cat - runs into trouble. Not because the rules are wrong, but the real world is messier than what any rulebook can handle. You need to define what a cat is across all species, with rules about ear shape, size, tail shape etc. However, doing this exhaustively for all varieties of cats can get... well, exhausting. And you can certainly get edge-cases which may not be captured by the rules (on a lighter note, see this brilliant collection of humans who look like cats, and could throw off your Rules-based system!)

The ML-based AI approach is to show the AI 1000 labelled pictures of cats, and it "learns" the distinguishing patterns. I keep using learned in double quotes (" ") because it is not how we humans learn a concept. Without going into too much maths, there is a three step process involved in this "learning":

  1. The AI predicts whether a certain picture is a cat, taking into account patterns in the pixel data (parameters) giving different levels of importance to each (weights).

  2. It receives feedback based on whether it got the answer correct - if it was incorrect, it refines the weights of the different parameters to give other predictions, till it gets the right answer.

  3. With enough examples, it builds up a sophisticated concept of a cat, also known as a model, that can accurately classify pictures of cats not seen before.

ML-based AI is more prevalent around us than we might think. A common example is the spam filter in your email, which has been trained to recognise spam emails, "learns" every time you flag an email as spam, and starts categorising similar emails as spam in future.

In LegalTech, ML-based AI powers a large number of tools in use:

  • Document review tools like Kira, Luminance and Eigen Technologies can identify and extract clauses (like change of control, indemnity clauses etc.) from contracts, after being shown sufficient examples of such clauses.

  • With eDiscovery tools like Relativity, Everlaw and Nuix , the more documents lawyers tag as relevant or not relevant, the better the model gets at predicting relevance across the remaining unreviewed documents.

  • Analytics tools like Lex Machina, Predicta, Premonition and BlueJ learn patterns from millions of historical cases to make predictions about how judges rule, how long cases take, as well as how they are likely to be decided.

ML-based AI opens up more possibility for legal work but suffers from two key limitations - first, training a model by showing solved examples is a time-consuming task, which must be done carefully to avoid statistical errors like overfitting and underfitting. Overfitting happens when the model learns the examples too literally and fails to generalise; underfitting happens when it has not seen enough examples to learn the pattern reliably. Both mean that the model performs poorly on real-world data. Second, the training you do for a certain task cannot be expanded to other tasks. For instance, if you train a model to recognise "change of control" clauses, you cannot use it to recognise "indemnity" clauses, without completing the training from scratch. For lawyers, that often means that the time and effort spent in training may not be worth the benefit, depending on the use case. Therefore, ML-based AI is powerful within the boundaries of what it was trained on, but brittle outside them. The next circle addresses this limitation - but that increase in capabilities has some trade-offs, as we will see.

LLM-based AI

Large language models or LLMs are the engines behind chatbots like ChatGPT, Claude, Gemini and DeepSeek. They use the same principle as ML-based AI - learning patterns from examples - but apply it to vast amount of text drawn from across the internet. And instead of learning to classify or predict, it learns to generate text - hence called GenAI. To understand how LLM-based AI works, we will walk through how OpenAI created ChatGPT, and explain some unfamiliar terms - tokens and embeddings - along the way.

Tokens

When OpenAI set out to map a large part of the internet, it needed to break language down into manageable units also called tokens. Instead of treating each word as a token, it can be efficient to treat common parts of words (like "ify" in indemnify or ratify) as a token, and reuse them as needed.

Therefore, an LLM does not read words the way you and I do - it reads them as tokens, which are roughly equivalent to words or parts of words. Tokens are the currency of LLMs - they determine how much text the model can process at once, how much a query costs, and why LLMs sometimes cut off mid-answer.

Embeddings

Once text is broken into tokens, how does the model know that "indemnify" and "hold harmless" are related concepts that should follow each other in a clause? The answer is context. While processing vast amounts of text from the internet (including contractual clauses), words that consistently appear near each other, or in similar sentences get treated as related.

This is also where LLM-based AI differs from ML-based AI. The latter, when trained to recognise an indemnity clause, does one thing and nothing else - it has seen enough labelled examples of indemnity clauses to recognise that pattern. LLM-based AI tries to do something far more ambitious - given part of a sentence, it tries to predict the next word. But to do that well across the breadth of human language, it needs to build up an extraordinarily rich map of how words, ideas and concepts relate to each other - across billions of examples.

The technical term for this map is embeddings: mathematical representations that place each word or word-fragment as a point in a vast, multi-dimensional space, where related terms end up close together and unrelated ones far apart. You can see below a visualization of embeddings, showing the words related to "contract". The AI model has learned these relationships not because anyone told it that “contract”, "rent" and “lawsuit” are related, but because those words consistently appear in similar contexts across millions of documents. Once the model has mapped all these relationships, it can look at a sequence of words and predict what comes next. Not because it knows what is correct, but because it has learned what is statistically likely given everything it has seen.

GenAI is nothing but "fancy auto-complete"

At this point, you might be thinking: this sounds like the autocomplete on my phone, just bigger. And at a mechanical level, you are right. The model is predicting the next token based on what came before. That is, at its core, what autocomplete does.

Give the model the phrase “The parties agree to indemnify and hold” – and it will predict that the next token is very likely to be “harmless.” It has seen this pattern thousands of times in its training data. Then it predicts the next token after that, and the next, and the next. Sentence by sentence, paragraph by paragraph, it builds an entire response this way: one token at a time.

This is worth sitting with for a moment - because it explains something that confuses many lawyers about GenAI. Even though LLM-powered chatbots might seem like they are intelligently understanding and responding like a human, all they are doing is next-token prediction, but doing it so well, that it sounds like human speech. It is helpful to consider a thought experiment that was created in the 1980s called the "Chinese room" - imagine a person locked in a room who receives slips of paper with Chinese characters through a slot. They have a giant rulebook that tells them: when you see this sequence of symbols, write that sequence in response. They follow the rules, pass responses back out through the slot, and from the outside, a native Chinese speaker is convinced they are having a fluent conversation. But the person inside understands not a single word of Chinese. They are just manipulating symbols according to rules.

LLM-based AI is similar. Therefore, when it hallucinates and produces a wrong answer, it is not lying, or confused. It is doing exactly what it does: pattern-matching symbols, with no awareness that the output is factually wrong.

So why does it work (for lawyers)?

At this point, you might feel underwhelmed by the thought that GenAI is basically like the autocomplete on your phone. But the difference between the two is one of scale, similar to how a pocket calculator and a supercomputer are both doing a form of calculation. Your phone predicts the next word based on simple patterns and your personal usage history. An LLM predicts the next token based on statistical patterns learned from a vast corpus of human knowledge, across billions of parameters, with the ability to maintain coherence across thousands of words. The principle is the same. The scale transforms what is possible.

Whether what LLMs are doing can be considered “intelligence” is an interesting (and contested) question, but one we don’t need to resolve to make use of them. Consider a practical example: if you want to extract rental amounts across a large dataset of leases, the LLM does not need to understand that rent is a monetary payment from the tenant to the landlord in exchange for being allowed to stay in a flat. It just needs to have learnt, from statistical patterns across millions of documents, which part of the lease contains the rental amount, and return that value to you. That is still an effective way of extracting rental amounts from leases.

At sufficient scale, these models start doing things that surprised even their creators. Nobody programmed GPT-4 to pass a bar exam. Nobody wrote code that says “if asked a legal hypothetical, reason through it step by step.” It is why it is sometimes said that LLMs are grown, not made. These capabilities appeared as a byproduct of training at enormous scale. The mechanism is simple: predict the next token. The outcome is not.

LLM-based AI is extremely powerful, and several LegalTech solutions have emerged leveraging it:

  • Legal AI tools like Harvey, Legora, CoCounsel and Vincent use LLMs to extract and synthesise information from unstructured data, whether they be in contracts, emails, legal submissions, decisions or regulations.

  • Legal research tools like Lexis+AI and Westlaw AI build a GenAI layer on top of their authoritative legal database, enabling search using queries in plain-English, and results that are grounded in their database.

  • Relativity aiR uses LLM-based AI to simulate and accelerate the actions of a human reviewer, finding and describing relevant documents according to plain-English prompts you provide.

  • Wexler uses LLM-based AI to extract, analyse, and verify key factual information across massive case datasets, transforming raw documents into detailed chronologies.

  • The Claude Cowork Legal Plugin is a set of high-quality system prompts and workflow maps layered on top of Claude's language model, providing a structured pathway for processing legal requests with consistent output format.

LegalQuants is a network of tech-savvy lawyers, who build and evaluate LegalTech tools. Many of the tools and prototypes (created as part of LegalQuants hackathons) use a new form of programming called vibe-coding. This allows you to write computer code using instructions in plain-English, which is again made possible by LLM-based AI. Moreover, many of these tools themselves use LLM-based AI in how they work (you can find out more about the hackathons and the tools here).

All this represents a genuine step change in what AI can do for legal practice. But that increase in capability comes with a different risk profile - one that flows directly from the probabilistic nature of these systems. The same quality that makes LLMs flexible and powerful also makes them capable of confidently producing answers that are entirely wrong. Understanding that is as important as understanding the capability - that is where we go next.

What you now know – and what comes next...

Let us take stock of where we are. You now understand that AI is not one thing – it is three concentric circles, each building on the last. The outermost circle, artificial intelligence, includes the rules-based tools you have been using for years. Machine learning, the middle circle, is where systems learn from data rather than following hand-written rules. And GenAI, at the centre, is where systems learn to create new content rather than just classify or predict.

One thing worth clarifying - these three circles are not a timeline for the evolution of LegalTech tools. Rules-based AI did not become obsolete with the advent of ML-based AI, which in turn was not made redundant by LLM-based AI. All three kinds of AI are actively used in legal practice today - often alongside each other in the same platform. Tools like Kira and Luminance, which started with ML-based AI extraction, have incorporated GenAI features alongside their trained models. The question is not which type of AI is better - it is understanding which type of AI is doing what in the tools you are using, and what that means for how much you trust the output.

That question becomes especially relevant for LegalTech tools using GenAI, once you understand that at its core, it is a form of autocomplete – but operating at a scale that produces capabilities nobody explicitly programmed. There is more to explore, especially on how AI can be made more suitable and capable for legal work - how can these models be grounded in authoritative legal context? How good are these models at legal reasoning? What techniques can help mitigate the impact of hallucinations? These are subjects that will be covered by future posts.

However, for now, whether what AI is doing constitutes "intelligence" is a question for philosophers. Whether it works for lawyers is a question for us lawyers – and if you have used any of the tools mentioned in this post, you already know the answer. The secret sauce is pattern-matching at scale. The practical impact is real, and it is already here. And the answer, with important caveats, is an emphatic yes. Those caveats are what Part 2 is about.

A Note for Readers

If you have made it this far, thank you for sticking with me.

This post was heavily inspired by Stephen Wolfram's What Is ChatGPT Doing... and Why Does It Work? - a remarkably lucid and simple piece that opened my eyes to understanding something I was using every day, and sparked a curiosity to learn more. It reminded me of something my mother used to tell me, paraphrasing Feynman: if you cannot explain something simply, you do not understand it well enough. I wanted to apply that standard here, finding a level of abstraction that is accurate without being alienating, especially for my lawyer friends, who may not be willing to go quite as deep into AI as I have.

I welcome your feedback - particularly from anyone who feels that in the process of simplifying, I have oversimplified, or misrepresented what is technically happening. I would rather be corrected than be usefully wrong.

If this has sparked your curiosity and you want to go further, here are the resources I would recommend:

And finally - for those who share my love for etymology and wordplay - the origin story of the word "spam" is more entertaining than you might expect.