LMQL: Programming Large Language Models

SRIlab @ ETH Zürich, Switzerland

LMQL is a query language for programming (large) language models.

<%SAMPLES_LIST%>
<%SAMPLES%>
Learn more about LMQL by hovering over underlined parts of the code.

Abstract

Large language models have demonstrated outstanding performance on a wide range of tasks such as question answering and code generation. On a high level, given an input, a language model can be used to automatically complete the sequence in a statistically-likely way. Based on this, users prompt these models with language instructions or examples, to implement a variety of downstream tasks. Advanced prompting methods can even imply interaction between the language model, a user, and external tools such as calculators. However, to obtain state-of-the-art performance or adapt language models for specific tasks, complex task- and model-specific programs have to be implemented, which may still require ad-hoc interaction.

Based on this, we present the novel idea of Language Model Programming (LMP). LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting. Additionally, LMP allows constraints to be specified over the language model output. This enables easy adaption to many tasks, while abstracting language model internals and providing high-level semantics.

To enable LMP, we implement LMQL (short for Language Model Query Language), which leverages the constraints and control flow from an LMP prompt to generate an efficient inference procedure that minimizes the number of expensive calls to the underlying language model.

We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way, especially facilitating interactive flows that are challenging to implement with existing high-level APIs. Our evaluation shows that we retain or increase the accuracy on several downstream tasks, while also significantly reducing the required amount of computation or cost in the case of pay-to-use APIs.

Experimental Results

Compared to standard decoding using 🤗 Transformers' generate() function, LMQL allows for high-level control and requires less tokens to be processed.
Chain-Of-Though reasoning with LMQL vs. standard decoding.
Query statistics of using LMQL for interactive language model querying vs. standard decoding.
*We estimate cost savings based on the current token price of $0.02/1K tokens of the GPT-3 davinci model.

Visual Debugger

LMQL also includes a Playground IDE for query development. This enables users to inspect the interpreter, validation result and model state at any point during generation, e.g. to inspect the different hypotheses explored during beam search.

Validation and Token Masks

To enable fast validation and constrained decoding, LMQL implements novel, partial evaluation semantics. Given a set of high-level constraints, the language runtime automatically derives token-level prediction masks and validates the produced sequence eagerly, i.e. as soon as the provided validation condition is definitively violated, decoding stops or is redirected to a different branch. This framework can easily be extended by external tools such as parsers.

Frontend/Backend separation

LMQL provides a high-level frontend to interact with language models, making query code portable and model-agnostic. This is achieved by abstracting over model-specific implementation details like batching, decoding and tokenization.

The actual language model runs out-of-process or even remotely, allowing for easy development and quick prototyping.

BibTeX


@article{beurer2022prompting,
  title={Prompting Is Programming: A Query Language For Large Language Models},
  author={Beurer-Kellner, Luca and Fischer, Marc and Vechev, Martin},
  journal={arXiv preprint arXiv:2212.06094},
  year={2022}
}