Glossary

Term	Definition
API	Application Programming Interface: a way for software to communicate with other software. In the context of AI, APIs allow you to send data to a model and receive results back without running the model yourself.
Batch processing	Running an AI model on many items (images, documents, records) automatically in sequence, rather than processing them one at a time manually.
Evaluation	The systematic process of measuring how well an AI model performs on a task, typically by comparing its outputs against ground truth. Evaluation helps you understand accuracy, failure modes, and whether a model is good enough for your use case.
Embedding	A numerical representation of text, images, or other data as a list of numbers (a vector). Items with similar meanings have similar embeddings, enabling search and clustering.
Fine-tuning	Adapting a pre-trained AI model to perform better on a specific task by training it on additional domain-specific data. For example, fine-tuning an OCR model on a particular typeface or handwriting style.
GLAM	Galleries, Libraries, Archives and Museums. A collective term for cultural heritage institutions that collect, preserve, and provide access to collections.
Ground truth	The known correct answers used to evaluate an AI model’s performance. For example, if you have 50 index cards where a human has already recorded the correct metadata, those human-verified records are the ground truth you compare the model’s outputs against.
Hugging Face	An open platform for sharing and running AI models, datasets, and applications. Used throughout this book as a source for models, datasets, and for running inference via API.
Inference	The process of using a trained AI model to make predictions or generate outputs on new data. For example, running an OCR model on a page image to extract text.
JSON	JavaScript Object Notation: a structured data format using key-value pairs. Widely used for exchanging data between systems and for defining schemas that AI models should follow when generating structured output.
LLM	Large Language Model: an AI model trained on large amounts of text that can understand and generate natural language. Examples include GPT-4, Claude, and Llama.
OCR	Optical Character Recognition: the process of converting images of text (printed or handwritten) into machine-readable text. Modern OCR increasingly uses AI models rather than traditional rule-based approaches.
Prompt	The instructions or input text given to an AI model to guide its output. Prompt design (or “prompt engineering”) involves crafting instructions that reliably produce the desired results.
Pydantic	A Python library for data validation using type annotations. In the context of AI, Pydantic models define the expected structure (schema) of outputs, ensuring the AI returns data in the correct format.
Quantization	A technique for reducing the size of AI models by using lower-precision numbers to represent model weights. This makes models faster and cheaper to run with minimal impact on quality. Common formats include Q4 (4-bit) and Q8 (8-bit).
Schema	A formal definition of the structure and types of data. In structured generation, a schema specifies the fields, types, and constraints that an AI model’s output must conform to.
Structured generation	A technique for constraining AI model outputs to follow a specific format or schema, such as JSON with defined fields. This ensures reliable, machine-readable outputs rather than free-form text.
Token	The basic unit that language models process. A token is roughly a word or word fragment. Model costs and speed are often measured in tokens per second or price per token.
VLM	Vision Language Model: an AI model that can process both images and text together. VLMs can describe images, answer questions about visual content, and extract structured information from documents. Examples include Qwen-VL and GPT-4V.