A Primer on AI for Architects with Anthony Alford
Machine Learning Concepts
- Architects need to understand AI and machine learning concepts to have intelligent conversations with their co-workers.
- Most people are referring to machine learning, specifically deep learning or neural networks, when they talk about AI.
Machine Learning Models
- Software developers can think of machine learning models as functions that take complex inputs, such as images or audio, and produce complex outputs, such as transcripts or summaries.
- Tensors are multi-dimensional arrays used in machine learning models.
- Machine learning models are trained using a process called supervised learning, which involves providing the model with inputs and expected outputs, similar to unit tests in software development.
Language Models
- Language models, such as ChatGPT, are trained on vast amounts of text data to predict the probability of a word occurring next in a sequence.
- Large language models (LLMs) are characterized by having tens or hundreds of billions of parameters.
- Hugging Face is a platform similar to GitHub, hosting and providing access to LLMs, including smaller models that can run on personal laptops.
- LLMs utilize "tokens," which are units of text smaller than words, allowing them to generate novel words and phrases not found in a standard vocabulary.
- Tokenization is a process that breaks down text into smaller units, typically larger than a character but smaller than a word.
- Large language models (LLMs) like ChatGPT and OpenAI's API use tokens to measure usage and billing.
- The "T" in GPT stands for Transformer, a neural network architecture that utilizes an "attention" mechanism to process and generate text.
Large Language Model Access
- There are publicly available, commercial large language models (LLMs) such as GPT-4, ChatGPT, Claude, Google's Gemini, and offerings from AWS. These can be accessed through web-based APIs and integrated using SDKs.
- While using commercial LLMs can be cost-effective in the short term due to a pay-per-token model, long-term cost and privacy concerns may arise.
- Open-source LLMs offer an alternative to commercial options, allowing for in-house implementation and greater control over data privacy.
Large Language Model Characteristics
- Large language models (LLMs) are non-deterministic, meaning the output for a given input is not always predictable.
Retrieval Augmented Generation (RAG)
- Retrieval augmented generation (RAG) is a technique that can improve the quality of LLM results by providing the model with relevant context from a knowledge base.
- RAG works by converting documents and user queries into vectors, then finding the closest matching vectors to provide context to the LLM.
Transfer Learning and Fine-tuning
- Transfer learning is a machine learning technique used to pre-train a model for general purposes and then fine-tune it for specific tasks.
- Fine-tuning involves restarting the training process with a smaller dataset specific to the desired outcome, allowing for adjustments to the model's responses.
Vector Databases
- Vector databases, often used in semantic or neural search, employ nearest neighbor search algorithms to efficiently find vectors similar to a given input vector, enabling the retrieval of related content based on meaning.
Large Language Model Applications
- LLMs can be understood as tools for addressing natural language processing tasks, such as named entity recognition and parts of speech recognition.
- LLMs are versatile and can be fine-tuned for specific use cases or chosen based on their proximity to a desired application, offering advantages in cost, quality, and speed.