πGlossary
This resource is designed to help you understand the world of AI, chatbots, and large language models. It provides clear explanations of key terms and concepts used in these fields.
1. Aritificial Intelligence
Artificial Intelligence (AI) refers to the simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions), and self-correction. AI is used in various applications, including expert systems, natural language processing, speech recognition, and machine vision.
2. Large Language Models (LLM)
LLMs are advanced AI systems designed to understand and generate human-like text based on vast amounts of data. Blockbrain is a model-agnostic AI tool, ensuring we always offer the best AI models available within a GDPR-compliant environment in the EU, irrespective of the model provider. You can select from the leading models by OpenAI, Anthropic, Meta, Mistral, etc.
Training a Large Language Model (LLM) involves using large datasets to teach the model the probabilities of words occurring together. During this phase, the model learns to predict the next word in a sequence by understanding semantic connections between words. Once the training is complete, the model cannot learn new information, and this is known as the "knowledge cutoff date"βthe point up to which the model has been trained.
3. Chatbot
A chatbot is a program that uses artificial intelligence (AI) to conduct human-like conversations. It can answer questions, perform tasks, and provide information. These chatbots leverage natural language processing (NLP) to understand and respond to user inputs in a coherent and contextually appropriate manner. They are commonly used in customer service, virtual assistants, and various online platforms to enhance user interaction and automate routine tasks.
4. Natural Language Processing (NLP)
NLP is the technology that enables chatbots to understand and process human language. It helps the chatbot recognize the meaning behind your words, allowing it to respond accurately and contextually. NLP involves various techniques such as tokenization, sentiment analysis, and entity recognition, which collectively enhance the chatbot's ability to interpret and generate human-like responses.
5. Machine Learning
Machine Learning is a method by which computers learn from data and improve their performance without being explicitly programmed. Chatbots utilize this technique to enhance their responses based on previous interactions. This involves algorithms that identify patterns and make predictions, allowing chatbots to become more accurate and efficient over time. By continuously analyzing user inputs and feedback, machine learning enables chatbots to adapt and provide more relevant and personalized responses.
6. Intent und Entity
Intent: The purpose or goal behind a user input. Intents represent what the user wants to achieve or inquire about. For example, in the query "What is the weather?", the intent is to obtain a weather forecast. Identifying the intent helps the chatbot understand the user's request and provide a relevant response.
Entity: Specific pieces of information extracted from a user input that provide context and details necessary to fulfill the intent. Entities are often nouns or proper nouns that give additional information about the user's request. For instance, in the query "What is the weather in Berlin?", "Berlin" is the entity. Recognizing entities allows the chatbot to tailor its response more precisely to the user's needs.
7. Hallucination
In the context of Large Language Models (LLMs), hallucination refers to the generation of information or responses that are not based on the input data or reality. This occurs when the model predicts text that seems plausible but is not factually accurate or relevant to the given context. Hallucinations can lead to misleading or incorrect outputs, highlighting the importance of verifying AI-generated content for accuracy and reliability.
8. Tokenization
Definition: Tokenization is the process of breaking down a string of text into smaller units called tokens. These tokens can be words, phrases, or even characters, depending on the level of granularity required.
Purpose in Chatbots:
Text Processing: Tokenization is a fundamental step in natural language processing (NLP) that allows chatbots to understand and analyze user inputs.
Context Understanding: By breaking down sentences into tokens, chatbots can better understand the context and meaning of each word or phrase.
Feature Extraction: Tokens serve as features that chatbots use to identify intents and entities within a conversation.
Example:
Input: "What's the weather in Berlin?"
Tokens: ["What", "'s", "the", "weather", "in", "Berlin", "?"]
Benefits:
Improved Accuracy: Helps in accurately interpreting user queries by focusing on individual components.
Enhanced Performance: Facilitates more efficient processing and response generation
9. Context Window
The context window refers to the maximum amount of text that a language model can process in a single message it receives. It includes the user's prompt, previous chat history, attached documents, and any assistant instructions. This window determines how much information the model can consider when generating a response, ensuring that it can provide relevant and coherent answers based on the given context.
10. Sentiment Analysis
Sentiment Analysis is a technique used in natural language processing (NLP) to determine the emotional tone behind a body of text. It involves analyzing text data to identify and categorize opinions expressed as positive, negative, or neutral. This process helps in understanding the sentiment of the user, which can be crucial for applications like customer feedback analysis, social media monitoring, and enhancing user interactions in chatbots.
11. Prompting
Prompting refers to the method of providing input or instructions to a language model to elicit a desired response. It involves crafting specific queries or statements that guide the model in generating relevant and accurate outputs. Effective prompting is crucial for maximizing the utility of AI models, as it helps in obtaining precise and contextually appropriate responses.
12. AI Agents
An AI Agent is a software entity designed to autonomously perform tasks, often repetitive, by following predefined rules or learned patterns. These agents can process information, make decisions, and execute actions, making them ideal for automating routine tasks and enhancing efficiency in various applications, such as virtual assistants and automated customer service.
13. Workflows
A workflow is a series of tasks organized to achieve a specific goal, often involving automated systems or agents performing tasks in sequence. Each agent's output becomes the input for the next, ensuring efficiency and consistency. This setup automates repetitive tasks, manages dependencies, and allows for scalable operations.
14. Chunk Size
This parameter determines the amount of text, in characters, that the Al processes in a single chunk from the documents uploaded. A smaller chunk size focuses the Al's analysis on a more narrow segment of text, leading to more precise and directly relevant re-sponses. Conversely, a larger chunk size allows the Al to consider a broader range of informa-tion, potentially capturing more context but also possibly diluting the precision of insights with less directly relevant information.
Recommended Usage: For inquiries that demand high precision and direct answers, opt for smaller sizes (e.g., 1000-2000 characters). For questions that benefit from a wider exploration of the text or when the context is crucial for un-derstanding, larger sizes (e.g., 3000-4000 char-acters) may be more effective. It's essential to balance the need for detailed information against the risk of including too much peripheral content.
15. Chunk Overlap
This parameter controls the amount of text, in characters, that overlaps between consecutive chunks. An increased overlap ensures better continuity and context retention across chunks, improving the cohesiveness of insights generated by the Al. A lower overlap may result in more disjointed analysis but can increase processing efficiency by reducing re-dundancy.
Recommended Usage: Smaller overlap values (e.g., 100-200 characters) are typically sufficient for general queries and help to maintain processing speed. For complex analyses where context is critical (especially in nuanced or technical documents) higher overlap values (e.g. 300-500 characters) may provide more accurate and contextually rich responses. Adjust this setting based on the need for context continuity versus processing speed.
Important: Chunk Size should be greater than Chunk Overlap
16. Smart Table Processing
Enable this to automatically detect and convert tables in uploaded PDFs into a structured text format that is readable for LLMs. Note: This will incur additional costs and processing time.
17. Smart Image Processing
Enable this to automatically detect and convert images in uploaded PDFs into a structured text format that is readable for LLMs. Note: This will incur additional costs and processing time.
18. Image Extraction
Enable this feature to extract images from uploaded PDFs. This will allow the bot to retrieve and display the images, enhancing the responses.
19. Search Method - Index Search
Definition: Traditional full-text search that provides precise matching for straightforward queries.
How it works:
Creates an index of all words in a document or database
Searches for exact word matches
Advantages:
Fast and efficient for simple search queries
Very precise when searching for specific terms or phrases
Disadvantages:
May miss synonyms or related concepts
Less effective for complex or ambiguous queries
20. Search Method - AI Search
Definition: A semantic search capability that understands context and meaning beyond exact word matches.
How it works:
Utilizes artificial intelligence and machine learning
Analyzes the context and intent behind a search query
Considers synonyms, related concepts, and nuances
Advantages:
Particularly effective for complex and nuanced queries
Can better understand and interpret user intent
Often delivers more relevant results for unclear search queries
Disadvantages:
May be less precise for very specific or technical searches
Requires more computational power and can be slower than Index Search
21. Search Method - Hybrid Search
Definition: A sophisticated combination of full-text and semantic search, offering the best of both worlds.
How it works:
Combines the precision of index search with the contextual understanding of AI search
Uses algorithms to decide which search method is best suited for a particular query
Advantages:
Provides a balanced mix of precise matching and contextual understanding
Ideal for diverse use cases and different types of search queries
Can deliver good results for both simple and complex searches
Application areas:
E-commerce platforms
Digital libraries and archival services
Enterprise search systems with diverse content
Last updated