← Back to AI Factory Chat

AI Factory Glossary

244 technical terms and definitions

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z All
Showing page 4 of 5 (244 entries)

vision alignment, manufacturing

Optical positioning system.

vision language,vlm,image text

Vision-language models (VLMs) understand both images and text. Use for image captioning, visual Q&A, diagram analysis.

vision mamba, computer vision

State space model for vision.

vision state space models, computer vision

Apply SSM to visual tasks.

vision transformer variants,computer vision

Different ViT architectures.

vision transformers scaling, computer vision

Scaling ViT to billions of parameters and massive datasets.

vision-and-language navigation,robotics

Navigate environments using language instructions.

vision-language generation,multimodal ai

Generate text describing images.

vision-language models advanced, multimodal ai

Deep integration of vision and language understanding.

vision-language models,multimodal ai

Models trained to understand both images and text together.

vision-language planning,robotics

Plan actions using vision and language.

vision-language pre-training objectives, multimodal ai

Training tasks for VL models.

vision-language pre-training objectives,multimodal ai

Tasks for learning multimodal representations.

vision-language-action models,robotics

Models that perceive and act.

visual commonsense reasoning (vcr),visual commonsense reasoning,vcr,evaluation

Understand why things happen in images.

visual commonsense reasoning, multimodal ai

Apply commonsense to visual scenes.

visual controls, manufacturing operations

Visual controls make status abnormalities and standards immediately apparent to anyone.

visual entailment, multimodal ai

Determine if image entails text.

visual entailment,evaluation

Determine if image entails text.

visual grounding, multimodal ai

Localize objects from language.

visual instruction tuning,multimodal ai

Train models to follow visual instructions.

visual management, quality & reliability

Visual management makes status standards and abnormalities immediately obvious.

visual navigation,robotics

Navigate using visual input.

visual odometry, 3d vision

Estimate camera motion from video.

visual prompting, prompting techniques

Visual prompting includes images or diagrams to supplement textual instructions.

visual prompting,multimodal ai

Use visual elements as prompts.

visual question answering (vqa),visual question answering,vqa,multimodal ai

Answer questions about image content.

visual question answering advanced, multimodal ai

Answer complex questions about images.

visual reasoning benchmarks,evaluation

Test visual understanding and reasoning.

visual reasoning, multimodal ai

Reason about visual content.

visual slam, robotics

SLAM using camera input.

visual speech recognition, audio & speech

Visual speech recognition transcribes speech from video of lip movements alone using 3D convolutional or transformer architectures.

visual speech synthesis, audio & speech

Visual speech synthesis generates realistic lip movements synchronized with audio for talking faces.

visual storytelling, multimodal ai

Generate stories from image sequences.

visual storytelling,multimodal ai

Generate stories from image sequences.

visual work instruction, quality & reliability

Visual work instructions use photos diagrams and minimal text for clarity.

vit feature maps, computer vision

Interpret attention as spatial features.

vit-22b, computer vision

22 billion parameter vision transformer.

vit-giant, computer vision

Extremely large vision transformer variant.

vit,vision transformer,patch

Vision Transformer (ViT) applies transformers to images. Split into patches, embed, self-attention.

viterbi algorithm, structured prediction

Viterbi algorithm efficiently finds the most likely sequence of hidden states in sequence labeling using dynamic programming.

vits, vits, audio & speech

VITS combines variational inference with adversarial training and normalizing flows for end-to-end text-to-speech in a single model.

vivit, video understanding

ViT for video understanding.

vllm serving system, inference

High-throughput LLM serving.

vllm,deployment

High-throughput LLM serving system using PagedAttention.

vllm,tgi,inference engine

vLLM and TGI are fast LLM inference engines with PagedAttention, continuous batching. Much faster than naive serving.

vmi, vmi, supply chain & logistics

Vendor-Managed Inventory delegates inventory monitoring and replenishment to suppliers reducing buyer's inventory management burden.

vna measurement, vna, signal & power integrity

Vector Network Analyzer measurements characterize frequency-dependent impedance transmission and reflection in high-speed interconnects.

voc abatement, voc, environmental & sustainability

Volatile organic compound abatement destroys or captures organic vapors from process exhaust.

vocabulary size selection, nlp

Choose number of tokens.