vision alignment, manufacturing
Optical positioning system.
244 technical terms and definitions
Optical positioning system.
Vision-language models (VLMs) understand both images and text. Use for image captioning, visual Q&A, diagram analysis.
State space model for vision.
Apply SSM to visual tasks.
Different ViT architectures.
Scaling ViT to billions of parameters and massive datasets.
Navigate environments using language instructions.
Generate text describing images.
Deep integration of vision and language understanding.
Models trained to understand both images and text together.
Plan actions using vision and language.
Training tasks for VL models.
Tasks for learning multimodal representations.
Models that perceive and act.
Understand why things happen in images.
Apply commonsense to visual scenes.
Visual controls make status abnormalities and standards immediately apparent to anyone.
Determine if image entails text.
Determine if image entails text.
Localize objects from language.
Train models to follow visual instructions.
Visual management makes status standards and abnormalities immediately obvious.
Navigate using visual input.
Estimate camera motion from video.
Visual prompting includes images or diagrams to supplement textual instructions.
Use visual elements as prompts.
Answer questions about image content.
Answer complex questions about images.
Test visual understanding and reasoning.
Reason about visual content.
SLAM using camera input.
Visual speech recognition transcribes speech from video of lip movements alone using 3D convolutional or transformer architectures.
Visual speech synthesis generates realistic lip movements synchronized with audio for talking faces.
Generate stories from image sequences.
Generate stories from image sequences.
Visual work instructions use photos diagrams and minimal text for clarity.
Interpret attention as spatial features.
22 billion parameter vision transformer.
Extremely large vision transformer variant.
Vision Transformer (ViT) applies transformers to images. Split into patches, embed, self-attention.
Viterbi algorithm efficiently finds the most likely sequence of hidden states in sequence labeling using dynamic programming.
VITS combines variational inference with adversarial training and normalizing flows for end-to-end text-to-speech in a single model.
ViT for video understanding.
High-throughput LLM serving.
High-throughput LLM serving system using PagedAttention.
vLLM and TGI are fast LLM inference engines with PagedAttention, continuous batching. Much faster than naive serving.
Vendor-Managed Inventory delegates inventory monitoring and replenishment to suppliers reducing buyer's inventory management burden.
Vector Network Analyzer measurements characterize frequency-dependent impedance transmission and reflection in high-speed interconnects.
Volatile organic compound abatement destroys or captures organic vapors from process exhaust.
Choose number of tokens.