vision-language pre-training objectives, multimodal ai
Training tasks for VL models.
77 technical terms and definitions
Training tasks for VL models.
Tasks for learning multimodal representations.
Models that perceive and act.
Apply commonsense to visual scenes.
Determine if image entails text.
Determine if image entails text.
Localize objects from language.
Train models to follow visual instructions.
Use visual elements as prompts.
Answer questions about image content.
Answer complex questions about images.
Reason about visual content.
Generate stories from image sequences.
Generate stories from image sequences.
Vision Transformer (ViT) applies transformers to images. Split into patches, embed, self-attention.
High-throughput LLM serving.
High-throughput LLM serving system using PagedAttention.
vLLM and TGI are fast LLM inference engines with PagedAttention, continuous batching. Much faster than naive serving.
Vendor-Managed Inventory delegates inventory monitoring and replenishment to suppliers reducing buyer's inventory management burden.
Volatile organic compound abatement destroys or captures organic vapors from process exhaust.
Voltage contrast imaging in SEM reveals floating or improperly biased conductors through secondary electron emission differences.
Volume rendering integrates color and density along rays producing realistic images from volumetric data.
VQ-Diffusion for audio generates discrete tokens through denoising diffusion in codebook space.
VQ-VAE-2 uses hierarchical vector quantization for high-fidelity image generation.
VQGAN combines vector quantization with adversarial training for high-quality image synthesis.
Variational Recurrent Neural Network combines VAE with RNN for learning probabilistic latent dynamics.
Find security flaws.