- 1.71.1 (latest)
- 1.71.0
- 1.70.0
- 1.69.0
- 1.68.0
- 1.67.1
- 1.66.0
- 1.65.0
- 1.63.0
- 1.62.0
- 1.60.0
- 1.59.0
- 1.58.0
- 1.57.0
- 1.56.0
- 1.55.0
- 1.54.1
- 1.53.0
- 1.52.0
- 1.51.0
- 1.50.0
- 1.49.0
- 1.48.0
- 1.47.0
- 1.46.0
- 1.45.0
- 1.44.0
- 1.43.0
- 1.39.0
- 1.38.1
- 1.37.0
- 1.36.4
- 1.35.0
- 1.34.0
- 1.33.1
- 1.32.0
- 1.31.1
- 1.30.1
- 1.29.0
- 1.28.1
- 1.27.1
- 1.26.1
- 1.25.0
- 1.24.1
- 1.23.0
- 1.22.1
- 1.21.0
- 1.20.0
- 1.19.1
- 1.18.3
- 1.17.1
- 1.16.1
- 1.15.1
- 1.14.0
- 1.13.1
- 1.12.1
- 1.11.0
- 1.10.0
- 1.9.0
- 1.8.1
- 1.7.1
- 1.6.2
- 1.5.0
- 1.4.3
- 1.3.0
- 1.2.0
- 1.1.1
- 1.0.1
- 0.9.0
- 0.8.0
- 0.7.1
- 0.6.0
- 0.5.1
- 0.4.0
- 0.3.1
API documentation for vision_models
package.
Classes
Image
Image.
ImageCaptioningModel
Generates captions from image.
Examples::
model = ImageCaptioningModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
captions = model.get_captions(
image=image,
# Optional:
number_of_results=1,
language="en",
)
ImageQnAModel
Answers questions about an image.
Examples::
model = ImageQnAModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
answers = model.ask_question(
image=image,
question="What color is the car in this image?",
# Optional:
number_of_results=1,
)
ImageTextModel
Generates text from images.
Examples::
model = ImageTextModel.from_pretrained("imagetext@001")
image = Image.load_from_file("image.png")
captions = model.get_captions(
image=image,
# Optional:
number_of_results=1,
language="en",
)
answers = model.ask_question(
image=image,
question="What color is the car in this image?",
# Optional:
number_of_results=1,
)
MultiModalEmbeddingModel
Generates embedding vectors from images and videos.
Examples::
model = MultiModalEmbeddingModel.from_pretrained("multimodalembedding@001")
image = Image.load_from_file("image.png")
video = Video.load_from_file("video.mp4")
embeddings = model.get_embeddings(
image=image,
video=video,
contextual_text="Hello world",
)
image_embedding = embeddings.image_embedding
video_embeddings = embeddings.video_embeddings
text_embedding = embeddings.text_embedding
MultiModalEmbeddingResponse
The multimodal embedding response.
Video
Video.
VideoEmbedding
Embeddings generated from video with offset times.
VideoSegmentConfig
The specific video segments (in seconds) the embeddings are generated for.