Class BlobAccessor (2.3.0)

BlobAccessor(*args, **kwargs)

Blob functions for Series and Index.

Properties

session

API documentation for session property.

Methods

authorizer

authorizer() -> bigframes.series.Series

Authorizers of the Blob.

Returns
Type Description
bigframes.series.Series Autorithers(connection) as string.

content_type

content_type() -> bigframes.series.Series

Retrieve the content type of the Blob.

Returns
Type Description
bigframes.series.Series string of the content type.

display

display(
    n: int = 3,
    *,
    content_type: str = "",
    width: typing.Optional[int] = None,
    height: typing.Optional[int] = None
)

Display the blob content in the IPython Notebook environment. Only works for image type now.

Parameters
Name Description
n int, default 3

number of sample blob objects to display.

content_type str, default ""

content type of the blob. If unset, use the blob metadata of the storage. Possible values are "image", "audio" and "video".

width int or None, default None

width in pixels that the image/video are constrained to. If unset, use the global setting in bigframes.options.experiments.blob_display_width, otherwise image/video's original size or ratio is used. No-op for other content types.

height int or None, default None

height in pixels that the image/video are constrained to. If unset, use the global setting in bigframes.options.experiments.blob_display_height, otherwise image/video's original size or ratio is used. No-op for other content types.

get_runtime_json_str

get_runtime_json_str(
    mode: str = "R", *, with_metadata: bool = False
) -> bigframes.series.Series

Get the runtime (contains signed URL to access gcs data) and apply the ToJSONSTring transformation.

Parameters
Name Description
mode str or str, default "R"

the mode for accessing the runtime. Default to "R". Possible values are "R" (read-only) and "RW" (read-write)

with_metadata bool, default False

whether to include metadata in the JSON string. Default to False.

Returns
Type Description
str the runtime object in the JSON string.

image_blur

image_blur(
    ksize: tuple[int, int],
    *,
    dst: typing.Optional[typing.Union[str, bigframes.series.Series]] = None,
    connection: typing.Optional[str] = None,
    max_batching_rows: int = 8192,
    container_cpu: typing.Union[float, int] = 0.33,
    container_memory: str = "512Mi"
) -> bigframes.series.Series

Blurs images.

Parameters
Name Description
ksize tuple(int, int)

Kernel size.

dst str or bigframes.series.Series or None, default None

Output destination. Can be one of: str: GCS folder str. The output filenames are the same as the input files. blob Series: The output file paths are determined by the uris of the blob Series. None: Output to BQ as bytes. Encoding is determined by the extension of the output filenames (or input filenames if doesn't have output filenames). If filename doesn't have an extension, use ".jpeg" for encoding.

connection str or None, default None

BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session.

max_batching_rows int, default 8,192

Max number of rows per batch send to cloud run to execute the function.

container_cpu int or float, default 0.33

number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers.

container_memory str, default "512Mi"

container memory size. String of the format

Returns
Type Description
bigframes.series.Series blob Series if destination is GCS. Or bytes Series if destination is BQ.

image_normalize

image_normalize(
    *,
    alpha: float = 1.0,
    beta: float = 0.0,
    norm_type: str = "l2",
    dst: typing.Optional[typing.Union[str, bigframes.series.Series]] = None,
    connection: typing.Optional[str] = None,
    max_batching_rows: int = 8192,
    container_cpu: typing.Union[float, int] = 0.33,
    container_memory: str = "512Mi"
) -> bigframes.series.Series

Normalize images.

Parameters
Name Description
alpha float, default 1.0

Norm value to normalize to or the lower range boundary in case of the range normalization.

beta float, default 0.0

Upper range boundary in case of the range normalization; it is not used for the norm normalization.

norm_type str, default "l2"

Normalization type. Accepted values are "inf", "l1", "l2" and "minmax".

dst str or bigframes.series.Series or None, default None

Output destination. Can be one of: str: GCS folder str. The output filenames are the same as the input files. blob Series: The output file paths are determined by the uris of the blob Series. None: Output to BQ as bytes. Encoding is determined by the extension of the output filenames (or input filenames if doesn't have output filenames). If filename doesn't have an extension, use ".jpeg" for encoding.

connection str or None, default None

BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session.

max_batching_rows int, default 8,192

Max number of rows per batch send to cloud run to execute the function.

container_cpu int or float, default 0.33

number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers.

container_memory str, default "512Mi"

container memory size. String of the format

Returns
Type Description
bigframes.series.Series blob Series if destination is GCS. Or bytes Series if destination is BQ.

image_resize

image_resize(
    dsize: tuple[int, int] = (0, 0),
    *,
    fx: float = 0.0,
    fy: float = 0.0,
    dst: typing.Optional[typing.Union[str, bigframes.series.Series]] = None,
    connection: typing.Optional[str] = None,
    max_batching_rows: int = 8192,
    container_cpu: typing.Union[float, int] = 0.33,
    container_memory: str = "512Mi"
)

Resize images.

Parameters
Name Description
dsize tuple(int, int), default (0, 0)

Destination size. If set to 0, fx and fy parameters determine the size.

fx float, default 0.0

scale factor along the horizontal axis. If set to 0.0, dsize parameter determines the output size.

fy float, defalut 0.0

scale factor along the vertical axis. If set to 0.0, dsize parameter determines the output size.

dst str or bigframes.series.Series or None, default None

Output destination. Can be one of: str: GCS folder str. The output filenames are the same as the input files. blob Series: The output file paths are determined by the uris of the blob Series. None: Output to BQ as bytes. Encoding is determined by the extension of the output filenames (or input filenames if doesn't have output filenames). If filename doesn't have an extension, use ".jpeg" for encoding.

connection str or None, default None

BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session.

max_batching_rows int, default 8,192

Max number of rows per batch send to cloud run to execute the function.

container_cpu int or float, default 0.33

number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers.

container_memory str, default "512Mi"

container memory size. String of the format

Returns
Type Description
bigframes.series.Series blob Series if destination is GCS. Or bytes Series if destination is BQ.

md5_hash

md5_hash() -> bigframes.series.Series

Retrieve the md5 hash of the Blob.

Returns
Type Description
bigframes.series.Series string of the md5 hash.

metadata

metadata() -> bigframes.series.Series

Retrieve the metadata of the Blob.

Returns
Type Description
bigframes.series.Series JSON metadata of the Blob. Contains fields: content_type, md5_hash, size and updated(time).

pdf_chunk

pdf_chunk(
    *,
    connection: typing.Optional[str] = None,
    chunk_size: int = 2000,
    overlap_size: int = 200,
    max_batching_rows: int = 1,
    container_cpu: typing.Union[float, int] = 2,
    container_memory: str = "1Gi",
    verbose: bool = False
) -> bigframes.series.Series

Extracts and chunks text from PDF URLs and saves the text as arrays of strings.

Parameters
Name Description
connection str or None, default None

BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session.

chunk_size int, default 2000

the desired size of each text chunk (number of characters).

overlap_size int, default 200

the number of overlapping characters between consective chunks. The helps to ensure context is perserved across chunk boundaries.

max_batching_rows int, default 1

Max number of rows per batch send to cloud run to execute the function.

container_cpu int or float, default 2

number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers.

container_memory str, default "1Gi"

container memory size. String of the format

verbose bool, default "False"

controls the verbosity of the output. When set to True, both error messages and the extracted content are displayed. Conversely, when set to False, only the extracted content is presented, suppressing error messages.

Returns
Type Description
bigframe.series.Series array[str] or struct[str, array[str]], depend on the "verbose" parameter. where each string is a chunk of text extracted from PDF. Includes error messages if verbosity is enabled.

pdf_extract

pdf_extract(
    *,
    connection: typing.Optional[str] = None,
    max_batching_rows: int = 1,
    container_cpu: typing.Union[float, int] = 2,
    container_memory: str = "1Gi",
    verbose: bool = False
) -> bigframes.series.Series

Extracts text from PDF URLs and saves the text as string.

Parameters
Name Description
connection str or None, default None

BQ connection used for function internet transactions, and the output blob if "dst" is str. If None, uses default connection of the session.

max_batching_rows int, default 1

Max number of rows per batch send to cloud run to execute the function.

container_cpu int or float, default 2

number of container CPUs. Possible values are [0.33, 8]. Floats larger than 1 are cast to intergers.

container_memory str, default "1Gi"

container memory size. String of the format

verbose bool, default "False"

controls the verbosity of the output. When set to True, both error messages and the extracted content are displayed. Conversely, when set to False, only the extracted content is presented, suppressing error messages.

Returns
Type Description
bigframes.series.Series str or struct[str, str], depend on the "verbose" parameter. Contains the extracted text from the PDF file. Includes error messages if verbosity is enabled.

read_url

read_url() -> bigframes.series.Series

Retrieve the read URL of the Blob.

Returns
Type Description
bigframes.series.Series Read only URLs.

size

size() -> bigframes.series.Series

Retrieve the file size of the Blob.

Returns
Type Description
bigframes.series.Series file size in bytes.

updated

updated() -> bigframes.series.Series

Retrieve the updated time of the Blob.

Returns
Type Description
bigframes.series.Series updated time as UTC datetime.

uri

uri() -> bigframes.series.Series

URIs of the Blob.

Returns
Type Description
bigframes.series.Series URIs as string.

version

version() -> bigframes.series.Series

Versions of the Blob.

Returns
Type Description
bigframes.series.Series Version as string.

write_url

write_url() -> bigframes.series.Series

Retrieve the write URL of the Blob.

Returns
Type Description
bigframes.series.Series Writable URLs.