- 1.21.0 (latest)
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
Series(*args, **kwargs)
N-dimensional analogue of DataFrame. Store multi-dimensional in a size-mutable, labeled data structure
Properties
T
Return the transpose, which is by definition self.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['Ant', 'Bear', 'Cow'])
>>> s
0 Ant
1 Bear
2 Cow
dtype: string
>>> s.T
0 Ant
1 Bear
2 Cow
dtype: string
at
Access a single value for a row/column label pair.
dt
Accessor object for datetime-like properties of the Series values.
Returns | |
---|---|
Type | Description |
bigframes.operations.datetimes.DatetimeMethods | An accessor containing datetime methods. |
dtype
Return the dtype object of the underlying data.
dtypes
Return the dtype object of the underlying data.
empty
Indicates whether Series/DataFrame is empty.
True if Series/DataFrame is entirely empty (no items), meaning any of the axes are of length 0.
Returns | |
---|---|
Type | Description |
bool | If Series/DataFrame is empty, return True, if not return False. |
iat
Access a single value for a row/column pair by integer position.
iloc
Purely integer-location based indexing for selection by position.
index
The index (axis labels) of the Series.
The index of a Series is used to label and identify each element of the underlying data. The index can be thought of as an immutable ordered set (technically a multi-set, as it may contain duplicate labels), and is used to index and align data.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can access the index of a Series via index
property.
>>> df = bpd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
... 'Age': [25, 30, 35],
... 'Location': ['Seattle', 'New York', 'Kona']},
... index=([10, 20, 30]))
>>> s = df["Age"]
>>> s
10 25
20 30
30 35
Name: Age, dtype: Int64
>>> s.index # doctest: +ELLIPSIS
Index([10, 20, 30], dtype='Int64')
>>> s.index.values
array([10, 20, 30], dtype=object)
Let's try setting a multi-index case reflect via index
property.
>>> df1 = df.set_index(["Name", "Location"])
>>> s1 = df1["Age"]
>>> s1
Name Location
Alice Seattle 25
Bob New York 30
Aritra Kona 35
Name: Age, dtype: Int64
>>> s1.index # doctest: +ELLIPSIS
MultiIndex([( 'Alice', 'Seattle'),
( 'Bob', 'New York'),
('Aritra', 'Kona')],
name='Name')
>>> s1.index.values
array([('Alice', 'Seattle'), ('Bob', 'New York'), ('Aritra', 'Kona')],
dtype=object)
Returns | |
---|---|
Type | Description |
Index | The index object of the Series. |
is_monotonic_decreasing
Return boolean if values in the object are monotonically decreasing.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([3, 2, 2, 1])
>>> s.is_monotonic_decreasing
True
>>> s = bpd.Series([1, 2, 3])
>>> s.is_monotonic_decreasing
False
Returns | |
---|---|
Type | Description |
bool | Boolean. |
is_monotonic_increasing
Return boolean if values in the object are monotonically increasing.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([1, 2, 2])
>>> s.is_monotonic_increasing
True
>>> s = bpd.Series([3, 2, 1])
>>> s.is_monotonic_increasing
False
Returns | |
---|---|
Type | Description |
bool | Boolean. |
loc
Access a group of rows and columns by label(s) or a boolean array.
.loc[]
is primarily label based, but may also be used with a
boolean array.
Allowed inputs are:
- A single label, e.g.
5
or'a'
, (note that5
is interpreted as a label of the index, and never as an integer position along the index). - A list of labels, e.g.
['a', 'b', 'c']
. - A boolean series of the same length as the axis being sliced,
e.g.
[True, False, True]
. - An alignable Index. The index of the returned selection will be the input.
- Not supported yet An alignable boolean Series. The index of the key will be aligned before masking.
- Not supported yet A slice object with labels, e.g.
'a':'f'
. Note: contrary to usual python slices, both the start and the stop are included. - Not supported yet A
callable
function with one argument (the calling Series or DataFrame) that returns valid output for indexing (one of the above).
Exceptions | |
---|---|
Type | Description |
NotImplementError | if the inputs are not supported. |
name
Return the name of the Series.
The name of a Series becomes its index or column name if it is used to form a DataFrame. It is also used whenever displaying the Series using the interpreter.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
For a Series:
>>> s = bpd.Series([1, 2, 3], dtype="Int64", name='Numbers')
>>> s
0 1
1 2
2 3
Name: Numbers, dtype: Int64
>>> s.name
'Numbers'
If the Series is part of a DataFrame:
>>> df = bpd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
>>> df
col1 col2
0 1 3
1 2 4
<BLANKLINE>
[2 rows x 2 columns]
>>> s = df["col1"]
>>> s.name
'col1'
Returns | |
---|---|
Type | Description |
hashable object | The name of the Series, also the column name if part of a DataFrame. |
ndim
Return an int representing the number of axes / array dimensions.
Returns | |
---|---|
Type | Description |
int | Return 1 if Series. Otherwise return 2 if DataFrame. |
plot
Make plots of Series.
Returns | |
---|---|
Type | Description |
bigframes.operations.plotting.PlotAccessor | An accessor making plots. |
query_job
BigQuery job metadata for the most recent query.
shape
Return a tuple of the shape of the underlying data.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([1, 4, 9, 16])
>>> s.shape
(4,)
>>> s = bpd.Series(['Alice', 'Bob', bpd.NA])
>>> s.shape
(3,)
size
Return the number of elements in the underlying data.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
For Series:
>>> s = bpd.Series({'a': 1, 'b': 2, 'c': 3})
>>> s.size
3
For Index:
>>> idx = bpd.Index(bpd.Series([1, 2, 3]))
>>> idx.size
3
Returns | |
---|---|
Type | Description |
int | Return the number of elements in the underlying data. |
str
Vectorized string functions for Series and Index.
NAs stay NA unless handled otherwise by a particular method. Patterned after Python’s string methods, with some inspiration from R’s stringr package.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(["A_Str_Series"])
>>> s
0 A_Str_Series
dtype: string
>>> s.str.lower()
0 a_str_series
dtype: string
>>> s.str.replace("_", "")
0 AStrSeries
dtype: string
Returns | |
---|---|
Type | Description |
bigframes.operations.strings.StringMethods | An accessor containing string methods. |
struct
Accessor object for struct properties of the Series values.
Returns | |
---|---|
Type | Description |
bigframes.operations.structs.StructAccessor | An accessor containing struct methods. |
values
Return Series as ndarray or ndarray-like depending on the dtype.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.Series([1, 2, 3]).values
array([1, 2, 3], dtype=object)
>>> bpd.Series(list('aabc')).values
array(['a', 'a', 'b', 'c'], dtype=object)
Returns | |
---|---|
Type | Description |
numpy.ndarray or ndarray-like | Values in the Series. |
Methods
__array_ufunc__
__array_ufunc__(
ufunc: numpy.ufunc, method: str, *inputs, **kwargs
) -> bigframes.series.Series
Used to support numpy ufuncs. See: https://numpy.org/doc/stable/reference/ufuncs.html
__rmatmul__
__rmatmul__(other)
Matrix multiplication using binary @
operator in Python>=3.5.
abs
abs() -> bigframes.series.Series
Return a Series/DataFrame with absolute numeric value of each element.
This function only applies to elements that are all numeric.
add
add(other: float | int | bigframes.series.Series) -> bigframes.series.Series
Return addition of Series and other, element-wise (binary operator add).
Equivalent to series + other
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> a = bpd.Series([1, 2, 3, bpd.NA])
>>> a
0 1.0
1 2.0
2 3.0
3 <NA>
dtype: Float64
>>> b = bpd.Series([10, 20, 30, 40])
>>> b
0 10
1 20
2 30
3 40
dtype: Int64
>>> a.add(b)
0 11.0
1 22.0
2 33.0
3 <NA>
dtype: Float64
You can also use the mathematical operator +
:
>>> a + b
0 11.0
1 22.0
2 33.0
3 <NA>
dtype: Float64
Adding two Series with explicit indexes:
>>> a = bpd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
>>> b = bpd.Series([10, 20, 30, 40], index=['a', 'b', 'd', 'e'])
>>> a.add(b)
a 11
b 22
c <NA>
d 34
e <NA>
dtype: Int64
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
add_prefix
add_prefix(prefix: str, axis: int | str | None = None) -> bigframes.series.Series
Prefix labels with string prefix
.
For Series, the row labels are prefixed. For DataFrame, the column labels are prefixed.
Parameters | |
---|---|
Name | Description |
prefix |
str
The string to add before each label. |
axis |
int or str or None, default None
|
add_suffix
add_suffix(suffix: str, axis: int | str | None = None) -> bigframes.series.Series
Suffix labels with string suffix
.
For Series, the row labels are suffixed. For DataFrame, the column labels are suffixed.
agg
agg(
func: typing.Union[str, typing.Sequence[str]]
) -> typing.Union[typing.Any, bigframes.series.Series]
Aggregate using one or more operations over the specified axis.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([1, 2, 3, 4])
>>> s
0 1
1 2
2 3
3 4
dtype: Int64
>>> s.agg('min')
1
>>> s.agg(['min', 'max'])
min 1.0
max 4.0
dtype: Float64
Parameter | |
---|---|
Name | Description |
func |
function
Function to use for aggregating the data. Accepted combinations are: string function name, list of function names, e.g. |
Returns | |
---|---|
Type | Description |
scalar or Series | Aggregated results |
aggregate
aggregate(
func: typing.Union[str, typing.Sequence[str]]
) -> typing.Union[typing.Any, bigframes.series.Series]
API documentation for aggregate
method.
all
all() -> bool
Return whether all elements are True, potentially over an axis.
Returns True unless there at least one element within a Series or along a DataFrame axis that is False or equivalent (e.g. zero or empty).
Returns | |
---|---|
Type | Description |
scalar or Series | If level is specified, then, Series is returned; otherwise, scalar is returned. |
any
any() -> bool
Return whether any element is True, potentially over an axis.
Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).
Returns | |
---|---|
Type | Description |
scalar or Series | If level is specified, then, Series is returned; otherwise, scalar is returned. |
apply
apply(
func, by_row: typing.Union[typing.Literal["compat"], bool] = "compat"
) -> bigframes.series.Series
Invoke function on values of a Series.
Can be ufunc (a NumPy function that applies to the entire Series) or a
Python function that only works on single values. If it is an arbitrary
python function then converting it into a remote_function
is recommended.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
For applying arbitrary python function a remote_funciton
is recommended.
Let's use reuse=False
flag to make sure a new remote_function
is created every time we run the following code, but you can skip it
to potentially reuse a previously deployed remote_function
from
the same user defined function.
>>> @bpd.remote_function([int], float, reuse=False)
... def minutes_to_hours(x):
... return x/60
>>> minutes = bpd.Series([0, 30, 60, 90, 120])
>>> minutes
0 0
1 30
2 60
3 90
4 120
dtype: Int64
>>> hours = minutes.apply(minutes_to_hours)
>>> hours
0 0.0
1 0.5
2 1.0
3 1.5
4 2.0
dtype: Float64
To turn a user defined function with external package dependencies into
a remote_function
, you would provide the names of the packages via
packages
param.
>>> @bpd.remote_function(
... [str],
... str,
... reuse=False,
... packages=["cryptography"],
... )
... def get_hash(input):
... from cryptography.fernet import Fernet
...
... # handle missing value
... if input is None:
... input = ""
...
... key = Fernet.generate_key()
... f = Fernet(key)
... return f.encrypt(input.encode()).decode()
>>> names = bpd.Series(["Alice", "Bob"])
>>> hashes = names.apply(get_hash)
Simple vectorized functions, lambdas or ufuncs can be applied directly
with by_row=False
.
>>> nums = bpd.Series([1, 2, 3, 4])
>>> nums
0 1
1 2
2 3
3 4
dtype: Int64
>>> nums.apply(lambda x: x*x + 2*x + 1, by_row=False)
0 4
1 9
2 16
3 25
dtype: Int64
>>> def is_odd(num):
... return num % 2 == 1
>>> nums.apply(is_odd, by_row=False)
0 True
1 False
2 True
3 False
dtype: boolean
>>> nums.apply(np.log, by_row=False)
0 0.0
1 0.693147
2 1.098612
3 1.386294
dtype: Float64
Parameters | |
---|---|
Name | Description |
func |
function
BigFrames DataFrames |
by_row |
False or "compat", default "compat"
If |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | A new Series with values representing the return value of the func applied to each element of the original Series. |
argmax
argmax() -> int
Return int position of the smallest value in the Series.
If the minimum is achieved in multiple locations, the first row position is returned.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Consider dataset containing cereal calories.
>>> s = bpd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: Float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and the minimum cereal calories is the first element, since series is zero-indexed.
Returns | |
---|---|
Type | Description |
Series | Row position of the maximum value. |
argmin
argmin() -> int
Return int position of the largest value in the Series.
If the maximum is achieved in multiple locations, the first row position is returned.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Consider dataset containing cereal calories.
>>> s = bpd.Series({'Corn Flakes': 100.0, 'Almond Delight': 110.0,
... 'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0})
>>> s
Corn Flakes 100.0
Almond Delight 110.0
Cinnamon Toast Crunch 120.0
Cocoa Puff 110.0
dtype: Float64
>>> s.argmax()
2
>>> s.argmin()
0
The maximum cereal calories is the third element and the minimum cereal calories is the first element, since series is zero-indexed.
Returns | |
---|---|
Type | Description |
Series | Row position of the minimum value. |
astype
astype(
dtype: typing.Union[
typing.Literal[
"boolean",
"Float64",
"Int64",
"string",
"string[pyarrow]",
"timestamp[us, tz=UTC][pyarrow]",
"timestamp[us][pyarrow]",
"date32[day][pyarrow]",
"time64[us][pyarrow]",
"decimal128(38, 9)[pyarrow]",
"decimal256(76, 38)[pyarrow]",
"binary[pyarrow]",
],
pandas.core.arrays.boolean.BooleanDtype,
pandas.core.arrays.floating.Float64Dtype,
pandas.core.arrays.integer.Int64Dtype,
pandas.core.arrays.string_.StringDtype,
pandas.core.dtypes.dtypes.ArrowDtype,
geopandas.array.GeometryDtype,
]
) -> bigframes.series.Series
Cast a pandas object to a specified dtype dtype
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Create a DataFrame:
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = bpd.DataFrame(data=d)
>>> df.dtypes
col1 Int64
col2 Int64
dtype: object
Cast all columns to Float64
:
>>> df.astype('Float64').dtypes
col1 Float64
col2 Float64
dtype: object
Create a series of type Int64
:
>>> ser = bpd.Series([1, 2], dtype='Int64')
>>> ser
0 1
1 2
dtype: Int64
Convert to Float64
type:
>>> ser.astype('Float64')
0 1.0
1 2.0
dtype: Float64
Parameter | |
---|---|
Name | Description |
dtype |
str or pandas.ExtensionDtype
A dtype supported by BigQuery DataFrame include 'boolean','Float64','Int64', 'string', 'string[pyarrow]','timestamp[us, tz=UTC][pyarrow]', 'timestamp |
between
between(left, right, inclusive="both")
Return boolean Series equivalent to left <= series <= right.
This function returns a boolean vector containing True
wherever the
corresponding Series element is between the boundary values left
and
right
. NA values are treated as False
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Boundary values are included by default:
>>> s = bpd.Series([2, 0, 4, 8, np.nan])
>>> s.between(1, 4)
0 True
1 False
2 True
3 False
4 <NA>
dtype: boolean
With inclusive set to "neither" boundary values are excluded:
>>> s.between(1, 4, inclusive="neither")
0 True
1 False
2 False
3 False
4 <NA>
dtype: boolean
left and right can be any scalar value:
>>> s = bpd.Series(['Alice', 'Bob', 'Carol', 'Eve'])
>>> s.between('Anna', 'Daniel')
0 False
1 True
2 True
3 False
dtype: boolean
Parameters | |
---|---|
Name | Description |
left |
scalar or list-like
Left boundary. |
right |
scalar or list-like
Right boundary. |
inclusive |
{"both", "neither", "left", "right"}
Include boundaries. Whether to set each bound as closed or open. |
Returns | |
---|---|
Type | Description |
Series | Series representing whether each element is between left and right (inclusive). |
bfill
bfill(*, limit: typing.Optional[int] = None) -> bigframes.series.Series
Fill NA/NaN values by using the next valid observation to fill the gap.
Returns | |
---|---|
Type | Description |
Series/DataFrame or None | Object with missing values filled. |
clip
clip(lower, upper)
Trim values at input threshold(s).
Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis.
Parameters | |
---|---|
Name | Description |
lower |
float or array-like, default None
Minimum threshold value. All values below this threshold will be set to it. A missing threshold (e.g NA) will not clip the value. |
upper |
float or array-like, default None
Maximum threshold value. All values above this threshold will be set to it. A missing threshold (e.g NA) will not clip the value. |
Returns | |
---|---|
Type | Description |
Series | Series. |
copy
copy() -> bigframes.series.Series
Make a copy of this object's indices and data.
A new object will be created with a copy of the calling object's data and indices. Modifications to the data or indices of the copy will not be reflected in the original object.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Modification in the original Series will not affect the copy Series:
>>> s = bpd.Series([1, 2], index=["a", "b"])
>>> s
a 1
b 2
dtype: Int64
>>> s_copy = s.copy()
>>> s_copy
a 1
b 2
dtype: Int64
>>> s.loc['b'] = 22
>>> s
a 1
b 22
dtype: Int64
>>> s_copy
a 1
b 2
dtype: Int64
Modification in the original DataFrame will not affect the copy DataFrame:
>>> df = bpd.DataFrame({'a': [1, 3], 'b': [2, 4]})
>>> df
a b
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
>>> df_copy = df.copy()
>>> df_copy
a b
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
>>> df.loc[df["b"] == 2, "b"] = 22
>>> df
a b
0 1 22.0
1 3 4.0
<BLANKLINE>
[2 rows x 2 columns]
>>> df_copy
a b
0 1 2
1 3 4
<BLANKLINE>
[2 rows x 2 columns]
corr
corr(other: bigframes.series.Series, method="pearson", min_periods=None) -> float
Compute the correlation with the other Series. Non-number values are ignored in the computation.
Uses the "Pearson" method of correlation. Numbers are converted to float before calculation, so the result may be unstable.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s1 = bpd.Series([.2, .0, .6, .2])
>>> s2 = bpd.Series([.3, .6, .0, .1])
>>> s1.corr(s2)
-0.8510644963469901
>>> s1 = bpd.Series([1, 2, 3], index=[0, 1, 2])
>>> s2 = bpd.Series([1, 2, 3], index=[2, 1, 0])
>>> s1.corr(s2)
-1.0
Parameters | |
---|---|
Name | Description |
other |
Series
The series with which this is to be correlated. |
method |
string, default "pearson"
Correlation method to use - currently only "pearson" is supported. |
min_periods |
int, default None
The minimum number of observations needed to return a result. Non-default values are not yet supported, so a result will be returned for at least two observations. |
Returns | |
---|---|
Type | Description |
float | Will return NaN if there are fewer than two numeric pairs, either series has a variance or covariance of zero, or any input value is infinite. |
count
count() -> int
Return number of non-NA/null observations in the Series.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([0.0, 1.0, bpd.NA])
>>> s
0 0.0
1 1.0
2 <NA>
dtype: Float64
>>> s.count()
2
Returns | |
---|---|
Type | Description |
int or Series (if level specified) | Number of non-null values in the Series. |
cov
cov(other: bigframes.series.Series) -> float
Compute covariance with Series, excluding missing values.
The two Series
objects are not required to be the same length and
will be aligned internally before the covariance is calculated.
Parameter | |
---|---|
Name | Description |
other |
Series
Series with which to compute the covariance. |
Returns | |
---|---|
Type | Description |
float | Covariance between Series and other normalized by N-1 (unbiased estimator). |
cummax
cummax() -> bigframes.series.Series
Return cumulative maximum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative maximum.
Parameter | |
---|---|
Name | Description |
axis |
{{0 or 'index', 1 or 'columns'}}, default 0
The index or the name of the axis. 0 is equivalent to None or 'index'. For |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Return cumulative maximum of scalar or Series. |
cummin
cummin() -> bigframes.series.Series
Return cumulative minimum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative minimum.
Parameters | |
---|---|
Name | Description |
axis |
{0 or 'index', 1 or 'columns'}, default 0
The index or the name of the axis. 0 is equivalent to None or 'index'. For |
skipna |
bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA. |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Return cumulative minimum of scalar or Series. |
cumprod
cumprod() -> bigframes.series.Series
Return cumulative product over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative product.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([2, np.nan, 5, -1, 0])
>>> s
0 2.0
1 <NA>
2 5.0
3 -1.0
4 0.0
dtype: Float64
By default, NA values are ignored.
>>> s.cumprod()
0 2.0
1 <NA>
2 10.0
3 -10.0
4 0.0
dtype: Float64
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Return cumulative sum of scalar or Series. |
cumsum
cumsum() -> bigframes.series.Series
Return cumulative sum over a DataFrame or Series axis.
Returns a DataFrame or Series of the same size containing the cumulative sum.
Parameter | |
---|---|
Name | Description |
axis |
{0 or 'index', 1 or 'columns'}, default 0
The index or the name of the axis. 0 is equivalent to None or 'index'. For |
Returns | |
---|---|
Type | Description |
scalar or Series | Return cumulative sum of scalar or Series. |
diff
diff(periods: int = 1) -> bigframes.series.Series
First discrete difference of element.
Calculates the difference of a {klass} element compared with another element in the {klass} (default is element in previous row).
Parameter | |
---|---|
Name | Description |
periods |
int, default 1
Periods to shift for calculating difference, accepts negative values. |
Returns | |
---|---|
Type | Description |
Series | First differences of the Series. |
div
div(other: float | int | bigframes.series.Series) -> bigframes.series.Series
API documentation for div
method.
divide
divide(other: float | int | bigframes.series.Series) -> bigframes.series.Series
API documentation for divide
method.
divmod
divmod(other) -> typing.Tuple[bigframes.series.Series, bigframes.series.Series]
Return integer division and modulo of Series and other, element-wise (binary operator divmod).
Equivalent to divmod(series, other).
Returns | |
---|---|
Type | Description |
2-Tuple of Series | The result of the operation. The result is always consistent with (floordiv, mod) (though pandas may not). |
dot
dot(other)
Compute the dot product between the Series and the columns of other.
This method computes the dot product between the Series and another one, or the Series and each columns of a DataFrame, or the Series and each columns of an array.
It can also be called using self @ other
in Python >= 3.5.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([0, 1, 2, 3])
>>> other = bpd.Series([-1, 2, -3, 4])
>>> s.dot(other)
8
You can also use the operator @
for the dot product:
>>> s @ other
8
Parameter | |
---|---|
Name | Description |
other |
Series
The other object to compute the dot product with its columns. |
Returns | |
---|---|
Type | Description |
scalar, Series or numpy.ndarray | Return the dot product of the Series and other if other is a Series, the Series of the dot product of Series and each rows of other if other is a DataFrame or a numpy.ndarray between the Series and each columns of the numpy array. |
drop
drop(
labels: typing.Any = None,
*,
axis: typing.Union[int, str] = 0,
index: typing.Any = None,
columns: typing.Union[typing.Hashable, typing.Iterable[typing.Hashable]] = None,
level: typing.Optional[typing.Union[str, int]] = None
) -> bigframes.series.Series
Return Series with specified index labels removed.
Remove elements of a Series based on specifying the index labels. When using a multi-index, labels on different levels can be removed by specifying the level.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(data=np.arange(3), index=['A', 'B', 'C'])
>>> s
A 0
B 1
C 2
dtype: Int64
Drop labels B and C:
>>> s.drop(labels=['B', 'C'])
A 0
dtype: Int64
Drop 2nd level label in MultiIndex Series:
>>> import pandas as pd
>>> midx = pd.MultiIndex(levels=[['llama', 'cow', 'falcon'],
... ['speed', 'weight', 'length']],
... codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
... [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> s = bpd.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
... index=midx)
>>> s
llama speed 45.0
weight 200.0
length 1.2
cow speed 30.0
weight 250.0
length 1.5
falcon speed 320.0
weight 1.0
length 0.3
dtype: Float64
>>> s.drop(labels='weight', level=1)
llama speed 45.0
length 1.2
cow speed 30.0
length 1.5
falcon speed 320.0
length 0.3
dtype: Float64
Parameter | |
---|---|
Name | Description |
labels |
single label or list-like
Index labels to drop. |
Exceptions | |
---|---|
Type | Description |
KeyError | If none of the labels are found in the index. |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Series with specified index labels removed or None if inplace=True . |
drop_duplicates
drop_duplicates(*, keep: str = "first") -> bigframes.series.Series
Return Series with duplicate values removed.
Parameter | |
---|---|
Name | Description |
keep |
{'first', 'last',
Method to handle dropping duplicates: 'first' : Drop duplicates except for the first occurrence. 'last' : Drop duplicates except for the last occurrence. |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Series with duplicates dropped or None if inplace=True . |
droplevel
droplevel(
level: typing.Union[str, int, typing.Sequence[typing.Union[str, int]]],
axis: int | str = 0,
)
Return Series with requested index / column level(s) removed.
Parameters | |
---|---|
Name | Description |
level |
int, str, or list-like
If a string is given, must be the name of a level If list-like, elements must be names or positional indexes of levels. |
axis |
{0 or 'index', 1 or 'columns'}, default 0
For |
dropna
dropna(
*,
axis: int = 0,
inplace: bool = False,
how: typing.Optional[str] = None,
ignore_index: bool = False
) -> bigframes.series.Series
Return a new Series with missing values removed.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Drop NA values from a Series:
>>> ser = bpd.Series([1., 2., np.nan])
>>> ser
0 1.0
1 2.0
2 <NA>
dtype: Float64
>>> ser.dropna()
0 1.0
1 2.0
dtype: Float64
Empty strings are not considered NA values. None
is considered an NA value.
>>> ser = bpd.Series(['2', bpd.NA, '', None, 'I stay'], dtype='object')
>>> ser
0 2
1 <NA>
2
3 <NA>
4 I stay
dtype: string
>>> ser.dropna()
0 2
2
4 I stay
dtype: string
Parameters | |
---|---|
Name | Description |
axis |
0 or 'index'
Unused. Parameter needed for compatibility with DataFrame. |
inplace |
bool, default False
Unsupported, do not set. |
how |
str, optional
Not in use. Kept for compatibility. |
Returns | |
---|---|
Type | Description |
Series | Series with NA entries dropped from it. |
duplicated
duplicated(keep: str = "first") -> bigframes.series.Series
Indicate duplicate Series values.
Duplicated values are indicated as True
values in the resulting
Series. Either all duplicates, all except the first or all except the
last occurrence of duplicates can be indicated.
Parameter | |
---|---|
Name | Description |
keep |
{'first', 'last', False}, default 'first'
Method to handle dropping duplicates: 'first' : Mark duplicates as |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Series indicating whether each value has occurred in the preceding values. |
eq
eq(other: object) -> bigframes.series.Series
Return equal of Series and other, element-wise (binary operator eq).
Equivalent to other == series
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
Series | The result of the operation. |
equals
equals(
other: typing.Union[bigframes.series.Series, bigframes.dataframe.DataFrame]
) -> bool
API documentation for equals
method.
expanding
expanding(min_periods: int = 1) -> bigframes.core.window.Window
Provide expanding window calculations.
Parameter | |
---|---|
Name | Description |
min_periods |
int, default 1
Minimum number of observations in window required to have a value; otherwise, result is |
Returns | |
---|---|
Type | Description |
bigframes.core.window.Window | Expanding subclass. |
ffill
ffill(*, limit: typing.Optional[int] = None) -> bigframes.series.Series
Fill NA/NaN values by propagating the last valid observation to next valid.
Examples:
>>> import bigframes.pandas as bpd
>>> import numpy as np
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame([[np.nan, 2, np.nan, 0],
... [3, 4, np.nan, 1],
... [np.nan, np.nan, np.nan, np.nan],
... [np.nan, 3, np.nan, 4]],
... columns=list("ABCD")).astype("Float64")
>>> df
A B C D
0 <NA> 2.0 <NA> 0.0
1 3.0 4.0 <NA> 1.0
2 <NA> <NA> <NA> <NA>
3 <NA> 3.0 <NA> 4.0
<BLANKLINE>
[4 rows x 4 columns]
Fill NA/NaN values in DataFrames:
>>> df.ffill()
A B C D
0 <NA> 2.0 <NA> 0.0
1 3.0 4.0 <NA> 1.0
2 3.0 4.0 <NA> 1.0
3 3.0 3.0 <NA> 4.0
<BLANKLINE>
[4 rows x 4 columns]
Fill NA/NaN values in Series:
>>> series = bpd.Series([1, np.nan, 2, 3])
>>> series.ffill()
0 1.0
1 1.0
2 2.0
3 3.0
dtype: Float64
Returns | |
---|---|
Type | Description |
Series/DataFrame or None | Object with missing values filled. |
fillna
fillna(value=None) -> bigframes.series.Series
Fill NA/NaN values using the specified method.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([np.nan, 2, np.nan, -1])
>>> s
0 <NA>
1 2.0
2 <NA>
3 -1.0
dtype: Float64
Replace all NA elements with 0s.
>>> s.fillna(0)
0 0.0
1 2.0
2 0.0
3 -1.0
dtype: Float64
You can use fill values from another Series:
>>> s_fill = bpd.Series([11, 22, 33])
>>> s.fillna(s_fill)
0 11.0
1 2.0
2 33.0
3 -1.0
dtype: Float64
Parameter | |
---|---|
Name | Description |
value |
scalar, dict, Series, or DataFrame, default None
Value to use to fill holes (e.g. 0). |
Returns | |
---|---|
Type | Description |
Series or None | Object with missing values filled or None. |
filter
filter(
items: typing.Optional[typing.Iterable] = None,
like: typing.Optional[str] = None,
regex: typing.Optional[str] = None,
axis: typing.Optional[typing.Union[str, int]] = None,
) -> bigframes.series.Series
Subset the dataframe rows or columns according to the specified index labels.
Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.
Parameters | |
---|---|
Name | Description |
items |
list-like
Keep labels from axis which are in items. |
like |
str
Keep labels from axis for which "like in label == True". |
regex |
str (regular expression)
Keep labels from axis for which re.search(regex, label) == True. |
axis |
{0 or 'index', 1 or 'columns', None}, default None
The axis to filter on, expressed either as an index (int) or axis name (str). By default this is the info axis, 'columns' for DataFrame. For |
floordiv
floordiv(other: float | int | bigframes.series.Series) -> bigframes.series.Series
Return integer division of Series and other, element-wise (binary operator floordiv).
Equivalent to series // other
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
ge
ge(other) -> bigframes.series.Series
Get 'greater than or equal to' of Series and other, element-wise (binary operator >=
).
Equivalent to series >= other
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
get
get(key, default=None)
Get item from object for given key (ex: DataFrame column).
Returns default value if not found.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame(
... [
... [24.3, 75.7, "high"],
... [31, 87.8, "high"],
... [22, 71.6, "medium"],
... [35, 95, "medium"],
... ],
... columns=["temp_celsius", "temp_fahrenheit", "windspeed"],
... index=["2014-02-12", "2014-02-13", "2014-02-14", "2014-02-15"],
... )
>>> df
temp_celsius temp_fahrenheit windspeed
2014-02-12 24.3 75.7 high
2014-02-13 31.0 87.8 high
2014-02-14 22.0 71.6 medium
2014-02-15 35.0 95.0 medium
<BLANKLINE>
[4 rows x 3 columns]
>>> df.get(["temp_celsius", "windspeed"])
temp_celsius windspeed
2014-02-12 24.3 high
2014-02-13 31.0 high
2014-02-14 22.0 medium
2014-02-15 35.0 medium
<BLANKLINE>
[4 rows x 2 columns]
>>> ser = df['windspeed']
>>> ser
2014-02-12 high
2014-02-13 high
2014-02-14 medium
2014-02-15 medium
Name: windspeed, dtype: string
>>> ser.get('2014-02-13')
'high'
If the key is not found, the default value will be used.
>>> df.get(["temp_celsius", "temp_kelvin"])
>>> df.get(["temp_celsius", "temp_kelvin"], default="default_value")
'default_value'
groupby
groupby(
by: typing.Union[
typing.Hashable,
bigframes.series.Series,
typing.Sequence[typing.Union[typing.Hashable, bigframes.series.Series]],
] = None,
axis=0,
level: typing.Optional[
typing.Union[int, str, typing.Sequence[int], typing.Sequence[str]]
] = None,
as_index: bool = True,
*,
dropna: bool = True
) -> bigframes.core.groupby.SeriesGroupBy
Group Series using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
You can group by a named index level.
>>> s = bpd.Series([380, 370., 24., 26.],
... index=["Falcon", "Falcon", "Parrot", "Parrot"],
... name="Max Speed")
>>> s.index.name="Animal"
>>> s
Animal
Falcon 380.0
Falcon 370.0
Parrot 24.0
Parrot 26.0
Name: Max Speed, dtype: Float64
>>> s.groupby("Animal").mean()
Animal
Falcon 375.0
Parrot 25.0
Name: Max Speed, dtype: Float64
You can also group by more than one index levels.
>>> import pandas as pd
>>> s = bpd.Series([380, 370., 24., 26.],
... index=pd.MultiIndex.from_tuples(
... [("Falcon", "Clear"),
... ("Falcon", "Cloudy"),
... ("Parrot", "Clear"),
... ("Parrot", "Clear")],
... names=["Animal", "Sky"]),
... name="Max Speed")
>>> s
Animal Sky
Falcon Clear 380.0
Cloudy 370.0
Parrot Clear 24.0
Clear 26.0
Name: Max Speed, dtype: Float64
>>> s.groupby("Animal").mean()
Animal
Falcon 375.0
Parrot 25.0
Name: Max Speed, dtype: Float64
>>> s.groupby("Sky").mean()
Sky
Clear 143.333333
Cloudy 370.0
Name: Max Speed, dtype: Float64
>>> s.groupby(["Animal", "Sky"]).mean()
Animal Sky
Falcon Clear 380.0
Cloudy 370.0
Parrot Clear 25.0
Name: Max Speed, dtype: Float64
You can also group by values in a Series provided the index matches with the original series.
>>> df = bpd.DataFrame({'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
... 'Max Speed': [380., 370., 24., 26.],
... 'Age': [10., 20., 4., 6.]})
>>> df
Animal Max Speed Age
0 Falcon 380.0 10.0
1 Falcon 370.0 20.0
2 Parrot 24.0 4.0
3 Parrot 26.0 6.0
<BLANKLINE>
[4 rows x 3 columns]
>>> df['Max Speed'].groupby(df['Animal']).mean()
Animal
Falcon 375.0
Parrot 25.0
Name: Max Speed, dtype: Float64
>>> df['Age'].groupby(df['Animal']).max()
Animal
Falcon 20.0
Parrot 6.0
Name: Age, dtype: Float64
Parameters | |
---|---|
Name | Description |
by |
mapping, function, label, pd.Grouper or list of such, default None
Used to determine the groups for the groupby. If |
axis |
{0 or 'index', 1 or 'columns'}, default 0
Split along rows (0) or columns (1). For |
level |
int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both |
as_index |
bool, default True
Return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively "SQL-style" grouped output. This argument has no effect on filtrations (see the "filtrations in the user guide" |
Returns | |
---|---|
Type | Description |
bigframes.core.groupby.SeriesGroupBy | Returns a groupby object that contains information about the groups. |
gt
gt(other) -> bigframes.series.Series
Get 'less than or equal to' of Series and other, element-wise (binary operator <=
).
Equivalent to series <= other
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
head
head(n: int = 5) -> bigframes.series.Series
Return the first n
rows.
This function returns the first n
rows for the object based
on position. It is useful for quickly testing if your object
has the right type of data in it.
For negative values of n
, this function returns
all rows except the last |n|
rows, equivalent to df[:n]
.
If n is larger than the number of rows, this function returns all rows.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
... 'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra
<BLANKLINE>
[9 rows x 1 columns]
Viewing the first 5 lines:
>>> df.head()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
<BLANKLINE>
[5 rows x 1 columns]
Viewing the first n
lines (three in this case):
>>> df.head(3)
animal
0 alligator
1 bee
2 falcon
<BLANKLINE>
[3 rows x 1 columns]
For negative values of n
:
>>> df.head(-3)
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
<BLANKLINE>
[6 rows x 1 columns]
Parameter | |
---|---|
Name | Description |
n |
int, default 5
Default 5. Number of rows to select. |
Returns | |
---|---|
Type | Description |
same type as caller | The first n rows of the caller object. |
idxmax
idxmax() -> typing.Hashable
Return the row label of the maximum value.
If multiple values equal the maximum, the first row label with that value is returned.
Returns | |
---|---|
Type | Description |
Index | Label of the maximum value. |
idxmin
idxmin() -> typing.Hashable
Return the row label of the minimum value.
If multiple values equal the minimum, the first row label with that value is returned.
Returns | |
---|---|
Type | Description |
Index | Label of the minimum value. |
interpolate
interpolate(method: str = "linear") -> bigframes.series.Series
Fill NaN values using an interpolation method.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({
... 'A': [1, 2, 3, None, None, 6],
... 'B': [None, 6, None, 2, None, 3],
... }, index=[0, 0.1, 0.3, 0.7, 0.9, 1.0])
>>> df.interpolate()
A B
0.0 1.0 <NA>
0.1 2.0 6.0
0.3 3.0 4.0
0.7 4.0 2.0
0.9 5.0 2.5
1.0 6.0 3.0
<BLANKLINE>
[6 rows x 2 columns]
>>> df.interpolate(method="values")
A B
0.0 1.0 <NA>
0.1 2.0 6.0
0.3 3.0 4.666667
0.7 4.714286 2.0
0.9 5.571429 2.666667
1.0 6.0 3.0
<BLANKLINE>
[6 rows x 2 columns]
Parameter | |
---|---|
Name | Description |
method |
str, default 'linear'
Interpolation technique to use. Only 'linear' supported. 'linear': Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes. 'index', 'values': use the actual numerical values of the index. 'pad': Fill in NaNs using existing values. 'nearest', 'zero', 'slinear': Emulates |
Returns | |
---|---|
Type | Description |
Series | Returns the same object type as the caller, interpolated at some or all NaN values |
isin
isin(values) -> "Series" | None
Whether elements in Series are contained in values.
Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['llama', 'cow', 'llama', 'beetle', 'llama',
... 'hippo'], name='animal')
>>> s
0 llama
1 cow
2 llama
3 beetle
4 llama
5 hippo
Name: animal, dtype: string
>>> s.isin(['cow', 'llama'])
0 True
1 True
2 True
3 False
4 True
5 False
Name: animal, dtype: boolean
Strings and integers are distinct and are therefore not comparable:
>>> bpd.Series([1]).isin(['1'])
0 False
dtype: boolean
>>> bpd.Series([1.1]).isin(['1.1'])
0 False
dtype: boolean
Parameter | |
---|---|
Name | Description |
values |
list-like
The sequence of values to test. Passing in a single string will raise a TypeError. Instead, turn a single string into a list of one element. |
Exceptions | |
---|---|
Type | Description |
TypeError | If input is not list-like. |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Series of booleans indicating if each element is in values. |
isna
isna() -> bigframes.series.Series
Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values get mapped to True values. Everything else gets mapped to
False values. Characters such as empty strings ''
or
numpy.inf
are not considered NA values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> import numpy as np
>>> df = bpd.DataFrame(dict(
... age=[5, 6, np.nan],
... born=[bpd.NA, "1940-04-25", "1940-04-25"],
... name=['Alfred', 'Batman', ''],
... toy=[None, 'Batmobile', 'Joker'],
... ))
>>> df
age born name toy
0 5.0 <NA> Alfred <NA>
1 6.0 1940-04-25 Batman Batmobile
2 <NA> 1940-04-25 Joker
<BLANKLINE>
[3 rows x 4 columns]
Show which entries in a DataFrame are NA:
>>> df.isna()
age born name toy
0 False True False True
1 False False False False
2 True False False False
<BLANKLINE>
[3 rows x 4 columns]
>>> df.isnull()
age born name toy
0 False True False True
1 False False False False
2 True False False False
<BLANKLINE>
[3 rows x 4 columns]
Show which entries in a Series are NA:
>>> ser = bpd.Series([5, None, 6, np.nan, bpd.NA])
>>> ser
0 5.0
1 <NA>
2 6.0
3 <NA>
4 <NA>
dtype: Float64
>>> ser.isna()
0 False
1 True
2 False
3 True
4 True
dtype: boolean
>>> ser.isnull()
0 False
1 True
2 False
3 True
4 True
dtype: boolean
isnull
isnull() -> bigframes.series.Series
Detect missing values.
Return a boolean same-sized object indicating if the values are NA.
NA values get mapped to True values. Everything else gets mapped to
False values. Characters such as empty strings ''
or
numpy.inf
are not considered NA values.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> import numpy as np
>>> df = bpd.DataFrame(dict(
... age=[5, 6, np.nan],
... born=[bpd.NA, "1940-04-25", "1940-04-25"],
... name=['Alfred', 'Batman', ''],
... toy=[None, 'Batmobile', 'Joker'],
... ))
>>> df
age born name toy
0 5.0 <NA> Alfred <NA>
1 6.0 1940-04-25 Batman Batmobile
2 <NA> 1940-04-25 Joker
<BLANKLINE>
[3 rows x 4 columns]
Show which entries in a DataFrame are NA:
>>> df.isna()
age born name toy
0 False True False True
1 False False False False
2 True False False False
<BLANKLINE>
[3 rows x 4 columns]
>>> df.isnull()
age born name toy
0 False True False True
1 False False False False
2 True False False False
<BLANKLINE>
[3 rows x 4 columns]
Show which entries in a Series are NA:
>>> ser = bpd.Series([5, None, 6, np.nan, bpd.NA])
>>> ser
0 5.0
1 <NA>
2 6.0
3 <NA>
4 <NA>
dtype: Float64
>>> ser.isna()
0 False
1 True
2 False
3 True
4 True
dtype: boolean
>>> ser.isnull()
0 False
1 True
2 False
3 True
4 True
dtype: boolean
kurt
kurt()
Return unbiased kurtosis over requested axis.
Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.
Returns | |
---|---|
Type | Description |
scalar or scalar | Unbiased kurtosis over requested axis. |
kurtosis
kurtosis()
API documentation for kurtosis
method.
le
le(other) -> bigframes.series.Series
Get 'less than or equal to' of Series and other, element-wise (binary operator <=
).
Equivalent to series <= other
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the comparison. |
lt
lt(other) -> bigframes.series.Series
Get 'less than' of Series and other, element-wise (binary operator <
).
Equivalent to series < other
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
map
map(
arg: typing.Union[typing.Mapping, bigframes.series.Series],
na_action: typing.Optional[str] = None,
*,
verify_integrity: bool = False
) -> bigframes.series.Series
Map values of Series according to an input mapping or function.
Used for substituting each value in a Series with another value,
that may be derived from a remote function, dict
, or a Series
.
If arg is a remote function, the overhead for remote functions applies. If mapping with a dict, fully deferred computation is possible. If mapping with a Series, fully deferred computation is only possible if verify_integrity=False.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['cat', 'dog', bpd.NA, 'rabbit'])
>>> s
0 cat
1 dog
2 <NA>
3 rabbit
dtype: string
map
can accepts a dict
. Values that are not found in the dict
are
converted to NA
:
>>> s.map({'cat': 'kitten', 'dog': 'puppy'})
0 kitten
1 puppy
2 <NA>
3 <NA>
dtype: string
It also accepts a remote function:
>>> @bpd.remote_function([str], str)
... def my_mapper(val):
... vowels = ["a", "e", "i", "o", "u"]
... if val:
... return "".join([
... ch.upper() if ch in vowels else ch for ch in val
... ])
... return "N/A"
>>> s.map(my_mapper)
0 cAt
1 dOg
2 N/A
3 rAbbIt
dtype: string
Parameter | |
---|---|
Name | Description |
arg |
function, Mapping, Series
remote function, collections.abc.Mapping subclass or Series Mapping correspondence. |
Returns | |
---|---|
Type | Description |
Series | Same index as caller. |
mask
mask(cond, other=None) -> bigframes.series.Series
Replace values where the condition is True.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([10, 11, 12, 13, 14])
>>> s
0 10
1 11
2 12
3 13
4 14
dtype: Int64
You can mask the values in the Series based on a condition. The values matching the condition would be masked. The condition can be provided in formm of a Series.
>>> s.mask(s % 2 == 0)
0 <NA>
1 11
2 <NA>
3 13
4 <NA>
dtype: Int64
You can specify a custom mask value.
>>> s.mask(s % 2 == 0, -1)
0 -1
1 11
2 -1
3 13
4 -1
dtype: Int64
>>> s.mask(s % 2 == 0, 100*s)
0 1000
1 11
2 1200
3 13
4 1400
dtype: Int64
You can also use a remote function to evaluate the mask condition. This is useful in situation such as the following, where the mask condition is evaluated based on a complicated business logic which cannot be expressed in form of a Series.
>>> @bpd.remote_function([str], bool, reuse=False)
... def should_mask(name):
... hash = 0
... for char_ in name:
... hash += ord(char_)
... return hash % 2 == 0
>>> s = bpd.Series(["Alice", "Bob", "Caroline"])
>>> s
0 Alice
1 Bob
2 Caroline
dtype: string
>>> s.mask(should_mask)
0 <NA>
1 Bob
2 Caroline
dtype: string
>>> s.mask(should_mask, "REDACTED")
0 REDACTED
1 Bob
2 Caroline
dtype: string
Simple vectorized (i.e. they only perform operations supported on a Series) lambdas or python functions can be used directly.
>>> nums = bpd.Series([1, 2, 3, 4], name="nums")
>>> nums
0 1
1 2
2 3
3 4
Name: nums, dtype: Int64
>>> nums.mask(lambda x: (x+1) % 2 == 1)
0 1
1 <NA>
2 3
3 <NA>
Name: nums, dtype: Int64
>>> def is_odd(num):
... return num % 2 == 1
>>> nums.mask(is_odd)
0 <NA>
1 2
2 <NA>
3 4
Name: nums, dtype: Int64
Parameters | |
---|---|
Name | Description |
cond |
bool Series/DataFrame, array-like, or callable
Where cond is False, keep the original value. Where True, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it). |
other |
scalar, Series/DataFrame, or callable
Entries where cond is True are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it). If not specified, entries will be filled with the corresponding NULL value (np.nan for numpy dtypes, pd.NA for extension dtypes). |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Series after the replacement. |
max
max() -> typing.Any
Return the maximum of the values over the requested axis.
If you want the index of the maximum, use idxmax
. This is the equivalent
of the numpy.ndarray
method argmax
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Calculating the max of a Series:
>>> s = bpd.Series([1, 3])
>>> s
0 1
1 3
dtype: Int64
>>> s.max()
3
Calculating the max of a Series containing NA
values:
>>> s = bpd.Series([1, 3, bpd.NA])
>>> s
0 1.0
1 3.0
2 <NA>
dtype: Float64
>>> s.max()
3.0
Returns | |
---|---|
Type | Description |
scalar | Scalar. |
mean
mean() -> float
Return the mean of the values over the requested axis.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Calculating the mean of a Series:
>>> s = bpd.Series([1, 3])
>>> s
0 1
1 3
dtype: Int64
>>> s.mean()
2.0
Calculating the mean of a Series containing NA
values:
>>> s = bpd.Series([1, 3, bpd.NA])
>>> s
0 1.0
1 3.0
2 <NA>
dtype: Float64
>>> s.mean()
2.0
Returns | |
---|---|
Type | Description |
scalar | Scalar. |
median
median(*, exact: bool = False) -> float
Return the median of the values over the requested axis.
Parameter | |
---|---|
Name | Description |
exact |
bool. default False
Default False. Get the exact median instead of an approximate one. Note: |
Returns | |
---|---|
Type | Description |
scalar | Scalar. |
min
min() -> typing.Any
Return the maximum of the values over the requested axis.
If you want the index of the minimum, use idxmin
. This is the equivalent
of the numpy.ndarray
method argmin
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Calculating the min of a Series:
>>> s = bpd.Series([1, 3])
>>> s
0 1
1 3
dtype: Int64
>>> s.min()
1
Calculating the min of a Series containing NA
values:
>>> s = bpd.Series([1, 3, bpd.NA])
>>> s
0 1.0
1 3.0
2 <NA>
dtype: Float64
>>> s.min()
1.0
Returns | |
---|---|
Type | Description |
scalar | Scalar. |
mod
mod(other) -> bigframes.series.Series
Return modulo of Series and other, element-wise (binary operator mod).
Equivalent to series % other
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
mode
mode() -> bigframes.series.Series
Return the mode(s) of the Series.
The mode is the value that appears most often. There can be multiple modes.
Always returns Series even if only one value is returned.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Modes of the Series in sorted order. |
mul
mul(other: float | int | bigframes.series.Series) -> bigframes.series.Series
Return multiplication of Series and other, element-wise (binary operator mul).
Equivalent to other * series
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
multiply
multiply(other: float | int | bigframes.series.Series) -> bigframes.series.Series
API documentation for multiply
method.
ne
ne(other: object) -> bigframes.series.Series
Return not equal of Series and other, element-wise (binary operator ne).
Equivalent to other != series
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
nlargest
nlargest(n: int = 5, keep: str = "first") -> bigframes.series.Series
Return the largest n
elements.
Parameters | |
---|---|
Name | Description |
n |
int, default 5
Return this many descending sorted values. |
keep |
{'first', 'last', 'all'}, default 'first'
When there are duplicate values that cannot all fit in a Series of |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The n largest values in the Series, sorted in decreasing order. |
notna
notna() -> bigframes.series.Series
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings ''
or numpy.inf
are not considered NA values.
NA values get mapped to False values.
Returns | |
---|---|
Type | Description |
NDFrame | Mask of bool values for each element that indicates whether an element is not an NA value. |
notnull
notnull() -> bigframes.series.Series
Detect existing (non-missing) values.
Return a boolean same-sized object indicating if the values are not NA.
Non-missing values get mapped to True. Characters such as empty
strings ''
or numpy.inf
are not considered NA values.
NA values get mapped to False values.
Returns | |
---|---|
Type | Description |
NDFrame | Mask of bool values for each element that indicates whether an element is not an NA value. |
nsmallest
nsmallest(n: int = 5, keep: str = "first") -> bigframes.series.Series
Return the smallest n
elements.
Parameters | |
---|---|
Name | Description |
n |
int, default 5
Return this many ascending sorted values. |
keep |
{'first', 'last', 'all'}, default 'first'
When there are duplicate values that cannot all fit in a Series of |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The n smallest values in the Series, sorted in increasing order. |
nunique
nunique() -> int
Return number of unique elements in the object.
Excludes NA values by default.
Returns | |
---|---|
Type | Description |
int | number of unique elements in the object. |
pad
pad(*, limit: typing.Optional[int] = None) -> bigframes.series.Series
API documentation for pad
method.
pct_change
pct_change(periods: int = 1) -> bigframes.series.Series
Fractional change between the current and a prior element.
Computes the fractional change from the immediately previous row by default. This is useful in comparing the fraction of change in a time series of elements.
Parameter | |
---|---|
Name | Description |
periods |
int, default 1
Periods to shift for forming percent change. |
Returns | |
---|---|
Type | Description |
Series or DataFrame | The same type as the calling object. |
pow
pow(other: float | int | bigframes.series.Series) -> bigframes.series.Series
Return Exponential power of series and other, element-wise (binary operator pow
).
Equivalent to series ** other
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
prod
prod() -> float
Return the product of the values over the requested axis.
Returns | |
---|---|
Type | Description |
scalar | Scalar. |
product
product() -> float
API documentation for product
method.
radd
radd(other: float | int | bigframes.series.Series) -> bigframes.series.Series
Return addition of Series and other, element-wise (binary operator radd).
Equivalent to other + series
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
rank
rank(
axis=0,
method: str = "average",
numeric_only=False,
na_option: str = "keep",
ascending: bool = True,
) -> bigframes.series.Series
Compute numerical data ranks (1 through n) along axis.
By default, equal values are assigned a rank that is the average of the ranks of those values.
Parameters | |
---|---|
Name | Description |
method |
{'average', 'min', 'max', 'first', 'dense'}, default 'average'
How to rank the group of records that have the same value (i.e. ties): |
numeric_only |
bool, default False
For DataFrame objects, rank only numeric columns if set to True. |
na_option |
{'keep', 'top', 'bottom'}, default 'keep'
How to rank NaN values: |
ascending |
bool, default True
Whether or not the elements should be ranked in ascending order. |
Returns | |
---|---|
Type | Description |
same type as caller | Return a Series or DataFrame with data ranks as values. |
rdiv
rdiv(other: float | int | bigframes.series.Series) -> bigframes.series.Series
API documentation for rdiv
method.
rdivmod
rdivmod(other) -> typing.Tuple[bigframes.series.Series, bigframes.series.Series]
Return integer division and modulo of Series and other, element-wise (binary operator rdivmod).
Equivalent to other divmod series.
Returns | |
---|---|
Type | Description |
2-Tuple of Series | The result of the operation. The result is always consistent with (rfloordiv, rmod) (though pandas may not). |
reindex
reindex(index=None, *, validate: typing.Optional[bool] = None)
Conform Series to new index with optional filling logic.
Places NA/NaN in locations having no value in the previous index. A new object
is produced unless the new index is equivalent to the current one and
copy=False
.
Parameter | |
---|---|
Name | Description |
index |
array-like, optional
New labels for the index. Preferably an Index object to avoid duplicating data. |
Returns | |
---|---|
Type | Description |
Series | Series with changed index. |
reindex_like
reindex_like(
other: bigframes.series.Series, *, validate: typing.Optional[bool] = None
)
Return an object with matching indices as other object.
Conform the object to the same index on all axes. Optional filling logic, placing Null in locations having no value in the previous index.
Parameter | |
---|---|
Name | Description |
other |
Object of the same data type
Its row and column indices are used to define the new indices of this object. |
Returns | |
---|---|
Type | Description |
Series or DataFrame | Same type as caller, but with changed indices on each axis. |
rename
rename(
index: typing.Union[typing.Hashable, typing.Mapping[typing.Any, typing.Any]] = None,
**kwargs
) -> bigframes.series.Series
Alter Series index labels or name.
Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don't throw an error.
Alternatively, change Series.name
with a scalar value.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([1, 2, 3])
>>> s
0 1
1 2
2 3
dtype: Int64
You can changes the Series name by specifying a string scalar:
>>> s.rename("my_name")
0 1
1 2
2 3
Name: my_name, dtype: Int64
You can change the labels by specifying a mapping:
>>> s.rename({1: 3, 2: 5})
0 1
3 2
5 3
dtype: Int64
Parameter | |
---|---|
Name | Description |
index |
scalar, hashable sequence, dict-like or function optional
Functions or dict-like are transformations to apply to the index. Scalar or hashable sequence-like will alter the |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Series with index labels. |
rename_axis
rename_axis(
mapper: typing.Union[typing.Hashable, typing.Sequence[typing.Hashable]], **kwargs
) -> bigframes.series.Series
Set the name of the axis for the index or columns.
Parameter | |
---|---|
Name | Description |
mapper |
scalar, list-like, optional
Value to set the axis name attribute. |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Series with the name of the axis set. |
reorder_levels
reorder_levels(
order: typing.Union[str, int, typing.Sequence[typing.Union[str, int]]],
axis: int | str = 0,
)
Rearrange index levels using input order.
May not drop or duplicate levels.
Parameters | |
---|---|
Name | Description |
order |
list of int representing new level order
Reference level by number or key. |
axis |
{0 or 'index', 1 or 'columns'}, default 0
For |
replace
replace(to_replace: typing.Any, value: typing.Any = None, *, regex: bool = False)
Replace values given in to_replace
with value
.
Values of the Series/DataFrame are replaced with other values dynamically.
This differs from updating with .loc
or .iloc
, which require
you to specify a location to update with some value.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([1, 2, 3, 4, 5])
>>> s
0 1
1 2
2 3
3 4
4 5
dtype: Int64
>>> s.replace(1, 5)
0 5
1 2
2 3
3 4
4 5
dtype: Int64
You can replace a list of values:
>>> s.replace([1, 3, 5], -1)
0 -1
1 2
2 -1
3 4
4 -1
dtype: Int64
You can use a replacement mapping:
>>> s.replace({1: 5, 3: 10})
0 5
1 2
2 10
3 4
4 5
dtype: Int64
With a string Series you can use a simple string replacement or a regex replacement:
>>> s = bpd.Series(["Hello", "Another Hello"])
>>> s.replace("Hello", "Hi")
0 Hi
1 Another Hello
dtype: string
>>> s.replace("Hello", "Hi", regex=True)
0 Hi
1 Another Hi
dtype: string
>>> s.replace("^Hello", "Hi", regex=True)
0 Hi
1 Another Hello
dtype: string
>>> s.replace("Hello$", "Hi", regex=True)
0 Hi
1 Another Hi
dtype: string
>>> s.replace("[Hh]e", "__", regex=True)
0 __llo
1 Anot__r __llo
dtype: string
Parameters | |
---|---|
Name | Description |
to_replace |
str, regex, list, int, float or None
How to find the values that will be replaced. * numeric, str or regex: - numeric: numeric values equal to |
value |
scalar, default None
Value to replace any values matching |
regex |
bool, default False
Whether to interpret |
Exceptions | |
---|---|
Type | Description |
TypeError | * If to_replace is not a scalar, array-like, dict , or None * If to_replace is a dict and value is not a list , dict , ndarray , or Series * If to_replace is None and regex is not compilable into a regular expression or is a list, dict, ndarray, or Series. * When replacing multiple bool or datetime64 objects and the arguments to to_replace does not match the type of the value being replaced |
Returns | |
---|---|
Type | Description |
Series/DataFrame | Object after replacement. |
reset_index
reset_index(
*, name: typing.Optional[str] = None, drop: bool = False
) -> bigframes.dataframe.DataFrame | bigframes.series.Series
Generate a new DataFrame or Series with the index reset.
This is useful when the index needs to be treated as a column, or when the index is meaningless and needs to be reset to the default before another operation.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([1, 2, 3, 4], name='foo',
... index=['a', 'b', 'c', 'd'])
>>> s.index.name = "idx"
>>> s
idx
a 1
b 2
c 3
d 4
Name: foo, dtype: Int64
Generate a DataFrame with default index.
>>> s.reset_index()
idx foo
0 a 1
1 b 2
2 c 3
3 d 4
<BLANKLINE>
[4 rows x 2 columns]
To specify the name of the new column use name
param.
>>> s.reset_index(name="bar")
idx bar
0 a 1
1 b 2
2 c 3
3 d 4
<BLANKLINE>
[4 rows x 2 columns]
To generate a new Series with the default index set param drop=True
.
>>> s.reset_index(drop=True)
0 1
1 2
2 3
3 4
Name: foo, dtype: Int64
Parameters | |
---|---|
Name | Description |
drop |
bool, default False
Just reset the index, without inserting it as a column in the new DataFrame. |
name |
object, optional
The name to use for the column containing the original Series values. Uses |
rfloordiv
rfloordiv(other: float | int | bigframes.series.Series) -> bigframes.series.Series
Return integer division of Series and other, element-wise (binary operator rfloordiv).
Equivalent to other // series
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
rmod
rmod(other) -> bigframes.series.Series
Return modulo of Series and other, element-wise (binary operator mod).
Equivalent to series % other
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
rmul
rmul(other: float | int | bigframes.series.Series) -> bigframes.series.Series
Return multiplication of Series and other, element-wise (binary operator mul).
Equivalent to series * others
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
Series | The result of the operation. |
rolling
rolling(window: int, min_periods=None) -> bigframes.core.window.Window
Provide rolling window calculations.
Parameters | |
---|---|
Name | Description |
window |
int, timedelta, str, offset, or BaseIndexer subclass
Size of the moving window. If an integer, the fixed number of observations used for each window. If a timedelta, str, or offset, the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetime-like indexes. To learn more about the offsets & frequency strings, please see |
min_periods |
int, default None
Minimum number of observations in window required to have a value; otherwise, result is |
Returns | |
---|---|
Type | Description |
bigframes.core.window.Window | Window subclass if a win_type is passed. Rolling subclass if win_type is not passed. |
round
round(decimals=0) -> bigframes.series.Series
Round each value in a Series to the given number of decimals.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([0.1, 1.3, 2.7])
>>> s.round()
0 0.0
1 1.0
2 3.0
dtype: Float64
>>> s = bpd.Series([0.123, 1.345, 2.789])
>>> s.round(decimals=2)
0 0.12
1 1.34
2 2.79
dtype: Float64
Parameter | |
---|---|
Name | Description |
decimals |
int, default 0
Number of decimal places to round to. If decimals is negative, it specifies the number of positions to the left of the decimal point. |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Rounded values of the Series. |
rpow
rpow(other: float | int | bigframes.series.Series) -> bigframes.series.Series
Return Exponential power of series and other, element-wise (binary operator rpow
).
Equivalent to other ** series
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
rsub
rsub(other: float | int | bigframes.series.Series) -> bigframes.series.Series
Return subtraction of Series and other, element-wise (binary operator rsub).
Equivalent to other - series
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
rtruediv
rtruediv(other: float | int | bigframes.series.Series) -> bigframes.series.Series
Return floating division of Series and other, element-wise (binary operator rtruediv).
Equivalent to other / series
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
sample
sample(
n: typing.Optional[int] = None,
frac: typing.Optional[float] = None,
*,
random_state: typing.Optional[int] = None
) -> bigframes.series.Series
Return a random sample of items from an axis of object.
You can use random_state
for reproducibility.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'num_legs': [2, 4, 8, 0],
... 'num_wings': [2, 0, 0, 0],
... 'num_specimen_seen': [10, 2, 1, 8]},
... index=['falcon', 'dog', 'spider', 'fish'])
>>> df
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
<BLANKLINE>
[4 rows x 3 columns]
Fetch one random row from the DataFrame (Note that we use random_state
to ensure reproducibility of the examples):
>>> df.sample(random_state=1)
num_legs num_wings num_specimen_seen
dog 4 0 2
<BLANKLINE>
[1 rows x 3 columns]
A random 50% sample of the DataFrame:
>>> df.sample(frac=0.5, random_state=1)
num_legs num_wings num_specimen_seen
dog 4 0 2
fish 0 0 8
<BLANKLINE>
[2 rows x 3 columns]
Extract 3 random elements from the Series df['num_legs']
:
>>> s = df['num_legs']
>>> s.sample(n=3, random_state=1)
dog 4
fish 0
spider 8
Name: num_legs, dtype: Int64
Parameters | |
---|---|
Name | Description |
n |
Optional[int], default None
Number of items from axis to return. Cannot be used with |
frac |
Optional[float], default None
Fraction of axis items to return. Cannot be used with |
random_state |
Optional[int], default None
Seed for random number generator. |
shift
shift(periods: int = 1) -> bigframes.series.Series
Shift index by desired number of periods.
Shifts the index without realigning the data.
Returns | |
---|---|
Type | Description |
NDFrame | Copy of input object, shifted. |
skew
skew()
Return unbiased skew over requested axis.
Normalized by N-1.
Returns | |
---|---|
Type | Description |
scalar | Scalar. |
sort_index
sort_index(
*, axis=0, ascending=True, na_position="last"
) -> bigframes.series.Series
Sort Series by index labels.
Returns a new Series sorted by label if inplace
argument is
False
, otherwise updates the original series and returns None.
Parameters | |
---|---|
Name | Description |
axis |
{0 or 'index'}
Unused. Parameter needed for compatibility with DataFrame. |
ascending |
bool or list-like of bools, default True
Sort ascending vs. descending. When the index is a MultiIndex the sort direction can be controlled for each level individually. |
na_position |
{'first', 'last'}, default 'last'
If 'first' puts NaNs at the beginning, 'last' puts NaNs at the end. Not implemented for MultiIndex. |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The original Series sorted by the labels or None if inplace=True . |
sort_values
sort_values(
*, axis=0, ascending=True, kind: str = "quicksort", na_position="last"
) -> bigframes.series.Series
Sort by the values.
Sort a Series in ascending or descending order by some criterion.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([np.nan, 1, 3, 10, 5])
>>> s
0 <NA>
1 1.0
2 3.0
3 10.0
4 5.0
dtype: Float64
Sort values ascending order (default behaviour):
>>> s.sort_values(ascending=True)
1 1.0
2 3.0
4 5.0
3 10.0
0 <NA>
dtype: Float64
Sort values descending order:
>>> s.sort_values(ascending=False)
3 10.0
4 5.0
2 3.0
1 1.0
0 <NA>
dtype: Float64
Sort values putting NAs first:
>>> s.sort_values(na_position='first')
0 <NA>
1 1.0
2 3.0
4 5.0
3 10.0
dtype: Float64
Sort a series of strings:
>>> s = bpd.Series(['z', 'b', 'd', 'a', 'c'])
>>> s
0 z
1 b
2 d
3 a
4 c
dtype: string
>>> s.sort_values()
3 a
1 b
4 c
2 d
0 z
dtype: string
Parameters | |
---|---|
Name | Description |
axis |
0 or 'index'
Unused. Parameter needed for compatibility with DataFrame. |
ascending |
bool or list of bools, default True
If True, sort values in ascending order, otherwise descending. |
kind |
str, default to 'quicksort'
Choice of sorting algorithm. Accepts 'quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’. Ignored except when determining whether to sort stably. 'mergesort' or 'stable' will result in stable reorder |
na_position |
{'first' or 'last'}, default 'last'
Argument 'first' puts NaNs at the beginning, 'last' puts NaNs at the end. |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Series ordered by values or None if inplace=True . |
std
std() -> float
Return sample standard deviation over requested axis.
Normalized by N-1 by default.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'person_id': [0, 1, 2, 3],
... 'age': [21, 25, 62, 43],
... 'height': [1.61, 1.87, 1.49, 2.01]}
... ).set_index('person_id')
>>> df
age height
person_id
0 21 1.61
1 25 1.87
2 62 1.49
3 43 2.01
<BLANKLINE>
[4 rows x 2 columns]
>>> df.std()
age 18.786076
height 0.237417
dtype: Float64
sub
sub(other: float | int | bigframes.series.Series) -> bigframes.series.Series
Return subtraction of Series and other, element-wise (binary operator sub).
Equivalent to series - other
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
subtract
subtract(other: float | int | bigframes.series.Series) -> bigframes.series.Series
API documentation for subtract
method.
sum
sum() -> float
Return the sum of the values over the requested axis.
This is equivalent to the method numpy.sum
.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Calculating the sum of a Series:
>>> s = bpd.Series([1, 3])
>>> s
0 1
1 3
dtype: Int64
>>> s.sum()
4
Calculating the sum of a Series containing NA
values:
>>> s = bpd.Series([1, 3, bpd.NA])
>>> s
0 1.0
1 3.0
2 <NA>
dtype: Float64
>>> s.sum()
4.0
Returns | |
---|---|
Type | Description |
scalar | Scalar. |
swaplevel
swaplevel(i: int = -2, j: int = -1)
Swap levels i and j in a MultiIndex
.
Default is to swap the two innermost levels of the index.
Parameters | |
---|---|
Name | Description |
i |
int or str
Levels of the indices to be swapped. Can pass level name as string. |
j |
int or str
Levels of the indices to be swapped. Can pass level name as string. |
Returns | |
---|---|
Type | Description |
Series | Series with levels swapped in MultiIndex |
tail
tail(n: int = 5) -> bigframes.series.Series
Return the last n
rows.
This function returns last n
rows from the object based on
position. It is useful for quickly verifying data, for example,
after sorting or appending rows.
For negative values of n
, this function returns all rows except
the first |n|
rows, equivalent to df[|n|:]
.
If n is larger than the number of rows, this function returns all rows.
Parameter | |
---|---|
Name | Description |
n |
int, default 5
Number of rows to select. |
to_csv
to_csv(path_or_buf=None, **kwargs) -> typing.Optional[str]
Write object to a comma-separated values (csv) file.
Parameter | |
---|---|
Name | Description |
path_or_buf |
str, path object, file-like object, or None, default None
String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string. If a non-binary file object is passed, it should be opened with |
Returns | |
---|---|
Type | Description |
None or str | If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None. |
to_dict
to_dict(into: type[dict] = <class 'dict'>) -> typing.Mapping
Convert Series to {label -> value} dict or dict-like object.
Parameter | |
---|---|
Name | Description |
into |
class, default dict
The collections.abc.Mapping subclass to use as the return object. Can be the actual class or an empty instance of the mapping type you want. If you want a collections.defaultdict, you must pass it initialized. |
Returns | |
---|---|
Type | Description |
collections.abc.Mapping | Key-value representation of Series. |
to_excel
to_excel(excel_writer, sheet_name="Sheet1", **kwargs) -> None
Write Series to an Excel sheet.
To write a single Series to an Excel .xlsx file it is only necessary to
specify a target file name. To write to multiple sheets it is necessary to
create an ExcelWriter
object with a target file name, and specify a sheet
in the file to write to.
Multiple sheets may be written to by specifying unique sheet_name
.
With all data written to the file it is necessary to save the changes.
Note that creating an ExcelWriter
object with a file name that already
exists will result in the contents of the existing file being erased.
Parameters | |
---|---|
Name | Description |
excel_writer |
path-like, file-like, or ExcelWriter object
File path or existing ExcelWriter. |
sheet_name |
str, default 'Sheet1'
Name of sheet to contain Series. |
to_frame
to_frame(name: typing.Hashable = None) -> bigframes.dataframe.DataFrame
Convert Series to DataFrame.
The column in the new dataframe will be named name (the keyword parameter) if the name parameter is provided and not None.
Returns | |
---|---|
Type | Description |
bigframes.dataframe.DataFrame | DataFrame representation of Series. |
to_json
to_json(
path_or_buf=None,
orient: typing.Literal[
"split", "records", "index", "columns", "values", "table"
] = "columns",
**kwargs
) -> typing.Optional[str]
Convert the object to a JSON string.
Note NaN's and None will be converted to null and datetime objects will be converted to UNIX timestamps.
Parameters | |
---|---|
Name | Description |
path_or_buf |
str, path object, file-like object, or None, default None
String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string. |
orient |
{"split", "records", "index", "columns", "values", "table"}, default "columns"
Indication of expected JSON string format. 'split' : dict like {{'index' -> [index], 'columns' -> [columns],'data' -> [values]}} 'records' : list like [{{column -> value}}, ... , {{column -> value}}] 'index' : dict like {{index -> {{column -> value}}}} 'columns' : dict like {{column -> {{index -> value}}}} 'values' : just the values array 'table' : dict like {{'schema': {{schema}}, 'data': {{data}}}} Describing the data, where data component is like |
Returns | |
---|---|
Type | Description |
None or str | If path_or_buf is None, returns the resulting json format as a string. Otherwise returns None. |
to_latex
to_latex(
buf=None, columns=None, header=True, index=True, **kwargs
) -> typing.Optional[str]
Render object to a LaTeX tabular, longtable, or nested table.
Parameters | |
---|---|
Name | Description |
buf |
str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string. |
columns |
list of label, optional
The subset of columns to write. Writes all columns by default. |
header |
bool or list of str, default True
Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names. |
index |
bool, default True
Write row names (index). |
Returns | |
---|---|
Type | Description |
str or None | If buf is None, returns the result as a string. Otherwise returns None. |
to_list
to_list() -> list
Return a list of the values.
These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([1, 2, 3])
>>> s
0 1
1 2
2 3
dtype: Int64
>>> s.to_list()
[1, 2, 3]
Returns | |
---|---|
Type | Description |
list | list of the values |
to_markdown
to_markdown(
buf: typing.Optional[typing.IO[str]] = None,
mode: str = "wt",
index: bool = True,
**kwargs
) -> typing.Optional[str]
Print {klass} in Markdown-friendly format.
Parameters | |
---|---|
Name | Description |
buf |
str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string. |
mode |
str, optional
Mode in which file is opened, "wt" by default. |
index |
bool, optional, default True
Add index (row) labels. |
Returns | |
---|---|
Type | Description |
str | {klass} in Markdown-friendly format. |
to_numpy
to_numpy(dtype=None, copy=False, na_value=None, **kwargs) -> numpy.ndarray
A NumPy ndarray representing the values in this Series or Index.
Parameters | |
---|---|
Name | Description |
dtype |
str or numpy.dtype, optional
The dtype to pass to |
copy |
bool, default False
Whether to ensure that the returned value is not a view on another array. Note that |
na_value |
Any, optional
The value to use for missing values. The default value depends on |
Returns | |
---|---|
Type | Description |
numpy.ndarray | A NumPy ndarray representing the values in this Series or Index. |
to_pandas
to_pandas(
max_download_size: typing.Optional[int] = None,
sampling_method: typing.Optional[str] = None,
random_state: typing.Optional[int] = None,
*,
ordered: bool = True
) -> pandas.core.series.Series
Writes Series to pandas Series.
Parameters | |
---|---|
Name | Description |
max_download_size |
int, default None
Download size threshold in MB. If max_download_size is exceeded when downloading data (e.g., to_pandas()), the data will be downsampled if bigframes.options.sampling.enable_downsampling is True, otherwise, an error will be raised. If set to a value other than None, this will supersede the global config. |
sampling_method |
str, default None
Downsampling algorithms to be chosen from, the choices are: "head": This algorithm returns a portion of the data from the beginning. It is fast and requires minimal computations to perform the downsampling; "uniform": This algorithm returns uniform random samples of the data. If set to a value other than None, this will supersede the global config. |
random_state |
int, default None
The seed for the uniform downsampling algorithm. If provided, the uniform method may take longer to execute and require more computation. If set to a value other than None, this will supersede the global config. |
ordered |
bool, default True
Determines whether the resulting pandas series will be deterministically ordered. In some cases, unordered may result in a faster-executing query. |
Returns | |
---|---|
Type | Description |
pandas.Series | A pandas Series with all rows of this Series if the data_sampling_threshold_mb is not exceeded; otherwise, a pandas Series with downsampled rows of the DataFrame. |
to_pickle
to_pickle(path, **kwargs) -> None
Pickle (serialize) object to file.
Parameter | |
---|---|
Name | Description |
path |
str, path object, or file-like object
String, path object (implementing |
to_string
to_string(
buf=None,
na_rep="NaN",
float_format=None,
header=True,
index=True,
length=False,
dtype=False,
name=False,
max_rows=None,
min_rows=None,
) -> typing.Optional[str]
Render a string representation of the Series.
Parameters | |
---|---|
Name | Description |
buf |
StringIO-like, optional
Buffer to write to. |
na_rep |
str, optional
String representation of NaN to use, default 'NaN'. |
float_format |
one-parameter function, optional
Formatter function to apply to columns' elements if they are floats, default None. |
header |
bool, default True
Add the Series header (index name). |
index |
bool, optional
Add index (row) labels, default True. |
length |
bool, default False
Add the Series length. |
dtype |
bool, default False
Add the Series dtype. |
name |
bool, default False
Add the Series name if not None. |
max_rows |
int, optional
Maximum number of rows to show before truncating. If None, show all. |
min_rows |
int, optional
The number of rows to display in a truncated repr (when number of rows is above |
Returns | |
---|---|
Type | Description |
str or None | String representation of Series if buf=None , otherwise None. |
to_xarray
to_xarray()
Return an xarray object from the pandas object.
Returns | |
---|---|
Type | Description |
xarray.DataArray or xarray.Dataset | Data in the pandas structure converted to Dataset if the object is a DataFrame, or a DataArray if the object is a Series. |
tolist
tolist() -> list
Return a list of the values.
These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period).
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([1, 2, 3])
>>> s
0 1
1 2
2 3
dtype: Int64
>>> s.to_list()
[1, 2, 3]
Returns | |
---|---|
Type | Description |
list | list of the values |
transpose
transpose() -> bigframes.series.Series
Return the transpose, which is by definition self.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(['Ant', 'Bear', 'Cow'])
>>> s
0 Ant
1 Bear
2 Cow
dtype: string
>>> s.transpose()
0 Ant
1 Bear
2 Cow
dtype: string
Returns | |
---|---|
Type | Description |
Series | Series. |
truediv
truediv(other: float | int | bigframes.series.Series) -> bigframes.series.Series
Return floating division of Series and other, element-wise (binary operator truediv).
Equivalent to series / other
, but with support to substitute a fill_value for
missing data in either one of the inputs.
Returns | |
---|---|
Type | Description |
bigframes.series.Series | The result of the operation. |
unique
unique() -> bigframes.series.Series
Return unique values of Series object.
Uniques are returned in order of appearance. Hash table-based unique, therefore does NOT sort.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([2, 1, 3, 3], name='A')
>>> s
0 2
1 1
2 3
3 3
Name: A, dtype: Int64
>>> s.unique()
0 2
1 1
2 3
Name: A, dtype: Int64
Returns | |
---|---|
Type | Description |
Series | The unique values returned as a Series. |
unstack
unstack(
level: typing.Union[str, int, typing.Sequence[typing.Union[str, int]]] = -1
)
Unstack, also known as pivot, Series with MultiIndex to produce DataFrame.
Parameter | |
---|---|
Name | Description |
level |
int, str, or list of these, default last level
Level(s) to unstack, can pass level name. |
Returns | |
---|---|
Type | Description |
DataFrame | Unstacked Series. |
value_counts
value_counts(
normalize: bool = False,
sort: bool = True,
ascending: bool = False,
*,
dropna: bool = True
)
Return a Series containing counts of unique values.
The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([3, 1, 2, 3, 4, bpd.NA], dtype="Int64")
>>> s
0 3
1 1
2 2
3 3
4 4
5 <NA>
dtype: Int64
value_counts
sorts the result by counts in a descending order by default:
>>> s.value_counts()
3 2
1 1
2 1
4 1
Name: count, dtype: Int64
You can normalize the counts to return relative frequencies by setting normalize=True
:
>>> s.value_counts(normalize=True)
3 0.4
1 0.2
2 0.2
4 0.2
Name: proportion, dtype: Float64
You can get the values in the ascending order of the counts by setting ascending=True
:
>>> s.value_counts(ascending=True)
1 1
2 1
4 1
3 2
Name: count, dtype: Int64
You can include the counts of the NA
values by setting dropna=False
:
>>> s.value_counts(dropna=False)
3 2
1 1
2 1
4 1
<NA> 1
Name: count, dtype: Int64
Parameters | |
---|---|
Name | Description |
normalize |
bool, default False
If True then the object returned will contain the relative frequencies of the unique values. |
sort |
bool, default True
Sort by frequencies. |
ascending |
bool, default False
Sort in ascending order. |
dropna |
bool, default True
Don't include counts of NaN. |
Returns | |
---|---|
Type | Description |
Series | Series containing counts of unique values. |
var
var() -> float
Return unbiased variance over requested axis.
Normalized by N-1 by default.
Returns | |
---|---|
Type | Description |
scalar or Series (if level specified) | Variance. |
where
where(cond, other=None)
Replace values where the condition is False.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series([10, 11, 12, 13, 14])
>>> s
0 10
1 11
2 12
3 13
4 14
dtype: Int64
You can filter the values in the Series based on a condition. The values
matching the condition would be kept, and not matching would be replaced.
The default replacement value is NA
.
>>> s.where(s % 2 == 0)
0 10
1 <NA>
2 12
3 <NA>
4 14
dtype: Int64
You can specify a custom replacement value for non-matching values.
>>> s.where(s % 2 == 0, -1)
0 10
1 -1
2 12
3 -1
4 14
dtype: Int64
>>> s.where(s % 2 == 0, 100*s)
0 10
1 1100
2 12
3 1300
4 14
dtype: Int64
Parameters | |
---|---|
Name | Description |
cond |
bool Series/DataFrame, array-like, or callable
Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and returns boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it). |
other |
scalar, Series/DataFrame, or callable
Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and returns scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it). If not specified, entries will be filled with the corresponding NULL value (np.nan for numpy dtypes, pd.NA for extension dtypes). |
Returns | |
---|---|
Type | Description |
bigframes.series.Series | Series after the replacement. |