Changelog

PyPI History

1.11.0 (2024-07-01)

Features

  • Add .agg support for size (#792) (87e6018)

  • Add bigframes.bigquery.json_set (#782) (1b613e0)

  • Add bigframes.streaming.to_pubsub method to create continuous query that writes to Pub/Sub (#801) (b47f32d)

  • Add DataFrame.to_arrow to create Arrow Table from DataFrame (#807) (1e3feda)

  • Add PolynomialFeatures support to to_gbq and pipelines (#805) (57d98b9)

  • Add Series.peek to preview data efficiently (#727) (580e1b9)

  • Expose gcf memory param in remote_function (#803) (014765c)

  • More informative error when query plan too complex (#811) (136dc24)

Bug Fixes

  • Include internally required packages in remote_function hash (#799) (4b8fc15)

Documentation

  • Document dtype limitation on row processing remote_function (#800) (487dff6)

1.10.0 (2024-06-21)

Features

  • Add dataframe.insert (#770) (e8bab68)

  • Add groupby head API (#791) (44202bc)

  • Add ml.preprocessing.PolynomialFeatures class (#793) (b4fbb51)

  • Bigframes.streaming module for continuous queries (#703) (0433a1c)

  • Include index columns in DataFrame.sql if they are named (#788) (c8d16c0)

Bug Fixes

  • Allow __repr__ to work with uninitialed DataFrame/Series/Index (#778) (e14c7a9)

  • Df.loc with the 2nd input as bigframes boolean Series (#789) (a4ac82e)

  • Ensure numpy version matches in remote_function deployment (#798) (324d93c)

  • Fix temp table creation retries by now throwing if table already exists. (#787) (0e57d1f)

  • Self-join optimization doesn’t needlessly invalidate caching (#797) (1b96b80)

1.9.0 (2024-06-10)

Features

  • Allow functions returned from bpd.read_gbq_function to execute outside of apply (#706) (ad7d8ac)

  • Support bigquery.vector_search() (#736) (dad66fd)

  • Support score() in GeminiTextGenerator (#740) (b2c7d8b)

  • Support bytes type in remote_function (#761) (4915424)

  • Support fit() in GeminiTextGenerator (#758) (d751f5c)

Bug Fixes

  • ARIMAPlus loads auto_arima_min_order param (#752) (39d7013)

  • Improve to_pandas_batches for large results (#746) (61f18cb)

  • Resolve issue with unset thread-local options (#741) (d93dbaf)

Documentation

  • Fix ML.EVALUATE spelling (#749) (7899749)

  • Remove LogisticRegression normal_equation strategy (#753) (ea5d367)

1.8.0 (2024-05-31)

Features

  • merge only generates a default index if both inputs already have an index (#733) (25d049c)

  • Add +, - as unary ops, ^ binary op (#724) (968d825)

  • Add GroupBy.size() to get number of rows in each group (#479) (1fca588)

  • Add DataFrame ~ operator (#721) (354abc1)

  • Add GeminiText 1.5 Preview models (#737) (56cbd3b)

  • Add slot_millis and add stats to session object (#725) (72e9583)

  • Adds bigframes.bigquery.array_to_string to convert array elements to delimited strings (#731) (f12c906)

  • Allow functions decorated with bpd.remote_function() to execute locally (#704) (d850da6)

  • Ensure "bigframes-api" label is always set on jobs, even if the API is unknown (#722) (1832778)

  • Support ml.SimpleImputer in bigframes (#708) (4c4415f)

  • Support type annotations to supply input and output types to bpd.remote_function() decorator (#717) (4a12e3c)

  • Support type annotations with bpd.remote_function() and axis=1 (a preview feature) (#730) (e5a2992)

Bug Fixes

  • Correct index labels in multiple aggregations for DataFrameGroupBy (#723) (6a78c89)

  • Fix Null index assign series to column (#711) (ffb4b57)

  • Set bpd.remote_function()s input_types and output_types default to None to allow omitting them when type annotations are present (#729) (0e25a3b)

  • Warn and disable time travel for linked datasets (#712) (085fa9d)

Performance Improvements

  • Optimize dataframe-series alignment on axis=1 (#732) (3d39221)

Documentation

  • Add examples to DataFrameGroupBy and SeriesGroupBy (#701) (e7da0f0)

1.7.0 (2024-05-20)

Features

  • read_gbq_query supports filters (9386373)

  • read_gbq suggests a correct column name when one is not found (9386373)

  • Add DefaultIndexKind.NULL to use as index_col in read_gbq\*, creating an indexless DataFrame/Series (#662) (29e4886)

  • Bigframes.bigquery.array_agg(SeriesGroupBy|DataFrameGroupby) (#663) (412f28b)

  • To_datetime supports utc=False for string inputs (#579) (adf9889)

Bug Fixes

  • read_gbq_table respects primary keys even when filters are set (#689) (9386373)

  • Fix type error in test_cluster (#698) (14d81c1)

  • Improve escaping of literals and identifiers (#682) (da9b136)

  • Properly identify non-unique index in tables without primary keys (#699) (6e0f4d8)

  • Remove a usage of the resource package when not available, such as on Windows (#681) (96243f2)

  • The imported samples error and use peek() (#688) (1a0b744)

Performance Improvements

  • Don’t run query immediately from read_gbq_table if filters is set (9386373)

  • Use a LIMIT clause when max_results is set (9386373)

Documentation

  • Add code snippets for imported onnx tutorials (#684) (cb36e46)

  • Add code snippets for imported tensorflow model (#679) (b02c401)

  • Use class_weight="balanced" in the logistic regression prediction tutorial (#678) (b951549)

1.6.0 (2024-05-13)

Features

  • Add DataFrame.__delitem__ (#673) (2218c21)

  • Add Series.case_when() (#673) (2218c21)

  • Add strategy="quantile" in KBinsDiscretizer (#654) (c6c487f)

  • Add Series.combine (#680) (2fd1b81)

  • Series.str.split (#675) (6eb19a7)

  • Suggest correct options in bpd.options.bigquery.location (#666) (57ccabc)

  • Support axis=1 in df.apply for scalar outputs (#629) (f6bdc4a)

  • Support gcf vpc connector in remote_function (#677) (9ca92d0)

  • Warn with a more specific DefaultLocationWarning category when no location can be detected (#648) (e084e54)

Bug Fixes

  • Include index_col when selecting columns and filters in read_gbq_table (#648) (e084e54)

Dependencies

  • Add jellyfish as a dependency for spelling correction (57ccabc)

Documentation

  • Add code snippets for llm text generatiion (#669) (93416ed)

  • Add logistic regression samples (#673) (2218c21)

  • Address lint errors in code samples (#665) (4fc8964)

  • Document inlining of small data in read_\* APIs (#670) (306953a)

1.5.0 (2024-05-07)

Features

  • bigframes.options and bigframes.option_context now uses thread-local variables to prevent context managers in separate threads from affecting each other (#652) (651fd7d)

  • Add ARIMAPlus.coef_ property exposing ML.ARIMA_COEFFICIENTS functionality (#585) (81d1262)

  • Add a unique session_id to Session and allow cleaning up sessions (#553) (c8d4e23)

  • Add the bigframes.bigquery sub-package with a bigframes.bigquery.array_length function (#630) (9963f85)

  • Always do a query dry run when option.repr_mode == "deferred" (#652) (651fd7d)

  • Custom query labels for compute options (#638) (f561799)

  • Warn with DefaultIndexWarning from read_gbq on clustered/partitioned tables with no index_col or filters set (#631, #658) (2715d2b, 73064dd)

  • Support index_col=False in read_csv and engine="bigquery" (73064dd)

  • Support gcf max instance count in remote_function (#657) (36578ab)

Bug Fixes

  • Don’t raise UnknownLocationWarning for US or EU multi-regions (#653) (8e4616b)

  • Fix bug with na in the column labels in stack (#659) (4a34293)

  • Use explicit session in PaLM2TextGenerator (#651) (e4f13c3)

Documentation

  • Add python code sample for multiple forecasting time series (#531) (16866d2)

  • Fix the Palm2TextGenerator output token size (#649) (c67e501)

1.4.0 (2024-04-29)

Features

  • Add .cache() method to persist intermediate dataframe (#626) (a5c94ec)

  • Add transpose support for small homogeneously typed DataFrames. (#621) (054075d)

  • Allow single input type in remote_function (#641) (3aa643f)

  • Expose gcf max timeout in remote_function (#639) (dfeaad0)

  • Series binary ops compatible with more types (#618) (518d315)

  • Support the score method for PaLM2TextGenerator (#634) (3ffc1d2)

Bug Fixes

Performance Improvements

  • Automatically condense internal expression representation (#516) (03c1b0d)

  • Cache transpose to allow performant retranspose (#635) (44b738d)

Documentation

  • Add supported pandas apis on the main page (#628) (8d2a51c)

  • Add the first sample for the Single time-series forecasting from Google Analytics data tutorial (#623) (2b84c4f)

  • Address more technical writers’ feedback (#640) (1e7793c)

1.3.0 (2024-04-22)

Features

  • Add Series.struct.dtypes property (#599) (d924ec2)

  • Add fine tuning fit() for Palm2TextGenerator (#616) (9c106bd)

  • Add quantile statistic (#613) (bc82804)

  • Expose max_batching_rows in remote_function (#622) (240a1ac)

  • Support primary key(s) in read_gbq by using as the index_col by default (#625) (75bb240)

  • Warn if location is set to unknown location (#609) (3706b4f)

Bug Fixes

  • Address technical writers fb (#611) (9f8f181)

  • Infer narrowest numeric type when combining numeric columns (#602) (8f9ece6)

  • Use exact median implementation by default (#619) (9d205ae)

Documentation

  • Fix rendering of examples for multiple apis (#620) (9665e39)

  • Set index_cols in read_gbq as a best practice (#624) (70015b7)

1.2.0 (2024-04-15)

Features

Bug Fixes

  • Address more technical writers feedback (#581) (4b08d92)

  • Error for object dtype on read_pandas (#570) (8702dcf)

  • Inverting int now does bitwise inversion rather than sign flip (#574) (5f1db8b)

  • Loc setitem dtype issue. (#603) (b94bae9)

  • Toc menu missing plotting name (#591) (eed12c1)

Documentation

1.1.0 (2024-04-04)

Features

  • (Series|DataFrame).explode (#556) (9e32f57)

  • Add DataFrame.eval and DataFrame.query (#361) (5e28ebd)

  • Add ColumnTransformer save/load (#541) (9d8cf67)

  • Add ml.metrics.mean_squared_error (#559) (853c25e)

  • Add support for numpy expm1, log1p, floor, ceil, arctan2 ops (#505) (e8e66cf)

  • Add transformers save/load (#552) (d805241)

  • Allow DataFrame binary ops to align on either axis and with loc… (#544) (6d8f3af)

  • Expose DataFrame.bqclient to assist in integrations (#519) (0be8911)

  • Read_pandas accepts pandas Series and Index objects (#573) (f8821fe)

  • Support ML.GENERATE_EMBEDDING in PaLM2TextEmbeddingGenerator (#539) (1156c1e)

  • Support max_columns in repr and make repr more efficient (#515) (54e49cf)

Bug Fixes

  • Assign NaN scalar to column error. (#513) (0a4153c)

  • Don’t download 100gb onto local python machine in load test (#537) (082c58b)

  • Exclude list-like s parameter in plot.scatter (#568) (1caac27)

  • Fix case where df.peek would fail to execute even with force=True (#511) (8eca99a)

  • Fix error in Series.drop(0) (#575) (75dd786)

  • Include all names in MultiIndex repr (#564) (b188146)

  • Plot.scatter s parameter cannot accept float-like column (#563) (8d39187)

  • Product operation produces float result for all input types (#501) (6873b30)

  • Reloaded transformer .transform error (#569) (39fe474)

  • Rename PaLM2TextEmbeddingGenerator.predict output columns to be backward compatible (#561) (4995c00)

  • Respect hard stack size limit and swallow limit change exception. (#558) (4833908)

  • Restore string to date/time type coercion (#565) (4ae0262)

  • Sync the notebook with embedding changes (#550) (347f2dd)

  • Use bytes limit on frame inlining rather than element count (#576) (659a161)

Performance Improvements

  • Add multi-query execution capability for complex dataframes (#427) (d2d7e33)

Dependencies

Documentation

  • bigframes.options.bigquery.project and location are optional in some circumstances (#548) (90bcec5)

  • Add “Supported pandas APIs” reference to the documentation (#542) (74c3915)

  • Add General Availability banner to README (#507) (262ff59)

  • Add opeartions in API docs (#557) (ea95761)

  • Add progress_bar code sample (#508) (92a1af3)

  • Add the code samples for metrics{auc, roc_auc_score, roc_curve} (#520) (5f37b09)

  • Address more comments from technical writers to meet legal purposes (#571) (9084df3)

  • Fix docs of ARIMAPlus.predict (#512) (3b80f95)

  • Include Index in table-of-contents (#564) (b188146)

  • Mark Gemini model as Pre-GA (#543) (769868b)

  • Migrate the overview page to Bigframes official landing page (#536) (a0fb8bb)

1.0.0 (2024-03-25)

⚠ BREAKING CHANGES

  • rename model parameter min_rel_progress to tol

  • early_stop setting no longer supported, always uses True

  • rename model parameter n_parallell_trees to n_estimators

  • rename class_weights to class_weight

  • rename learn_rate to learning_rate

  • PCA n_components supports float value and None, default to None

  • rename various ml model parameters for consistency with sklearn (https://github.com/googleapis/python-bigquery-dataframes/pull/491)

Features

Bug Fixes

  • early_stop setting no longer supported, always uses True (65c6f47)

  • Fix -1 offset lookups failing (#463) (2dfb9c2)

  • Plot.scatter c argument functionalities (#494) (d6ee994)

  • Properly support format param for numerical input. (#486) (ae20c35)

  • Renable to_csv and to_json related tests (#468) (2b9a01d)

  • Sampling plot cannot preserve ordering if index is not ordered (#475) (a5345fe)

  • Use actual BigQuery types rather than ibis types in to_pandas (#500) (82b4f91)

Dependencies

Documentation

  • Add code samples for metrics.{accuracy_score, confusion_matrix} (#478) (3e3329a)

  • Add code samples for metrics.{recall_score, precision_score, f11_score} (#502) (370fe90)

  • Improve API documentation (#489) (751266e)

  • Update bigquery connection documentation (#499) (4bfe094)

  • Update LLM + K-means notebook to handle partial failures (#496) (97afad9)

0.26.0 (2024-03-20)

⚠ BREAKING CHANGES

  • exclude remote models for .register() (#465)

Features

  • (Series|DataFrame).plot (#438) (1c3e668)

  • read_gbq_table supports LIKE as a operator in filters (#454) (d2d425a)

  • Add DataFrame.pipe() method (#421) (95f5a6e)

  • Set force=True by default in DataFrame.peek() (#469) (4e8e97d)

  • Support datetime related casting in (Series|DataFrame|Index).astype (#442) (fde339b)

  • Support Series.dt.strftime (#453) (8f6e955)

Bug Fixes

  • Any() on empty set now correctly returns False (#471) (f55680c)

  • Df.drop_na preserves columns dtype (#457) (3bab1a9)

  • Disable to_json and to_csv related tests (#462) (874026d)

  • Exclude remote models for .register() (#465) (73fe0f8)

  • Fix broken link in covid notebook (#450) (adadb06)

  • Fix broken multiindex loc cases (#467) (b519197)

  • Fix grouping series on multiple other series (#455) (3971bd2)

  • Groupby aggregates no longer check if grouping keys are numeric (#472) (4fbf938)

  • Raise ValueError when read_pandas() receives a bigframes DataFrame (#447) (b28f9fd)

  • Series.(to_csv|to_json) leverages bq export (#452) (718a00c)

  • Warn when read_gbq / read_gbq_table uses the snapshot time cache (#441) (e16a8c0)

Documentation

  • Add code samples for ml.metrics.r2_score (#459) (85fefa2)

  • Add the docs for loc and iloc indexers (#446) (14ab8d8)

  • Add the pages for at and iat indexers (#456) (340f0b5)

  • Add version information to bug template (#437) (91bd39e)

  • Indicate that project and location are optional in example notebooks (#451) (1df0140)

0.25.0 (2024-03-14)

Features

  • (Series|DataFrame).plot.(line|area|scatter) (#431) (0772510)

  • Support CMEK for remote_function cloud functions (#430) (2fd69f4)

0.24.0 (2024-03-12)

⚠ BREAKING CHANGES

  • read_parquet uses a “pandas” engine to parse files by default. Use engine="bigquery" for the previous behavior

Features

Bug Fixes

  • Move third_party.bigframes_vendored to bigframes_vendored (#424) (763edeb)

  • Only do row identity based joins when joining by index (#356) (76b252f)

  • Read_pandas inline respects location (#412) (ae0e3ea)

Documentation

  • Add predict sample to samples/snippets/bqml_getting_started_test.py (#388) (6a3b0cc)

  • Document minimum IAM requirement (#416) (36173b0)

  • Fix the note rendering for DataFrames methods: nlargest, nsmallest (#417) (38bd2ba)

0.23.0 (2024-03-05)

Features

  • Add ml.metrics.pairwise.euclidean_distance (#397) (1726588)

  • Add TextEmbedding model version support (#394) (e0f1ab0)

Bug Fixes

  • Code exception in remote_function now prevents retry and surfaces in the client (#387) (dd3643d)

  • Docs link for metrics.pairwise (#400) (a60aba7)

Dependencies

  • Update ibis to version 8.0.0 and refactor remote_function to use ibis UDF method (#277) (350499b)

Documentation

  • Update README to point to new summary pages (#402) (bfe2b23)

0.22.0 (2024-02-27)

⚠ BREAKING CHANGES

  • rename cosine_similarity to paired_cosine_distances (#393)

  • move model optional args to kwargs (#381)

Features

  • Add DataFrames.corr() method (#379) (67fd434)

  • Add ml.metrics.pairwise.manhattan_distance (#392) (9d31865)

  • Enable regional endpoints for me-central2 (#386) (469674d)

Bug Fixes

  • Avoid ibis warning for “database” table() method argument (#390) (a0490a4)

  • Correct the numeric literal dtype (#365) (93b02cd)

  • Rename cosine_similarity to paired_cosine_distances (#393) (81ece46)

Performance Improvements

Dependencies

  • Add minimum version constraint for sqlglot to 19.9.0 (#389) (8b62d77)

Documentation

  • Add a code sample for creating a kmeans model (#267) (4291d65)

  • Fix bigframes.pandas.concat documentation (#382) (234b61c)

Miscellaneous Chores

Code Refactoring

0.21.0 (2024-02-13)

Features

  • Add Series.cov method (#368) (443db22)

  • Add ml.llm.GeminiTextGenerator model (#370) (de1e0a4)

  • Add ml.metrics.pairwise.cosine_similarity function (#374) (126f566)

  • Add XGBoostModel (#363) (d5518b2)

  • Limited support of lambdas in Series.apply (#345) (208e081)

  • Support bigframes.pandas.to_datetime for scalars, iterables and series. (#372) (ffb0d15)

  • Support read_gbq wildcard table path (#377) (90caf86)

Bug Fixes

Documentation

  • Clarify ADC pre-auth in a non-interactive environment (#348) (99a9e6e)

0.20.1 (2024-02-06)

Performance Improvements

  • Make repr cache the block where appropriate (#350) (068879f)

Documentation

  • Add a sample to demonstrate the evaluation results (#364) (cff0919)

  • Fix the DataFrame.apply code sample (#366) (1866a26)

0.20.0 (2024-01-30)

Features

  • Add DataFrame.peek() as an efficient alternative to head() results preview (#318) (9c34d83)

  • Add ARIMA_EVAULATE options in forecasting models (#336) (73e997b)

  • Add Index constructor, repr, copy, get_level_values, to_series (#334) (e5d054e)

  • Improve error message for drive based BQ table reads (#344) (0794788)

  • Update cut to work without labels = False and show intervals as dict (#335) (4ff53db)

Bug Fixes

  • Chance default connection name in getting_started.ipnyb (#347) (677f014)

  • Series iteration correctly returns values instead of index (#339) (2c6af9b)

Documentation

  • Add code samples for Series.{between, cumprod} (#353) (09a52fd)

0.19.2 (2024-01-22)

Bug Fixes

  • Read_gbq large response issue (#332) (b8178b9)

  • Use object dtype for ARRAY columns in to_pandas() with pandas 1.x (#329) (374ddb5)

Documentation

0.19.1 (2024-01-17)

Bug Fixes

  • Handle multi-level columns for df aggregates properly (#305) (5bb45ba)

  • Update max_output_token limitation. (#308) (5cccd36)

Documentation

0.19.0 (2024-01-09)

Features

  • Add ‘columns’ as an alias for ‘col_order’ (#298) (a01b271)

  • Add Series dt.tz and dt.unit properties (#303) (2e1a403)

  • Add to_gbq() method for LLM models (#299) (dafbc1b)

  • Allow manually set clustering_columns in dataframe.to_gbq (#302) (9c21323)

  • Support assigning to columns like a property (#304) (f645c56)

  • Support upcasting numeric columns in concat (#294) (e3a056a)

Bug Fixes

  • DF.drop tuple input as multi-index (#301) (21391a9)

  • Fix bug converting non-string labels to sql ids (#296) (a61c5fe)

Documentation

  • Add code samples for Series.ffill and DataFrame.ffill (#307) (1c63b45)

0.18.0 (2024-01-02)

Features

  • Add dataframe.to_html (#259) (2cd6489)

  • Add IntervalIndex support to bigframes.pandas.cut (#254) (6c1969a)

  • Add replace method to DataFrame (#261) (5092215)

  • Specific pyarrow mappings for decimal, bytes types (#283) (a1c0631)

Bug Fixes

  • Dataframes to_gbq now creates dataset if it doesn’t exist (#222) (bac62f7)

  • Exclude pandas 2.2.0rc0 to unblock prerelease tests (#292) (ac1a745)

  • Fix DataFrameGroupby.agg() issue with as_index=False (#273) (ab49350)

  • Make Series.str.replace work for simple strings (#285) (ad67465)

  • Update dataframe.to_gbq to dedup column names. (#286) (746115d)

  • Use setuptools.find_namespace_packages (#246) (9ec352a)

Dependencies

  • Migrate to ibis-framework >= "7.1.0" (#53) (9798a2b)

Documentation

  • Add code snippets for explore query result page (#278) (7cbbb7d)

  • Code samples for astype common to DataFrame and Series (#280) (95b673a)

  • Code samples for DataFrame.copy and Series.copy (#290) (7cbc2b0)

  • Code samples for drop and fillna (#284) (9c5012e)

  • Code samples for isna, isnull, dropna, isin (#289) (ad51035)

  • Code samples for rename , size (#293) (eb69f60)

  • Code samples for reset_index and sort_values (#282) (acc0eb7)

  • Code samples for sample, get, Series.round (#295) (c2b1892)

  • Code samples for Series.{add, replace, unique, T, transpose} (#287) (0e1bbfc)

  • Code samples for Series.{map, to_list, count} (#290) (7cbc2b0)

  • Code samples for Series.{name, std, agg} (#293) (eb69f60)

  • Code samples for Series.groupby and Series.{sum,mean,min,max} (#280) (95b673a)

  • Code samples for DataFrame set_index, items (#295) (c2b1892)

  • Fix the rendering for get_dummies (#291) (252f3a2)

0.17.0 (2023-12-14)

Features

  • Add filters argument to read_gbq for enhanced data querying (#198) (034f71f)

  • Add module/class level api tracking (#272) (4f3db3d)

  • Deprecate use_regional_endpoints (#199) (319a1f2)

Bug Fixes

  • Increase recursion limit, cache compilation tree hashes (#184) (b54791c)

  • Replaced raise NotImplementedError with return NotImplemented (#258) (a133822)

Documentation

  • Add code samples for values and value_counts (#249) (f247d95)

  • Add sample for getting started with BQML (#141) (fb14f54)

0.16.0 (2023-12-12)

Features

  • Add ARIMAPlus.predict parameters (#264) (99598c7)

  • Add DataFrame from_dict and from_records methods (#244) (8d81e24)

  • Add DataFrame.select_dtypes method (#242) (1737acc)

  • Add nunique method to Series/DataFrameGroupby (#256) (c8ec245)

  • Support dataframe.loc with conditional columns selection (#233) (3febea9)

Bug Fixes

  • Enfore pandas version requirement <2.1.4 (#265) (9dd63f6)

  • Exclude pandas 2.1.4 from prerelease tests to unblock e2e tests (b02fc2c)

  • Fix value_counts column label for normalize=True (#245) (d3fa6f2)

  • Migrate e2e tests to bigframes-load-testing project (8766ac6)

  • Ml.sql logic (#262) (68c6fdf)

  • Update the llm_kmeans notebook (#247) (66d1839)

Documentation

  • Add code samples for shape and head (#257) (5bdcc65)

  • Add example for dataframe.melt, dataframe.pivot, dataframe.stac… (#252) (8c63697)

  • Add example to dataframe.nlargest, dataframe.nsmallest, datafra… (#234) (e735412)

  • Add examples for dataframe.cummin, dataframe.cummax, dataframe.cumsum, dataframe.cumprod (#243) (0523a31)

  • Add examples for dataframe.nunique, dataframe.diff, dataframe.a… (#251) (77074ec)

  • Correct the docs for option_context (#263) (d21c6dd)

  • Correct the params rendering for ml.remote and ml.ensemble modules (#248) (c2829e3)

  • Fix return annotation in API docstrings (#253) (89a1c67)

0.15.0 (2023-11-29)

⚠ BREAKING CHANGES

  • model.predict returns all the columns (#204)

Features

  • Add info and memory_usage methods to dataframe (#219) (9d6613d)

  • Add remote vertex model support (#237) (0bfc4fb)

  • Add the recent api method for ML component (#225) (ed8876d)

  • Model.predict returns all the columns (#204) (416171a)

  • Send warnings on LLM prediction partial failures (#216) (81125f9)

Bug Fixes

  • Add df snapshots lookup for read_gbq (#229) (d0d9b84)

  • Avoid unnecessary row_number() on sort key for io (#211) (a18d40e)

  • Dedup special character (#209) (dd78acb)

  • Invalid JSON type of the notebook (#215) (a729831)

  • Make to_pandas override enable_downsampling when sampling_method is manually set. (#200) (ae03756)

  • Polish the llm+kmeans notebook (#208) (e8532b1)

  • Update the llm+kmeans notebook with recent change (#236) (f8917ab)

  • Use anonymous dataset to create remote_function (#205) (69b016e)

Documentation

  • Add code samples for index and column properties (#212) (c88d38e)

  • Add code samples for df reshaping, function, merge, and join methods (#203) (010486c)

  • Add examples for dataframe.kurt, dataframe.std, dataframe.count (#232) (f9c6e72)

  • Add examples for dataframe.mean, dataframe.median, dataframe.va… (#228) (edd0522)

  • Add examples for dataframe.min, dataframe.max and dataframe.sum (#227) (3a375e8)

  • Code samples for Series.dot and DataFrame.dot (#226) (b62a07a)

  • Code samples for Series.where and Series.mask (#217) (52dfad2)

  • Code samples for dataframe.any, dataframe.all and dataframe.prod (#223) (d7957fa)

  • Make the code samples reflect default bq connection usage (#206) (71844b0)

Miscellaneous Chores

0.14.1 (2023-11-16)

Bug Fixes

  • Correctly handle null values when initializing fingerprint ordering (#210) (8324f13)

Documentation

  • Add an example notebook about line graphs (#197) (f957b27)

0.14.0 (2023-11-14)

Features

  • Add ‘cross’ join support (#176) (765446a)

  • Add ‘index’, ‘pad’, ‘nearest’ interpolate methods (#162) (6a28403)

  • Add series.sample (identical to existing dataframe.sample) (#187) (37914a4)

  • Add unordered sql compilation (#156) (58f420c)

  • Log most recent API calls as recent-bigframes-api-xx labels on BigQuery jobs (#145) (4ea33b7)

  • Read_gbq creates order deterministically without table copy (#191) (8ab81de)

  • Support date_series.astype("string[pyarrow]") to cast DATE to STRING (#186) (aee0e8e)

  • Support series.at[row_label] = scalar (#173) (0c8bd33)

  • Temporary resources no longer use BigQuery Sessions (#194) (4a02cac)

Bug Fixes

  • All sort operation are now stable (#195) (3a2761f)

  • Default to 7 days expiration for read_csv, read_json, read_parquet (#193) (03606cd)

  • Deprecate the remote_service_type in llm model (#180) (a8a409a)

  • For reset_index on unnamed multiindex, always use level_[n] label (#182) (f95000d)

  • Match pandas behavior when assigning listlike to empty dfs (#172) (c1d1f42)

  • Use anonymous dataset instead of session dataset for temp tables (#181) (800d44e)

  • Use random table for read_pandas (#192) (741c75e)

  • Use random table when loading data for read_csv, read_json, read_parquet (#175) (9d2e6dc)

Documentation

  • Add code samples for read_gbq_function using community UDFs (#188) (7506eab)

  • Add docstring code samples for Series.apply and DataFrame.map (#185) (c816d84)

  • Add llm kmeans notebook as an included example (#177) (d49ae42)

  • Use head() to get top n results, not to preview results (#190) (87f84c9)

0.13.0 (2023-11-07)

Features

  • to_gbq without a destination table writes to a temporary table (#158) (e1817c9)

  • Add DataFrame.__iter__, DataFrame.iterrows, DataFrame.itertuples, and DataFrame.keys methods (#164) (c065071)

  • Add Series.__iter__ method (#164) (c065071)

  • Add interpolate() to series and dataframe (#157) (b9cb55c)

  • Support 32k text-generation and multilingual embedding models (#161) (5f0ea37)

Bug Fixes

  • Update default temp table expiration to 7 days (#174) (4ff26cd)

0.12.0 (2023-11-01)

Features

  • Add DataFrame.melt (#113) (4e4409c)

  • Add DataFrame.to_pandas_batches() to download large DataFrame objects (#136) (3afd4a3)

  • Add bigframes.options.compute.maximum_bytes_billed option that sets maximum bytes billed on query jobs (#133) (63c7919)

  • Add pandas.qcut (#104) (8e44518)

  • Add pd.get_dummies (#149) (d8baad5)

  • Add unstack to series, add level param (#115) (5edcd19)

  • Implement operator @ for DataFrame.dot (#139) (79a638e)

  • Populate ibis version in user agent (#140) (c639a36)

Bug Fixes

  • Don’t override the global logging config (#138) (2ddbf74)

  • Fix bug with column names under repeated column assignment (#150) (29032d0)

  • Resolve plotly rendering issue by using ipython html for job pro… (#134) (39df43e)

  • Use indexee’s session for loc listlike cases (#152) (27c5725)

Documentation

  • Add artithmetic df sample code (#153) (ac44ccd)

  • Fix indentation on read_gbq_function code sample (#163) (0801d96)

  • Link to ML.EVALUATE BQML page for score() methods (#137) (45c617f)

0.11.0 (2023-10-26)

Features

  • Add back reset_session as an alias for close_session (#124) (694a85a)

  • Change query parameter to query_or_table in read_gbq (#127) (f9bb3c4)

Bug Fixes

  • Expose bigframes.pandas.reset_session as a public API (#128) (b17e1f4)

  • Use series’s own session in series.reindex listlike case (#135) (95bff3f)

Documentation

  • Add runnable code samples for DataFrames I/O methods and property (#129) (6fea8ef)

  • Add runnable code samples for reading methods (#125) (a669919)

0.10.0 (2023-10-19)

Features

  • Implement DataFrame.dot for matrix multiplication (#67) (29dd414)

0.9.0 (2023-10-18)

⚠ BREAKING CHANGES

  • rename bigframes.pandas.reset_session to close_session (#101)

Features

  • Add bigframes.options.bigquery.application_name for partner attribution (#117) (52d64ff)

  • Add AtIndexer getitems (#107) (752b01f)

  • Rename bigframes.pandas.reset_session to close_session (#101) (36693bf)

  • Send BigQuery cancel request when canceling bigframes process (#103) (e325fbb)

  • Support external packages in remote_function (#98) (ec10c4a)

  • Use ArrowDtype for STRUCT columns in to_pandas (#85) (9238fad)

Bug Fixes

  • Support multiindex for three loc getitem overloads (#113) (68e3cd3)

Performance Improvements

  • If primary keys are defined, read_gbq avoids copying table data (#112) (e6c0cd1)

Documentation

  • Add documentation for Series.struct.field and Series.struct.explode (#114) (a6dab9c)

  • Add open-source link in API doc (#106) (db51fe3)

  • Update ML overview API doc (#105) (1b3f3a5)

0.8.0 (2023-10-12)

⚠ BREAKING CHANGES

  • The default behavior of to_parquet is changing from no compression to 'snappy' compression.

Features

  • Support compression in to_parquet (a8c286f)

Bug Fixes

  • Create session dataset for remote functions only when needed (#94) (1d385be)

0.7.0 (2023-10-11)

Features

  • Add aliases for several series properties (#80) (c0efec8)

  • Add equals methods to series/dataframe (#76) (636a209)

  • Add iat and iloc accessing by tuples of integers (#90) (228aeba)

  • Add level param to DataFrame.stack (#88) (97b8bec)

  • Allow df.drop to take an index object (#68) (740c451)

  • Use default session connection (#87) (4ae4ef9)

Bug Fixes

Documentation

  • Add more preprocessing models into the docs menu. (#97) (1592315)

0.6.0 (2023-10-04)

Features

  • Add df.unstack (#63) (4a84714)

  • Add idxmin, idxmax to series, dataframe (#74) (781307e)

  • Add ml.preprocessing.KBinsDiscretizer (#81) (24c6256)

  • Add multi-column dataframe merge (#73) (c9fa85c)

  • Add update and align methods to dataframe (#57) (bf050cf)

  • Support STRUCT data type with Series.struct.field to extract child fields (#71) (17afac9)

Bug Fixes

  • Avoid 403 response too large to return error with read_gbq and large query results (#77) (8f3b5b2)

  • Change return type of Series.loc[scalar] (#40) (fff3d45)

  • Fix df/series.iloc by list with multiindex (#79) (971d091)

0.5.0 (2023-09-28)

Features

  • Add DataFrame.kurtosis / DF.kurt method (c1900c2)

  • Add DataFrame.rolling and DataFrame.expanding methods (c1900c2)

  • Add items, apply methods to DataFrame. (#43) (3adc1b3)

  • Add axis param to simple df aggregations (#52) (9cf9972)

  • Add index dtype, astype, drop, fillna, aggregate attributes. (#38) (1a254a4)

  • Add ml.preprocessing.LabelEncoder (#50) (2510461)

  • Add ml.preprocessing.MaxAbsScaler (#56) (14b262b)

  • Add ml.preprocessing.MinMaxScaler (#64) (392113b)

  • Add more index methods (#54) (a6e32aa)

  • Support calculate_p_values parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support class_weights="balanced" in LogisticRegression model (c1900c2)

  • Support df[column_name] = df_only_one_column (c1900c2)

  • Support early_stop parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support enable_global_explain parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support l2_reg parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support learn_rate_strategy parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support ls_init_learn_rate parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support max_iterations parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support min_rel_progress parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support optimize_strategy parameter in bigframes.ml.linear_model.LinearRegression (c1900c2)

  • Support casting string to integer or float (#59) (3502f83)

Bug Fixes

  • Fix header skipping logic in read_csv (#49) (d56258c)

  • Generate unique ids on join to avoid id collisions (#65) (7ab65e8)

  • LabelEncoder params consistent with Sklearn (#60) (632caec)

  • Loosen filter items tests to accomodate shifting pandas impl (#41) (edabdbb)

Performance Improvements

  • Add ability to cache dataframe and series to session table (#51) (416d7cb)

  • Inline small Series and DataFrames in query text (#45) (5e199ec)

  • Reimplement unpivot to use cross join rather than union (#47) (f9a93ce)

  • Simplify join order to use multiple order keys instead of string. (#36) (5056da6)

Documentation

  • Link to Remote Functions code samples from README and API reference (c1900c2)

0.4.0 (2023-09-16)

Features

  • Add axis parameter to droplevel and reorder_levels (7c6b0dd)

  • Add bfill and ffill to DataFrame and Series (7c6b0dd)

  • Add DataFrame.combine and DataFrame.combine_first (#27) (7c6b0dd)

  • Add DataFrame.nlargest, nsmallest (7c6b0dd)

  • Add DataFrame.pct_change and Series.pct_change (7c6b0dd)

  • Add DataFrame.skew and GroupBy.skew (7c6b0dd)

  • Add DataFrame.to_dict, to_excel, to_latex, to_records, to_string, to_markdown, to_pickle, to_orc (7c6b0dd)

  • Add diff method to DataFrame and GroupBy (7c6b0dd)

  • Add filter and reindex to Series and DataFrame (7c6b0dd)

  • Add reindex_like to DataFrame and Series (7c6b0dd)

  • Add swaplevel to DataFrame and Series (7c6b0dd)

  • Add partial support for Sereies.replace (7c6b0dd)

  • Support DataFrame.loc[bool_series, column] = scalar (7c6b0dd)

  • Support a persistent name in remote_function (7c6b0dd)

Bug Fixes

  • remote_function uses same credentials as other APIs (7c6b0dd)

  • Add type hints to models (7c6b0dd)

  • Raise error when ARIMAPlus is used with Pipeline (7c6b0dd)

  • Remove transforms parameter in model.fit (breaking change) (7c6b0dd)

  • Support column joins with “None indexer” (7c6b0dd)

  • Use for literals Int64Dtype in cut (7c6b0dd)

  • Use lowercase strings for parameter literals in bigframes.ml (breaking change) (7c6b0dd)

Performance Improvements

  • bigframes-api label to I/O query jobs (7c6b0dd)

Documentation

  • Document possible parameter values for PaLM2TextGenerator (7c6b0dd)

  • Document region logic in README (7c6b0dd)

  • Fix OneHotEncoder sample (7c6b0dd)

0.3.2 (2023-09-06)

Bug Fixes

  • Make release.sh script for PyPI upload executable (#20) (9951610)

0.3.1 (2023-09-05)

Bug Fixes

  • release: Use correct directory name for release build config (#17) (3dd25b3)

0.3.0 (2023-09-02)

Features

  • Add bigframes.get_global_session() and bigframes.reset_session() aliases (a32b747)

  • Add bigframes.pandas.read_pickle function (a32b747)

  • Add components_, explained_variance_, and explained_variance_ratio_ properties to bigframes.ml.decomposition.PCA (89b9503)

  • Add fit_transform to bigquery.ml transformers (a32b747)

  • Add Series.dropna and DataFrame.fillna (8fab755)

  • Add Series.str methods isalpha, isdigit, isdecimal, isalnum, isspace, islower, isupper, zfill, center (a32b747)

  • Support bigframes.pandas.merge() (8fab755)

  • Support DataFrame.isin with list and dict inputs (8fab755)

  • Support DataFrame.pivot (a32b747)

  • Support DataFrame.stack (89b9503)

  • Support DataFrame-DataFrame binary operations (8fab755)

  • Support df[my_column] = [a python list] (89b9503)

  • Support Index.is_monotonic (8fab755)

  • Support np.arcsin, np.arccos, np.arctan, np.sinh, np.cosh, np.tanh, np.arcsinh, np.arccosh, np.arctanh, np.exp with Series argument (89b9503)

  • Support np.sin, np.cos, np.tan, np.log, np.log10, np.sqrt, np.abs with Series argument (89b9503)

  • Support pow() and power operator in DataFrame and Series (8fab755)

  • Support read_json with engine=bigquery for newline-delimited JSON files (89b9503)

  • Support Series.corr (89b9503)

  • Support Series.map (8fab755)

  • Support for np.add, np.subtract, np.multiply, np.divide, np.power (8fab755)

  • Support MultiIndex for DataFrame columns (a32b747)

  • Use pandas.Index for column labels (a32b747)

  • Use default session and connection in ml.llm and ml.imported (8fab755)

Bug Fixes

  • Add error message to set_index (a32b747)

  • Align column names with pandas in DataFrame.agg results (89b9503)

  • Allow (but still not recommended) ORDER BY in read_gbq input when an index_col is defined (89b9503)

  • Check for IAM role on the BigQuery connection when initializing a remote_function (89b9503)

  • Check that types are specified in read_gbq_function (a32b747)

  • Don’t use query cache for Session construction (a32b747)

  • Include survey link in abstract NotImplementedError exception messages (89b9503)

  • Label temp table creation jobs with source=bigquery-dataframes-temp label (89b9503)

  • Make X_train argument names consistent across methods (8fab755)

  • Raise AttributeError for unimplemented pandas methods (89b9503)

  • Raise exception for invalid function in read_gbq_function (a32b747)

  • Support spaces in column names in DataFrame initializater (89b9503)

Performance Improvements

  • Add local cache for __repr_\*__ methods (a32b747)

  • Lazily instantiate client library objects (89b9503)

  • Use row_number() filter for head / tail (8fab755)

Documentation

  • Add ML section under Overview (a32b747)

  • Add release status to table of contents (a32b747)

  • Add samples and best practices to read_gbq docs (a32b747)

  • Correct the return types of Dataframe and Series (a32b747)

  • Create subfolders for notebooks (a32b747)

  • Fix link to GitHub (89b9503)

  • Highlight bigframes is open-source (a32b747)

  • Sample ML Drug Name Generation notebook (a32b747)

  • Set options.bigquery.project in sample code (89b9503)

  • Transform remote function user guide into sample code (a32b747)

  • Update remote function notebook with read_gbq_function usage (8fab755)

0.2.0 (2023-08-17)

Features

  • Add KMeans.cluster_centers_.

  • Allow column labels to be any type handled by bq df, column labels can be integers now.

  • Add dataframegroupby.agg().

  • Add Series Property is_monotonic_increasing and is_monotonic_decreasing.

  • Add match, fullmatch, get, pad str methods.

  • Add series isin function.

Bug Fixes

  • Update ML package to use sessions for queries.

  • Optimize read_gbq with index_col set to cluster by index_col.

  • Raise ValueError if the location mismatched.

  • read_gbq no longer uses ‘time travel’ with query inputs.

Documentation

  • Add docstring to _uniform_sampling to avoid user using it.

0.1.1 (2023-08-14)

Documentation

  • Correct link to code repository in setup.py and use correct terminology for console.cloud.google.com links.

0.1.0 (2023-08-11)

Features

  • Add bigframes.pandas package with an API compatible with pandas. Supported data sources include: BigQuery SQL queries, BigQuery tables, CSV (local and GCS), Parquet (local and Cloud Storage), and more.

  • Add bigframes.ml package with an API inspired by scikit-learn. Train machine learning models and run batch predicition, powered by BigQuery ML.

0.0.0 (2023-02-22)

  • Empty package to reserve package name.