From 0458fc85a97fcd96b791ca2df3f9e2089deb7d04 Mon Sep 17 00:00:00 2001 From: itholic Date: Mon, 10 Feb 2020 17:11:23 +0900 Subject: [PATCH 1/6] Add missing APIs to docs --- docs/source/user_guide/best_practices.rst | 41 +++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/docs/source/user_guide/best_practices.rst b/docs/source/user_guide/best_practices.rst index 4ef94c1b91..0958277495 100644 --- a/docs/source/user_guide/best_practices.rst +++ b/docs/source/user_guide/best_practices.rst @@ -170,3 +170,44 @@ this operation should be avoided. See `Operations on different DataFrames `_ for more details. + +The list of unsupported APIs +---------------------------- + +Koalas does not support several APIs that may cause memory issues mostly due to the size of the data. + +- list of API not supported for DataFrame + - DataFrame.values + - DataFrame.to_pickle + - DataFrame.memory_usage + - DataFrame.to_xarray + +- list of API not supported for Series + - Series.values + - Series.to_pickle + - Series.memory_usage + - Series.to_xarray + - Series.array + - Series.duplicated + - Series.real + - Series.nbytes + - Series.__iter__ + - Series.ravel + +- list of API not supported for Index & MultiIndex + - Index.values + - Index.memory_usage + - Index.array + - Index.duplicated + - Index.__iter__ + - Index.to_list + - Index.tolist + - MultiIndex.values + - MultiIndex.memory_usage + - MultiIndex.array + - MultiIndex.duplicated + - MultiIndex.codes + - MultiIndex.levels + - MultiIndex.__iter__ + - MultiIndex.to_list + - MultiIndex.tolist From a7cb7dc222bd143e4530c7f75c987301281b3328 Mon Sep 17 00:00:00 2001 From: itholic Date: Mon, 10 Feb 2020 22:23:21 +0900 Subject: [PATCH 2/6] Add missing APIs to docs --- docs/source/user_guide/best_practices.rst | 42 -------------------- docs/source/user_guide/faq.rst | 47 +++++++++++++++++++++++ 2 files changed, 47 insertions(+), 42 deletions(-) diff --git a/docs/source/user_guide/best_practices.rst b/docs/source/user_guide/best_practices.rst index 0958277495..f9eaa92fe2 100644 --- a/docs/source/user_guide/best_practices.rst +++ b/docs/source/user_guide/best_practices.rst @@ -169,45 +169,3 @@ It internally performs a join operation which can be expensive in general, which this operation should be avoided. See `Operations on different DataFrames `_ for more details. - - -The list of unsupported APIs ----------------------------- - -Koalas does not support several APIs that may cause memory issues mostly due to the size of the data. - -- list of API not supported for DataFrame - - DataFrame.values - - DataFrame.to_pickle - - DataFrame.memory_usage - - DataFrame.to_xarray - -- list of API not supported for Series - - Series.values - - Series.to_pickle - - Series.memory_usage - - Series.to_xarray - - Series.array - - Series.duplicated - - Series.real - - Series.nbytes - - Series.__iter__ - - Series.ravel - -- list of API not supported for Index & MultiIndex - - Index.values - - Index.memory_usage - - Index.array - - Index.duplicated - - Index.__iter__ - - Index.to_list - - Index.tolist - - MultiIndex.values - - MultiIndex.memory_usage - - MultiIndex.array - - MultiIndex.duplicated - - MultiIndex.codes - - MultiIndex.levels - - MultiIndex.__iter__ - - MultiIndex.to_list - - MultiIndex.tolist diff --git a/docs/source/user_guide/faq.rst b/docs/source/user_guide/faq.rst index ef3a9fc2c5..a45193d4f0 100644 --- a/docs/source/user_guide/faq.rst +++ b/docs/source/user_guide/faq.rst @@ -62,3 +62,50 @@ lot longer (in the order of days) 2. Koalas takes a different approach that might contradict Spark's API design principles, and those principles cannot be changed lightly given the large user base of Spark. A new, separate project provides an opportunity for us to experiment with new design principles. + +What is the list of APIs that are not plan to support in Koalas? +---------------------------------------------------------------- + +Koalas doesn't support several APIs that may cause memory issues mostly due to the size of the data. + +They potentially + +The following is a list of APIs that Koalas doesn't plan to support. + +- DataFrame + - DataFrame.values + - DataFrame.to_pickle + - DataFrame.memory_usage + - DataFrame.to_xarray + +- Series + - Series.values + - Series.to_pickle + - Series.memory_usage + - Series.to_xarray + - Series.array + - Series.duplicated + - Series.real + - Series.nbytes + - Series.__iter__ + - Series.ravel + +- Index + - Index.values + - Index.memory_usage + - Index.array + - Index.duplicated + - Index.__iter__ + - Index.to_list + - Index.tolist + +- MultiIndex + - MultiIndex.values + - MultiIndex.memory_usage + - MultiIndex.array + - MultiIndex.duplicated + - MultiIndex.codes + - MultiIndex.levels + - MultiIndex.__iter__ + - MultiIndex.to_list + - MultiIndex.tolist From ae03177e27afff2ed98e56fc00f136520ac84b3a Mon Sep 17 00:00:00 2001 From: itholic Date: Mon, 10 Feb 2020 22:30:54 +0900 Subject: [PATCH 3/6] Restore unnecessary change --- docs/source/user_guide/best_practices.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/user_guide/best_practices.rst b/docs/source/user_guide/best_practices.rst index f9eaa92fe2..4ef94c1b91 100644 --- a/docs/source/user_guide/best_practices.rst +++ b/docs/source/user_guide/best_practices.rst @@ -169,3 +169,4 @@ It internally performs a join operation which can be expensive in general, which this operation should be avoided. See `Operations on different DataFrames `_ for more details. + From e371bb3486f26e30deb32854138556ffe256a195 Mon Sep 17 00:00:00 2001 From: itholic Date: Mon, 10 Feb 2020 22:32:01 +0900 Subject: [PATCH 4/6] Restore unnecessary change --- docs/source/user_guide/faq.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/source/user_guide/faq.rst b/docs/source/user_guide/faq.rst index a45193d4f0..f5907e8b94 100644 --- a/docs/source/user_guide/faq.rst +++ b/docs/source/user_guide/faq.rst @@ -68,8 +68,6 @@ What is the list of APIs that are not plan to support in Koalas? Koalas doesn't support several APIs that may cause memory issues mostly due to the size of the data. -They potentially - The following is a list of APIs that Koalas doesn't plan to support. - DataFrame From 60aa5b73a67f53c21f730d118266a3bba726eadd Mon Sep 17 00:00:00 2001 From: itholic Date: Mon, 10 Feb 2020 22:35:54 +0900 Subject: [PATCH 5/6] plan -> planned --- docs/source/user_guide/faq.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/user_guide/faq.rst b/docs/source/user_guide/faq.rst index f5907e8b94..a84c92775d 100644 --- a/docs/source/user_guide/faq.rst +++ b/docs/source/user_guide/faq.rst @@ -63,8 +63,8 @@ lot longer (in the order of days) principles cannot be changed lightly given the large user base of Spark. A new, separate project provides an opportunity for us to experiment with new design principles. -What is the list of APIs that are not plan to support in Koalas? ----------------------------------------------------------------- +What is the list of APIs that are not planned to support in Koalas? +------------------------------------------------------------------- Koalas doesn't support several APIs that may cause memory issues mostly due to the size of the data. From 4c4ca9acd1a8381117e31c2d3f7df2a75900d45d Mon Sep 17 00:00:00 2001 From: itholic Date: Mon, 10 Feb 2020 22:41:33 +0900 Subject: [PATCH 6/6] Add Example --- docs/source/user_guide/faq.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/source/user_guide/faq.rst b/docs/source/user_guide/faq.rst index a84c92775d..977ef9dab2 100644 --- a/docs/source/user_guide/faq.rst +++ b/docs/source/user_guide/faq.rst @@ -68,6 +68,10 @@ What is the list of APIs that are not planned to support in Koalas? Koalas doesn't support several APIs that may cause memory issues mostly due to the size of the data. +For example, implementing and using `DataFrame.values` in Koalas can cause all data belonging to the + +DataFrame to be loaded into the driver's memory, causing memory errors like OOM. + The following is a list of APIs that Koalas doesn't plan to support. - DataFrame