diff --git a/README.md b/README.md index 7277889bd..128bfe118 100644 --- a/README.md +++ b/README.md @@ -49,7 +49,7 @@ We recommend using Databricks Runtime versions 13.3 LTS with Photon enabled. > DEPRECATION ERROR: Mosaic v0.4.x series only supports Databricks Runtime 13. You can specify `%pip install 'databricks-mosaic<0.4,>=0.3'` for DBR < 13. -As of the 0.4.0 release, Mosaic issues the following ERROR when initialized on a cluster that is neither Photon Runtime nor Databricks Runtime ML [[ADB](https://learn.microsoft.com/en-us/azure/databricks/runtime/) | [AWS](https://docs.databricks.com/runtime/index.html) | [GCP](https://docs.gcp.databricks.com/runtime/index.html)]: +:warning: **Mosaic 0.4.x series issues the following ERROR on a standard, non-Photon cluster [[ADB](https://learn.microsoft.com/en-us/azure/databricks/runtime/) | [AWS](https://docs.databricks.com/runtime/index.html) | [GCP](https://docs.gcp.databricks.com/runtime/index.html)]:** > DEPRECATION ERROR: Please use a Databricks Photon-enabled Runtime for performance benefits or Runtime ML for spatial AI benefits; Mosaic 0.4.x series restricts executing this cluster. diff --git a/docs/source/api/raster-functions.rst b/docs/source/api/raster-functions.rst index 87af5c46e..bdad114e9 100644 --- a/docs/source/api/raster-functions.rst +++ b/docs/source/api/raster-functions.rst @@ -190,6 +190,7 @@ rst_combineavg The output raster will have the same pixel type as the input rasters. The output raster will have the same pixel size as the input rasters. The output raster will have the same coordinate reference system as the input rasters. + Also, see :doc:`rst_combineavg_agg ` function. :param tiles: A column containing an array of raster tiles. :type tiles: Column (ArrayType(RasterTileType)) @@ -229,58 +230,6 @@ rst_combineavg | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | +----------------------------------------------------------------------------------------------------------------+ -rst_combineavgagg -***************** - -.. function:: rst_combineavgagg(tile) - - Combines a group by statement over aggregated raster tiles by averaging the pixel values. - The rasters must have the same extent, number of bands, and pixel type. - The rasters must have the same pixel size and coordinate reference system. - The output raster will have the same extent as the input rasters. - The output raster will have the same number of bands as the input rasters. - The output raster will have the same pixel type as the input rasters. - The output raster will have the same pixel size as the input rasters. - The output raster will have the same coordinate reference system as the input rasters. - - :param tile: A grouped column containing raster tiles. - :type tile: Column (RasterTileType) - :rtype: Column: RasterTileType - - :example: - -.. tabs:: - .. code-tab:: py - - df.groupBy()\ - .agg(mos.rst_combineavgagg("tile").limit(1).display() - +----------------------------------------------------------------------------------------------------------------+ - | rst_combineavgagg(tile) | - +----------------------------------------------------------------------------------------------------------------+ - | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | - +----------------------------------------------------------------------------------------------------------------+ - - .. code-tab:: scala - - df.groupBy() - .agg(rst_combineavgagg(col("tile")).limit(1).show - +----------------------------------------------------------------------------------------------------------------+ - | rst_combineavgagg(tile) | - +----------------------------------------------------------------------------------------------------------------+ - | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | - +----------------------------------------------------------------------------------------------------------------+ - - .. code-tab:: sql - - SELECT rst_combineavgagg(tile) - FROM table - GROUP BY 1 - +----------------------------------------------------------------------------------------------------------------+ - | rst_combineavgagg(tile) | - +----------------------------------------------------------------------------------------------------------------+ - | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | - +----------------------------------------------------------------------------------------------------------------+ - rst_derivedband ************** @@ -295,6 +244,7 @@ rst_derivedband The output raster will have the same pixel type as the input rasters. The output raster will have the same pixel size as the input rasters. The output raster will have the same coordinate reference system as the input rasters. + Also, see :doc:`rst_derivedband_agg ` function. :param tiles: A column containing an array of raster tiles. :type tiles: Column (ArrayType(RasterTileType)) @@ -364,96 +314,6 @@ rst_derivedband +----------------------------------------------------------------------------------------------------------------+ -rst_derivedbandagg -***************** - -.. function:: rst_derivedbandagg(tile, python_func, func_name) - - Combines a group by statement over aggregated raster tiles by using the provided python function. - The rasters must have the same extent, number of bands, and pixel type. - The rasters must have the same pixel size and coordinate reference system. - The output raster will have the same extent as the input rasters. - The output raster will have the same number of bands as the input rasters. - The output raster will have the same pixel type as the input rasters. - The output raster will have the same pixel size as the input rasters. - The output raster will have the same coordinate reference system as the input rasters. - - :param tile: A grouped column containing raster tile(s). - :type tile: Column (RasterTileType) - :param python_func: A function to evaluate in python. - :type python_func: Column (StringType) - :param func_name: name of the function to evaluate in python. - :type func_name: Column (StringType) - :rtype: Column: RasterTileType - - :example: - -.. tabs:: - .. code-tab:: py - from textwrap import dedent - df\ - .select( - "date", "tile", - F.lit(dedent( - """ - import numpy as np - def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs): - out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar) - """)).alias("py_func1"), - F.lit("average").alias("func1_name") - )\ - .groupBy("date", "py_func1", "func1_name")\ - .agg(mos.rst_derivedbandagg("tile","py_func1","func1_name")).limit(1).display() - +----------------------------------------------------------------------------------------------------------------+ - | rst_derivedbandagg(tile,py_func1,func1_name) | - +----------------------------------------------------------------------------------------------------------------+ - | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | - +----------------------------------------------------------------------------------------------------------------+ - - .. code-tab:: scala - - df - .select( - "date", "tile" - lit( - """ - |import numpy as np - |def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs): - | out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar) - |""".stripMargin).as("py_func1"), - lit("average").as("func1_name") - ) - .groupBy("date", "py_func1", "func1_name") - .agg(mos.rst_derivedbandagg("tile","py_func1","func1_name")).limit(1).show - +----------------------------------------------------------------------------------------------------------------+ - | rst_derivedbandagg(tile,py_func1,func1_name) | - +----------------------------------------------------------------------------------------------------------------+ - | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | - +----------------------------------------------------------------------------------------------------------------+ - - .. code-tab:: sql - SELECT - date, py_func1, func1_name, - rst_derivedbandagg(tile, py_func1, func1_name) - FROM SELECT ( - date, tile, - """ - import numpy as np - def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs): - out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar) - """ as py_func1, - "average" as func1_name - FROM table - ) - GROUP BY date, py_func1, func1_name - LIMIT 1 - +----------------------------------------------------------------------------------------------------------------+ - | rst_derivedbandagg(tile,py_func1,func1_name) | - +----------------------------------------------------------------------------------------------------------------+ - | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | - +----------------------------------------------------------------------------------------------------------------+ - - rst_frombands ************** @@ -527,6 +387,7 @@ rst_fromcontent .. tabs:: .. code-tab:: py + # binary is python bytearray data type df = spark.read.format("binaryFile")\ .load("dbfs:/FileStore/geospatial/mosaic/sample_raster_data/binary/netcdf-coral")\ @@ -538,6 +399,7 @@ rst_fromcontent +----------------------------------------------------------------------------------------------------------------+ .. code-tab:: scala + //binary is scala/java Array(Byte) data type val df = spark.read .format("binaryFile") @@ -910,9 +772,12 @@ rst_mapalgebra Here are examples of the json_spec': (1) shows default indexing, (2) shows reusing an index, and (3) shows band indexing. - (1) '{"calc": "A+B/C"}' - (2) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 1}' - (3) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 2, "A_band": 1, "B_band": 1, "C_band": 1}' + + .. code-block:: text + + (1) '{"calc": "A+B/C"}' + (2) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 1}' + (3) '{"calc": "A+B/C", "A_index": 0, "B_index": 1, "C_index": 2, "A_band": 1, "B_band": 1, "C_band": 1}' :param tile: A column containing the raster tile. :type tile: Column (RasterTileType) @@ -1011,6 +876,7 @@ rst_merge The output raster will have the same pixel type as the input rasters. The output raster will have the same pixel size as the highest resolution input rasters. The output raster will have the same coordinate reference system as the input rasters. + Also, see :doc:`rst_merge_agg ` function. :param tiles: A column containing an array of raster tiles. :type tiles: Column (ArrayType(RasterTileType)) @@ -1048,63 +914,6 @@ rst_merge | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | +----------------------------------------------------------------------------------------------------------------+ -rst_mergeagg -************ - -.. function:: rst_mergeagg(tile) - - Combines a grouped aggregate of raster tiles into a single raster. - The rasters do not need to have the same extent. - The rasters must have the same coordinate reference system. - The rasters are combined using gdalwarp. - The noData value needs to be initialised; if not, the non valid pixels may introduce artifacts in the output raster. - The rasters are stacked in the order they are provided. - This order is randomized since this is an aggregation function. - If the order of rasters is important please first collect rasters and sort them by metadata information and then use - rst_merge function. - The output raster will have the extent covering all input rasters. - The output raster will have the same number of bands as the input rasters. - The output raster will have the same pixel type as the input rasters. - The output raster will have the same pixel size as the highest resolution input rasters. - The output raster will have the same coordinate reference system as the input rasters. - - :param tile: A column containing raster tiles. - :type tile: Column (RasterTileType) - :rtype: Column: RasterTileType - - :example: - -.. tabs:: - .. code-tab:: py - - df.groupBy("date")\ - .agg(mos.rst_mergeagg("tile")).limit(1).display() - +----------------------------------------------------------------------------------------------------------------+ - | rst_mergeagg(tile) | - +----------------------------------------------------------------------------------------------------------------+ - | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | - +----------------------------------------------------------------------------------------------------------------+ - - .. code-tab:: scala - - df.groupBy("date") - .agg(rst_mergeagg(col("tile"))).limit(1).show - +----------------------------------------------------------------------------------------------------------------+ - | rst_mergeagg(tile) | - +----------------------------------------------------------------------------------------------------------------+ - | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | - +----------------------------------------------------------------------------------------------------------------+ - - .. code-tab:: sql - - SELECT rst_mergeagg(tile) - FROM table - GROUP BY date - +----------------------------------------------------------------------------------------------------------------+ - | rst_mergeagg(tile) | - +----------------------------------------------------------------------------------------------------------------+ - | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | - +----------------------------------------------------------------------------------------------------------------+ rst_metadata ************* diff --git a/docs/source/api/spatial-aggregations.rst b/docs/source/api/spatial-aggregations.rst index 4f8d3a0ca..9f806fec9 100644 --- a/docs/source/api/spatial-aggregations.rst +++ b/docs/source/api/spatial-aggregations.rst @@ -2,11 +2,214 @@ Spatial aggregation functions ============================= +rst_combineavg_agg +***************** + +.. function:: rst_combineavg_agg(tile) + + Combines a group by statement over aggregated raster tiles by averaging the pixel values. + The rasters must have the same extent, number of bands, and pixel type. + The rasters must have the same pixel size and coordinate reference system. + The output raster will have the same extent as the input rasters. + The output raster will have the same number of bands as the input rasters. + The output raster will have the same pixel type as the input rasters. + The output raster will have the same pixel size as the input rasters. + The output raster will have the same coordinate reference system as the input rasters. + + :param tile: A grouped column containing raster tiles. + :type tile: Column (RasterTileType) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df.groupBy()\ + .agg(mos.rst_combineavg_agg("tile").limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_combineavg_agg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + df.groupBy() + .agg(rst_combineavg_agg(col("tile")).limit(1).show + +----------------------------------------------------------------------------------------------------------------+ + | rst_combineavg_agg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + SELECT rst_combineavg_agg(tile) + FROM table + GROUP BY 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_combineavg_agg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + +rst_derivedband_agg +***************** + +.. function:: rst_derivedband_agg(tile, python_func, func_name) + + Combines a group by statement over aggregated raster tiles by using the provided python function. + The rasters must have the same extent, number of bands, and pixel type. + The rasters must have the same pixel size and coordinate reference system. + The output raster will have the same extent as the input rasters. + The output raster will have the same number of bands as the input rasters. + The output raster will have the same pixel type as the input rasters. + The output raster will have the same pixel size as the input rasters. + The output raster will have the same coordinate reference system as the input rasters. + + :param tile: A grouped column containing raster tile(s). + :type tile: Column (RasterTileType) + :param python_func: A function to evaluate in python. + :type python_func: Column (StringType) + :param func_name: name of the function to evaluate in python. + :type func_name: Column (StringType) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + from textwrap import dedent + df\ + .select( + "date", "tile", + F.lit(dedent( + """ + import numpy as np + def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs): + out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar) + """)).alias("py_func1"), + F.lit("average").alias("func1_name") + )\ + .groupBy("date", "py_func1", "func1_name")\ + .agg(mos.rst_derivedband_agg("tile","py_func1","func1_name")).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_derivedband_agg(tile,py_func1,func1_name) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + df + .select( + "date", "tile" + lit( + """ + |import numpy as np + |def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs): + | out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar) + |""".stripMargin).as("py_func1"), + lit("average").as("func1_name") + ) + .groupBy("date", "py_func1", "func1_name") + .agg(mos.rst_derivedband_agg("tile","py_func1","func1_name")).limit(1).show + +----------------------------------------------------------------------------------------------------------------+ + | rst_derivedband_agg(tile,py_func1,func1_name) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + SELECT + date, py_func1, func1_name, + rst_derivedband_agg(tile, py_func1, func1_name) + FROM SELECT ( + date, tile, + """ + import numpy as np + def average(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize, raster_ysize, buf_radius, gt, **kwargs): + out_ar[:] = np.sum(in_ar, axis=0) / len(in_ar) + """ as py_func1, + "average" as func1_name + FROM table + ) + GROUP BY date, py_func1, func1_name + LIMIT 1 + +----------------------------------------------------------------------------------------------------------------+ + | rst_derivedband_agg(tile,py_func1,func1_name) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + +rst_merge_agg +************ + +.. function:: rst_merge_agg(tile) + + Combines a grouped aggregate of raster tiles into a single raster. + The rasters do not need to have the same extent. + The rasters must have the same coordinate reference system. + The rasters are combined using gdalwarp. + The noData value needs to be initialised; if not, the non valid pixels may introduce artifacts in the output raster. + The rasters are stacked in the order they are provided. + This order is randomized since this is an aggregation function. + If the order of rasters is important please first collect rasters and sort them by metadata information and then use + rst_merge function. + The output raster will have the extent covering all input rasters. + The output raster will have the same number of bands as the input rasters. + The output raster will have the same pixel type as the input rasters. + The output raster will have the same pixel size as the highest resolution input rasters. + The output raster will have the same coordinate reference system as the input rasters. + + :param tile: A column containing raster tiles. + :type tile: Column (RasterTileType) + :rtype: Column: RasterTileType + + :example: + +.. tabs:: + .. code-tab:: py + + df.groupBy("date")\ + .agg(mos.rst_merge_agg("tile")).limit(1).display() + +----------------------------------------------------------------------------------------------------------------+ + | rst_merge_agg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: scala + + df.groupBy("date") + .agg(rst_merge_agg(col("tile"))).limit(1).show + +----------------------------------------------------------------------------------------------------------------+ + | rst_merge_agg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + + .. code-tab:: sql + + SELECT rst_merge_agg(tile) + FROM table + GROUP BY date + +----------------------------------------------------------------------------------------------------------------+ + | rst_merge_agg(tile) | + +----------------------------------------------------------------------------------------------------------------+ + | {index_id: 593308294097928191, raster: [00 01 10 ... 00], parentPath: "dbfs:/path_to_file", driver: "NetCDF" } | + +----------------------------------------------------------------------------------------------------------------+ + st_intersects_aggregate *********************** -.. function:: st_intersects_aggregate(leftIndex, rightIndex) +.. function:: st_intersects_agg(leftIndex, rightIndex) Returns `true` if any of the `leftIndex` and `rightIndex` pairs intersect. @@ -33,10 +236,10 @@ st_intersects_aggregate left_df .join(right_df, col("left_index.index_id") == col("right_index.index_id")) .groupBy() - .agg(st_intersects_aggregate(col("left_index"), col("right_index"))) + .agg(st_intersects_agg(col("left_index"), col("right_index"))) ).show(1, False) +------------------------------------------------+ - |st_intersects_aggregate(left_index, right_index)| + |st_intersects_agg(left_index, right_index)| +------------------------------------------------+ |true | +------------------------------------------------+ @@ -50,10 +253,10 @@ st_intersects_aggregate leftDf .join(rightDf, $"left_index.index_id" === $"right_index.index_id") .groupBy() - .agg(st_intersects_aggregate($"left_index", $"right_index")) + .agg(st_intersects_agg($"left_index", $"right_index")) .show(false) +------------------------------------------------+ - |st_intersects_aggregate(left_index, right_index)| + |st_intersects_agg(left_index, right_index)| +------------------------------------------------+ |true | +------------------------------------------------+ @@ -62,10 +265,10 @@ st_intersects_aggregate WITH l AS (SELECT grid_tessellateexplode("POLYGON ((0 0, 0 3, 3 3, 3 0))", 1) AS left_index), r AS (SELECT grid_tessellateexplode("POLYGON ((2 2, 2 4, 4 4, 4 2))", 1) AS right_index) - SELECT st_intersects_aggregate(l.left_index, r.right_index) + SELECT st_intersects_agg(l.left_index, r.right_index) FROM l INNER JOIN r on l.left_index.index_id = r.right_index.index_id +------------------------------------------------+ - |st_intersects_aggregate(left_index, right_index)| + |st_intersects_agg(left_index, right_index)| +------------------------------------------------+ |true | +------------------------------------------------+ @@ -83,20 +286,20 @@ st_intersects_aggregate showDF( select( join(df.l, df.r, df.l$left_index.index_id == df.r$right_index.index_id), - st_intersects_aggregate(column("left_index"), column("right_index")) + st_intersects_agg(column("left_index"), column("right_index")) ), truncate=F ) +------------------------------------------------+ - |st_intersects_aggregate(left_index, right_index)| + |st_intersects_agg(left_index, right_index)| +------------------------------------------------+ |true | +------------------------------------------------+ -st_intersection_aggregate +st_intersection_agg ************************* -.. function:: st_intersection_aggregate(leftIndex, rightIndex) +.. function:: st_intersection_agg(leftIndex, rightIndex) Computes the intersections of `leftIndex` and `rightIndex` and returns the union of these intersections. @@ -123,10 +326,10 @@ st_intersection_aggregate left_df .join(right_df, col("left_index.index_id") == col("right_index.index_id")) .groupBy() - .agg(st_astext(st_intersection_aggregate(col("left_index"), col("right_index")))) + .agg(st_astext(st_intersection_agg(col("left_index"), col("right_index")))) ).show(1, False) +--------------------------------------------------------------+ - |convert_to(st_intersection_aggregate(left_index, right_index))| + |convert_to(st_intersection_agg(left_index, right_index))| +--------------------------------------------------------------+ |POLYGON ((2 2, 3 2, 3 3, 2 3, 2 2)) | +--------------------------------------------------------------+ @@ -140,10 +343,10 @@ st_intersection_aggregate leftDf .join(rightDf, $"left_index.index_id" === $"right_index.index_id") .groupBy() - .agg(st_astext(st_intersection_aggregate($"left_index", $"right_index"))) + .agg(st_astext(st_intersection_agg($"left_index", $"right_index"))) .show(false) +--------------------------------------------------------------+ - |convert_to(st_intersection_aggregate(left_index, right_index))| + |convert_to(st_intersection_agg(left_index, right_index))| +--------------------------------------------------------------+ |POLYGON ((2 2, 3 2, 3 3, 2 3, 2 2)) | +--------------------------------------------------------------+ @@ -152,10 +355,10 @@ st_intersection_aggregate WITH l AS (SELECT grid_tessellateexplode("POLYGON ((0 0, 0 3, 3 3, 3 0))", 1) AS left_index), r AS (SELECT grid_tessellateexplode("POLYGON ((2 2, 2 4, 4 4, 4 2))", 1) AS right_index) - SELECT st_astext(st_intersection_aggregate(l.left_index, r.right_index)) + SELECT st_astext(st_intersection_agg(l.left_index, r.right_index)) FROM l INNER JOIN r on l.left_index.index_id = r.right_index.index_id +--------------------------------------------------------------+ - |convert_to(st_intersection_aggregate(left_index, right_index))| + |convert_to(st_intersection_agg(left_index, right_index))| +--------------------------------------------------------------+ |POLYGON ((2 2, 3 2, 3 3, 2 3, 2 2)) | +--------------------------------------------------------------+ @@ -173,11 +376,11 @@ st_intersection_aggregate showDF( select( join(df.l, df.r, df.l$left_index.index_id == df.r$right_index.index_id), - st_astext(st_intersection_aggregate(column("left_index"), column("right_index"))) + st_astext(st_intersection_agg(column("left_index"), column("right_index"))) ), truncate=F ) +--------------------------------------------------------------+ - |convert_to(st_intersection_aggregate(left_index, right_index))| + |convert_to(st_intersection_agg(left_index, right_index))| +--------------------------------------------------------------+ |POLYGON ((2 2, 3 2, 3 3, 2 3, 2 2)) | +--------------------------------------------------------------+ diff --git a/docs/source/api/spatial-functions.rst b/docs/source/api/spatial-functions.rst index 2e02fb3d6..350756e48 100644 --- a/docs/source/api/spatial-functions.rst +++ b/docs/source/api/spatial-functions.rst @@ -929,6 +929,7 @@ st_intersection .. function:: st_intersection(geom1, geom2) Returns a geometry representing the intersection of `left_geom` and `right_geom`. + Also, see :doc:`st_intersection_agg ` function. :param geom1: Geometry :type geom1: Column @@ -1665,6 +1666,7 @@ st_union .. function:: st_union(left_geom, right_geom) Returns the point set union of the input geometries. + Also, see :doc:`st_union_agg ` function. :param left_geom: Geometry :type left_geom: Column diff --git a/docs/source/api/spatial-indexing.rst b/docs/source/api/spatial-indexing.rst index 0ea059cb4..ba0a91132 100644 --- a/docs/source/api/spatial-indexing.rst +++ b/docs/source/api/spatial-indexing.rst @@ -857,7 +857,8 @@ grid_cell_intersection .. function:: grid_cell_intersection(left_chip, right_chip) - Returns the chip representing the intersection of two chips based on the same grid cell + Returns the chip representing the intersection of two chips based on the same grid cell. + Also, see :doc:`grid_cell_intersection_agg ` function. :param left_chip: Chip :type left_chip: Column: ChipType(LongType) @@ -912,7 +913,8 @@ grid_cell_union .. function:: grid_cell_union(left_chip, right_chip) - Returns the chip representing the union of two chips based on the same grid cell + Returns the chip representing the union of two chips based on the same grid cell. + Also, see :doc:`grid_cell_union_agg ` function. :param left_chip: Chip :type left_chip: Column: ChipType(LongType) diff --git a/docs/source/api/spatial-predicates.rst b/docs/source/api/spatial-predicates.rst index 09fc6fa31..c1c3c8288 100644 --- a/docs/source/api/spatial-predicates.rst +++ b/docs/source/api/spatial-predicates.rst @@ -67,6 +67,7 @@ st_intersects .. function:: st_intersects(geom1, geom2) Returns true if the geometry `geom1` intersects `geom2`. + Also, see :doc:`st_intersects_agg ` function. :param geom1: Geometry :type geom1: Column diff --git a/docs/source/api/vector-format-readers.rst b/docs/source/api/vector-format-readers.rst index 8d9b420e2..6a58cb961 100644 --- a/docs/source/api/vector-format-readers.rst +++ b/docs/source/api/vector-format-readers.rst @@ -47,7 +47,7 @@ The reader supports the following options: * vsizip - if the vector files are zipped files, set this to true (BooleanType) * asWKB - if the geometry should be returned as WKB (BooleanType) - default is false * layerName - name of the layer to read (StringType) - * layerNumber - number of the layer to read (IntegerType) + * layerNumber - number of the layer to read (IntegerType), zero-indexed .. function:: read.format("ogr").load(path) @@ -105,14 +105,15 @@ Each feature will be provided as 2 columns: The fields of the feature will be provided as columns in the DataFrame. The types of the fields are coerced to most concrete type that can hold all the values. -The reader supports the following options: +ALL options should be passed as String as they are provided as a Map +and parsed into expected types on execution. The reader supports the following options: * driverName - GDAL driver name (StringType) - * vsizip - if the vector files are zipped files, set this to true (BooleanType) - * asWKB - if the geometry should be returned as WKB (BooleanType) - default is false - * chunkSize - size of the chunk to read from the file per single task (IntegerType) - default is 5000 + * vsizip - if the vector files are zipped files, set this to true (BooleanType) [pass as String] + * asWKB - if the geometry should be returned as WKB (BooleanType) - default is false [pass as String] + * chunkSize - size of the chunk to read from the file per single task (IntegerType) - default is 5000 [pass as String] * layerName - name of the layer to read (StringType) - * layerNumber - number of the layer to read (IntegerType) + * layerNumber - number of the layer to read (IntegerType), zero-indexed [pass as String] .. function:: read.format("multi_read_ogr").load(path) @@ -157,7 +158,7 @@ The reader supports the following options: +--------------------+-------+-----+-----------------+-----------+ -spark.read().format("geo_db") +spark.read.format("geo_db") ***************************** Mosaic provides a reader for GeoDB files natively in Spark. The output of the reader is a DataFrame with inferred schema. @@ -166,7 +167,7 @@ The reader supports the following options: * asWKB - if the geometry should be returned as WKB (BooleanType) - default is false * layerName - name of the layer to read (StringType) - * layerNumber - number of the layer to read (IntegerType) + * layerNumber - number of the layer to read (IntegerType), zero-indexed * vsizip - if the vector files are zipped files, set this to true (BooleanType) .. function:: read.format("geo_db").load(path) @@ -210,7 +211,7 @@ The reader supports the following options: +--------------------+-------+-----+-----------------+-----------+ -spark.read().format("shapefile") +spark.read.format("shapefile") ******************************** Mosaic provides a reader for Shapefiles natively in Spark. The output of the reader is a DataFrame with inferred schema. @@ -219,7 +220,7 @@ The reader supports the following options: * asWKB - if the geometry should be returned as WKB (BooleanType) - default is false * layerName - name of the layer to read (StringType) - * layerNumber - number of the layer to read (IntegerType) + * layerNumber - number of the layer to read (IntegerType), zero-indexed * vsizip - if the vector files are zipped files, set this to true (BooleanType) .. function:: read.format("shapefile").load(path) diff --git a/docs/source/images/init_script.png b/docs/source/images/init_script.png index 335f19904..d141cd6c2 100644 Binary files a/docs/source/images/init_script.png and b/docs/source/images/init_script.png differ diff --git a/docs/source/usage/install-gdal.rst b/docs/source/usage/install-gdal.rst index 7e1b0c19b..f18b7eae8 100644 --- a/docs/source/usage/install-gdal.rst +++ b/docs/source/usage/install-gdal.rst @@ -31,8 +31,8 @@ the mos.setup_gdal() function. .. note:: (a) This is close in behavior to Mosaic < 0.4 series (prior to DBR 13), with new options to pip install Mosaic for either ubuntugis gdal (3.4.3) or jammy default (3.4.1). - (b) `to_fuse_dir` can be one of `/Volumes/..`, `/Workspace/..`, `/dbfs/..`; - however, you should consider `setup_fuse_install()` for Volume based installs as that + (b) 'to_fuse_dir' can be one of '/Volumes/..', '/Workspace/..', '/dbfs/..'; + however, you should consider setup_fuse_install()` for Volume based installs as that exposes more options, to include copying JAR and JNI Shared Objects. .. function:: setup_gdal() @@ -100,5 +100,8 @@ code at the top of the notebook: mos.enable_mosaic(spark, dbutils) mos.enable_gdal(spark) + +.. code-block:: text + GDAL enabled. - GDAL 3.4.3, released 2022/04/22 \ No newline at end of file + GDAL 3.4.1, released 2021/12/27 \ No newline at end of file