pyspark.sql.functions.kll_merge_agg_double#
- pyspark.sql.functions.kll_merge_agg_double(col, k=None)[source]#
Aggregate function: merges binary KllDoublesSketch representations and returns the merged sketch. The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.
New in version 4.1.0.
- Parameters
- Returns
ColumnThe merged binary representation of the KllDoublesSketch.
Examples
>>> from pyspark.sql import functions as sf >>> df1 = spark.createDataFrame([1.0,2.0,3.0], "DOUBLE") >>> df2 = spark.createDataFrame([4.0,5.0,6.0], "DOUBLE") >>> sketch1 = df1.agg(sf.kll_sketch_agg_double("value").alias("sketch")) >>> sketch2 = df2.agg(sf.kll_sketch_agg_double("value").alias("sketch")) >>> merged = sketch1.union(sketch2).agg(sf.kll_merge_agg_double("sketch").alias("merged")) >>> n = merged.select(sf.kll_sketch_get_n_double("merged")).first()[0] >>> n 6