• Calculating the percentile

    From Anton Shepelev@21:1/5 to All on Mon Mar 11 14:42:48 2024
    Hello, all (which is how many?)

    I am trying to solve a trivial problem, and my solution
    turns out unexpectedly cumbersome and likeky suboptimal.
    Given a table storing values (val) over dimension (dim), I
    want to query the total, mean and median of the value over
    the dimension. My solution is:

    SELECT
    dim ,
    SUM(val) tot,
    AVG(val) avg,
    med
    FROM
    ( SELECT
    dim,
    val,
    PERCENTILE_CONT( 0.5 )
    WITHIN GROUP (ORDER BY val)
    OVER (PARTITION BY dim) AS med
    FROM #test
    ) w_med
    GROUP BY dim, med

    I had to wrap the percentile calculation into a subquery
    (w_med) because PERCENTILE_CONT() does not seem to work like
    a typical aggregate function, accepting the dimension in a
    separate (and mandatory) OVER clause. The subquery then must
    be GROUPed not only by the dimension, but also by the
    calculated median value, which is redundant because there is
    only one median for each dimension. Is there a better way?

    --
    () ascii ribbon campaign -- against html e-mail
    /\ www.asciiribbon.org -- against proprietary attachments

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Erland Sommarskog@21:1/5 to Anton Shepelev on Thu Mar 14 14:55:22 2024
    Anton Shepelev (anton.txt@g{oogle}mail.com) writes:
    I am trying to solve a trivial problem, and my solution
    turns out unexpectedly cumbersome and likeky suboptimal.
    Given a table storing values (val) over dimension (dim), I
    want to query the total, mean and median of the value over
    the dimension. My solution is:

    SELECT
    dim ,
    SUM(val) tot,
    AVG(val) avg,
    med
    FROM
    ( SELECT
    dim,
    val,
    PERCENTILE_CONT( 0.5 )
    WITHIN GROUP (ORDER BY val)
    OVER (PARTITION BY dim) AS med
    FROM #test
    ) w_med
    GROUP BY dim, med


    If you are on SQL 2022, you can use the new function APPROXIMATIVE_PERCENTILE_CONT which is a regular aggregate function.
    But as the name says, it is approximative. It's intended for large
    data sets where exact calculation would take a lot of time. I don't
    know how well it works on smaller data sets.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)