《Pro Oracle SQL》Chapter 八 - 8.7 Other Analytic Functions 之三

《Pro Oracle SQL》Chapter 8 -- 8.7 Other Analytic Functions 之三

NTILE       (page 263)
    The  NTILE  function divides an ordered set of rows in a data partition, groups them in to buckets , and
assigns a unique group number to each group.
This function is useful in statistical analysis. For example,
if you want to remove the outliers (values that are outside the norm), you can group them in the top or
bottom buckets and eliminate those values from the statistical analysis. Oracle Database statistics
collection packages also use NTILE  functions to calculate histogram boundaries. In statistical
terminology, the  NTILE  function creates equi-width histograms. 

     NTILE函数在一数据分区中划分一有序行集,将他们分组成 ,再赋予每一组唯一的组号。该函数在统计分析中很有用。 例如,如果你想要去除极端值(超出正常范围的值),你能把它们归组到头或者末位桶,再从统计分析的数据中把它们清除掉。Oracle数据库统计集合包也使用NTILE函数计算柱状图边界。用统计术语说,NTILE函数创建等宽柱状图。
    The number of buckets is passed as the argument to this analytic function. For example, ntile(100)
will group the rows into 100 buckets, assigning an unique number for each bucket. This function does
not support windowing clauses, however.
    桶的数量作为参数传给该分析函数。例如,ntile(100)将把行集分成100桶,赋予每桶唯一的数。然而,该函数不支持开窗子句。
    In the Listing 8-18, you split a data partition into 10 buckets using the clause  ntile (10).  Rows are
sorted by the Sale column in the descending order. The NTILE  function groups rows into buckets with
each bucket containing equal number of rows. Since the rows are sorted by the Sale column values in
descending order, rows with lower group numbers have higher Sale column value. Outliers in the data
can be easily removed with this technique. 
    在列表8-18中,你使用子句ntile(10)划分数据区成10桶。行集按Sale列降序排列。NTILE函数将行集归组成桶,使得各桶包含相等数量的行。由于行集按Sale列降序排列,低组号的行集对应的Sale列值高。使用这种技术数据区中的极端值容易被去除。
    There may be a row count difference of at most 1 between the buckets if the rows can not be divided
equally.
In this example, rows for the year= 2001 is divided in to 10 buckets, each bucket having 5 rows,
but the last bucket 10 has only 4 rows.
    如果行集不能均匀划分可能使得桶之间的行数最多相差1。 在本例中,year=2001的行集划分成10桶,每桶有5行,但是最后第10桶只有4行。
Listing 8-18.  NTILE Function
 1  select  year, week,sale,
 2     ntile (10) over(
 3            partition by product, country, region , year
 4            order by sale desc
 5            ) group#
 6    from sales_fact
 7*   where country in ('Australia')  and product ='Xtend Memory'
 
 YEAR WEEK       SALE     GROUP#
----- ---- ---------- ----------
 2001   16     278.44          1
 2001    4     256.70          1
 2001   21     233.70          1
 2001   48     182.96          1
 2001   14     162.91          1
...
 2001   52      23.14          9
 2001   50      23.14         10
 2001    6      22.44         10
 2001   23      22.38         10
 2001   18      22.37         10
 
    The  NTILE  function is useful in real world applications such as dividing total work among N parallel
processes. Let’s say you have ten parallel processes; you can divide the total work into 10 buckets and
assign each bucket to a process.  
    NTILE 函数在现实应用中很有用,诸如划分整体工作成N个并发的进程。我们说你有十个并发的进程;你可划分整个工作成10个桶然后赋予每个桶一个进程。

Stddev
    The  stddev function can be used to calculate standard deviation among a set of rows in a data partition or in the result set if no partitioning clause is specified. This function calculates the standard deviation, defined as square root of variance, for a data partition specified using a partitioning clause. If
partitioning clause is not specified, this function calculates the  stddev for all rows in the result set.
    stddev能被用于 计算 在一数据分区,或者若没有指定分区子句则在结果集,的行集中的标准差。 对于一使用分区子句指定的数据分区,该函数计算标准差,定义平方根的变化 。如果分区子句没有指定,该函数计算结果集所有行的标准差。
    In the Listing 8-19, the clause  stddev (sale) is calculating the stddev on Sale column among the
rows in a data partition. Partitioning clause  partition by product, country, region, year specifies
the partitioning columns. The windowing clause  rows between unbounded preceding and unbounded
following  specifies the window as all rows in that data partition. Essentially, this SQL is calculating the
standard deviation on Sale column amongst all rows in a data partition. 
    在列表8-19中,子句stdev(sale)计算在一数据分区行集中Sale列的标准差。分区子句 partition by product, country, region, year 指定分区列。开窗子句 rows between unbounded preceding and unbounded following指定那个数据分区的所有行为窗口。本质上,这条SQL计算在数据分区上所有行间Sale列的标准差。
    Standard deviation can be calculated at coarser or granular level by specifying appropriate
partition-by  clause and windowing clause.
    能够通过指定适当的分区子句和开窗子句在粗粒度细粒度 层上计算标准差。

Listing 8-19.  STDDEV Function
1  select  year, week,sale,
2     stddev (sale) over(
3            partition by product, country, region , year
4            order by Sale desc
5            rows between unbounded preceding and unbounded following
6            ) stddv
7    from sales_fact
8    where country in ('Australia')  and product ='Xtend Memory'
9* order by year, week
 
 YEAR WEEK       SALE      STDDV
----- ---- ---------- ----------
 ...
 2000   50      21.19 49.8657423
 2000   52      67.45 49.8657423
 2001    1      92.26 59.1063592
 2001    2     118.38 59.1063592
 2001    3      47.24 59.1063592
 ...
    There are various other statistics functions that can be used to calculate statistical metrics; for
example, stddev_samp  calculates the cumulative sample standard deviation ,  stddev_pop calculates the
population standard deviation , etc. Detailed discussion about various statistics functions is out of the
scope of this book, however.
    有多种其他的统计函数能用来计算统计标准;例如stddev_samp计算 累积样本标准差 ,stddev_pop计算总体标准差 ,等等。然而,关于 各种统计函数的 详细讨论超出了本书的范围。
 

Listagg 
    Oracle Database version 11gR2 introduced another analytic function, the Listagg  function, which is very useful in string manipulation. This analytic function provides the ability to convert column values from
multiple rows in to a list format.
For example, if you want to concatenate all the employee names in a
department, then you can use this function to concatenate all names in to a list.
   
Oracle数据库版本11gR2 引入了另一个分析函数,Listagg函数,在字符串操作上十分有用。该分析函数提供把多行列值转换成一格式化列表的能力。 例如,如果你想连结某部门的所有员工名字,你就能用这个函数吧所有名字连成一列表。
Syntax for this function is of the format:     该函数的语法格式:
 
Listagg (string, separator ) within group (order-by-clause)
 Over (partition-by-clause )

 
    Syntax for the Listagg  function uses the clause  within group (order-by-clause)  to specify sorting
order. This clause is similar to  order-by  clause in other analytic functions. The first argument to this
function is the string or column name to concatenate. The second argument is the separator for the
values.
In the Listing 8-20, the partitioning clause is not specified and rows are ordered by the Country
column in the descending order. The output shows that country names are converted to a list separated
by comma.
   
Listagg  函数使用子句 within group (order-by-clause) 定义排序顺序。该子句类似于 其他分析函数中的order-by子句。该函数的第一个参数是要连结的字符串或者列名 。第二个参数是这些值的分隔符。 在列表8-20中,分区子句没有指定且行按照Country列降序排列。输出展示,国家名称被转换成由逗号分隔的列表了。
    Note that Listagg  function does not support windowing clauses.

    注意Listagg函数也不支持开窗子句。
Listing 8-20.  LISTAGG  Function
  1  select listagg (country, ',') 
  2    within group (order by country desc)
  3   from (
  4    select distinct country from sales_fact
  5    order by country
  6*  )
 
LISTAGG(COUNTRY,',')WITHINGROUP(ORDERBYCOUNTRYDESC)                             
--------------------------
United States of America,United Kingdom,Turkey,Spain,Singapore,
Saudi Arabia,Poland,New Zealand, Japan,Italy,Germany,France,
Denmark,China,Canada,Brazil,Australia,Argentina