When you pivot data, the values from the selected column or columns become new columns in the dataset, each of which contains a summary calculation that you specify. |
Tip: This transform is very useful in Preview mode for quick discovery and analysis. |
pivot col: Dates value:SUM(Sales) group: prodId |
Output: Reshapes your dataset to include a ProdId
column and new columns for each distinct value in the Dates
column. Each distinct ProdId
value is represented by a separate row in the reshaped dataset. Within each row, the new columns contain a sum of sales for the product for each date in the dataset.
pivot col:column_ref value: FUNCTION(arg1,arg2) [group: group_col] [limit:int_num] |
Token | Required? | Data Type | Description |
---|---|---|---|
pivot | Y | transform | Name of the transform |
col | N | string | Source column name or names |
value | N | string | Expression containing the aggregation function or functions used to pivot the source column data |
group | N | string | Column name or names containing the values by which to group for calculation |
limit | N | integer (positive) | Maximum number of unique values in a group. Default is 50. |
Identifies the column or columns to which to apply the transform.
Single-column example:
The specified column contains the values that become new columns in the dataset.
pivot col: autoBrand value:AVERAGE(autoPrice) group:State |
Output: Reshapes the dataset to contain the State
column followed by columns for each distinct value in the autoBrand
column. Each value in those columns is the average value in the autoPrice
column, as grouped by State
value.
Multi-column example:
For multiple input columns, the transform generates new columns for each combination of the inputs. See the example below.
Required? | Data Type |
---|---|
No | String (column name) |
For this transform, the value
parameter contains the aggregation function and its parameters, which define the set of rows to which the function is applied.
NOTE: For the |
Required? | Data Type |
---|---|
No | String containing a list of aggregation functions each containing one column reference |
For this transform, this parameter specifies the column whose values are used to group the dataset prior to applying the specified function. You can specify multiple columns as comma-separated values.
Required? | Data Type |
---|---|
No | String (column name) |
The limit
parameter defines the maximum number of unique values permitted in a group. If it is not specified, the default value for this parameter is 50
.
NOTE: Be careful setting this parameter too high. In some cases, the application can run out of memory generating the results, and your results can fail. |
Required? | Data Type |
---|---|
No. Default value is 50 . | Integer (positive) |
Source:
The following dataset contains information about sales across one weekend and two states for three different products.
Date | State | ProdId | Sales |
---|---|---|---|
3/9/16 | CA | Big Trike 9000 | 500 |
3/9/16 | NV | Fast GoKart 5000 | 200 |
3/9/16 | CA | SuperQuick Scooter | 700 |
3/9/16 | CA | Fast GoKart 5000 | 900 |
3/9/16 | NV | Big Trike 9000 | 300 |
3/9/16 | NV | SuperQuick Scooter | 250 |
3/10/16 | NV | Fast GoKart 5000 | 50 |
3/10/16 | NV | Big Trike 9000 | 400 |
3/10/16 | NV | SuperQuick Scooter | 150 |
3/10/16 | CA | Big Trike 9000 | 600 |
3/10/16 | CA | SuperQuick Scooter | 800 |
3/10/16 | CA | Fast GoKart 5000 | 1100 |
Transformation 1: Sum of sales by State for each Date
Apply this transformation in Preview only, just so you can see the results:
Cancel the transformation update.
State | sum_Sales_03/09/2016 | sum_Sales_03/10/2016 |
---|---|---|
CA | 2100 | 2500 |
NV | 750 | 600 |
Transform 2: Sum of Sales by Date for each State
Date | sum_Sales_CA | sum_Sales_NV |
---|---|---|
03/09/2016 | 2100 | 750 |
03/10/2016 | 2500 | 600 |
Cancel the transformation update again.
Transform 3: Sum of Sales by product ID for each State
ProdId | sum_Sales_CA | sum_Sales_NV |
---|---|---|
Big Trike 9000 | 1100 | 700 |
Fast GoKart 5000 | 2000 | 500 |
SuperQuick Scooter | 1500 | 300 |
The Pivot Columns transformation supports the ability to use the values in multiple columns to specify the columns that are generated. When two or more columns are used in the pivot, columns containing values of all possible combinations from the source columns are generated.
In the following source data, sales data on individual products is organized by brand, product name, and month:
Brand | Product | Month | Sales |
---|---|---|---|
AAA | Towels | January | 95 |
AAA | Napkins | January | 113 |
B | Towels | January | 99 |
B | Tissues | January | 88 |
AAA | Towels | February | 108 |
AAA | Napkins | February | 91 |
B | Towels | February | 85 |
B | Tissues | February | 105 |
AAA | Towels | March | 81 |
AAA | Napkins | March | 92 |
B | Towels | March | 112 |
B | Tissues | March | 104 |
Transformation:
If you wanted to create summary sales information for each product by month, you might choose to create a pivot
on the Product
column. However, if you look at the column values, you might notice that the Paper towels
product is available for both the AAA
and B
brands. In this case, you must perform a multi-column pivot on these two columns, like the following:
Results:
Month | avg_Sales_B_Towels | avg_Sales_B_Tissues | avg_Sales_AAA_Napkins | avg_Sales_AAA_Towels |
---|---|---|---|---|
January | 99 | 88 | 113 | 95 |
February | 85 | 105 | 91 | 108 |
March | 112 | 104 | 92 | 81 |
You can use the pivot
transform to perform aggregation calculations on values in a column.
Source:
In the following table, you can review test results from three different tests for 10 students:
StudentId | TestId | Score |
---|---|---|
s001 | t001 | 98 |
s001 | t002 | 98 |
s001 | t003 | 87 |
s002 | t001 | 92 |
s002 | t002 | 96 |
s002 | t003 | 79 |
s003 | t001 | 99 |
s003 | t002 | 76 |
s003 | t003 | 94 |
s004 | t001 | 93 |
s004 | t002 | 99 |
s004 | t003 | 80 |
s005 | t001 | 79 |
s005 | t002 | 80 |
s005 | t003 | 84 |
s006 | t001 | 93 |
s006 | t002 | 74 |
s006 | t003 | 89 |
s007 | t001 | 86 |
s007 | t002 | 81 |
s007 | t003 | 97 |
s008 | t001 | 82 |
s008 | t002 | 73 |
s008 | t003 | 79 |
s009 | t001 | 96 |
s009 | t002 | 97 |
s009 | t003 | 79 |
s010 | t001 | 80 |
s010 | t002 | 79 |
s010 | t003 | 75 |
Transformation:
For each student, you're interested in a student's average score and maximum score. You create the following pivot transform, which groups computations of average score and maximum score columns by studentId
value.
For the generated average column, you want results rounded to two decimal places:
Results:
StudentId | average_Score | max_Score |
---|---|---|
s001 | 94.33 | 98 |
s002 | 89 | 96 |
s003 | 89.67 | 99 |
s004 | 90.67 | 99 |
s005 | 81 | 84 |
s006 | 85.33 | 93 |
s007 | 88 | 97 |
s008 | 78 | 82 |
s009 | 90.67 | 97 |
s010 | 78 | 80 |