Step 5: Creating Analytical Questions

Now that you have provisioned datasets to the clean room, you are ready to create an analytical question to get some insights.

Many customers will build templated questions for their organizations and then provision them to several clean rooms across partners. However, this tutorial assumes we are creating a clean room question for the clean room we created in step 4. The difference is that we specify the clean room ID in which to create the question.

This tutorial also assumes we are using only the data we provisioned in step 4 because we have the power to assign those datasets. In real clean room use cases where you use both your datasets and partner datasets, you may need to ask partners to assign their datasets once before running questions programmatically.

This example draws from LiveRamp's Quick Start Insights product to create an analytics question for API testing by recreating the "What is the ROAS of my campaign based on first touch with custom attribution window?" question from these clean rooms. This example illustrates several concepts for authoring questions:

  • Dataset types: Macros that are stand-ins for datasets, which will be assigned. Their use means this type of question can be reused regardless of which datasets you will assign.
  • Runtime parameters: Macros that are used to stand-in for variable inputs and may change each time you execute the question
  • Dimensions: Outputs on which to group results by
  • Measures: Aggregations you perform on the datasets

The following example uses the questions endpoint.

For more information, see the POST /question call.

Define an Analytical Question

### Defining an Analytics Question

### Define the query string. Note: We are using real field names rather than macros because we know which datasets will be assigned.

query_str = """
with traffic AS (
  SELECT
    a.cid as pub_cid,
    b.rampid as pub_rampid,
    a.advertiser_name,
    a.campaign_name,
    event_type,
    a.server_timestamp AS traffic_dt,
    a.impression_id
  FROM @adlog as a
  join @pubmap as b on a.cid = b.cid
  where
   a.cid IS NOT NULL and b.cid IS NOT NULL
   and b.rampid != 'UNKNOWN' and b.rampid is not null and b.rampid != '0' and b.rampid != 'UNMATCHED'),
conversion AS (
  SELECT
    a.transaction_id,
    a.gross_amt,
    a.cid as rmn_cid,
    b.rampid as rmn_rampid,
    a.conversion_timestamp as conv_dt
  FROM
    (select transaction_id, cid, conversion_timestamp, sum(gross_amt) as gross_amt from @transaction group by 1,2,3) a
    join @rmnmap b on a.cid = b.cid
  where
   a.cid IS NOT NULL and b.cid IS NOT NULL
   and b.rampid != 'UNKNOWN' and b.rampid is not null and b.rampid != '0' and b.rampid != 'UNMATCHED'),
matched_latency AS (
  SELECT
    c.transaction_id,
    t.impression_id,
    t.pub_cid,
    t.advertiser_name,
    t.campaign_name,
    (case when t.event_type = 'click' then 1 else 2 end) AS match_type,
    (conv_dt - traffic_dt) / 86400 AS match_age,
    max(c.gross_amt) as gross_amt
  FROM
    conversion c
    JOIN traffic t ON c.rmn_rampid = t.pub_rampid
  group by 1,2,3,4,5,6,7),
first_ranked AS (
  SELECT
    advertiser_name,
    campaign_name,
    transaction_id,
    ROW_NUMBER() OVER(
      PARTITION BY transaction_id
      ORDER BY
        match_age DESC, -- first touch
        match_type ASC
    ) AS match_rank,
    match_age,
    match_type,
    gross_amt,
    pub_cid
  FROM
    matched_latency
  WHERE
    (match_type=1 and match_age BETWEEN 0 AND @click_attribution_window)
    OR (match_type=2 and match_age BETWEEN 0 AND @imp_attribution_window)
),
mc_summary as (
  SELECT 
    sum(gross_amt) as mc_gross_amt,
    count(distinct transaction_id) as mc_conversions
  from
    (SELECT
      a.transaction_id,
      a.cid as rmn_cid,
      max(a.gross_amt) as gross_amt
    FROM
      (select transaction_id, cid, conversion_timestamp, sum(gross_amt) as gross_amt from @transaction group by 1,2,3) a
      join @rmnmap b on a.cid = b.cid
      join @pubmap c on b.rampid = c.rampid
    where
      a.cid IS NOT NULL and b.cid IS NOT NULL
      and b.rampid != 'UNKNOWN' and b.rampid is not null and b.rampid != '0' and b.rampid != 'UNMATCHED'
    group by 1,2)
    ),
tc_summary as (
  SELECT 
    sum(gross_amt) as tc_gross_amt,
    count(distinct transaction_id) as tc_conversions
  from @transaction
  where 
    cid IS NOT NULL
),
scaling_factor as (
  SELECT 
    tc.tc_gross_amt / mc.mc_gross_amt AS gross_amt_ratio,
    tc.tc_conversions / mc.mc_conversions AS conversions_ratio
  FROM 
    tc_summary as tc, 
    mc_summary as mc
),
spend as (
  SELECT 
    advertiser_name,
    campaign_name,
    sum(cost) as cost
  FROM @adlog
  WHERE 
    cid IS NOT NULL
  group by 1,2
)
Select
    advertiser_name,
    campaign_name,
    'First Touch' as attribution_model,
    SUM(gross_amt) as attributed_gross_amt,
    SUM(gross_amt) * (select gross_amt_ratio from scaling_factor) as attributed_gross_amt_scaled,
    count(distinct transaction_id) as attributed_conversions,
    count(distinct transaction_id) * (select conversions_ratio from scaling_factor) as attributed_conversions_scaled,
    max(cost) as cost
from first_ranked
join spend using (advertiser_name, campaign_name)
WHERE match_rank = 1
GROUP BY 1,2,3
""" 

Define question details

# Define question details which will go into the body of the API call. Note: We do not need to define dataSetTypeFields because we referenced known schemas in the query.

question_details = {
    "questionDetails" : {
        "title" : "API Tutorial: What is the ROAS of my campaign based on first touch with custom attribution window?",
        "category" : "API Tutorial",
        "tags" : ["API Tutorial"],
        "description" : "API Tutorial ROAS question",
        "hasDateParameters" : "false",
        "hasDateFilter" : "false",
        "isAnalytics" : "true",
        "isUserListQuestion" : "false",
        "isCleanComputeQuestion" : "false",
        "outputFormat" : "REPORT",
        "visualization" : "NO_CHART",
        "hasMultipleOutputs" : "false",
        "queryDetails" : [
            {"query": query_str, 
             "cleanRoomType": "Hybrid", 
             "queryLanguage": "SQL"}
            ],
        "dataTypes" : [
            {
                "macro": "adlog",
                # Replace with the dataset ID for publisher adlogs in your own tutorial clean room.
                "datasetID": "<YOUR dataset id>",
                "importDataType": "Generic",
                "dataSetTypeFields": []
            },
            {
               "macro": "pubmap",
                # Replace with the dataset ID for the publisher synthetic rampid mapping in your own tutorial clean room.
                "datasetID": "<YOUR dataset id>",
                "importDataType": "Generic",
                "dataSetTypeFields": []
            },
            {
                "macro": "transaction",
                # Replace with the dataset ID for the advertiser transaction synthetic dataset in your own tutorial clean room.
                "datasetID": "<YOUR dataset id>",
                "importDataType": "Generic",
                "dataSetTypeFields": []
            },
            {
            "macro": "rmnmap",
                # Replace with the advertiser RampID mapping dataset ID for this synthetic dataset in your own tutorial clean room.
                "datasetID": "<YOUR dataset id>",
                "importDataType": "Generic",
                "dataSetTypeFields": []
            }
        ],
        "parameters" : [
            {
                "name" : "click_attribution_window",
                "type" : "INTEGER"
            },
            {
                "name" : "imp_attribution_window",
                "type" : "INTEGER"
            }
        ],
        "measures" : [
            {
                "name" : "ATTRIBUTED_GROSS_AMT_SCALED",
                "type" : "DOUBLE",
                "isIdentity" : "false",
                "noise" : "false"
            },
            {
                "name" : "ATTRIBUTED_GROSS_AMT",
                "type" : "DOUBLE",
                "isIdentity" : "false",
                "noise" : "false"
            },
            {
                "name" : "ATTRIBUTED_CONVERSIONS",
                "type" : "LONG",
                "isIdentity" : "false",
                "noise" : "false"
            },
            {
                "name" : "ATTRIBUTED_CONVERSIONS_SCALED",
                "type" : "DOUBLE",
                "isIdentity" : "false",
                "noise" : "false"
            },
            {
                "name" : "COST",
                "type" : "DOUBLE",
                "isIdentity" : "false",
                "noise" : "false"
            }
        ],
        "dimensions" : [
            {
                "name" : "ADVERTISER_NAME",
                "type" : "STRING"
            },
            {
                "name" : "CAMPAIGN_NAME",
                "type" : "STRING"
            },
            {
                "name" : "ATTRIBUTION_MODEL",
                "type" : "STRING"
            }
        ],
        "userListOutputs" : []
        
    },
    "id" : "",
    "cleanRoomID" : "<YOUR cleanRoomID>",
    "organizationID" : "<YOUR organizationID>",
    "settings" : {
        "queryValidation" : "false"
    },
    "cleanRoomID" : "<YOUR cleanRoomID>"
    
}

Create the Question

# Create the question

response = cleanroom_api.create_questions(question_details)
print(json.dumps(response, indent=4))

# Don't forget to capture the cleanroom question ID in the response.

Note: For list and Clean Compute questions, the body format is roughly the same. The main difference is whether the question uses isUserListQuestion or isCleanComputeQuestion.