Step 4: Provisioning Datasets

Now that you've created some data connections for this tutorial's synthetic dataset library, you need to configure them as datasets for use in the clean room that you created in step 3. To provision these data connections as datasets, you need the following IDs:

  • The cleanroom ID generated in step 3
    • This can be found via a GET call to the API or in the URL that is displayed when entering the clean room via the UI.
  • The data connection IDs for the data connections you wish to configure.
    • This can be found either by performing a GET operation to get all data connections available to your organization or in the URL in the UI when viewing the job's details. For example, d1fb5f7b-5704-4b11-9e64-58b391ca3f84 is the data connection ID in the following URL: https://connect.liveramp.com/cleanroom/ingestion/jobs/d1fb5f7b-5704-4b11-9e64-58b391ca3f84

This example will use the cleanrooms endpoint to provision 8 of the synthetic data connections as datasets in our tutorial clean room.

For more information, see the following calls:


# Define the data connections you need to provision. Remember to replace these IDs with those found for the data connections for your organization.
data_connection_ids = [
    "<YOUR Synthetic Data Connection ID 1>",
    "<YOUR Synthetic Data Connection ID 2>",
    "<YOUR Synthetic Data Connection ID 3>",
    "<YOUR Synthetic Data Connection ID 4>",
    "<YOUR Synthetic Data Connection ID 5>",
    "<YOUR Synthetic Data Connection ID 6>",
    "<YOUR Synthetic Data Connection ID 7>",
    "<YOUR Synthetic Data Connection ID 8>"
]

# Give RampID consent

ramp_id_consent = {
    "rampIDConsent" : "true"
}

# Construct for loop to provision them to the clean room. Replace the clean room ID with your own clean room ID.

responses = {}

for data_connection_id in data_connection_ids:
    response = cleanroom_api.create_cleanroom_datasets("<YOUR clean room id>",data_connection_id,ramp_id_consent)

responses[data_connection_id] = response

    # Print formatted response
print(json.dumps(response, indent=4)) 

(Optional) Further Dataset Filtering and Configuration

For some datasets, you may want to filter out or exclude certain fields or values depending on your goals with your partners. This example shows how you would perform this type of configuration for one of the datasets provisioned in step 5 below.

Important: Replace the IDs with your ID.

# Define the dataset ID you want to update. This can be retrieved with a GET command or from the URL of the dataset's UI in Edit configuration. It is the last ID in the URL.

# dataset_id = "<YOUR dataset id>"

# Define dataset field mappings as an array
# cleanroom_dataset_details = {
#    "datasetFieldMappings" : [
#        {
#        "fieldIndex" : "1",
#        "fieldName" : "CID",
#        "fieldLabel" : "CID",
#        "isExcluded" : "false",
#        "rename" : ""
#        },
#        {
#        "fieldIndex" : "2",
#        "fieldName" : "age",
#        "fieldLabel" : "age",
#        "isExcluded" : "false",
#        "rename" : ""
#        },
#        {
#        "fieldIndex" : "3",
#        "fieldName" : "education",
#        "fieldLabel" : "education",
#        "isExcluded" : "false",
#        "rename" : ""
#        },
#        {
#        "fieldIndex" : "4",
#        "fieldName" : "income_range",
#        "fieldLabel" : "income_range",
#        "isExcluded" : "false",
#        "rename" : ""
#        },
#        {
#        "fieldIndex" : "5",
#        "fieldName" : "num_of_children",
#        "fieldLabel" : "num_of_children",
#        "isExcluded" : "false",
#        "rename" : ""
#        },
#        {
#        "fieldIndex": "6",
#        "fieldName" : "sex",
#        "fieldLabel": "sex",
#        "isExcluded": "true",
#        "rename" : ""
#        }
#    ],
#    "datasetFilters":[
#        {
#        "datasetId": "<YOUR dataset id>",
#        "datasetFieldName": "num_of_children",
#        "operator": "EQUALS",
#        "values": ["2"]
#        }
#    ]
#}

# Update dataset

# response = cleanroom_api.put_cleanroom_datasets("<YOUR cleanroom_id>",dataset_id,cleanroom_dataset_details)
# print(json.dumps(response, indent=4))