Guides
Submit Documentation FeedbackJoin Developer CommunityLog In

Data Specification

❗️

Deprecation Notice

Enriched Events Export replaces Results Export and Raw Events Export as Optimizely's source-of-truth events dataset. Results Export and Raw Events Export will no longer be supported beginning November 15, 2020. Please refer to the Enriched Events Export documentation for how to get started.

Technical details

The Raw Events Export process generates multiple files containing raw event data collected in the past 24 hours (00:00 - 23:59 UTC). The files are tab-delimited and compressed with gzip for faster download. The first file in each daily partition includes a header row.

AWS S3

This section describes the Raw Events Export files that you’ll retrieve from your Optimizely AWS S3 bucket. See the AWS documentation for more information about making requests to Amazon S3.

The S3 file location follows the format:

s3://optimizely-export-ng/{accountId}/{projectId}/2.0/yyyy/mm/dd/{experimentId}/{fileName}

Legend

  • optimizely-export-ng: S3 bucket name
  • accountId: Unique account identifier
  • projectId: Unique project identifier
  • 2.0: Export process version number
  • yyyy/mm/dd: Creation date of the export
  • experimentId: Unique experiment identifier
  • fileName is one or more file parts that stores the results records in Parquet format. The fileName follows the format experimentId-filepartnum-yyyy-mm-dd-r-reducernum.gz

Notes

  • If the events don't contain an experiment ID, Optimizely exports them in the following path: s3://optimizely-export-ng/{accountId}/{projectId}/2.0/yyyy/mm/dd/none/{fileName}
  • The unzipped files don't have a file extension but use the same TSV format.
  • As experiment record volume increases, the number of files generated may scale in a linear fashion.

Status file

The daily partition files are ready for import when the status.yaml file is available at

s3://optimizely-export-ng/{accountId}/{projectId}/2.0/yyyy/mm/dd/status.yml

A status file (status.yaml) is included within each daily partition to track the success or failure of the Raw Events Export job. The status files contain the following information:

  • failed_exports: List of experiment IDs. Generally, this section will be empty.
  • successful_exports: List of experiment IDs.
  • timestamp: In UTC seconds since Unix epoch.

View a sample YAML file with and without failed export files.

Export process notes

  • The Optimizely job kicks off at 03:00 UTC and processes the data received the previous day (00:00-23:59 UTC); process completion time depends on data volume.
  • If a single day's folder isn't present for some project, but other folders are present for that day for other projects, that means the job succeeded and there isn't activity to export for that day for those specific projects.
  • If all folders for all projects for a single day are absent, that could mean:
    • There's no experiment activity across the account for that day.
    • The job is still running.
    • The job has failed (all files are transferred in one step).
  • If data files are present but the status.yaml file isn't, the status file may still be in process or that file itself may have failed (it's a separate step after data file transfer).
  • Should a given day or project be reprocessed later, those experiment files will be deposited in the folder corresponding to the same date when the data was captured, not when the job was rerun.
  • Normally, if an experiment's file is listed as failed, the file should not be present. If a failed file is present, don't process the file. Optimizely will replace it once the job reruns successfully.

Schema field descriptions

As these files are TSVs, nulls will be empty tabs.

Field and Type

Description

timestamp
positive integer

The timestamp of when the event occurred in the browser or app. The format is a number representing the number of milliseconds since Unix epoch.

project_id
integer

Your Optimizely project ID on which the campaign and/or experiment lives.

campaign_id
integer

The campaign ID (also known as layer ID).
Web Experimentation and Personalization: Value can be found in the API Names tab.
Full Stack: Value can be found in the project's JSON data file.

experiment_id
integer

The experiment ID.
Web Experimentation and Personalization: Value can be found in the API Names tab.
Full Stack: Value can be found in the project's JSON data file.

variation_id
integer

The ID Optimizely uses to identify the variation the visitor saw.
Web Experimentation and Personalization: Value can be found in the API Names tab.
Full Stack: Value can be found in the project's JSON data file.

layer_holdback
string

Boolean value that indicates whether the visitor was placed in the campaign's or experiment's holdback group. Values are either true or false.

audience_names
alphanumeric string representing an array

An array containing the name of the audience for which the visitor qualified to be placed in the campaign and experiment.
Web Experimentation and Personalization: If your snippet masks descriptive names, this will be the audience ID (of the form [Aud 1234567890]). It can be mapped to Audience Name on the Campaign Overview screen, API Names tab.
Full Stack: This mapping is available in the project's JSON data file.

end_user_id
alphanumeric string

Alphanumeric string concatenated with a Unix timestamp. Example: oeu1460584472759r0.9885484367665214
Web Experimentation and Personalization: This is the anonymous optimizelyEndUserId value stored in a cookie and local storage. It represents a unique visitor.
Full Stack: This is the user ID provided by your app.

uuid
string

Ignore; null. uuid is not currently supported in Optimizely X.

session_id
alphanumeric string

A unique session identifier.
Web Experimentation and Personalization: Set to AUTO by default.
Full Stack: This is null and can be ignored.

snippet_revision
integer

Web Experimentation and Personalization: The revision number of the Optimizely snippet that was served in this visitor's browser.
Full Stack: The revision number of your datafile that was compiled into the SDK at the time of event firing.

user_ip
IPv6 address format

IP address of the visitor associated with this tracking call. If you employ IP anonymization, the last octet will be a 0 (zero) for all tracking calls made to Optimizely. The full IP address won't be stored anywhere and can't be retrieved later.

user_agent
alphanumeric string

Web Experimentation and Personalization: The userAgent header passed from the browser.
Full Stack: The package or code language that initiated this tracking call.

user_engine
alpha string

Language or stack in which the Optimizely snippet or SDK was served. For example, a value of js will be shown for the Web snippet.

referer
alphanumeric URL

Web Experimentation and Personalization: The referring URL in the browser.
Full Stack: This will be null and can be ignored.

global_holdback
alpha string

Ignore; will always be false. A global holdback isn't currently supported in Optimizely X.

event_type
alpha string

Web Experimentation and Personalization: The type of event recorded by Optimizely. Values are view_activated or other. view_activated indicates the activation of a page (view), and other could be a click or custom event. Refer to the event_name column for more details.
Full Stack: This will be null and can be ignored.
For all products, if the row represents a bucketing decision event, this field will be null.

event_name
alphanumeric string or integer

The API name of the click or custom event. event_name will be an alphanumeric string if event_type is other or an integer if event_type is view_activated.
Web Experimentation and Personalization: If event_type equals view_activated, this value will be the page ID.
For all products, if the row represents a bucketing decision event, this field will be null.

user_features
large alphanumeric string representing an array of JSON objects

Web Experimentation and Personalization: An array of JSON objects of Optimizely customer-defined behavioral attributes (if Personalization is enabled), custom dimensions and/or user attributes, and Optimizely standard segments. Each object will have a type, name, and value. These values are all optional.
Full Stack: This will be an array of JSON objects containing customer-defined attributes. Optimizely default segments include: first_session; browser_id; AdWords campaign value if source_type is campaign; device; source_type: traffic source; timestamp in seconds since Unix epoch; offset: number of minutes behind UTC, indicates the timezone in which the event was fired.

active_views
string

Deprecated. For all products this field is null.

event_features
large alphanumeric string representing an array of JSON objects

Web Experimentation and Personalization: An array of JSON objects of any page or event tags or categories defined for this event.
Full Stack: An array of JSON objects containing customer-defined tags.
For all products, if the row represents a bucketing decision event, this field will be null.

event_metrics
alphanumeric string representing an array of JSON objects

If revenue is captured for this event, a JSON object array indicating revenue as the name and the value in cents. Note: Revenue will never be a decimal. Optimizely preserves the converted value into cents as an integer.
For all products, if the row represents a bucketing decision event, this field will be null.

event_uuid
alphanumeric string

A unique identifier for this event. Clients usually set this value with any UUID-generating method. The field can be used to deduplicate events that are accidentally or erroneously replayed.

Usage notes

  • Raw event data contains events from users who may or may not count for an experiment.
  • Events may exist in the export outside of the time frame in which an experiment ran.
  • For Optimizely X Web Experimentation and Personalization Data Export files:
    • Two types of rows are represented: a bucketing decision and an event. The bucketing decision will have all of event_type, event_name, event_features, and event_metrics as null. Event rows will have these fields populated with the data described above, and you will see one event row for all campaign (layer) IDs that are active in the snippet, not just the campaign IDs into which the visitor was bucketed. Given this construct, all event rows that are chronologically subsequent to a given bucketing decision row need to be matched to the campaign (or experiment or variation) in the bucketing decision row. This must be done to answer whether the event occurred while the visitor was bucketed. You can further look back to your experiment setup to determine if this event was attached as a metric to that campaign or experiment at the time of this event firing.
    • Multiple instances of the same event may happen within the same session. For all files, the user_features column may help indicate uniqueness.

Retention Policy

To comply with GDPR requirements, Optimizely retains the files in your Data Export bucket for both export services for 30 days. Older data is automatically deleted. To retain the data for a longer period of time, ensure that your import process archives the files to your data warehouse at least once every 30 days.

Encryption

Your Data Export data is encrypted. To access the data, you need one of the following clients, at minimum:

  • Amazon S3 console
  • AWS CLI version 1.11.108 and later
  • AWS SDKs released after May 2016

Other Resources

For more information, see: