Create and manage data sets

A data set is a container that holds data and information about the data’s structure, layout, and format. It is a single store of related data added to the data library from an outside source. A data set cannot be empty, that is, it cannot just have column headers without rows of data.

When a new, undefined set of data is initially added to your data library, Data Management creates a description of the data structure and characteristics, called a signature. This signature is part of the data set, and it allows Data Management to recognize and organize future contributions of data that have the same signature. The new data set then must be defined through data modeling (see How data modeling works).

When you add new data to an existing data set, that is called adding a contribution (see Contribute data to a data set).

Create a new data set

When you add new data to your data library by uploading a data file from your computer, and that data’s structure and characteristics (its signature) do not match the signature of data already in the library, a new data set is created.

NOTE   The data file you upload must be in Excel (XLSX), comma separated values (CSV), bar separated values (BSV), or tab separated values (TSV) format. While BSV and TSV are not standard file extensions, Data Management uses them to help remind you of the structure of the files you are uploading. It is not necessary for you to change the more common TXT file extension to BSV or TSV in order for Data Management to recognize the format when you select your file.

Create a new data set from a data file on your computer

  1. Select Data Explorer > Data Library to display the Data Library page.
  2. in the Data Set panel, select any data set.
  3. Click the Create New Data Set button to display the New Data Set dialog box.
  4. In the Choose Data Task section, select File Uploaders in the left pane and Immediate File Uploader in the right pane.
  5. Click Next to view the Immediate File Uploader dialog box.
  6. Click Browse (or Choose File if you use Chrome) to navigate to the data file.
  7. Select the file type from the Choose file type drop-down field. Advanced Reporting compares the signature of your data file to those of existing data sets.

    • If the signature is unique, continue to Step 9.
    • If the signature matches an existing data set, you can either add the new data to that data set (Upload to this Data Set button) or create a new data set (Create New Data Set button). If you opt to create a new data set, continue to Step 9.
  8. Click Configure Data Set to display the contents of your file in the Design Data Set window.
  9. You can model the data in the new data set at this point, or click Done to save your work and add the new data set to your data library. You can come back later to model the data.

Work with data sets

Once a data set has been added to your data library, you will likely want to add more data to it and associate it with a collection. You can also edit information about the data set and model its data. You might need to troubleshoot issues with the data that a data set contains by downloading its contributions for examination. And finally, when a data set is no longer needed, you can delete it.

NOTE   You must be the owner of the data set in order to edit information about the data set, delete the data set, delete contributions to the data set, and download contributions to the data set.

Upload additional data to a data set

Data added to an existing data set is called a data set contribution.

  • Select Upload Additional Data from the Actions drop-down menu, or click the Upload Additional Data button. See Contribute data to a data set for instructions on how to upload a data set contribution..

Add a data set to a collection

You can also add data sets to a collection when the collection is selected in the Collections panel. See Create and manage collections for more information about collections.

  1. In the Data Sets panel, select the data set you want to add to a collection.
  2. From the Actions drop-down menu, select Add to Collection. The Edit Data Set dialog box opens.
  3. From the Choose Collections pane, select the desired collections.
  4. Click Save.

Edit information about a data set

  1. In the Data Sets panel, select the data set whose information you want to edit.
  2. Open the Edit Data Set dialog box by doing one of the following:

    • From the Actions drop-down menu, select Edit Data Set Info.
    • From the Actions drop-down menu, select Add to Collection.
    • Click the Edit Data Set Info icon that appears when you hover over the data set name in the Data Sets panel.
  3. From here you can change the name of the data set, add a text description of it, add tags (see Add metadata tags to data sets and collections for information on how to format tags), and assign it to a collection.
  4. Click Save.

Design the data set (model the data)

  1. Open the Design Data Set page by doing one of the following:

    • From the Actions drop-down menu, select Design Data Set.
    • Click the Design Data Set icon that appears when you hover over the data set name in the Data Sets panel.
    • Double-click any contribution to the data set.
  2. From this page, define your data set. See How data modeling works for information on what data modeling is and how it is accomplished.
  3. When you are finished, click Done.

Download all the contributions in a data set

  1. In the Data Sets panel, select the data set whose contributions you want to download.
  2. From the Actions drop-down menu, select Download contribution(s). Your downloading options are whatever you have configured in your browser. The contributions are saved in CSV format within a folder. You have the option to name this folder anything you like. The name of each contribution file within the folder is appended with the date of the download.

Delete a data set

IMPORTANT   Deleting a data set deletes the data set’s signature and definition in the model and all contributions associated with the data set. Once deleted, none of this data can be recovered.

  1. In the Data Sets panel, select the data set you want to delete.
  2. From the Actions drop-down menu, select Delete Data Set and click Delete.

Related topics