Data Asset API

Prerequisites

  • Token with datasets scope

  • The data asset's ID to pass to the API call

You can find the data asset's ID below the title

All the date formats for API requests and responses are Unix (epoch) format timestamps.

Create a New Data Asset

POST https://{domain}codeocean.com/api/v1/data_assets

Path Parameters

Headers

Request Body

{
    "created":"created ID",
    "description":"the description provided",
    "files":"the number of the files in the datset",
    "id":"dataset's ID",
    "lastUsed":0,
    "name":"name of the dataset",
    "sizeInBytes":"size of the dataset",
    "state":"DATA_ASSET_STATE_DRAFT",
    "tags":"the tags provided",
    "custom_metadata":{k-v pairs}
    "type":"DATA_ASSET_TYPE_DATASET"
}

Example of creating a new data asset:

Create a data asset from a computation (capture result API)

CURL Example

curl -H "Content-Type: application/json" -u ${CUSTOM_KEY}: -X POST https://acmecorp.codeocean.com/api/v1/data_assets --data-raw '{
   "name": "Data asset From API",
   "description": "An example for creating data asset from CO API",
   "mount": "some-folder",
   "tags": [ "keyword1", "keyword2" ],
   "source": {
       "computation": {
	  "id": "some-computation-id"
       }
   }
}'

To use this example, please change the domain to your VPC instead of acmecorp, change ${CUSTOM_KEY} to your API token, and replace some-computation-id with the computation id of the result you'd like to capture as a data asset.

Create a data asset from a public AWS bucket - no credentials are provided

Request Body

{
   "name": "my dataset",
   "description": "a descriptive description",
   "mount": "mount",
   "tags": [ "t1", "t2" ],
   "source": {
       "aws": {
           "bucket": "MY_BUCKET",
	   "prefix": "PREFIX"
       }
   }
}

CURL Example

curl --location --request POST 'https://acmecorp.codeocean.com/api/v1/data_assets' \
--header 'Content-Type: application/json' \
-u \'${CUSTOM_KEY}:\' \
--data-raw '{
    "name":"import public AWS bucket with dataset api",
    "description":"meaningful-c",
    "mount":"citations",
    "tags":["citations"],
    "source":{
        "aws":{
            "bucket":"ai2-public-datasets",
            "prefix":"meaningful-citations",
            "keep_on_external_storage":false,
            "index_data":false
            }
        }
    }'

To use this example, please change the domain to your VPC instead of acmecorp, and change ${CUSTOM_KEY} to your API token.

Create an external data asset from a private bucket

Request Body

{
   "name": "external dataset",
   "description": "an external dataset description",
   "mount": "default_mount",
   "tags": [ "ex1", "ex2"],
   "source": {
       "aws": {
           "access_key_id": "MY_ACCCESS_KEY",
           "secret_access_key": "MY_SECRET",
           "bucket": "MY_BUCKET",
           "keep_on_external_storage": true,
           "index_data": true
       }
   }
}

CURL Example

curl --location --request POST 'https://acmecorp-edge.codeocean.com/api/v1/data_assets' \
--header 'Content-Type: application/json' \
-u \'${CUSTOM_KEY}:\' \
--data-raw '{ 
    "name": "My External Data Asset", 
    "description": "External Indexed Dataset From API", 
    "mount": "external-indexed", 
    "tags": [ "t1","t2"],
    "source": {
        "aws": {
            "bucket": "codeocean-datasetapi-test-cs",
            "keep_on_external_storage": true,
            "index_data": true,
            "access_key_id": "'"$AWS_ACCESS_KEY_ID"'",
            "secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'"
        }
    }
}'

To use this example, please change the domain to your VPC instead of acmecorp-edge, and change ${CUSTOM_KEY} to your API token.

Since this example is pulling a dataset from a private bucket, you will need to change the "aws" part accordingly.

Create data asset with custom fields
curl -H "Content-Type: application/json" -u USER_API_TOKEN: -X POST https://codeocean.com/api/v1/data_assets --data-raw '{
   "name": "Data asset From API",
   "description": "An example for creating data asset from CO API",
   "mount": "some-folder",
   "tags": [ "keyword1", "keyword2" ],
   "custom_metadata":{
      "some Field": "one",
      "another_field": 1,
      "dateField": 1676246400
   },
   
   "source": {
       "computation": {
			"id": "some-computation-id"
       }
   }
}'

The examples here are executed from a capsule that utilizes the secret management feature. The API token was added as "Custom Key" and the AWS credential was added as "AWS Cloud Credential"

Get Data Asset Metadata

GET https://{domain}.codeocean.com/api/v1/data_assets/{data assets ID}'

Path Parameters

Headers

{
   "created": 1633277005,
   "description": "a descriptive description",
   "files": 0,
   "id": "fea84ebf-b58b-4ad2-994d-7169dc3880fb",
   "lastUsed": 0,
   "name": "my dataset",
   "sizeInBytes": 0,
   "state": "DATA_ASSET_STATE_DRAFT",
   "tags": [ "t1", "t2" ],
   "type": "DATA_ASSET_TYPE_DATASET"
}
Example of getting data asset metadata:

Request Example

curl --location --request GET 'https://acmecorp.codeocean.com/api/v1/data_assets/37a93748-ce90-4980-913b-2de0908d5212' \
-u \'${CUSTOM_KEY}:\'

To use this example, please change the domain to your VPC instead of acmecorp, and change ${CUSTOM_KEY} to your API token.

Also, you have to change the dataset ID to in the URL too.

Search Data Assets

GET https://{domain}.codeocean.com/api/v1/data_assets/{data assets ID}'

Path Parameters

Query Parameters

Headers

{
    "has_more" - boolean: indicates whether there ar more results
    "results" - array: array of dataset found
}
Example of searching data assets:

Request Example

curl --location --request GET 'https://acmecorp.codeocean.com/api/v1/data_assets' \
-u \'${CUSTOM_KEY}:\' \
-d start=0 \
-d limit=10 \
-d sort_order=desc \
-d sort_field=name \
-d type=dataset \
-d ownership=owner \
-d favorite=false \
-d archived=false \
-d query=name:co+tag:x1 \
-G

To use this example, please change the domain to your VPC instead of acmecorp, and change ${CUSTOM_KEY} to your API token.

Response Example

{
  "has_more": false,
  "results": [
    {
      "created": 1641473110,
      "description": "commoncrawl",
      "files": 24212276,
      "id": "27b4d7ec-a2f8-4c8d-9be5-361d33e50bf0",
      "last_used": 0,
      "name": "commoncrawl",
      "size": 7184869799906702,
      "state": "ready",
      "tags": [
        "x1",
        "x2",
        "y1"
      ],
      "type": "dataset"
    },
    {
      "created": 1641473217,
      "description": "codeocean datasets",
      "files": 554879,
      "id": "f3259982-5d3b-42b2-9069-302df8cbe1b8",
      "last_used": 0,
      "name": "codeocean datasets",
      "size": 941697100208,
      "state": "ready",
      "tags": [
        "x1",
        "x2",
        "y1"
      ],
      "type": "dataset"
    }
  ],
  "total": 2
}

The default number of items returned from Search Data Assets is 10.

Update Data Asset Metadata

PUT https://{domain}.codeocean.com/api/v1/data_assets/{data_set_id}

Path Parameters

Headers

Request Body

{
  "created": float64 - data asset creation time,
  "description": string - data asset descriptionw description",
  "files": int64 - total number of files in the data asset if available,
  "id": string - the data asset internal id,
  "lastUsed": float64 - the last time the data asset was used in seconds since epoch,
  "name": string - data asset name,
  "size": int64 - the total size in bytes of the data asset if available,
  "state": string - data asset state - draft / ready / failed,
  "tags": array of string tags,
  "custom_metadata": { 
      k-v pairs
    },
  "type": string - dataset / result 
}
Example of updating data asset metadata:

Request Example

curl -X PUT 'https://acmecorp.codeocean/api/v1/data_assets/{data asset id}' \
-u \'${CUSTOM_KEY}:\' \
-H 'Content-Type: application/json' \
--data-raw '{
    "name": "modified name",
    "description": "a new description",
    "tags": ["aaa","bbb","ccc"],
    "mount": "newmount"
}'

To use this example, please change the domain to your VPC instead of acmecorp, and change ${CUSTOM_KEY} to your API token and the data asset's id accordingly.

Response Example

{
  "created": 1633277005,
  "description": "a new description",
  "files": 0,
  "id": "fea84ebf-b58b-4ad2-994d-7169dc3880fb",
  "last_used": 0,
  "name": "modified name",
  "size": 0,
  "state": "ready",
  "tags": [
    "aaa",
    "bbb",
    "ccc"
  ],
  "type": "dataset"
}

Update Data Asset Permissions

PUT https://{domain}.codeocean.com/api/v1/data_assets/{data_set_id}/permission

Path Parameters

Headers

Request Body

None of the fields in Update Data Asset Permission is required (any combination of 1/2/3 field(s) is valid)

Delete Data Asset

DELETE https://{domain}.codeocean.com/api/v1/data_assets/{data_set_id}

Path Parameters

Headers

{
    // Response
}

Archive Data Asset

PATCH https://{domain}.codeocean.com/api/v1/data_assets/{data_set_id}/archive?archive=true

Path Parameters

Headers

{
    // Response
}
Example of archiving and unarchiving data asset:

Archive data asset

curl -H "Content-Type: application/json" -u ${CUSTOM_KEY}: -X PATCH "https://acmecorp.codeocean.com/api/v1/data_assets/{data-asset_id}/archive?archive=true"

Unarchive data asset

curl -H "Content-Type: application/json" -u ${CUSTOM_KEY}: -X PATCH "https://acmecorp.codeocean.com/api/v1/data_assets/{data-asset_id}/archive?archive=false"

To use this example, please change the domain to your VPC instead of acmecorp, change ${CUSTOM_KEY} to your API token, and replace {data-asset_id} with the id of the data asset you'd like to archive/unarchive.

Last updated