System Overview
Learn the deployment model, system architecture, and AWS deployment resources of the Code Ocean VPC application.
Deployment Model
Code Ocean VPC is installed in your AWS account. It can be installed into a new dedicated AWS VPC which follows AWS Well-Architected guidelines, or into an existing AWS VPC so that you can leverage existing AWS resources and easily align with company VPC guidelines and standards.
Code Ocean VPC deployments are managed with AWS CloudFormation infrastructure as code (IaC) service. A CloudFormation template is available for installation and upgrades and it provisions all AWS resources required to run Code Ocean in your AWS account. See our CloudFormation Deployment section.
System Architecture
Code Ocean is designed to enable secure cross-functional team collaboration while managing a large number of backend services, so you and your colleagues can focus on the research. The following diagram represents the most common system architecture for Code Ocean on AWS environments.
VPC Network
If you choose to install Code Ocean into its own dedicated AWS VPC the CloudFormation template will provision a new VPC across two availability zones with public and private subnets in your selected AWS region. The availability zones and CIDR blocks for VPC and subnets are configurable.
If you choose to install Code Ocean into an existing AWS VPC you can configure the CloudFormation template with the two availability zones and the private and public subnets to deploy to. Public subnets are required if you choose an internet-facing deployment.
EC2
Code Ocean uses an HTTPS-only AWS application load balancer (ALB) to expose the system to users and to allow access to its internal Git server and Docker registry. The ALB can be internet-facing or internal for deployments behind a VPN.
The system manages two types of EC2 instances:
Services - Single machine in an auto-scale group attached to an internal load balancer that runs all internal system services.
Workers - Two auto-scale groups of worker machines where the actual users computations are running. One auto scale group for GPU based computations and the other for general non-GPU computations. When you run a Compute Capsule within Code Ocean it gets scheduled to run on one of the worker machines. The system will automatically provision more worker machines (scale out) as the load increases and deprovision worker machines (scale in) as the load decreases.
The deployment is configured with security groups to control network flow between parts of the system and IAM instance roles for each instance type to limit access to AWS resources.
Shell access to the EC2 instances is available through AWS SSM Session Manager.
DNS
The system also uses an internal Route53 private hosted zone for internal service discovery.
Storage
There are three types of storage medium used:
EBS data volume:
Where most of the internal persistent system data is stored. For example, a Compute Capsule in the system is backed by a git repo which is persisted in this storage type.
The volume is configured with encryption at rest, and a 14-days daily scheduled EBS snapshots via AWS Backup by default.
S3 buckets:
Used to store input (datasets) and output (results) data of Compute Capsules, as well as the internal docker registry storage and other system persistent storage buckets.
All S3 buckets are private, with server-side encryption, and access logs enabled by default. S3 bucket versioning is enabled on buckets that store persistent data.
EFS:
The Datasets Cache EFS provides a computation (Compute Capsule) running on a worker instance machine with fast access to datasets. Once a dataset is available it is cached on EFS which gets mounted to the worker instance machines.
The Scratch EFS provides a dedicated folder per compute capsule for intermediate data that persists through the lifetime of the capsule.
Encryption at rest is enabled by default on all EFS storage.
Backup and Disaster Recovery
As described above, Code Ocean is automatically configured with scheduled EBS snapshots via AWS Backup, and S3 bucket versioning. These together allow for point-in-time restore. Please contact our support for help with data restore or recovery.
Monitoring
Code Ocean natively reports all metrics to AWS CloudWatch. In addition to metrics from AWS services that Code Ocean uses, such as CPU utilization from EC2 and disk IO metrics from EBS, the system also reports custom metrics to a custom CodeOcean
CloudWatch metrics namespace. For example, this includes Code Ocean worker machine utilization, and memory and disk utilization on all machines.
The deployment provisions AWS CloudWatch alarms and an SNS topic to easily get notifications on system issues.
Code Ocean pushes all OS and application logs to CloudWatch Logs. This log data can help with troubleshooting system issue.
Last updated