Alarms
Learn alarm features.
Last updated
Learn alarm features.
Last updated
Code Ocean natively reports metrics to CloudWatch. These include system metrics as disk space, CPU and memory usage as well as application metrics. As part of the deployment, alarms are also provisioned
services-unhealthy-host - We need to check the status of the Code Ocean service EC2 instance:
First, check if the instance is running in the AWS EC2 console: https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Instances:v=3;instanceState=running;search=:codeocean-services
Connect to the instance using Session Manager and run sudo systemctl restart codeocean
services-data-volume-usage - This alarm indicates low disk space on the data EBS volume that is attached to the Code Ocean services instance.
First, we need to increase the data EBS volume size:
1. Open the AWS console, go to the EC2 service, select Instances on the left and search for the codeocean-services
instance. Here's a direct link: https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Instances:v=3;instanceState=running;search=:codeocean-services.
2. Select the instance and click the Storage tab
3. Click on the volume with Device Name /dev/sde
sdf
4. Right click on the volume and click Modify volume
5. Enter the new Size for the volume (it's normally recommended to double the current size. For example, if it's currently 500 GiB, then enter 1024 to make it 1 TiB)
6. Click Modify
See AWS docs for reference
Next, we need to extend the file system for this volume on the Code Ocean services instance:
1. Go back to the codeocean-services
instance in the AWS EC2 console
2. Right click the instance and click Connect
3. Select the Session Manager tab and click Connect
4. In the terminal enter the following commands:
sudo su - ec2-user
sudo xfs_growfs -d /data
See AWS docs for reference
critical-errors - In case of a critical error, you will need to generate a support bundle and send it to the Code Ocean support team for analysis and help with troubleshooting.
services-cpu-usage-high - Contact Code Ocean if the issue persists
You can get notified when alarms happen using AWS SNS.
All of the Code Ocean alarms are reported to the alarms-<id>
SNS topic, which you can subscribe to.
Email subscription to an SNS topic
Open the Amazon SNS console at https://console.aws.amazon.com/sns/v3/home.
In the navigation pane, choose Topics. A list of topics should be visible. Find the pre-defined topic with the name alarms-<id>
and copy its ARN value**.**
In the navigation pane, switch to Subscriptions, and select Create subscription.
In the Create subscription dialog box, for Topic ARN, paste the topic ARN that you copied in the previous step.
For Protocol, choose Email.
For Endpoint, enter an email address that you can use to receive the notification, and then choose Create subscription.
From your email application, open the message from AWS Notifications and confirm your subscription.
Your web browser displays a confirmation response from Amazon SNS.
See more on how to connect AWS SNS with Slack or Microsoft Teams.
Alarm | Description | Threshold |
---|---|---|
services-unhealthy-host
The services machine is not responding to health checks
Whenever this happens
services-data-volume-usage-70/-90
The services machine disk space is getting low
Disk used is greater than 70%/90%
critical-error
Critical errors returned by CodeOcean services
Whenever this happens
services-cpu-usage-high
Services machine CPU usage is high
CPU Utilization is grater then 70% for 6 times during 30 minutes