Usually, any metric refers to the node from which it's collected by hostname.
However, when something goes wrong with a node, it can be tricky to determine the underlying cloud instance because
the node name may be inconsistent with the instance name in the cloud console.
Fortunately, this problem is not hard to solve since most cloud providers allow instances to retrieve their own metadata without
the need for any special permissions.
In this post, I'll explain how
obtains instance metadata in different clouds.
In order to identify which cloud an instance is running in, the agent uses the following heuristics:
AWS Xen instances: the uuid from /sys/hypervisor/uuid has the "ec2" or "EC2" prefix
AWS Nitro instances: the content of /sys/class/dmi/id/board_vendor is "Amazon EC2"
GCP: the content of /sys/class/dmi/id/board_vendor is "Google"
Azure: the content of /sys/class/dmi/id/board_vendor is "Microsoft Corporation"
If the cloud provider is detected, the agent retrieves the instance metadata using the provider-specific API.
Instance Metadata Services (IMDS) are usually available at http://169.254.169.254.
The IPv4 block 169.254.0.0/16 is reserved for link-local addressing,
this means that addresses from this block cannot be accessed from outside the host.
As a result, the agent exports the metric:
account_id="<the ID of the account which the instance belongs to>",
instance_id="<the ID of the instance>",
instance_type="<the type of instance>",
instance_life_cycle="<the purchasing options of the instance (on-demand/spot/reserved)>",
region="<the cloud region where the instance is running>",
availability_zone="<the Availability Zone where the instance is running>",
availability_zone_id="<the Availability Zone ID where the instance is running (AWS only)>",
local_ipv4="<the local IPv4 address>",
public_ipv4="<the public IPv4 address>",
Below are my notes on the nuances of different clouds.
Amazon Web Services (AWS)
The full list of available metadata categories is available here.
To list available categories from within an EC2 instance, you can use the following command:
$ curl http://169.254.169.254/latest/meta-data/
To get the value of a specific variable, add its name to the URL. For example:
$ curl http://169.254.169.254/latest/meta-data/instance-id
$ curl http://169.254.169.254/latest/meta-data/placement/availability-zone
What's important to understand is that Availability Zone names don't map to the same physical locations across accounts.
For example, us-west-1a in one account can point to the AZ with ID=usw1-az3,
and to the AZ ID=usw1-az1 in another account.
Therefore, if it matters in your case, use the availability_zone_id label of the
metric instead of availability_zone.
Google Cloud Platform (GCP)
The documentation of the GCP Instance Metadata Service is available here.
The service works similarly to AWS, except that each request must contain the Metadata-Flavor: Google header:
$ curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/
Another difference from AWS is that GCP doesn't have a separate variable for the name of the region in which an instance is working.
The agent extracts the region name from the zone variable:
$ curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/zone
For this example, the region is us-central1.
As for the instance purchase options, GCP provides only the preempted flag.
Depending on its value, the agent puts on-demand or preemptible to the instance_life_cycle label.
The Azure Instance Metadata Service (IMDS) provides information about a VM in JSON format.
Each request must contain the Metadata:true header and the api-version argument.
The list of supported API versions is available at http://169.254.169.254/metadata/versions.
Microsoft promises that versions will be supported indefinitely, so we hardcoded the latest available version into node-agent.
Like other cloud providers, Azure has at least three Availability Zones in each region.
However, if a VM is launched in the single instance mode,
the metadata service doesn't report the Availability Zone in which this instance was launched.
Why it's important to understand where your instances run
An Availability Zone is an isolated infrastructure (one or more data centers) with independent power, cooling, networking, etc.
Cloud providers usually suggest customers run their workloads in multiple AZs to be tolerant to failure of an individual zone.
It does really increase fault tolerance, but don't forget that:
- Network latency between availability zones within the same region can be higher than within one particular zone.
- Data transfer between availability zones in the same region is paid, while data transfer within a zone is free.
In any case, it is important to have an overview of how your applications are distributed across availability zones and
communicate with each other.
Manually annotating nodes with placement metadata
If your infrastructure is built on bare metal servers or your cloud provider doesn't have a metadata service, you can
set provider, region, and availability_zone manually using command-line arguments of node-agent.
For instance, for a node in the Equinix datacenter LA3 in Los Angeles, the arguments might be as follows:
coroot-node-agent --provider=equinix --region=us-west-la --availability-zone=us-west-la3