Menu
Usually, any metric refers to the node from which it’s collected by hostname. However, when something goes wrong with a node, it can be tricky to determine the underlying cloud instance because the node name may be inconsistent with the instance name in the cloud console. Fortunately, this problem is not hard to solve since most cloud providers allow instances to retrieve their own metadata without the need for any special permissions.
In this post, I’ll explain how node-agent obtains instance metadata in different clouds.
In order to identify which cloud an instance is running in, the agent uses the following heuristics:
If the cloud provider is detected, the agent retrieves the instance metadata using the provider-specific API. Instance Metadata Services (IMDS) are usually available at http://169.254.169.254. The IPv4 block 169.254.0.0/16 is reserved for link-local addressing, this means that addresses from this block cannot be accessed from outside the host.
As a result, the agent exports the node_cloud_info metric:
node_cloud_info{ provider="<AWS|GCP|Azure>", account_id="<the ID of the account which the instance belongs to>", instance_id="<the ID of the instance>", instance_type="<the type of instance>", instance_life_cycle="<the purchasing options of the instance (on-demand/spot/reserved)>", region="<the cloud region where the instance is running>", availability_zone="<the Availability Zone where the instance is running>", availability_zone_id="<the Availability Zone ID where the instance is running (AWS only)>", local_ipv4="<the local IPv4 address>", public_ipv4="<the public IPv4 address>", }
Below are my notes on the nuances of different clouds.
The full list of available metadata categories is available here. To list available categories from within an EC2 instance, you can use the following command:
$ curl http://169.254.169.254/latest/meta-data/ ami-id ami-launch-index ami-manifest-path block-device-mapping/ events/ hostname identity-credentials/ instance-action instance-id instance-life-cycle instance-type local-hostname local-ipv4 mac metrics/ network/ placement/ profile public-hostname public-ipv4 public-keys/ reservation-id security-groups services/
To get the value of a specific variable, add its name to the URL. For example:
$ curl http://169.254.169.254/latest/meta-data/instance-id i-0c6ec76f046843cd4 $ curl http://169.254.169.254/latest/meta-data/placement/availability-zone us-east-1a
What’s important to understand is that Availability Zone names don’t map to the same physical locations across accounts. For example, us-west-1a in one account can point to the AZ with ID=usw1-az3, and to the AZ ID=usw1-az1 in another account.
Therefore, if it matters in your case, use the availability_zone_id label of the node_cloud_info metric instead of availability_zone.
The documentation of the GCP Instance Metadata Service is available here. The service works similarly to AWS, except that each request must contain the Metadata-Flavor: Google header:
$ curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/ attributes/ cpu-platform description disks/ guest-attributes/ hostname id image legacy-endpoint-access/ licenses/ machine-type maintenance-event name network-interfaces/ preempted remaining-cpu-time scheduling/ service-accounts/ tags virtual-clock/ zone
Another difference from AWS is that GCP doesn’t have a separate variable for the name of the region in which an instance is working. The agent extracts the region name from the zone variable:
$ curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/zone
projects/<account-id>/zones/us-central1-a
For this example, the region is us-central1.
As for the instance purchase options, GCP provides only the preempted flag. Depending on its value, the agent puts on-demand or preemptible to the instance_life_cycle label.
The Azure Instance Metadata Service (IMDS) provides information about a VM in JSON format. Each request must contain the Metadata:true header and the api-version argument. The list of supported API versions is available at http://169.254.169.254/metadata/versions. Microsoft promises that versions will be supported indefinitely, so we hardcoded the latest available version into node-agent.
Like other cloud providers, Azure has at least three Availability Zones in each region. However, if a VM is launched in the single instance mode, the metadata service doesn’t report the Availability Zone in which this instance was launched.
An Availability Zone is an isolated infrastructure (one or more data centers) with independent power, cooling, networking, etc. Cloud providers usually suggest customers run their workloads in multiple AZs to be tolerant to failure of an individual zone. It does really increase fault tolerance, but don’t forget that:
In any case, it is important to have an overview of how your applications are distributed across availability zones and communicate with each other.
If your infrastructure is built on bare metal servers or your cloud provider doesn’t have a metadata service, you can set provider, region, and availability_zone manually using command-line arguments of node-agent. For instance, for a node in the Equinix datacenter LA3 in Los Angeles, the arguments might be as follows:
coroot-node-agent --provider=equinix --region=us-west-la --availability-zone=us-west-la3