OCI : INFRASTRUCTURE OPERATIONS : oracle-cloud-infrastructure-operations :

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Monitoring / Notification / Logging

https://mylearn.oracle.com/component/oracle-cloud-infrastructure-operations-associate-workshop/35644/97528/132981

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

In this lesson we will discuss about Monitoring , Notifications and Events .  From the operations point of it is important to keep track of your resources and service provisioned in Oracle Cloud and take necessary action in a manual manner or in a automated manner 

We have multiple things to discuss such as below.


as well how do you integrate with Events from the OCI service .


1. Monitoring Service : 

As a cloud provider , oracle has not only given various services but also gives you the ability to Monitor your resources with the help of various Metrics that are automatically captured and Alarms that you can configure to Notify you whenever something happens which is an unexpected or a behavior which needs attention. 

Metrics : Metrics are constantly captured at a regular interval depending on the type of metric they would specific intervals at which they are captures and they are relayed either the health of a resource , or the capacity utilization or performance of various cloud resources you might have . As you can see when you have various resources provisioned in Oracle Cloud . The services will keep emitting  information about the metrics that are being captured which could also mean - you could also have your customer specific application implemented for which you can capture metrics and ask OCI to emit them . 

With Metrics being captured you have the ability to view them in an aggregated manner either from the console or with custom monitoring tools that you might have which can be integrated or you can use CLI , API , REST integration to do the needful .

Once you have identified what Metrics you want to capture and you have them aggregated across different dimensions which we will talk about later . You can also configure alarms , with the help of alarms you specify Thresholds for these metrics , Whenever an Alarm detects that a Threshold has been reached it can trigger some action in the form of notifying you in various forms and the entire monitoring service can be accessed from the console or REST APIs , SDK , Terraform and other tools that are supported . This is an Overview of Monitoring service .

In the rest of the videos you will learn how one can access these videos & use these details to do your operations activities . 




Metrics :

Understanding the Metrics that the Monitoring service offers you . Before we go through the slides lets understand where these Metrics are available. 

If I go to any service in the OCI  


Lets take the example of compute instance , for every resource you have provisioned for example , if I click here on My compute instance over here .


There are various Metrics that are automatically captured 



And the metrics are grouped into different namespaces . A Namespace is basically a collection of metrics . The namespace might get its sources from different sources as such . If you take the example of Compute - There is a infrastructure health of compute based on the hardware and based on the status of the instance as such which is from a cloud perspective. 

Whereas in the compute , what is the compute doing can be captured using the agents that are there inside the compute instance for which you need oracle management agent for the purpose of cloud enabled in the instance and you get information like having OS level monitoring tool to capture CPU , Memory , Disk , IO , Network related information  . This is specific about compute . 

If I go into finding what my Management Agent in the instance and what is it doing I have the "Oracle Cloud Agent" option available here. In which we have instance monitoring as an option . And when is enabled only the computed agent related metrics are captured .  whereas the infrastructure health is standard from OCI infrastructure perspective. 



Similarly , If I go to Block Volumes (Block Storage > Block Volume ) ,  I will get Metrics about individual in terms of what a particular volume is undergoing in terms of the throughput for read -write , IO operations that are happening etc . 




Similarly , If I go to file system that I have created , I have various metrics about how the file system is put to use and If I click on a given File system . I have the ability to see the metrics about read and write latency that is happening including the file system usage for your filesystems 


These are service specific metrics that are available 

Where as if you go into Menu and go into Monitoring > service metrics 



to get aggregate metrics , aggregate metrics from a service level can be captured. Namespace I choose OCI Compute Agent .



Now I get the aggregate information for all the compute agents that are running . 

Metric is a regional service  

You choose the compartment in which you want to get the service metrics . The resource specific namespace that you want . And I can go and add dimensions 

A Metric is some information about the resource , as an example I have take the case of CPU Utilization  of a compute instance 

A Metric is a combination of : Namespace,  Dimension & Metadata


Namespace :

When we look at the Metric from a service level , you choose the Namespace as to where you want data to be taken , so if you take the compute agent - CPU utilization is a metric that you see 

Dimension :

Dimension is basically a filter category that you want to apply for the metric aggregation that is happening . When you say filtering criteria you might be interested to look at metric of all compute instance running in a particular availability domain or you can go-ahead and add your own dimension for specific information or what filter you want to apply . And there is a meta-data that is provided for every metric in terms of what units the measurement is given. So the meta-data will be specific to the metric that is captured which provides additional meaningful information about the metric as such . 

So if I go back to the OCI browser console you can over hear add dimensions . 


which would be specific to the namespace you are choosing 

Dimensions are just a means to filter out data as to from that particular service whatever namespace you have chosen . The data that is being captured will be based on the dimensions that you choose . 

And what you get in the Metric is having information about specific metric 

You can go ahead and choose the aggregate metric explorer to choose and visualize your metric data .



In the Metric screen that I took you , You can also customize what you want see by writing - Monitoring Query Language . You can refer to OCI documentation for monitoring service to identify how do you pick the dimension and metric that you are interested in and the specific ways you write Metric queries 

Syntax


  • Metric - Give the Metric 
  • Interval -- The interval in which you want the aggregation to happen . The frequency at which you want the aggregation to happen for example at a 5 minute interval you want the metric to be aggregated for which dimension 
  • Group function -- And a grouping function that you want to apply for example you are interested in Max CPU Utilization at one minute intervals you give the metric to be CPU utilization 


For example you are interested in one particular instance you can give "ocid" like below

or you want to gather aggregate across all instances in an availability domain you can add your filer criteria or dimension accordingly , if you go back to the browser console I have the ability to go identify what information I want to see over here . The information that I want see can be written in the form of queries from the browser console by choosing a Metric Namespace .


Oci_Computeagent for this example

And I will get the information on what is happening in this one cyclic update chart . once making the chances you can click on "Update Chart" and you can get the data individually listed 

Metrics Done . 


Alarms :

In this session you will learn the Alarms feature while monitoring, where is this available in the console .


If I go back to my monitoring console within in the Monitoring Service we have the "Alarm Statuses" . 


Alarm Definition --> Create Alarm 

What is the idea behind using Alarms needs to be understood first . It is the means based on the metrics you are having you can set thresholds , so whenever the threshold is breached alarm will be triggered. 

The monitoring query language can be used to define your thresholds for Alarms as to what metric or aggregation . What metric for a service you want to monitor and based on what what criteria you want the alarm to be fired .


when you create an alarm you can  choose the severity of your choice 

The choose the Metric Namespace for example 


choose the metric that is of interest . In a one min interval the average CPU utilization is less than or equal to lets say 20% , let us wait for a certain amount of time before it triggers . You can customize your alarms to fire at a specific dimension same way as you can define your monitoring query language .

The you can set the notification through which the notification should be sent . You have not still see notification service .We will see that later . Notification service is basically a messaging system to which you can publish messages and whoever wants to subscribe can subscribe .

If you want to write your own custom query language ,give the details you can customize it .


You can customize it , look into the documentation how you can run it or come back to the basic mode of using the Alarm configuration . 

Hereby you make OCI passively monitor and give you alarms and when an alarm is triggered and you get a notification you can go ahead and take action appropriately . That's about the alarm service .

Notification Overview :

This session you will get an overview on the Notification service , Is a build is Publish Subscribe mechanism , that oracle provides , like the typical streaming messaging system where in Notification enable to broadcast messages to various subscribers using a public-subscribe mechanism whereas this is secure, reliable, low latency needs for application messages that might be coming in from OCI or even externally .  

How does this work

You can use the notification service to create subscriber pattern for messages that are published, wherein, event based rules or Alarms that are getting triggered . You may want to get notified now we come into a new component to understand called "Events" - This has to be understood . 



We already know about the about Alarms from Monitoring that can be there. Whenever there is an alarm triggered you need to have a mechanism to notify you which is where the notification service comes into picture. 

What do you do with Notifications is that you create Topics of interest and you will enable messages to be published to topics and you can have subscribers to get notified either through email or through other mechanisms to get notification whenever that particular notification gets a message published . 

The notification service enables you to setup communication channels for publishing messages using different topics and subscribers . When a message is published to the topic whoever is subscribed will get the notifications sent to them based on the subscription . 

So how do we do this is to be understood .

Go to Notification Session under --> Application Integration > Notifications


You can create topics of interest and once you have a Topic in place , once you have a topic in place you can go-ahead and create subscriptions for a given topic .  You have various methods of subscriptions.


When a message is published to a particular topic , choose a topic for your notification and you can decide to get a E-Mail notification , or make a OCI function to be called to do some automation , you may want to call a particular URL to take action , or integrate with pager duty , slack , or sms notifications. 

All these are subscribers who will be subscribed to a particular topic of interest and you will create a means through which you will published with messages , whenever they have messages published , whoever is subscribed to them will have  the respective call happening . 

Before we get to certain scenarios as to where these could be used. we should also understand about  how could this event service that we talked about 

EVENTS 

In this session we will learn the event service that is build into OCI , to access Events , Go to Menu

Application Integration >  Event Service 

What are events in OCI , anything that happens as part of your OCI services it could be a matter of creating a compute instance , Attaching a Block Volume to your compute instance, creation of a user in IAM ,  anything that is happening in your Tenancy as part of the cloud infrastructure offering can be considered as an event. Which means oracle is emitting these events whenever such events occur .

It is up to you to create those rules to capture those events and take action accordingly .

You create an Event giving an display name .




You can use a tag based filter criteria or type based filter criteria . For example if I take the example of 


for instance if we take compute as the service . There is a various events that are emitted by oracle from time to time.  for example if you are moving an image. 


Moving an image from one compartment to another , all these are examples of Events that are automatically captured .

Whenever such event happens , you can take action. 










Comments

Popular posts from this blog

Create OCI Infrastructure : Using Ansible

Oracle -OCI : Foundations

OCI -- Compute Instance Creation : VCN :Subnet