FLUENT D
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
https://www.youtube.com/watch?v=Jl-F9azcyow
Fluent Bit is a light alternative to FluentD
Source code is available on GitHub : github.com/fluent
Installing Fleuntd : https://docs.fluentd.org/v/0.12/articles/install-by-rpm
FAQ : https://www.fluentd.org/faqs
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Wikipedia definition :
Fluentd is a cross platform open-source data collection software project originally developed at Treasure Data. It is written primarily in the Ruby programming language.
There are many benefits to implementing fluentD .
Has the ability to collect all types of logging information's from multiple sources such as -Databases , Application and Network servers
There are many benefits of using FleuntD for your logging aggregator and the logging analyzing system.
Firstly you have the ability to collect all types of logging information from multiple sources such as Database, Application and Network servers . The primary feature is its uniform or logging data collected from other input sources , after collection in process data information can then be routed to multiple sources such as the cloud service , a database or other archival system . FluentD comes with fully compatible with Kubernetes & Docker for deploying and managing logging events within a cloud platform solution . Another feature of FluentD is that there are over 500 plus plugins available for fluentD to connect to all types of various softwares .
There are many reasons why to use Fluent D as your logging aggregator and analyzing archival system . Today Fluent D integrates logging with hundreds of systems because of number of plugins available . For unified log date information fluentD implements JSON which is popular machine file format and it is widely used whether collecting or recrafting Fluent D is literally able to scale literally 1000s of servers . FluentD is able to handle and manage various log types such as webservers , databases to application . According to the FluentD website , fluentD is able to aggregate log files from 50,000 servers which is an example for using FluentD for Enterprise logging solution .
FluentD has a life cycle for each log Events
Each lifecycle comprises of 5 different log events that are as above .While setting up FluentD , uses a main configuration file to connect to all of its components .
When setting up fluentD the main configuration files is used to connect all of its components . Within the main configuration file , inputs are defined - which area also called Listeners , these Listeners are able to match specific input data as data is collected from input devices , for example , the first step is to create a match is to define a data source for the data as on on the slide
- The source is a web server, and the listener is a listening on the port 8888
- The second set is a use of a Match element , that matches any input with > test.example
If a match occurs then the input data is outputted to standard output
Three component are involved with each FluentD events , they are Tags, Time & Record
- Tag : A Tag represents the Origin of an Event
- Time : The time represents the actual time or occurrence of the event
- Record : Record represents the content of the event log
The Match element is key for matching specific data from data input within fluentd . The match comprise a method for sending output to other systems when input data matches
The six component within the Fluentd life cycle of an event is the use of labels : Labels allow for Grouping , Filters and Output .
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Fluentd : https://www.youtube.com/watch?v=aeGADcC-hUA
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Fluentd :Best kept secret
How can we run fluentd on a host machine outside of a docker container as containers are ephemerals and configurations will also disappear once the containers are destroyed . However there can be scenarios as to why you want to run fluentd within your container . Namely if you are only concerned in what is happening with in a specific container .
Now Docker released version 1.1 on April 13 couple of months ago since Docker , version 1.8 fluentd has become ship with docker . Nonetheless we have a guide available to configure fluentd on Docker .
The first problem that we are trying to solve using Fluentd is Logging. Logging is really a mess.
What you see on the screen is very typical from many data engineering teams .
What we typically experience is a gaggle of information sources .
The other problem is that we are not logging enough .
Fluentd works on a simple mechanism of INput plugins and OUput plugins , that can listen to all of our data sources and route it to and route it to teh right places with out introducing unnecessary complexities .
Fluent is an extensible and reliable data collection tool when I say for it has a core that augmented by various plugins
The core differentiator is the way it is designed ,
It has this core and the thriving eco system of third party plugins .
The core focuses on the common concerns of collecting and managing event data while the plugins focus on specific use cases by augmenting the functionality of the core. In other words the core focuses on dividing and conquering .
Lets dig into the plugin architecture of it.
There are six type of plugins
Input and Parser plugins take care of the taking in data from various such as syslogs and mobile apps , social networks , from webservers etc, once the data is in fluentd it goes through an optional plugin "Filter" to modify the data or filter the data with stream , you might want to mask certain data and you may want to send data to different places based on the value of certain fields - At this stage the data send out to different systems , so buffering is inbuilt, so if output to certain systems fail retry based on that buffer data
Formatting is going to be your last step - Certain systems and stakeholders require their data be formatted in a certain way , other all activities can be the domainer plugins
Lets take a deep dive a little bit more in the life time of an event.
As you can see in the diagram , a fluentd consists of three parts
- Tag
- Timestamp
- Body Record /or Record
Body record is essentially the payload or the message that you want to move between systems
Tag is a very unique ID in fluentd , essentially it is a piece of text that tells fluentd where to send a data given the tag prefix that is S3, Mongodb, treasuredata, so forth will let fluentd to know where to route the data , we are going to see how the tags are implemented in our configuration files , you can also configure the tags as production or development so it routes data to the right server and to the right environment.
Time : Time is very important as all log data comes with a timestamp , so knowing exactly what time the event process is key to knowing that the backend understands the meaning of the data , the context of the data . Once the data is input into fluentd it goes into router mechanism and based on that value of that tag - this is where the routing happens - If the tag says flew it might got to one system
For each event once it is routed to different output and be buffered in its own way - once the buffering is done and when you are flushing the buffer the data goes into a queue that is the output plugins right logic . You might be writing to an external system over the internet , you might be writing to your own file system , you might be writing to your own local database or remote database - one key as its always logic to abstracted away from the user - when you are using fluentd you don't have to worry about these mechanisms , simply write a declarative configuration file and thats going to show you how you manage the data flow .
So lets go into a some of those use cases ,
This is the most common , you are basically collecting data from many different sources routing it to fluentd and then we are outputting it to an external system such as Mongodb , PostgresSQL or so on.
Today fluentd supports more than 200 output systems making it a very versatile choice from which to start structured login within your organization so this configuration will look like this
Look we have our source, sources are where we are getting our data from .
once you have simple data forwarding covered you will go on to do more advanced forwarding. In this you are using fluentd to load balance multiple fluentd nodes , we are collecting data from many different servers , from many different data sources such as mobile apps , local file system ,
Those leaf fluentd instance are forwarding the data to big aggregated node which is in the middle , you can also channel overflow when your aggregate node is full you can configure gracefully into another one. You can also do load balancing so that you are not just sending data to one aggregated node but to several - once the aggregate instance gets all the data it periodically sends data to all its backend systems .
Fluentd support more than 200 output plugins make it more appealing.
One of the more advance use is case is we are starting use Lambda
Comments
Post a Comment