Mastering Filebeat

Camilo Matajira Avatar

Filebeat is the program that crawls your logs in a certain folder or file that you specify, and sends these logs at least once to the specified output (e.g. Logstash or even ElasticSearch directly). Perhaps the major value proposition of Filebeat is that it promises that it will send the logs at least once and it will make sure that they arrive.

Installing Filebeat is not complicated, but it is preferable to work with a docker image because it is easier and faster.

Filebeat with docker: the main idea

The main idea when using Filebeat with docker, is that you must make a bind volume (or several bind volumes) targetting directly the folders in which you store the logs that you want to send.

In the first ELK stack I made, I used this code to create the Filebeat container. Pay close attention to the volume part.

$ docker run -p 80:80 -d -v
/home/c.matajira/mylogs:/usr/share/filebeat/my_
logs --name filebeat --network=elknet
docker.elastic.co/beats/filebeat:6.4.2
$ docker run -p 80:80 -d -v :/ --name
--network=

Its worth mentioning that “/home/c.matajira/mylogs” is where I store my logs. Inside it, I have a folder for each application that generates logs. It is also worth mentioning that we expose the port 80 to receive incoming udp or tcp transmissions from services like rsyslog. But I haven’t used this port in Filebeat. I use instead rsyslog to grab incoming logs. Rsyslog is very lightweight and allows me to persist my logs very easily.

Filebeat configuration: the main idea

The main idea behind the Filebeat configuration is that you must specify (1) the paths where Filebeat looks for the logs, and (2) where to send them afterwards. All this is specified in the filebeat.yml.
WARNING: Yaml files are very grumpy about white spaces and tabs. When you finish writing your yaml file do not forget to test it here: http://www.yamllint.com/

Below I will copy paste an extract of my configuration file. I would add several comments to clarify specific topics. Nevertheless, be aware that my original configuration file does not have comments (this could break something, I prefer to play safe, and you should, too).

filebeat.inputs: #Here you specify the inputs, in other words, the sources of information which FB will be tracking.
- type: log # We use 'log' for written files
paths:
-
'/usr/share/filebeat/my_logs/my_app_logs/*.log' #the path to the logs, from the FB container perspective.
fields_under_root: true #This simply adds certain information to the JSON message and it
appears in the "first level" instead of being nested.
fields:
attrs:
my_company.app: my_app # With this we add some tags. This is very important for Logstash!
- type: log
paths:
-
'/usr/share/filebeat/my_logs/my_app2_logs/*.log'
fields_under_root: true
scan.sort: filename #This is so that when Filebeat starts, and has a lot of files, he starts to check files according to a ranking based on the filename. This is not necessary.
scan.order: desc #This tells Filebeat to start with the youngest logs, and finish with the latest. This is not necessary.
fields:
attrs:
my_company.app: my_app2

- type: log
paths:
-
'/usr/share/filebeat/my_logs/rsyslog_log/messag
es' #Here we just track a very very long file.
include_lines: ['.*keyword.*'] #We tell filebeat that we only want the log files that
include "keyword" the others he can ignore.
fields_under_root: true
fields:
attrs:
my_company.app: my_app3

output.logstash: #Here is where we specify where Filebeat should send the logs. In this case Logstash.
hosts: ['localhost:5044'] #Here I provide IP address and port of Logstash.
timeout: 30s

At this point, it is important to mention that you should specify a Filebeat input for each type of log. The reason is that you want to tag the logs so that Logstash knows how to (GROK) them. To learn how to use Logstash’s GROK read this

How to configure a running Filebeat, in practice

There are several ways of modifying the configuration file on a running container. Possibly the easiest is to put a bind volume on the configuration file. Why not the whole folder? Because in the same folder that the standard yml file is located there is also the Filebeat executable and other important contents; so it is not a good idea. When you bind volumes, the contents of the local machine overwrite the contents of the docker container.

If you bind the file you will end up doing something like this (see the docker-compose extract below):

filebeat:
container_name: filebeat
build:
context: .
dockerfile: ./filebeat/Dockerfile #I use my own modified version of Filebeat
ports:
- "80:80"
volumes:
- filebeat_data:/usr/share/filebeat/data/ #I persist in a named volume the FIlebeat's registry
- /where/my/logs:/usr/share/filebeat/my_logs #Here I target the logs that I am interested in
- /where/my/filebeat_conf/filebeat.yml:/usr/share/filebeat/filebeat.yml # Here I bind the Filebeat configuration file, to one in my local machine.

With this configuration, you can edit the filebeat.yml from outside the container. After you finish you should restart the service. Below the steps.
1. Edit the filebeat.yml
2. Restart the container:

$ docker restart filebeat

Warnings

  • You can only specify one input per file. In other words you cannot make Filebeat crawl with the same file with two “inputs”.
  • When you have several log files, and you start filebeat for the first time, Filebeat might take some time to send everything. For me it takes about 90 minutes, and sometimes more. You have to be patient with Filebeat. Filebeat works well, just give him time to stabilize. After you restart the Filebeat service, Filebeat actually crawls again everything checking if there is something new in every file. This is why sometimes the result of the configuration take time to show. To speed up the process read this.
  • Once one file is sent, Filebeat would not send it again.
  • Filebeat sends files in duplicates so you have to handle the duplicates in Logstash.

Most important commands = How to debug

# Launch filebeat with debug mode
./filebeat -c filebeat.yml -e -d "*"
# Restart Filebeat container
docker restart filebeat
Camilo Matajira Avatar

Leave a Reply

Your email address will not be published. Required fields are marked *