Filebeat之pipeline实践

filebeat收集日志通过elsaticsearch的pipeline功能解析
使用es pipeline 的grok处理器来解析日志。
可以选择kibana的Ingest Pipelines创建也可以使用es api来创建pipeline。

解析apache的日志，使用curl命令创建es pipeline。

curl -XPUT "http://10.30.4.50:9200/_ingest/pipeline/apache2" -H 'Content-Type: application/json' -d'
{
  "description" : "This pipeline is the regular rule that Grok uses to match the access logs of Apache 2",
  "processors" : [
    {
    	"grok": {
        "field": "message",
        "patterns": [
          "%{COMBINEDAPACHELOG}"
        ]
      }
    }
  ]
}
'

查询es pipeline

curl -XGET http://10.30.4.50:9200/_ingest/pipeline/apache2?pretty

创建一个filebeat容器，然后将usr/share/filebeat目录全部拷贝出来。
删除容器，然后重新创建，将其挂载，并将需要收集的日志路径也一并挂载。

docker run -dit --name filebeat -u root -v /app/dockerdata/filebeat:/usr/share/filebeat -v /var/run/docker.sock:/var/run/docker.sock:ro -v /app/dockerdata/ldap_data/log:/ldap/log:ro 10.30.4.50:8082/soimt/beats/soimt-filebeat:7.17.5

filebeat.yml的内容：

# ============================== Filebeat inputs ===============================
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /ldap/log/apache2/access.log
  pipeline: "apache2"
  scan_frequency: 30s
  tags: ["LDAP-Apache"]
setup.ilm.enabled: false
setup.template.name: "10.30.4.57"
setup.template.pattern: "10.30.4.57-*"
# ============================== Filebeat modules ==============================
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
# ======================= Elasticsearch template setting =======================
setup.template.settings:
  index.number_of_shards: 5
  index.number_of_replicas: 0
# =================================== Kibana ===================================
setup.kibana:
  host: "10.30.4.50:5601"
# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:
  hosts: ["10.30.4.50:9200"]
  #protocol: "https"
  #username: "elastic"
  #password: "admin1234"
  indices:
    - index: "10.30.4.57-LDAP_apache-%{+yyyy.MM.dd}"
      when.contains:
        tags: "LDAP-Apache"
# ================================= Processors =================================
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

如果es设置了https和密码，则可以将其注释取消

启动成功filebeat后，在es中会自动创建一个名为10.30.4.57的索引模板，索引需要在Stack Management-->Index Patterns中创建。
将索引创建成功后，打开Discover选择创建成功的索引，选择message就可以看到日志已经被解析。

遇到的问题：
如果在启动filebeat后，filebeat日志报错

{"type":"mapper_parsing_exception","reason":"object mapping for [agent] tried to parse field [agent] as object, but found a concrete value"}, dropping event!

这种类似的错误。
"agent" 字段的 mapping 映射错误，这表明 Elasticsearch 尝试将该字段解释为对象，但实际上却找到了一个具体的值，导致出现了映射解析异常

只需要在创建es pipeline的时候删除agent字段就行

curl -XPUT "http://10.30.4.50:9200/_ingest/pipeline/apache2" -H 'Content-Type: application/json' -d'
{
  "description" : "This pipeline is the regular rule that Grok uses to match the access logs of Apache 2",
  "processors" : [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{COMBINEDAPACHELOG}"
        ]
      }
    },
    {
      "remove": {
        "ignore_failure": true,
        "field": "agent"
      }
    }
  ]
}
'

删除索引的命令：curl -X DELETE http://10.30.4.50:9200/10.30.4.57*

es Pipeline上传至其他集群
将所有pipeline下载下来，写到pipelines.json中
curl -XGET 'http://localhost:9200/_ingest/pipeline?pretty' > pipelines.json
使用脚本将pipelines.json文件读取，然后上传至新的es集群中。
脚本内容:

#!/bin/bash
for pipeline_id in $(cat pipelines.json | jq -r '. | keys[]'); do
  pipeline_file="${pipeline_id}.json"
  cat pipelines.json | jq -r ".[\"${pipeline_id}\"]" > "${pipeline_file}"
  curl -XPUT "http://localhost:9200/_ingest/pipeline/${pipeline_id}" -H 'Content-Type: application/json' -d @"${pipeline_file}"
done

#如果没有jq命令需要安装。或者将pipelines.json文件拷贝到可以访问到es集群的机器上然后执行脚本。