hisune/log2ck

实时、高性能的日志文件读取和写入ClickHouse。

1.1.8 2023-06-06 08:24 UTC

This package is auto-updated.

Last update: 2024-09-06 11:20:55 UTC


README

英语 | 中文

Latest Stable Version Total Downloads Latest Unstable Version License PHP Version Require

该工具可以通过TCP协议实时将Monolog标准日志写入ClickHouse。如果您能编写常规规则,其他标准化日志也可以支持。

特性

  • 极简代码
  • 高性能(在线业务CPU使用率仅为filebeat的1/20)
  • 不依赖于第三方服务(如队列等)
  • 可配置
  • 自定义(自定义正则表达式、行处理回调函数)
  • 支持按日分割日志读取
  • 支持自动断点续传收集
  • 支持批量插入数据
  • 支持优雅重启

使用说明

  1. 如果您使用默认正则表达式,需要读取的日志文件必须是标准的默认Monolog日志格式文件,且Monolog的name和group名称不能包含空格。
  2. 要读取的日志必须一行一行读取。例如,Monolog需要设置格式化为:'allowInlineLineBreaks'= > false

如何使用

# Install
composer require "hisune/log2ck"
# Modify config.php to the configuration you want 
cp vendor/hisune/log2ck/test.config.php config.php
# Create manager
vim manager.php

manager.php文件内容的示例

<?php
use Hisune\Log2Ck\Manager;
require_once 'vendor/autoload.php';
(new Manager(__DIR__ . DIRECTORY_SEPARATOR . 'config.php'))->run();
# Begin execution 
php manager.php

默认情况下,管理器和工作者执行日志可以在vendor/hisune/log2ck/logs/目录中查看。您也可以通过配置文件修改这两个日志的存储路径。

config.php配置示例

return [
    'env' => [ // System environment variables
//        'bin' => [
//            'php' => '/usr/bin/php', // Optional configuration, the path to which the php bin file belongs
//        ],
        'clickhouse' => [ // Required configuration
            'dsn' => 'tcp://192.168.37.205:9000',
            'username' => 'default',
            'password' => '',
            'options' => [
                'connect_timeout' => 3,
                'socket_timeout'  => 30,
                'tcp_nodelay'     => true,
                'persistent'      => true,
            ],
            'database' => 'logs', // Database name
            'table' => 'repo', // Table name
            'max_sent_count' => 100, // Insert when there are many pieces of data in a single batch
            'max_sent_wait' => 10, // If the number of data items in a single batch is not satisfied, the insertion will be performed at least once in how many seconds
        ],
//        'worker' => [
//            'cache_path' => '/dev/shm/', // Optional configuration, worker cache directory
//        ],
//        'logger' => [
//            'enable' => true, // Optional configuration, whether to record logs
//            'path' => __DIR__ . DIRECTORY_SEPARATOR . 'logs' . DIRECTORY_SEPARATOR, // Specify the directory where the logs are logged, optional configuration, and need to end with /
//        ],
    ],
    'tails' => [
        'access' => [ // Key is the log name, corresponding to the name field of clickhouse
            'repo' => 'api2', // The name of the project to which the log belongs
            'path' => '/mnt/c/access.log', // eg: Log path, fixed file name log
//            'path' => '/mnt/c/access-{date}.log', // eg: Log path, a daily log with a file name, currently only one macro variable {date} is supported. For example, the date format: 2022-02-22
//            'host' => 'host1', // Customize the host name, the default is the server host name if it is not set, which corresponds to the host field of clickhouse
//            'pattern' => '/\[(?P<created_at>.*)\] (?P<logger>\w+).(?P<level>\w+): (?P<message>.*[^ ]+) (?P<context>[^ ]+) (?P<extra>[^ ]+)/', // Optional configuration, if regular processing is not required, set to false
//            'callback' => function($data) { // Optional configuration, this line of data is processed according to a custom callback method, and the content of the method can implement any logic for cleaning this stream by itself.
//                $data['message'] = 'xxoo'; // For example, customize the processing of this data
//                return $data; // Need to return an array, key is the field name of the table in clickhouse, and value is the stored value
//            }
//            'clickhouse' => [...] // You can also configure the clickhouse connection information for individual projects, and the configuration content is the same as the clickhouse array of env.
        ],
    ],
];

supervisord

建议使用supervisord来管理您的管理进程。

[program:log2ck]
directory=/data/log2ck
command=php manager.php
user=root
autostart=true
autorestart=true
startretries=10
stderr_logfile=/data/logs/err.log
stdout_logfile=/data/logs/out.log

ClickHouse日志表结构

如果您使用Monolog并且使用默认的正则规则,可以直接使用以下表结构。如果您有自定义正则,可以根据自己的正则匹配结果自定义自己的ClickHouse表结构。

create table repo
(
    repo       LowCardinality(String) comment 'Project name',
    name       LowCardinality(String) comment 'Log name',
    host       LowCardinality(String) comment 'The machine where the log is generated',
    created_at DateTime,
    logger     LowCardinality(String),
    level      LowCardinality(String),
    message    String,
    context    String,
    extra      String
) engine = MergeTree()
      PARTITION BY toDate(created_at)
      ORDER BY (created_at, repo, host)
      TTL created_at + INTERVAL 10 DAY;

如果您的消息或上下文内容是json,可以参考ClickHouse的json查询功能:https://clickhouse.ac.cn/docs/en/sql-reference/functions/json-functions/