README

emilklindt/laravel-marker-clusterer 包允许您在发送到客户端之前在 Laravel 中对标记进行聚类。

在处理大量数据集时，可能无法或实际发送所有数据点到客户端。这可能是由于传输所有数据点所需的较大有效负载，或者为了避免在客户端上过度处理。本项目旨在通过 Laravel 的服务器端标记聚类来促进这些情况。

安装
- 发布配置文件
用法
- 添加标记
- 标记聚类
聚类器
- K-means 聚类器
- 基于密度的空间聚类器
测试
致谢
许可证

安装

您可以通过 composer 安装此软件包

composer require emilklindt/laravel-marker-clusterer

该软件包将自动注册自己。

发布配置文件

您可以选择使用以下命令发布配置文件：

php artisan vendor:publish --provider="EmilKlindt\MarkerClusterer\MarkerClustererServiceProvider" --tag="config"

这是将发布到 config/marker-clusterer.php 的文件内容

配置文件内容

return [

    /*
    |--------------------------------------------------------------------------
    | Default Clusterer
    |--------------------------------------------------------------------------
    |
    | The default clustering method used when using the DefaultClusterer class
    | included in this project. This allows for easily swapping of the clusterer
    | used throughout a project, through only editing the config file.
    |
    */

    'default_clusterer' => 'density-based-spatial-clusterer',

    /*
    |--------------------------------------------------------------------------
    | Default Distance Formula
    |--------------------------------------------------------------------------
    |
    | The default formula for calculating distance from one coordinate to
    | another. Possible values are constants of the DistanceFormula enum.
    |
    */

    'default_distance_formula' => \EmilKlindt\MarkerClusterer\Enums\DistanceFormula::MANHATTAN,

    /*
    |--------------------------------------------------------------------------
    | K-means Clustering
    |--------------------------------------------------------------------------
    |
    | K-means algorithm identifies k number of centroids, and then allocates
    | every data point to the nearest cluster.
    |
    */

    'k_means' => [

        /*
        |--------------------------------------------------------------------------
        | Default Maximum Iterations
        |--------------------------------------------------------------------------
        |
        | The default number of maximum iterations of clustering, for example used
        | in K-means clustering, where clustering is repeated untill either reaching
        | convergence (no further changes) or the maximum number of iterations.
        |
        */

        'default_maximum_iterations' => 10,

        /*
        |--------------------------------------------------------------------------
        | Default Maximum Convergence Distance
        |--------------------------------------------------------------------------
        |
        | The maximum distance between iterations to count a cluster as converged,
        | meaning that no further iteration is necessary. A higher value can provide
        | better performance, due to the need of doing less iterations. A lower value
        | will ensure that a cluster has actually converged.
        |
        */

        'default_convergence_maximum' => 100.0,

        /*
        |--------------------------------------------------------------------------
        | Default Maximum Samples
        |--------------------------------------------------------------------------
        |
        | The default number of maximum samples of clustering, for example used
        | in K-means clustering, where the specified number of samples are run
        | to achieve the lowest variance between the centroids.
        |
        | This differs from maximum iterations in that, iterations are executed
        | on the same set of initially random centroids. Each sample instantiates
        | a new set of centroids to iteratively optimize, untill maximum number
        | of iterations or convergence is reached.
        |
        */

        'default_maximum_samples' => 10,
    ],

    /*
    |--------------------------------------------------------------------------
    | Density Based Spatial Clusterer (DBSCAN)
    |--------------------------------------------------------------------------
    |
    | Finds core samples of high density and expands clusters from them.
    |
    */

    'dbscan' => [

        /*
        |--------------------------------------------------------------------------
        | Default Include Noise
        |--------------------------------------------------------------------------
        |
        | Whether to include markers not meeting the threshold of minSamples.
        | If true, markers not within epsilon distance of at least minSamples,
        | will be included anyways, in a solo cluster for that given point.
        */

        'default_include_noise' => true,

        /*
        |--------------------------------------------------------------------------
        | Default Use Geohash Neighboring
        |--------------------------------------------------------------------------
        |
        | When response time is critical and precision is not, it may sometimes
        | be beneficial to use geohashing for neighbor searching only. A geohash
        | is calculated for every marker when added to the clusterer. This is
        | used to limit the scope of distance calculations to only points that
        | fall within neighboring geohashes.
        |
        | Enabling this setting will remove the last step, which is calculating
        | exact distance to each marker in the neighboring geohashes, and then
        | comparing it against the epsilon value.
        |
        | The geohash precision is based on the epsilon value, so by specifycing
        | a larger epsilon value, more markers will be considered neighbors etc.
        */

        'default_use_geohash_neighboring' => false,
    ]
];

用法

为了在整个应用程序中管理使用的聚类算法，建议您使用 DefaultClusterer 类。它使用 marker-clusterer 配置中的 default_clusterer 值，可以通过发布配置简单地更改。

$clusters = DefaultClusterer::cluster($markers, $config);

添加您的标记

聚类器接受标记集合。然而，这些标记必须实现 Clusterable 接口，该接口有一个用于检索标记坐标的单个方法。

在您的 Eloquent 模型中的示例实现可能如下所示

use League\Geotools\Coordinate\Coordinate;
use EmilKlindt\MarkerClusterer\Interfaces\Clusterable;

class Car extends Model implements Clusterable
{
    public function getClusterableCoordinate(): Coordinate
    {
        return new Coordinate([
            $this->lat,
            $this->lng
        ]);
    }
}

聚类您的标记

实现接口后，一组汽车可以按如下方式聚类

$cars = Car::get();

$config = new Config([
    'epsilon' => 10.5,
    'minSamples' => 2
]);

$clusters = DefaultClusterer::cluster($cars, $config);

请参阅下面的聚类部分，了解不同的聚类方法和它们的参数。

聚类器

欢迎贡献或功能请求。请随意创建拉取请求或在其想法讨论主题中发布您的功能请求。

来源：NSHipster/DBSCAN 存储库

K-means 聚类

可能是最著名的聚类算法，k-means 聚类将 n 个元素划分为 k 个聚类。

如上图所示，K-means 将标记聚类到最近的均值所在的聚类中，即标记到聚类中心的最近距离。K-means 需要一个 k 值，即所需的聚类数量。根据您的数据集，有许多不同的方法来选择此值，但这个方法尚未在本存储库中介绍。

$config = new Config([
    'k' => 3
]);

$clusters = KMeansClusterer::cluster($cars, $config);

适用于 KMeansClusterer 的配置参数

k，整数（必需）
所需的聚类数量以及要随机初始化的初始质心（聚类点）数量。如果 k 大于标记数量，所有标记将单独聚类。
iterations，整数（可选）
重新计算质心（聚类点平均位置）并分配标记到最近聚类的最大迭代次数。在随机初始化聚类后，标记被分配到最近的聚类，然后重新计算聚类质心，重复此过程。
convergenceMaximum，float（可选）
两次连续迭代中聚类质心之间的最大距离，以此宣布收敛，它与iterations一起是生成聚类样本的停止标准。在更改distanceFormula时，可能需要调整此值，因为每个距离公式具有不同的准确度。
samples，int（可选）
初始化聚类随机质心的次数以及执行iterations次数的优化迭代。找到samples数量的聚类样本后，选择方差最低的聚类样本（示例，由Arif R，Medium提供）。更高的值更有可能得到最佳结果，但会以性能为代价。
distanceFormula，string（可选）
用于计算点之间距离的公式。距离用于确定标记最近的聚类。有效值是DistanceFormula枚举的常量。如果聚类任务对精度很重要，则首选Haversine距离。

所有标记为可选的属性都具有默认值，在配置中指定。请参阅上面的发布配置文件部分以操作这些。

基于密度的空间聚类

使用的算法是基于密度的空间聚类应用噪声（DBSCAN）。

这种方法克服了K-means聚类的一个常见问题，即需要指定聚类的数量。此参数可能依赖于数据集——更具体地说，是数据集的密度。DBSCAN算法采用epsilon（两个点被分组在一起的最大距离）和minSamples（要分组形成聚类的最小点数）。

“应用噪声”部分意味着，该聚类方法能够处理噪声。这意味着不直接与聚类相关的点，可能根据includeNoise配置被丢弃或作为单独的点返回。

$config = new Config([
    'epsilon' => 10.5,
    'minSamples' => 5
]);

$clusters = DensityBasedSpatialClusterer::cluster($cars, $config);

适用于DensityBasedSpatialClusterer的配置参数

epsilon，float（必需）
两个标记之间的最大距离，一个标记被视为另一个标记的邻域。这不应该是最大聚类大小，因为通过多次步骤，聚类可能比epsilon大得多。
minSamples，integer（必需）
一个点的邻域中标记的数量，该点被认为是新聚类的核心点。如果一个点p在epsilon距离内有超过minSamples个邻居，则它有资格成为新聚类的核心点。算法从这个点开始创建新聚类，并重复添加所有在epsilon距离内的点，直到没有更多的点可达。
includeNoise，boolean（可选）
是否包括未达到minSamples阈值的标记。如果为真，所有不在聚类epsilon距离内的标记都将作为单独的聚类包括在内。
distanceFormula，string（可选）
用于计算点之间距离的公式。距离用于确定标记最近的聚类。有效值是DistanceFormula枚举的常量。如果聚类任务对精度很重要，则首选Haversine距离。

所有标记为可选的属性都具有默认值，在配置中指定。请参阅上面的发布配置文件部分以操作这些。

测试

composer test

致谢

感谢Spatie为包开发提供灵感和有用资源。

许可证

MIT许可证（MIT）。有关更多信息，请参阅许可证文件。

emilklindt / laravel-marker-clusterer

维护者

详细信息