README

Item Similarity: 基于内容、无模式的推荐服务

一个简单的推荐服务，用于计算项目的相似度。

由于这是我的持续进行的硕士项目的一部分，README将在10月之前得到改进。

概念

相似度计算

两个项目之间的相似度计算如下

给定以下两个JSON文档

a = {
    "brand": "Addi",
    "model": "Speedy",
    "colors": ["black", "white"],
    "category": "Shoes",
    "size": 42
}
b = {
    "brand": "Prima",
    "model": "Kazak",
    "colors": ["red", "white"],
    "category": "Sweater",
    "sleeves": "long"
}

首先，任何两个文档中都不存在的项目特征将被丢弃

a = {
    "brand": "Addi",
    "model": "Speedy",
    "colors": "black,white",
    "category": "Shoes",
}
b = {
    "brand": "Prima",
    "model": "Kazak",
    "colors": "red,white",
    "category": "Sweater",
}

其次，文档被转换为具有键作为值前缀的列表

a = ["brand_Addi", "model_Ayak", "colors_black", "colors_white", "category_Shoes"]
b = ["brand_Addi", "model_Kazak", "colors_red", "colors_white", "category_Sweater"]

最后，计算tanimoto系数的变体

nA = number of features in A
nB = number of features in B
nAB = number of intersecting features
score = nAB / (nA + nB - nAB)

相似度索引

该索引存储在MongoDB集合中，每个特征都有一个文档。此文档还跟踪其与其他文档的相似度分数。每次处理新记录时，都会计算与其他文档的相似度并将其存储。然后将此分数添加到其他文档。因此，当请求文档的相似度分数时，最终结果已经预先计算。

API

索引通过POST和DELETE请求管理。分数通过GET获取。

路由前缀{index}允许在实例中维护多个索引。

POST /{index} 将文档发布到索引并计算相似度分数

DELETE /{index} 删除文档

GET /{index}?itemIds=1,2,3 返回GET参数中的项目的相似项目。

安装

$ git clone https://github.com/halk/item-similarity
$ cd item-similarity
$ cp config/config.php.dist config/config.php

有关配置详细信息，请参阅recowise-vagrant。

测试

$ cp phpunit.xml.dist phpunit.xml
$ phpunit

halk / item-similarity

维护者

详细信息

README

Item Similarity: 基于内容、无模式的推荐服务

概念

相似度计算

相似度索引

API

安装

测试