spinettainc/phpw2v

这是 'PHP实现的Word2Vec算法,由Tomas Mikolov创建,并由Radim Řehůřek和Peter Sojka与Gensim Python库推广的流行词嵌入算法' 的一个分支版本。

0.1.0-alpha 2023-06-11 07:24 UTC

This package is auto-updated.

Last update: 2024-09-13 15:13:06 UTC


README

"PHP实现的Word2Vec算法的分支版本,这是一种由Tomas Mikolov创建且由Radim Řehůřek和Peter Sojka与Gensim Python库推广的流行词嵌入算法"。

安装

使用 Composer 将PHPW2V安装到您的项目中

$ composer require spinettainc/phpw2v

要求

  • PHP 7.4或更高版本

使用PHPW2v

步骤1:在文件顶部引入Vendor自动加载并导入PHPW2V

<?php

require __DIR__ . '/vendor/autoload.php';

use PHPW2V\Word2Vec;
use PHPW2V\SoftmaxApproximators\NegativeSampling;

步骤2:准备一个句子数组

$sentences = [
    'the fox runs fast',
    'the cat jogged fast',
    'the pug ran fast',
    'the cat runs fast',
    'the dog ran fast',
    'the pug runs fast',
    'the fox ran fast',
    'dogs are our link to paradise',
    'pets are humanizing',
    'a dog is the only thing on earth that loves you more than you love yourself',    
];

步骤3:训练您的模型并将其保存以供以后使用

$dimensions     = 150; //vector dimension size
$sampling       = new NegativeSampling; //Softmax Approximator
$minWordCount   = 2; //minimum word count
$alpha          = .05; //the learning rate
$window         = 3; //window for skip-gram
$epochs         = 500; //how many epochs to run
$subsample      = 0.05; //the subsampling rate


$word2vec = new Word2Vec($dimensions, $sampling, $window, $subsample,  $alpha, $epochs, $minWordCount);
$word2vec->train($sentences);
$word2vec->save('my_word2vec_model');

步骤4:加载之前训练的模型并找到最相似的单词

$word2vec = new Word2Vec();
$word2vec = $word2vec->load('my_word2vec_model');

$mostSimilar = $word2vec->mostSimilar(['dog']);

这将导致

Array
(
    [fox] => 0.65303660275952
    [pug] => 0.63475600376409
    [you] => 0.63469270773687
    [cat] => 0.28333476473645
    [are] => 0.0086017358485732
    [ran] => -0.016116842526914
    [the] => -0.068253396295047
    [runs] => -0.11967150816883
    [fast] => -0.12999690227979
)

步骤5:在正负语境中找到相似的单词

$mostSimilar = $word2vec->mostSimilar(['dog'], ['cat']);

以获得

$wordEmbedding = $word2vec->wordVec('dog');