the-don-himself/twitter-graph

使用Gremlin OGM库实现的Twitter示例图数据库。

v0.4.0 2020-02-23 12:32 UTC

This package is auto-updated.

Last update: 2024-09-05 03:06:30 UTC


README

使用The-Don-Himself/gremlin-ogm库实现的Twitter示例图数据库的示例代码。

创建模式

注意:并非所有Gremlin兼容的数据库都支持模式,如果你正在使用一个,请跳过这部分。

首先我们处理顶点(我在一些地方把它们称为顶点,类似于索引和索引)

顶点

<?php

namespace TheDonHimself\TwitterGraph\Graph\Vertices;

use JMS\Serializer\Annotation as Serializer;
use TheDonHimself\GremlinOGM\Annotation as Graph;

/**
 *  @Serializer\ExclusionPolicy("all")
 *  @Graph\Vertex(
 *      label="tweets",
 *      indexes={
 *          @Graph\Index(
 *              name="byTweetsIdComposite",
 *              type="Composite",
 *              unique=true,
 *              label_constraint=true,
 *              keys={
 *                  "tweets_id"
 *              }
 *          ),
 *          @Graph\Index(
 *              name="tweetsMixed",
 *              type="Mixed",
 *              label_constraint=true,
 *              keys={
 *                  "tweets_id"       : "DEFAULT",
 *                  "text"            : "TEXT",
 *                  "retweet_count"   : "DEFAULT",
 *                  "created_at"      : "DEFAULT",
 *                  "favorited"       : "DEFAULT",
 *                  "retweeted"       : "DEFAULT",
 *                  "source"          : "STRING"
 *              }
 *          )
 *      }
 *  )
 */
class Tweets
{
    /**
     * @Serializer\Type("integer")
     * @Serializer\Expose
     * @Serializer\Groups({"Default"})
     */
    public $id;

    /**
     * @Serializer\VirtualProperty
     * @Serializer\Expose
     * @Serializer\Type("integer")
     * @Serializer\Groups({"Graph"})
     * @Serializer\SerializedName("tweets_id")
     * @Graph\Id
     * @Graph\PropertyName("tweets_id")
     * @Graph\PropertyType("Long")
     * @Graph\PropertyCardinality("SINGLE")
     */
    public function getVirtualId()
    {
        return self::getId();
    }

    /**
     * @Serializer\Type("string")
     * @Serializer\Expose
     * @Serializer\Groups({"Default", "Graph"})
     * @Graph\PropertyName("text")
     * @Graph\PropertyType("String")
     * @Graph\PropertyCardinality("SINGLE")
     */
    public $text;

    /**
     * @Serializer\Type("integer")
     * @Serializer\Expose
     * @Serializer\Groups({"Default", "Graph"})
     * @Graph\PropertyName("retweet_count")
     * @Graph\PropertyType("Integer")
     * @Graph\PropertyCardinality("SINGLE")
     */
    public $retweet_count;

    /**
     * @Serializer\Type("boolean")
     * @Serializer\Expose
     * @Serializer\Groups({"Default", "Graph"})
     * @Graph\PropertyName("favorited")
     * @Graph\PropertyType("Boolean")
     * @Graph\PropertyCardinality("SINGLE")
     */
    public $favorited;

    /**
     * @Serializer\Type("boolean")
     * @Serializer\Expose
     * @Serializer\Groups({"Default", "Graph"})
     * @Graph\PropertyName("retweeted")
     * @Graph\PropertyType("Boolean")
     * @Graph\PropertyCardinality("SINGLE")
     */
    public $retweeted;

    /**
     * @Serializer\Type("DateTime<'', '', 'D M d H:i:s P Y'>")
     * @Serializer\Expose
     * @Serializer\Groups({"Default", "Graph"})
     * @Graph\PropertyName("created_at")
     * @Graph\PropertyType("Date")
     * @Graph\PropertyCardinality("SINGLE")
     */
    public $created_at;

    /**
     * @Serializer\Type("string")
     * @Serializer\Expose
     * @Serializer\Groups({"Default", "Graph"})
     * @Graph\PropertyName("source")
     * @Graph\PropertyType("String")
     * @Graph\PropertyCardinality("SINGLE")
     */
    public $source;

    /**
     * @Serializer\Type("TheDonHimself\TwitterGraph\Graph\Vertices\Users")
     * @Serializer\Expose
     * @Serializer\Groups({"Default"})
     */
    public $user;

    /**
     * @Serializer\Type("TheDonHimself\TwitterGraph\Graph\Vertices\Tweets")
     * @Serializer\Expose
     * @Serializer\Groups({"Default"})
     */
    public $retweeted_status;

    /**
     * Get id.
     *
     * @return int
     */
    public function getId()
    {
        return $this->id;
    }
}

以及一条边

<?php

namespace TheDonHimself\TwitterGraph\Graph\Edges;

use JMS\Serializer\Annotation as Serializer;
use TheDonHimself\GremlinOGM\Annotation as Graph;

/**
 *  @Serializer\ExclusionPolicy("all")
 *  @Graph\Edge(
 *      label="follows",
 *      multiplicity="MULTI"
 *  )
 */
class Follows
{
    /**
     *  @Graph\AddEdgeFromVertex(
     *      targetVertex="users",
     *      uniquePropertyKey="users_id",
     *      methodsForKeyValue={"getUserVertex1Id"}
     *  )
     */
    protected $userVertex1Id;

    /**
     *  @Graph\AddEdgeToVertex(
     *      targetVertex="users",
     *      uniquePropertyKey="users_id",
     *      methodsForKeyValue={"getUserVertex2Id"}
     *  )
     */
    protected $userVertex2Id;

    public function __construct($user1_vertex_id, $user2_vertex_id)
    {
        $this->userVertex1Id = $user1_vertex_id;
        $this->userVertex2Id = $user2_vertex_id;
    }

    /**
     * Get User 1 Vertex ID.
     *
     *
     * @return int
     */
    public function getUserVertex1Id()
    {
        return $this->userVertex1Id;
    }

    /**
     * Get User 2 Vertex ID.
     *
     *
     * @return int
     */
    public function getUserVertex2Id()
    {
        return $this->userVertex2Id;
    }
}

这个库的优点在于它只帮助你编写Gremlin命令,但不会阻止你直接与Gremlin交互,例如在上面的Follows Edge情况下,如果你可以向它传递一个唯一的标识符,如user_id、house_id、taxi_id等,该库将生成创建两个顶点之间边界的Gremlin命令。如果你想以其他方式添加边,你可以简单地编写一个Gremlin命令并通过$graph_connection->send('my awesome gremlin command;')直接提交。

Follows edges类非常简单,它只是通过user_id创建两个顶点之间的边,在实际的例子中,你可能创建边并添加如followed_on、via_app、introduced_by等属性。只需将这些属性添加到类中,让库为你序列化它们即可。

创建模式

在创建顶点和边类时,请参考来自\TheDonHimself\TwitterGraph\Commands的代码,它们包括;

SchemaCheckCommand;SchemaCreateCommand;PopulateCommand;VertexesCountCommand;VertexesDropCommand;EdgesCountCommand;EdgesDropCommand;GremlinTraversalCommand;

SchemaCheckCommand运行一些检查以确保你没有重复属性名、标签或索引的名称,而SchemaCreateCommand实际上会遍历你的图类并发送Gremlin命令来创建它们。PopulateCommand通过API填充图中的数据,例如示例TwitterGraph的情况,或者如果你使用Doctrine ORM(RDBMS)和/或Doctrine ODM(MongoDB),则从数据库中填充数据。GremlinTraversalCommand允许你通过CLI发送Gremlin命令,例如php bin/graph twittergraph:gremlin:traversal --traversal="g.V().count()"。

遍历图

这个库几乎无缝地从Gremlin API过渡而来。这里最重要的是来自\TheDonHimself\Traversal\TraversalBuilder的TraversalBuilder,它返回准备执行的Gremlin命令,例如要获取Twitter的用户顶点,你可以构建以下Traversal:

use TheDonHimself\GremlinOGM\Traversal\TraversalBuilder;
....

$user_id = 12345;

$traversalBuilder = new TraversalBuilder();

$command = $traversalBuilder
  ->g()
  ->V()
  ->hasLabel("'users'")
  ->has("'users_id'", "$user_id")
  ->getTraversal();

return $command;

请注意单引号和双引号

回显此命令将显示以下内容

g.V().hasLabel('users').has('users_id', 12345)

如果你想要在脚本参数化的情况下使用绑定(强烈推荐),你可以这样做。

use TheDonHimself\GremlinOGM\Traversal\TraversalBuilder;
....

$user_id = 12345;

$traversalBuilder = new TraversalBuilder();

$command = $traversalBuilder
  ->raw('def b = new Bindings(); ')
  ->g()
  ->V()
  ->hasLabel("'users'")
  ->has("'users_id'", "b.of('user_id', $user_id)")
  ->getTraversal();

return $command;

再次请注意单引号和双引号

回显此命令将显示以下内容

def b = new Bindings(); g.V().hasLabel('users').has('users_id', b.of('user_id', 12345))

关于检查可能的遍历步骤,请参考来自\TheDonHimself\Traversal\Step的代码

现在让我们变得更加复杂一些,获取用户的动态

$screen_name = 'my_username';

$traversalBuilder = new TraversalBuilder();

$command = $traversalBuilder
  ->g()
  ->V()
  ->hasLabel("'users'")
  ->has("'screen_name'", "'$screen_name'")
  ->union(
    (new TraversalBuilder())->out("'tweeted'")->getTraversal(),
    (new TraversalBuilder())->out("'follows'")->out("'tweeted'")->getTraversal()
  )
  ->order()
  ->by("'created_at'", 'decr')
  ->limit(10)
  ->getTraversal();

return $command;

回显此命令将显示以下内容

g.V().hasLabel('users').has('screen_name', 'my_username').union(out('tweeted'), out('follows').out('tweeted')).order().by('created_at', decr).limit(10)

这就是目前的全部内容,这个简单的库还能做很多其他事情,请查看sample TwitterGraph文件夹,通过运行此命令快速开始使用示例图,该图包含你的Twitter朋友、关注者、点赞、推文和转发。库为此提供了一个预先配置的只读Twitter应用程序。

测试

目前,我还没有编写任何测试套件,但你可以使用与这个库预先配置的示例Twitter Graph来测试这个库。但是,我已经测试了以下图数据库是否可以正常工作,我将在我有时间/资源的时候测试更多。

  • Azure Cosmos DB
  • Compose上的JanusGraph
  • 自托管的JanusGraph

在根目录下的相应yaml文件中简单配置它们中的任何一个,然后执行以下操作

php bin/graph twittergraph:schema:create

然后

php bin/graph twittergraph:populate

Azure Cosmos DB

请注意:Schema创建命令不适用于CosmosDB

示例

php bin/graph twittergraph:populate

Populate the Twitter Graph with Data
====================================

 Enter the path to a yaml configuration file or use defaults (JanusGraph, 127.0.0.1:8182 with ssl, no username or password):
 > \path\to\azure-cosmosdb.yaml

 The Twitter Username to Populate:
 > The_Don_Himself

 Perform a Dry Run [false]:
 >

Twitter User @The_Don_Himself Found
Twitter ID : 622225192
Creating Vertexes...
Done! 338 Vertexes Created
Creating Edges...
Done! 367 Edges Created
Graph Populated Successfully!

Compose上的JanusGraph

示例

php bin/graph twittergraph:populate

Populate the Twitter Graph with Data
====================================

 Enter the path to a yaml configuration file or use defaults (JanusGraph, 127.0.0.1:8182 with ssl, no username or password):
 > \path\to\janusgraph-compose.yaml

 The Twitter Username to Populate:
 > The_Don_Himself

 Perform a Dry Run [false]:
 >

Twitter User @The_Don_Himself Found
Twitter ID : 622225192
Creating Vertexes...
Done! 338 Vertexes Created
Creating Edges...
Done! 367 Edges Created
Graph Populated Successfully!

自托管的JanusGraph

示例

php bin/graph twittergraph:populate

Populate the Twitter Graph with Data
====================================

 Enter the path to a yaml configuration file or use defaults (JanusGraph, 127.0.0.1:8182 with ssl, no username or password):
 > \path\to\janusgraph.yaml

 The Twitter Username to Populate:
 > The_Don_Himself

 Perform a Dry Run [false]:
 >

Twitter User @The_Don_Himself Found
Twitter ID : 622225192
Creating Vertexes...
Done! 338 Vertexes Created
Creating Edges...
Done! 367 Edges Created
Graph Populated Successfully!

GraphQL

您可能还对graphql2gremlin感兴趣,它试图在将GraphQL查询转换为Gremlin遍历方面建立一个标准。