the-don-himself / twitter-graph
使用Gremlin OGM库实现的Twitter示例图数据库。
Requires
- php: ^7.2
- abraham/twitteroauth: ^1.0
- symfony/console: ^5.0 || ^4.0 || ^3.0
- symfony/yaml: ^5.0 || ^4.0 || ^3.0
- the-don-himself/gremlin-ogm: ^0.5
README
使用The-Don-Himself/gremlin-ogm库实现的Twitter示例图数据库的示例代码。
创建模式
注意:并非所有Gremlin兼容的数据库都支持模式,如果你正在使用一个,请跳过这部分。
首先我们处理顶点(我在一些地方把它们称为顶点,类似于索引和索引)
顶点
<?php
namespace TheDonHimself\TwitterGraph\Graph\Vertices;
use JMS\Serializer\Annotation as Serializer;
use TheDonHimself\GremlinOGM\Annotation as Graph;
/**
* @Serializer\ExclusionPolicy("all")
* @Graph\Vertex(
* label="tweets",
* indexes={
* @Graph\Index(
* name="byTweetsIdComposite",
* type="Composite",
* unique=true,
* label_constraint=true,
* keys={
* "tweets_id"
* }
* ),
* @Graph\Index(
* name="tweetsMixed",
* type="Mixed",
* label_constraint=true,
* keys={
* "tweets_id" : "DEFAULT",
* "text" : "TEXT",
* "retweet_count" : "DEFAULT",
* "created_at" : "DEFAULT",
* "favorited" : "DEFAULT",
* "retweeted" : "DEFAULT",
* "source" : "STRING"
* }
* )
* }
* )
*/
class Tweets
{
/**
* @Serializer\Type("integer")
* @Serializer\Expose
* @Serializer\Groups({"Default"})
*/
public $id;
/**
* @Serializer\VirtualProperty
* @Serializer\Expose
* @Serializer\Type("integer")
* @Serializer\Groups({"Graph"})
* @Serializer\SerializedName("tweets_id")
* @Graph\Id
* @Graph\PropertyName("tweets_id")
* @Graph\PropertyType("Long")
* @Graph\PropertyCardinality("SINGLE")
*/
public function getVirtualId()
{
return self::getId();
}
/**
* @Serializer\Type("string")
* @Serializer\Expose
* @Serializer\Groups({"Default", "Graph"})
* @Graph\PropertyName("text")
* @Graph\PropertyType("String")
* @Graph\PropertyCardinality("SINGLE")
*/
public $text;
/**
* @Serializer\Type("integer")
* @Serializer\Expose
* @Serializer\Groups({"Default", "Graph"})
* @Graph\PropertyName("retweet_count")
* @Graph\PropertyType("Integer")
* @Graph\PropertyCardinality("SINGLE")
*/
public $retweet_count;
/**
* @Serializer\Type("boolean")
* @Serializer\Expose
* @Serializer\Groups({"Default", "Graph"})
* @Graph\PropertyName("favorited")
* @Graph\PropertyType("Boolean")
* @Graph\PropertyCardinality("SINGLE")
*/
public $favorited;
/**
* @Serializer\Type("boolean")
* @Serializer\Expose
* @Serializer\Groups({"Default", "Graph"})
* @Graph\PropertyName("retweeted")
* @Graph\PropertyType("Boolean")
* @Graph\PropertyCardinality("SINGLE")
*/
public $retweeted;
/**
* @Serializer\Type("DateTime<'', '', 'D M d H:i:s P Y'>")
* @Serializer\Expose
* @Serializer\Groups({"Default", "Graph"})
* @Graph\PropertyName("created_at")
* @Graph\PropertyType("Date")
* @Graph\PropertyCardinality("SINGLE")
*/
public $created_at;
/**
* @Serializer\Type("string")
* @Serializer\Expose
* @Serializer\Groups({"Default", "Graph"})
* @Graph\PropertyName("source")
* @Graph\PropertyType("String")
* @Graph\PropertyCardinality("SINGLE")
*/
public $source;
/**
* @Serializer\Type("TheDonHimself\TwitterGraph\Graph\Vertices\Users")
* @Serializer\Expose
* @Serializer\Groups({"Default"})
*/
public $user;
/**
* @Serializer\Type("TheDonHimself\TwitterGraph\Graph\Vertices\Tweets")
* @Serializer\Expose
* @Serializer\Groups({"Default"})
*/
public $retweeted_status;
/**
* Get id.
*
* @return int
*/
public function getId()
{
return $this->id;
}
}
边
以及一条边
<?php
namespace TheDonHimself\TwitterGraph\Graph\Edges;
use JMS\Serializer\Annotation as Serializer;
use TheDonHimself\GremlinOGM\Annotation as Graph;
/**
* @Serializer\ExclusionPolicy("all")
* @Graph\Edge(
* label="follows",
* multiplicity="MULTI"
* )
*/
class Follows
{
/**
* @Graph\AddEdgeFromVertex(
* targetVertex="users",
* uniquePropertyKey="users_id",
* methodsForKeyValue={"getUserVertex1Id"}
* )
*/
protected $userVertex1Id;
/**
* @Graph\AddEdgeToVertex(
* targetVertex="users",
* uniquePropertyKey="users_id",
* methodsForKeyValue={"getUserVertex2Id"}
* )
*/
protected $userVertex2Id;
public function __construct($user1_vertex_id, $user2_vertex_id)
{
$this->userVertex1Id = $user1_vertex_id;
$this->userVertex2Id = $user2_vertex_id;
}
/**
* Get User 1 Vertex ID.
*
*
* @return int
*/
public function getUserVertex1Id()
{
return $this->userVertex1Id;
}
/**
* Get User 2 Vertex ID.
*
*
* @return int
*/
public function getUserVertex2Id()
{
return $this->userVertex2Id;
}
}
这个库的优点在于它只帮助你编写Gremlin命令,但不会阻止你直接与Gremlin交互,例如在上面的Follows Edge情况下,如果你可以向它传递一个唯一的标识符,如user_id、house_id、taxi_id等,该库将生成创建两个顶点之间边界的Gremlin命令。如果你想以其他方式添加边,你可以简单地编写一个Gremlin命令并通过$graph_connection->send('my awesome gremlin command;')直接提交。
Follows edges类非常简单,它只是通过user_id创建两个顶点之间的边,在实际的例子中,你可能创建边并添加如followed_on、via_app、introduced_by等属性。只需将这些属性添加到类中,让库为你序列化它们即可。
创建模式
在创建顶点和边类时,请参考来自\TheDonHimself\TwitterGraph\Commands的代码,它们包括;
SchemaCheckCommand;SchemaCreateCommand;PopulateCommand;VertexesCountCommand;VertexesDropCommand;EdgesCountCommand;EdgesDropCommand;GremlinTraversalCommand;
SchemaCheckCommand运行一些检查以确保你没有重复属性名、标签或索引的名称,而SchemaCreateCommand实际上会遍历你的图类并发送Gremlin命令来创建它们。PopulateCommand通过API填充图中的数据,例如示例TwitterGraph的情况,或者如果你使用Doctrine ORM(RDBMS)和/或Doctrine ODM(MongoDB),则从数据库中填充数据。GremlinTraversalCommand允许你通过CLI发送Gremlin命令,例如php bin/graph twittergraph:gremlin:traversal --traversal="g.V().count()"。
遍历图
这个库几乎无缝地从Gremlin API过渡而来。这里最重要的是来自\TheDonHimself\Traversal\TraversalBuilder的TraversalBuilder,它返回准备执行的Gremlin命令,例如要获取Twitter的用户顶点,你可以构建以下Traversal:
use TheDonHimself\GremlinOGM\Traversal\TraversalBuilder;
....
$user_id = 12345;
$traversalBuilder = new TraversalBuilder();
$command = $traversalBuilder
->g()
->V()
->hasLabel("'users'")
->has("'users_id'", "$user_id")
->getTraversal();
return $command;
请注意单引号和双引号
回显此命令将显示以下内容
g.V().hasLabel('users').has('users_id', 12345)
如果你想要在脚本参数化的情况下使用绑定(强烈推荐),你可以这样做。
use TheDonHimself\GremlinOGM\Traversal\TraversalBuilder;
....
$user_id = 12345;
$traversalBuilder = new TraversalBuilder();
$command = $traversalBuilder
->raw('def b = new Bindings(); ')
->g()
->V()
->hasLabel("'users'")
->has("'users_id'", "b.of('user_id', $user_id)")
->getTraversal();
return $command;
再次请注意单引号和双引号
回显此命令将显示以下内容
def b = new Bindings(); g.V().hasLabel('users').has('users_id', b.of('user_id', 12345))
关于检查可能的遍历步骤,请参考来自\TheDonHimself\Traversal\Step的代码
现在让我们变得更加复杂一些,获取用户的动态
$screen_name = 'my_username';
$traversalBuilder = new TraversalBuilder();
$command = $traversalBuilder
->g()
->V()
->hasLabel("'users'")
->has("'screen_name'", "'$screen_name'")
->union(
(new TraversalBuilder())->out("'tweeted'")->getTraversal(),
(new TraversalBuilder())->out("'follows'")->out("'tweeted'")->getTraversal()
)
->order()
->by("'created_at'", 'decr')
->limit(10)
->getTraversal();
return $command;
回显此命令将显示以下内容
g.V().hasLabel('users').has('screen_name', 'my_username').union(out('tweeted'), out('follows').out('tweeted')).order().by('created_at', decr).limit(10)
这就是目前的全部内容,这个简单的库还能做很多其他事情,请查看sample TwitterGraph文件夹,通过运行此命令快速开始使用示例图,该图包含你的Twitter朋友、关注者、点赞、推文和转发。库为此提供了一个预先配置的只读Twitter应用程序。
测试
目前,我还没有编写任何测试套件,但你可以使用与这个库预先配置的示例Twitter Graph来测试这个库。但是,我已经测试了以下图数据库是否可以正常工作,我将在我有时间/资源的时候测试更多。
- Azure Cosmos DB
- Compose上的JanusGraph
- 自托管的JanusGraph
在根目录下的相应yaml文件中简单配置它们中的任何一个,然后执行以下操作
php bin/graph twittergraph:schema:create
然后
php bin/graph twittergraph:populate
Azure Cosmos DB
请注意:Schema创建命令不适用于CosmosDB
示例
php bin/graph twittergraph:populate
Populate the Twitter Graph with Data
====================================
Enter the path to a yaml configuration file or use defaults (JanusGraph, 127.0.0.1:8182 with ssl, no username or password):
> \path\to\azure-cosmosdb.yaml
The Twitter Username to Populate:
> The_Don_Himself
Perform a Dry Run [false]:
>
Twitter User @The_Don_Himself Found
Twitter ID : 622225192
Creating Vertexes...
Done! 338 Vertexes Created
Creating Edges...
Done! 367 Edges Created
Graph Populated Successfully!
Compose上的JanusGraph
示例
php bin/graph twittergraph:populate
Populate the Twitter Graph with Data
====================================
Enter the path to a yaml configuration file or use defaults (JanusGraph, 127.0.0.1:8182 with ssl, no username or password):
> \path\to\janusgraph-compose.yaml
The Twitter Username to Populate:
> The_Don_Himself
Perform a Dry Run [false]:
>
Twitter User @The_Don_Himself Found
Twitter ID : 622225192
Creating Vertexes...
Done! 338 Vertexes Created
Creating Edges...
Done! 367 Edges Created
Graph Populated Successfully!
自托管的JanusGraph
示例
php bin/graph twittergraph:populate
Populate the Twitter Graph with Data
====================================
Enter the path to a yaml configuration file or use defaults (JanusGraph, 127.0.0.1:8182 with ssl, no username or password):
> \path\to\janusgraph.yaml
The Twitter Username to Populate:
> The_Don_Himself
Perform a Dry Run [false]:
>
Twitter User @The_Don_Himself Found
Twitter ID : 622225192
Creating Vertexes...
Done! 338 Vertexes Created
Creating Edges...
Done! 367 Edges Created
Graph Populated Successfully!
GraphQL
您可能还对graphql2gremlin感兴趣,它试图在将GraphQL查询转换为Gremlin遍历方面建立一个标准。