URL 清理、验证和存储

安装: 20

依赖项: 0

建议者: 0

安全: 0

星星: 0

关注者: 1

分支: 0

开放问题: 1

类型:symfony-bundle

dev-master 2017-03-31 03:57 UTC

This package is not auto-updated.

Last update: 2024-09-29 04:14:25 UTC


README

此包提供了一套强大的工具集,用于清理、验证和提交 URL。

如下所述,主要目标是提供一个非常一致的爬取系统(比“parse_url”和“curl exec”等更高级)。

Symfony 安装

您可以使用 composer 在您的 Symfony 标准分发版中轻松安装。

composer require open-actu/url dev-master

然后,将包添加到您的 AppKernel 中。

// app/AppKernel.php
$bundles = array(
    // ..
    new OpenActu\UrlBundle\OpenActuUrlBundle(),
    // ..
);

现在,您需要在主配置文件 'app/config/config.yml' 中添加配置 URL。

open_actu_url:
    url:
        # ==========================
        # schemes requirement
        # ==========================
        # provides lines for the management of valid URL schemes
        # ==========================
        schemes: [ "http", "https","file" ]
        # ==========================
        # scheme default
        # ==========================
        # this scheme will be used when no scheme is indicated
        # ==========================
        scheme_default: "http"
        # ==========================
        # level exception management
        # ==========================
        # two modes are availabled : "INFO" and "ERROR"
        # - INFO : this mode store exception in a exception bag. The exceptions can be retrieved with 
        #          method "getExceptions" on service manager
        # - ERROR: this mode provide an UrlException at the first error detected 
        # ==========================
        level_exception: "INFO"
        # ==========================
        # defaults ports requirements
        # ==========================
        # (OPTIONAL) Configuration area to manage the port use. 
        # three modes are availabled: "normal", "forced" and "none"
        # - normal (RECOMMANDED): If the port is the standard port used with the current scheme, the port will
        #                          be omitted.
        # - forced		: force the port information. If port is not given, the port takes the default port
        #                          relative to the current scheme
        # - none		: use port only if the information is done
        port:
            defaults:
                - { scheme: http,port: 80 }
                - { scheme: https,port: 443 }
            mode: "forced"
        # ===========================
        # protocol configuration
        # ===========================
        # (OPTIONAL) Configuration to manage the remote request sending
        # - get (DEFAULT) 	: send request as GET query
        # - post		: send request as POST query
        protocol:
            method: "get"
            timeout: 10
        # ==========================
        # response management
        # ==========================
        response:
        # purge the delay requirement according to the unit choosen
        # to process it, you need to add a cron with the calling of
        # the command "open-actu:response:purge"
        # the unit accepted are second, minute, hour, day or month
            purge:
                delay: 1
                unit: hour

最后,您必须在 doctrine 区域添加一个映射类型(始终在 'app/config/config.yml' 中)。

doctrine:
    dbal:
        ...
        mapping_types:
            enum: string

恭喜,您现在可以使用此包了!

用例

以下是我们可以使用它的基本示例。这仅在您检查了所需的最低配置(实体创建等)的情况下才有效,如下一章所述。

  ...
  use MyBundle\MyLink;
  ... 
  $usm= $this->container->get('open-actu.url_storage.manager');		
  $um = $this->container->get('open-actu.url.manager');
  
  // Configuration settings
  $um->changePortMode('normal');
  		
  /**
   * sanitize area - first step to work	 
   */
  $link = $um->sanitize(MyLink::class,"http://www.google.fr/");
  		
  # we push (this is not obligatory)
  $usm->push($link);
  
  if(null !== $link && !$um->hasErrors())
  {
 	/**
	 * now we can send request and receive response
	 */
	$um->send($link);

	/**
	 * we said that the link can not be updated
	 */
	$link->setAcceptUpdate(true);

	/**
	 * we can store the object in database
	 */
	$usm->push($link);
  			
  }

特性

文档的源存储在 Resources/doc/ 文件夹中

清理 URL

与 URL 实体一起工作