fwidm/dwd-hourly-crawler

爬取DWD FTP以从德国气象站检索天气数据

dev-master 2018-02-02 12:20 UTC

This package is not auto-updated.

Last update: 2024-09-27 15:18:13 UTC


README

此包包含查询DWD FTP的方法,通过指定要查询的参数,以及日期和纬度+经度。

数据源

所有数据均从德国气象服务(DWD)提供的公共数据集中检索。

公共CDC FTP根目录: ftp://ftp-cdc.dwd.de/pub/CDC/

说明

当前实现通过所需变量请求,获取用户在特定时间点的位置。然后,DWDLib对象初始化包含所有路径和将数据解析为扩展DWDAbstractParameter的相应类的请求变量服务。之后,爬虫将所有请求变量的数据组合起来,并将它们作为数组返回给调用者。

特性

  • 查询德国所有地区的公共DWD FTP上的最近小时数据
    • 可用参数
  • 包含一个安全查询选项,查询几个最近的站以获取一个结果
  • 将输出解析为不同的对象,这些对象包含文件中的所有数据以及所有参数的简短描述。

修复

  • 更改配置以使用FTP路径上的操作系统依赖性斜杠 - 目前由于FTP路径中的反斜杠,脚本在win上失败。
    • 只有本地文件使用操作系统依赖性斜杠。
  • 允许用户通过构造函数标志修改输出的基本目录。
    • 可以通过创建DWDLib实例时指定参数来实现。
  • 允许用户通过dwd将查询变量从预定义的组中拆分为单个变量。
  • 添加从站点到查询点的距离
  • 添加了分形支持
  • 添加了实验性支持以更快地查找数据

待办事项

  • 为一次爬虫任务缓存最近的站点(目前无法实现,因为每个变量可能有其他活动控制器)
  • 更改代码:检查查询日期是否比上次检查的日期旧或相等,否则不查询
  • 如果查询旧数据,可能禁用检查站点是否活跃,这对于太阳能设备非常重要
    • 可能以检查查询日期是否在站点的“活跃”期内重写“活跃”部分
  • 添加通过构造函数启用日志记录的选项
  • 添加设置给定点附近活动站点半径的选项

示例

库的使用很简单

//Coordinates we want to query
$coordinates=new Coordinate(48.3751,8.9801);
//Use default folders
$dwdLib=new DWDLib();

//OR: set output of the downloaded files to <dir>/storage/...
$dwdLib = new DWDLib("storage");

//set up which parameters you need
$param = new DWDHourlyParameters();
$param->addAirTemperature()->addCloudiness()->addPrecipitation()->addPressure()->addSoilTemperature()->addSun()->addWind()/*->add...*/;
// EITHER:
$out = $dwdLib->getHourlyByInterval($param, $date, $coordinates->getLat(), $coordinates->getLng());
// OR: to get all data for one day
$out = $dwdLib->getHourlyDataByDay($vars, $date, $coordinates->getLat(), $coordinates->getLng());

输出包含一个具有键 values => 天气参数和 stations => 天气站。

DWD 参数 "组"

获取参数JSON

/*
 * Print all retrieved items in the 'values' part => weather parameters as json
 */
foreach ($out['values'] as $key => $obj) {
    print "obj=$key<br>";
    //either iterate to convert single items
    foreach ($obj as $value) {
        /* @var $value DWDAbstractParameter */
        //Each model has a toResource method that returns Fractal's ResourceAbstract, it can be used to retrieve an array or json data
        prettyPrint(FractalWrapper::toJson(FractalWrapper::toResource($parameter,new ParameterTransformer()),JSON_PRETTY_PRINT));
    //or use fractal wrapper if you want to convert everything
    $collection=FractalWrapper::toResource($obj,new ParameterTransformer());
    prettyPrint(FractalWrapper::toJson($collection,JSON_PRETTY_PRINT));
    }
}

输出

{
    "data": {
        "station_id": 2074,
        "description": {
            "qualityLevel": "QN_9: quality level - refer to ftp:\/\/ftp-cdc.dwd.de\/pub\/CDC\/observations_germany\/climate\/hourly\/air_temperature\/recent\/DESCRIPTION_obsgermany_climate_hourly_tu_recent_en.pdf",
            "temperature2m": "TT_TU: temperature in 2m height - in degrees Celsius.",
            "relativeHumidity": "RF_TU: relative humidity in percent.",
            "temperature2mUnit": "C",
            "relativeHumidityUnit": "%"
        },
        "classification": "Temperature",
        "distance": 8.651701,
        "lon": "8.9801",
        "lat": "48.3751",
        "date": "2017-09-16T22:00:00+00:00",
        "2m_temperature": "6.5",
        "2m_temperature_unit": "C",
        "relative_humidity": "96.0",
        "relative_humidity_unit": "%"
    }
}

站点

获取站点JSON

/*
 * Print all stations as json
 */
foreach ($out['stations'] as $key => $obj) {
    print "obj=$key<br>";
    /* @var $obj \FWidm\DWDHourlyCrawler\Model\DWDStation */
    prettyPrint(FractalWrapper::toJson(FractalWrapper::toResource($obj,new StationTransformer()),JSON_PRETTY_PRINT));
    
}
//or use fractal wrapper if you want to convert everything
$collection=FractalWrapper::toResource($out['stations'] ,new StationTransformer());
prettyPrint(FractalWrapper::toJson($collection,JSON_PRETTY_PRINT));

输出

{
    "data": {
        "id": "02074",
        "from": "2004-06-01T09:27:45+00:00",
        "until": "2017-11-28T09:27:45+00:00",
        "name": "Hechingen",
        "state": "Baden-W\u00fcrttemberg",
        "height": "522",
        "lon": "8.9801",
        "lat": "48.3751",
        "active": true
    }
}

紧凑参数

此外,还可能将这些“分组”参数转换为单个变量对象

$collection=FractalWrapper::toResource($exported,new CompactParameterTransformer());
prettyPrint(FractalWrapper::toJson($collection,JSON_PRETTY_PRINT));

输出

{
    "data": [
        {
            "station_id": 2074,
            "description": {
                "name": "TT_TU: temperature in 2m height - in degrees Celsius.",
                "quality": 3,
                "qualityType": "QN_9: quality level - refer to ftp:\/\/ftp-cdc.dwd.de\/pub\/CDC\/observations_germany\/climate\/hourly\/air_temperature\/recent\/DESCRIPTION_obsgermany_climate_hourly_tu_recent_en.pdf",
                "units": "C"
            },
            "classification": "Temperature",
            "distance": 8.651701,
            "lon": "8.9801",
            "lat": "48.3751",
            "date": "2017-09-16T22:00:00+02:00",
            "value": 6.5,
            "type": "2 metre temperature"
        },
        {
            "station_id": 2074,
            "description": {
                "name": "RF_TU: relative humidity in percent.",
                "quality": 3,
                "qualityType": "QN_9: quality level - refer to ftp:\/\/ftp-cdc.dwd.de\/pub\/CDC\/observations_germany\/climate\/hourly\/air_temperature\/recent\/DESCRIPTION_obsgermany_climate_hourly_tu_recent_en.pdf",
                "units": "%"
            },
            "classification": "Temperature",
            "distance": 8.651701,
            "lon": "8.9801",
            "lat": "48.3751",
            "date": "2017-09-16T22:00:00+02:00",
            "value": 96,
            "type": "relative humidity in percent"
        }
    ]
}