keboola / table-backend-utils
该包允许从多个云存储导入文件到Snowflake
Requires
- php: ^8.1
- ext-json: *
- ext-odbc: *
- ext-pdo: *
- doctrine/dbal: ^3.3
- google/cloud-bigquery: ^1.23
- keboola/common-exceptions: ^1
- keboola/php-datatypes: ^7.6
- keboola/php-utils: ^4.1
- keboola/retry: ^0.5.0
Requires (Dev)
- dev-main
- 2.8.1
- 2.8.0
- 2.7.0
- 2.6.1
- 2.6.0
- 2.5.2
- 2.5.1
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.5
- 2.2.4
- 2.2.3
- 2.2.2
- 2.2.1
- 2.2.0
- 2.1.0
- 2.0.1
- 2.0.0
- 1.15.1
- 1.15.0
- v1.14.1
- v1.14.0
- v1.13.0
- v1.12.0
- v1.11.0
- v1.10.0
- v1.9.0
- v1.8.0
- v1.7.0
- v1.6.0
- v1.5.0
- v1.4.2
- v1.4.1
- v1.4.0
- v1.3.0
- v1.2.4
- v1.2.3
- v1.2.2
- v1.2.1
- v1.2.0
- v1.1.0
- v1.0.1
- v1.0.0
- v0.20.0
- v0.19.0
- 0.18.3
- v0.18.2
- v0.18.1
- v0.18.0
- v0.17.0
- v0.16.1
- v0.16.0
- v0.15.1
- v0.15.0
- v0.14.0
- v0.13.0
- v0.12.0
- v0.11.0
- v0.10.0
- v0.9.1
- v0.9.0
- v0.8.1
- v0.8.0
- v0.7.0
- v0.6.0
- v0.5.1
- v0.5.0
- v0.4.3
- v0.4.2
- v0.4.1
- v0.4.0
- v0.3.2
- v0.3.1
- v0.3.0
- v0.2.0
- v0.1
- dev-dependabot/github_actions/dot-github/workflows/actions/download-artifact-4.1.7
- dev-zajca-big-256
- dev-zajca-ct-1642
- dev-zajca-ct-950-ignore-columns
- dev-zajca-fix-distinct-on-nonnative-tables-2
- dev-zajca-fix-wrong-tests
- dev-martinj-db-import-export-terraform-fix
- dev-martinj-fix-phpstan
- dev-CT-933-add-release-development-branch-tag-to-able-to-require-in-other-lib-in-process-of-programming
- dev-martinj-ct-1361-default-value-for-null-conversion-in-bq
- dev-jirka-ct-1271-add-alter-command
- dev-jirka-1271-new-command
- dev-jirka-ct-1331-add-protobuf-for-table-info-in-preview
- dev-zajca-ct-1301
- dev-CT-1169-put-column-definition-endpoint
- dev-erik-metadata-backend
- dev-BIG-208-create-table-definition-sql-injection
- dev-martin-GCP-445
- dev-CT-905-dont-run-build-if-create-tag-on-master-branch
- dev-adamvyborny-CM-727-php-datatypes-oracle
- dev-backup-cache
- dev-jirka-big-193-convert-load-exception
- dev-martin-build-ecr
- dev-zajca-tag.php
- dev-zajca-big-185
- dev-big-160-update-common
- dev-jirka-big-167-too-many-requests-exception
- dev-zajca-big-171
- dev-zajca-big-169
- dev-zajca-big-170
- dev-big-160
- dev-jirka-ct-1084-add-table-type-bq-td
- dev-BIG-126-external-buckets
- dev-BIG-126-external-buckets-2
- dev-big-153-roman-improve-type-hint
- dev-big-153-runtime-options
- dev-revert-84-zajca-ct-1118
- dev-zajca-BIG-155-ASCII
- dev-zajca-BIG-157
- dev-zajca-big-142
- dev-zajca-ct-1118
- dev-zajca-ct-1118-no-bc
- dev-zajca-ct-1128-1
- dev-zajca-fix-zero-length
- dev-zajca-new-err-code
- dev-jirka-ct-910-external-tables
- dev-zajca-kbc-1003
- dev-CM-569-ondra
- dev-disable-td
- dev-jirka-ct-924-re-enable-exasol-start-stop
- dev-zajca-CT-666-snflk-null
- dev-php81
- dev-CT-950-ignore_timestamp
- dev-ct-835-fixx-export-null
- dev-ct-843-fix-numeric-value-is-empty-string
- dev-roman-finish-release
- dev-CT-843-null-import
- dev-PST-631_SNFLK-add-missing-types
- dev-roman-add-release-phase
- dev-add-ie-lib-repo
- dev-test-build
- dev-CT-844
- dev-roman-testing-layers
- dev-odbc-test
- dev-roman-depending-build
- dev-roman-improve-docker
- dev-roman-add-workflows-ie-lib
- dev-roman-adopt-ie-lib
- dev-roman-add-insturction-about-adopt-repo
- dev-roman-fix-deprecating-warnings-in-table-utils
- dev-CT-807-php8
- dev-add-ci-to-php-table-backend-utils
- dev-add-php-table-backend-utils
- dev-roman-add-ie-lib
- dev-roman-split-php-table-backend-utils
- dev-zajca-kbc-2902
- dev-KBC-2942-add-php-table-backend-utils-to-the-monorepo
- dev-roman-fix-exa-version
- dev-zajca-showgrantsontable
- dev-zajca-fix-exasol-pk-create-table
- dev-zajca-init-auth
- dev-zajca-init
This package is auto-updated.
Last update: 2024-09-03 22:03:20 UTC
README
用于表格后端(Snowflake|Synapse|Redshift)之间共享的常用功能。
接口
数据库
Keboola\TableBackendUtils\Database\DatabaseReflectionInterface
interface DatabaseReflectionInterface { public function getUsersNames(?string $like = null): array; public function getRolesNames(?string $like = null): array; }
模式
Keboola\TableBackendUtils\Schema\SchemaReflectionInterface
用于检索模式信息的函数
interface SchemaReflectionInterface { public function getTablesNames(): array; public function getViewsNames(): array; }
表
Keboola\TableBackendUtils\Table\TableReflectionInterface
用于检索表信息的函数
interface TableReflectionInterface { public function getColumnsNames(): array; public function getColumnsDefinitions(): ColumnCollection; public function getRowsCount(): int; public function getPrimaryKeysNames(): array; public function getTableStats(): TableStatsInterface; public function isTemporary(): bool; public function getDependentViews(): array; }
Keboola\TableBackendUtils\Table\TableQueryBuilderInterface
用于处理表的查询
interface TableQueryBuilderInterface { public const TIMESTAMP_COLUMN_NAME = '_timestamp'; public function getDropTableCommand(string $schemaName, string $tableName): string; public function getRenameTableCommand(string $schemaName, string $sourceTableName, string $newTableName): string; public function getTruncateTableCommand(string $schemaName, string $tableName): string; public function getCreateTempTableCommand( string $schemaName, string $tableName, ColumnCollection $columns ): string; public function getCreateTableCommand( string $schemaName, string $tableName, ColumnCollection $columns, array $primaryKeys = [] ): string; public function getCreateTableCommandFromDefinition( TableDefinitionInterface $definition, bool $definePrimaryKeys = self::CREATE_TABLE_WITHOUT_PRIMARY_KEYS ): string; }
Keboola\TableBackendUtils\Table\TableStatsInterface
表统计信息
interface TableStatsInterface { public function getDataSizeBytes(): int; public function getRowsCount(): int; }
列
Keboola\TableBackendUtils\Column\ColumnInterface
表列定义
interface ColumnInterface { public function getColumnName(): string; public function getColumnDefinition(): Keboola\Datatype\Definition\DefinitionInterface; public static function createGenericColumn(string $columnName): self; }
视图
Keboola\TableBackendUtils\View\ViewReflectionInterface
用于检索视图信息的函数
interface ViewReflectionInterface { public function getDependentViews(): array; }
认证
Keboola\TableBackendUtils\Auth\UserReflectionInterface
interface UserReflectionInterface { public function endAllSessions(): void; public function getAllSessionIds(): array; }
Keboola\TableBackendUtils\Auth\GrantQueryBuilderInterface
interface GrantQueryBuilderInterface { public function getGrantSql(GrantOptionsInterface $options): string; public function getRevokeSql(RevokeOptionsInterface $options): string; }
开发
准备
设置环境变量
cp .env.dist .env
设置 TEST_PREFIX=
环境变量
SYNAPSE
在Azure门户或使用CLI创建synapse服务器。
设置环境变量
SYNAPSE_UID=UID SYNAPSE_PWD=xxxx SYNAPSE_DATABASE=synapse_db SYNAPSE_SERVER=<synapse>.database.windows.net
Teradata
在AWS/Azure上准备Teradata服务器并设置以下属性。见
为测试创建新数据库
CREATE DATABASE <nick>_utils_tests FROM dbc AS PERMANENT = 1e8, SPOOL = 1e8;
设置环境变量
TERADATA_HOST= TERADATA_PORT=1025 TERADATA_USERNAME= TERADATA_PASSWORD= TERADATA_DATABASE=
AWS
为了下载TD odbc驱动程序,创建一个可以访问包含驱动程序包的S3存储桶的用户。
设置环境变量
AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY=
Exasol
在本地机器上运行docker中的Exasol(对于此情况,.env已预先配置)
docker compose up -d exasol
在其他地方运行Exasol服务器并设置环境变量
EXASOL_HOST= EXASOL_USERNAME= EXASOL_PASSWORD=
问题
-
在较慢的机器上,Exasol可能会占用大量资源。运行
docker compose -f docker compose.yml -f docker compose.limits.yml up exasol
限制内存和CPU -
如果您遇到错误
exadt::ERROR: the private interface with address '<ip>/16' either does not exist or is down.
https://github.com/exasol/docker-db/issues/45 Exasol保存了您(容器)当前的IP地址,而docker daemon正在更改默认的桥接范围。有两种方法可以修复此问题
- 每次重启时将您当前的IP范围设置为exasol
docket-compose run --rm exasol exaconf modify-node -n 11 -p '<ip>/16'
这必须每次在IP地址不匹配时完成。 - 设置docker默认桥接到某个固定范围:编辑或创建
/etc/docker/daemon.json
并设置{"bip":"172.0.0.1/24"}
(或适合您的不同范围)
Snowflake
准备Snowflake访问凭据
CREATE ROLE "KEBOOLA_CI_TABLE_UTILS"; CREATE DATABASE "KEBOOLA_CI_TABLE_UTILS"; GRANT ALL PRIVILEGES ON DATABASE "KEBOOLA_CI_TABLE_UTILS" TO ROLE "KEBOOLA_CI_TABLE_UTILS"; GRANT USAGE ON WAREHOUSE "DEV" TO ROLE "KEBOOLA_CI_TABLE_UTILS"; CREATE USER "KEBOOLA_CI_TABLE_UTILS" PASSWORD = 'my_secret_password' DEFAULT_ROLE = "KEBOOLA_CI_TABLE_UTILS"; GRANT ROLE "KEBOOLA_CI_TABLE_UTILS" TO USER "KEBOOLA_CI_TABLE_UTILS";
设置环境变量
SNOWFLAKE_HOST=
SNOWFLAKE_PORT=443
SNOWFLAKE_USER=KEBOOLA_CI_TABLE_UTILS
SNOWFLAKE_PASSWORD='my_secret_password
SNOWFLAKE_DATABASE=KEBOOLA_CI_TABLE_UTILS
SNOWFLAKE_WAREHOUSE=
Bigquery
要准备后端,您可以使用 Terraform模板。您必须具有组织中的 resourcemanager.folders.create
权限。
# run in provisioning/local/BigQuery folder terraform init terraform apply -var organization_id=<your-org-id> -var backend_prefix=<your_backend_prefix> -var billing_account_id=<billing_account_id> # and enter name for your backend prefix for example your name, all resources will create with this prefx
在Terraform apply结束后,转到由Terraform创建的文件夹中的服务项目。
- 转到新创建的服务项目,项目ID列在Terraform调用末尾。(service_project_id)
- 点击IAM & Admin
- 在左侧面板中选择服务帐户
- 点击服务帐户的电子邮件(只有一个)
- 在顶部选择密钥和添加密钥 => 创建新密钥
- 选择密钥类型JSON
- 点击创建按钮,文件将自动下载
- 将密钥转换为字符串
awk -v RS= '{$1=$1}1' <key_file>.json >> .env
- 将.env文件最后一行的内容设置为变量
BQ_KEY_FILE
设置环境变量
BQ_KEY_FILE=<the content of the downloaded json key file>
测试
使用以下命令运行测试。
docker compose run --rm dev composer tests
单元测试和功能测试可以单独运行。
#unit test docker compose run --rm dev composer tests-unit #functional test docker compose run --rm dev composer tests-functional
代码质量检查
#phplint docker compose run --rm dev composer phplint #phpcs docker compose run --rm dev composer phpcs #phpcbf docker compose run --rm dev composer phpcbf #phpstan docker compose run --rm dev composer phpstan
完整的CI工作流程
此命令将运行所有检查,加载固定数据,并运行测试。
docker compose run --rm dev composer ci
Teradata连接
您可以使用以下方法获取对Teradata的连接以进行查询。
\Keboola\TableBackendUtils\Connection\Teradata\TeradataConnection::getConnection([ 'host' => getenv('TERADATA_HOST'), 'user' => getenv('TERADATA_USERNAME'), 'password' => getenv('TERADATA_PASSWORD'), 'port' => getenv('TERADATA_PORT'), 'dbname' => '', ]);
如果您想通过PHPStorm DataGrip或其他数据库客户端进行连接,请在客户端设置连接时删除端口号。否则,测试连接将失败。
许可
MIT许可,请参阅LICENSE文件。