Clickhouse
The Clickhouse scraper will execute a query and then create a catalog item for each returned row.
The example below creates a new Order
config for each row, with name as order_id
and the row in json format as the config.
clickhouse-scraper.yamlapiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: clickhouse-scraper
spec:
clickhouse:
- query: |
SELECT
concat('ORD-', toString(10000 + number)) as order_id,
['Electronics', 'Clothing', 'Books', 'Home', 'Sports'][rand() % 5 + 1] as category,
['New York', 'London', 'Tokyo', 'Paris', 'Sydney'][rand() % 5 + 1] as city,
round((rand() % 50000) / 100, 2) as amount,
['completed', 'pending', 'cancelled'][rand() % 3 + 1] as status,
toDateTime('2024-01-01 00:00:00') + toIntervalSecond(rand() % 31536000) as order_date
FROM numbers(1000)
type: Order
id: $.order_id
transform:
#full: true
expr: "[config].toJSON()"
Objects in external storages like AzureBlobStorage or AWS S3 can also be queried
azure-blob-storage-scraper.yamlapiVersion: mission-control.flanksource.com/v1
kind: Connection
metadata:
name: azure-prod
spec:
azure:
clientID:
valueFrom:
secretKeyRef:
name: azure-creds
key: CLIENT_ID
clientSecret:
valueFrom:
secretKeyRef:
name: azure-creds
key: CLIENT_SECRET
tenantID:
valueFrom:
secretKeyRef:
name: azure-creds
key: TENANT_ID
---
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: audit-logs
spec:
clickhouse:
- query: SELECT * FROM azureBlobStorage(azure_prod_audit_data);
type: AuditRecord
id: $.user_id
transform:
expr: "[config].toJSON()"
azureBlobStorage:
connection: azure-prod
account: acme-logs
container: audit
path: audit-data.csv
collection: azure_prod_audit_data
An instance of clichkouse server needs to be deployed for this scraper to function.
Mission Control can deploy an instance by setting config-db.clichkouse.enabled: true
in the helm chart.
An external clichkouse server can also be used via the clickhouseURL
parameter
Scraper
Field | Description | Scheme | Required |
---|---|---|---|
logLevel | Specify the level of logging. | string | |
schedule | Specify the interval to scrape in cron format. Defaults to every 60 minutes. | string | |
full | Set to true to extract changes from scraped configurations. Defaults to false . | bool | |
retention | Settings for retaining changes, analysis and scraped items | Retention | |
clickhouse | Specifies the list of SQL configurations to scrape. | []Clickhouse |
Clickhouse
Field | Description | Scheme |
---|---|---|
id* | A deterministic or natural id for the resource |
|
query* | Query to execute |
|
type* | e.g. |
|
awsS3 | AWS S3 configuration for accessing buckets | |
azureBlobStorage | Azure Blob Storage configuration for accessing storage | |
clickhouseURL | ClickHouse connection URL in format: clickhouse://<user>:<password>@<host>:<port>/<database>?param1=value1¶m2=value2 |
|
class |
| |
createFields | Identify the created time for a resource (if different to scrape time). If multiple fields are specified, the first non-empty value will be used |
|
deleteFields | Identify when a config item was deleted. If multiple fields are specified, the first non-empty value will be used |
|
format | Format of config item e.g. |
|
items | Extract multiple config items from this array | |
labels | Labels for each config item. |
|
name |
| |
properties | Custom templatable properties for the scraped config items. | |
tags | Tags for each config item. Max allowed: 5 | |
timestampFormat | Format to parse timestamps in | Go time format |
transform | Transform configs after they've been scraped |
AWSS3
Field | Description | Scheme |
---|---|---|
bucket | AWS S3 bucket name |
|
endpoint | Custom endpoint URL for AWS S3 |
|
path | Path within the S3 bucket |
|
connection | The connection url to use, mutually exclusive with | |
accessKey | Access Key ID | |
secretKey | Secret Access Key | |
region | The AWS region |
|
endpoint | Custom AWS Endpoint to use |
|
skipTLSVerify | Skip TLS verify when connecting to AWS |
|
AzureBlobStorage
Field | Description | Scheme |
---|---|---|
collectionName* | Name of the collection in Clickhouse. See Named Collections |
|
account | Azure storage account name |
|
container | Azure Blob Storage container name |
|
endpointSuffix | Azure endpoint suffix |
|
path | Path within the container |
|
connection | The connection url to use, mutually exclusive with | |
tenantId* | The Azure Active Directory tenant ID | |
subscriptionId* | The Azure subscription ID | |
clientId | The Azure client/application ID | |
clientSecret | The Azure client/application secret |