Skip to main content

Clickhouse

The Clickhouse scraper will execute a query and then create a catalog item for each returned row.

The example below creates a new Order config for each row, with name as order_id and the row in json format as the config.

clickhouse-scraper.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: clickhouse-scraper
spec:
clickhouse:
- query: |
SELECT
concat('ORD-', toString(10000 + number)) as order_id,
['Electronics', 'Clothing', 'Books', 'Home', 'Sports'][rand() % 5 + 1] as category,
['New York', 'London', 'Tokyo', 'Paris', 'Sydney'][rand() % 5 + 1] as city,
round((rand() % 50000) / 100, 2) as amount,
['completed', 'pending', 'cancelled'][rand() % 3 + 1] as status,
toDateTime('2024-01-01 00:00:00') + toIntervalSecond(rand() % 31536000) as order_date
FROM numbers(1000)
type: Order
id: $.order_id
transform:
#full: true
expr: "[config].toJSON()"

Objects in external storages like AzureBlobStorage or AWS S3 can also be queried

azure-blob-storage-scraper.yaml
apiVersion: mission-control.flanksource.com/v1
kind: Connection
metadata:
name: azure-prod
spec:
azure:
clientID:
valueFrom:
secretKeyRef:
name: azure-creds
key: CLIENT_ID
clientSecret:
valueFrom:
secretKeyRef:
name: azure-creds
key: CLIENT_SECRET
tenantID:
valueFrom:
secretKeyRef:
name: azure-creds
key: TENANT_ID
---
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
name: audit-logs
spec:
clickhouse:
- query: SELECT * FROM azureBlobStorage(azure_prod_audit_data);
type: AuditRecord
id: $.user_id
transform:
expr: "[config].toJSON()"
azureBlobStorage:
connection: azure-prod
account: acme-logs
container: audit
path: audit-data.csv
collection: azure_prod_audit_data
info

An instance of clichkouse server needs to be deployed for this scraper to function.

Mission Control can deploy an instance by setting config-db.clichkouse.enabled: true in the helm chart.

An external clichkouse server can also be used via the clickhouseURL parameter

Scraper

FieldDescriptionSchemeRequired
logLevelSpecify the level of logging.string
scheduleSpecify the interval to scrape in cron format. Defaults to every 60 minutes.string
fullSet to true to extract changes from scraped configurations. Defaults to false.bool
retentionSettings for retaining changes, analysis and scraped itemsRetention
clickhouseSpecifies the list of SQL configurations to scrape.[]Clickhouse

Clickhouse

FieldDescriptionScheme
id*

A deterministic or natural id for the resource

string or JSONPath

query*

Query to execute

string

type*

e.g. File::Host, File::Tomcat, File::Pom

string or JSONPath

awsS3

AWS S3 configuration for accessing buckets

AWSS3

azureBlobStorage

Azure Blob Storage configuration for accessing storage

AzureBlobStorage

clickhouseURL

ClickHouse connection URL in format: clickhouse://<user>:<password>@<host>:<port>/<database>?param1=value1&param2=value2

string

class

string or JSONPath

createFields

Identify the created time for a resource (if different to scrape time). If multiple fields are specified, the first non-empty value will be used

[]string or []JSONPath

deleteFields

Identify when a config item was deleted. If multiple fields are specified, the first non-empty value will be used

[]string or []JSONPath

format

Format of config item e.g. xml, properties. Defaults to JSON

string

items

Extract multiple config items from this array

JSONPath

labels

Labels for each config item.

map[string]string

name

string or JSONPath

properties

Custom templatable properties for the scraped config items.

[]ConfigProperty

tags

Tags for each config item. Max allowed: 5

[]ConfigTag

timestampFormat

Format to parse timestamps in createFields and deletedFields. Defaults to RFC3339

Go time format

transform

Transform configs after they've been scraped

Transform

AWSS3

FieldDescriptionScheme
bucket

AWS S3 bucket name

string

endpoint

Custom endpoint URL for AWS S3

string

path

Path within the S3 bucket

string

connection

The connection url to use, mutually exclusive with accessKey and secretKey

Connection

accessKey

Access Key ID

EnvVar

secretKey

Secret Access Key

EnvVar

region

The AWS region

string

endpoint

Custom AWS Endpoint to use

string

skipTLSVerify

Skip TLS verify when connecting to AWS

boolean

AzureBlobStorage

FieldDescriptionScheme
collectionName*

Name of the collection in Clickhouse. See Named Collections

string

account

Azure storage account name

string

container

Azure Blob Storage container name

string

endpointSuffix

Azure endpoint suffix

string

path

Path within the container

string

connection

The connection url to use, mutually exclusive with tenantId, subscriptionId, clientId, and clientSecret

Connection

tenantId*

The Azure Active Directory tenant ID

subscriptionId*

The Azure subscription ID

EnvVar

clientId

The Azure client/application ID

EnvVar

clientSecret

The Azure client/application secret

EnvVar