Clickhouse

The Clickhouse scraper will execute a query and then create a catalog item for each returned row.

The example below creates a new Order config for each row, with name as order_id and the row in json format as the config.

clickhouse-scraper.yaml
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
  name: clickhouse-scraper
spec:
  clickhouse:
    - query: |
        SELECT 
            concat('ORD-', toString(10000 + number)) as order_id,
            ['Electronics', 'Clothing', 'Books', 'Home', 'Sports'][rand() % 5 + 1] as category,
            ['New York', 'London', 'Tokyo', 'Paris', 'Sydney'][rand() % 5 + 1] as city,
            round((rand() % 50000) / 100, 2) as amount,
            ['completed', 'pending', 'cancelled'][rand() % 3 + 1] as status,
            toDateTime('2024-01-01 00:00:00') + toIntervalSecond(rand() % 31536000) as order_date
        FROM numbers(1000)
      type: Order
      id: $.order_id
      transform:
        #full: true
        expr: "[config].toJSON()"

Objects in external storages like AzureBlobStorage or AWS S3 can also be queried

azure-blob-storage-scraper.yaml
apiVersion: mission-control.flanksource.com/v1
kind: Connection
metadata:
  name: azure-prod
spec:
  azure:
    clientID:
      valueFrom:
        secretKeyRef:
          name: azure-creds
          key: CLIENT_ID
    clientSecret:
      valueFrom:
        secretKeyRef:
          name: azure-creds
          key: CLIENT_SECRET
    tenantID:
      valueFrom:
        secretKeyRef:
          name: azure-creds
          key: TENANT_ID
---
apiVersion: configs.flanksource.com/v1
kind: ScrapeConfig
metadata:
  name: audit-logs
spec:
  clickhouse:
    - query: SELECT * FROM azureBlobStorage(azure_prod_audit_data);
      type: AuditRecord
      id: $.user_id
      transform:
        expr: "[config].toJSON()"
      azureBlobStorage:
        connection: azure-prod
        account: acme-logs
        container: audit
        path: audit-data.csv
        collection: azure_prod_audit_data

info

An instance of clichkouse server needs to be deployed for this scraper to function.

Mission Control can deploy an instance by setting config-db.clichkouse.enabled: true in the helm chart.

An external clichkouse server can also be used via the clickhouseURL parameter

Scraper

Field	Description	Scheme
`logLevel`	Specify the level of logging.	`string`
`schedule`	Specify the interval to scrape in cron format. Defaults to every 60 minutes.	`string`
`full`	Set to `true` to extract changes from scraped configurations. Defaults to `false`.	`bool`
`retention`	Settings for retaining changes, analysis and scraped items	`Retention`
`clickhouse`	Specifies the list of SQL configurations to scrape.	`[]Clickhouse`

Clickhouse

Field	Description	Scheme
`id*`	A deterministic or natural id for the resource	`string` or JSONPath
`query*`	Query to execute	`string`
`type*`	e.g. `File::Host`, `File::Tomcat`, `File::Pom`	`string` or JSONPath
`awsS3`	AWS S3 configuration for accessing buckets	AWSS3
`azureBlobStorage`	Azure Blob Storage configuration for accessing storage	AzureBlobStorage
`clickhouseURL`	ClickHouse connection URL in format: clickhouse://<user>:<password>@<host>:<port>/<database>?param1=value1&param2=value2	`string`
`class`		`string` or JSONPath
`createFields`	Identify the created time for a resource (if different to scrape time). If multiple fields are specified, the first non-empty value will be used	`[]string` or []JSONPath
`deleteFields`	Identify when a config item was deleted. If multiple fields are specified, the first non-empty value will be used	`[]string` or []JSONPath
`format`	Format of config item e.g. `xml`, `properties`. Defaults to `JSON`	`string`
`items`	Extract multiple config items from this array	JSONPath
`labels`	Labels for each config item.	`map[string]string`
`name`		`string` or JSONPath
`properties`	Custom templatable properties for the scraped config items.	`[]ConfigProperty`
`tags`	Tags for each config item. Max allowed: 5	`[]ConfigTag`
`timestampFormat`	Format to parse timestamps in `createFields` and `deletedFields`. Defaults to `RFC3339`	Go time format
`transform`	Transform configs after they've been scraped	`Transform`

AWSS3

Field	Description	Scheme
`bucket`	AWS S3 bucket name	`string`
`endpoint`	Custom endpoint URL for AWS S3	`string`
`path`	Path within the S3 bucket	`string`
`connection`	The connection url to use, mutually exclusive with `accessKey` and `secretKey`	Connection
`accessKey`	Access Key ID	EnvVar
`secretKey`	Secret Access Key	EnvVar
`region`	The AWS region	`string`
`endpoint`	Custom AWS Endpoint to use	`string`
`skipTLSVerify`	Skip TLS verify when connecting to AWS	`boolean`

AzureBlobStorage

Field	Description	Scheme
`collectionName*`	Name of the collection in Clickhouse. See Named Collections	`string`
`account`	Azure storage account name	`string`
`container`	Azure Blob Storage container name	`string`
`endpointSuffix`	Azure endpoint suffix	`string`
`path`	Path within the container	`string`
`connection`	The connection url to use, mutually exclusive with `tenantId`, `subscriptionId`, `clientId`, and `clientSecret`	Connection
`tenantId*`	The Azure Active Directory tenant ID
`subscriptionId*`	The Azure subscription ID	EnvVar
`clientId`	The Azure client/application ID	EnvVar
`clientSecret`	The Azure client/application secret	EnvVar

Scraper​

Clickhouse​

AWSS3​

AzureBlobStorage​

Scraper

Clickhouse

AWSS3

AzureBlobStorage