ClickHouse OSS quick start
In this quick start tutorial, we'll get you set up with OSS ClickHouse in 8 easy steps. You'll download an appropriate binary for your OS, learn to run ClickHouse server, and use the ClickHouse client to create a table, then insert data into it and run a query to select that data.
Download ClickHouse
ClickHouse runs natively on Linux, FreeBSD and macOS, and runs on Windows via
the WSL. The simplest way to download ClickHouse locally is to run the
following curl command. It determines if your operating system is supported,
then downloads an appropriate ClickHouse binary.
We recommend running the command below from a new and empty subdirectory as some configuration files will be created in the directory the binary is located in the first time ClickHouse server is run.
You should see:
At this stage, you can ignore the prompt to run the install command.
For Mac users: If you are getting errors that the developer of the binary cannot be verified, please see "Fix the Developer Verification Error in MacOS".
Start the server
Run the following command to start the ClickHouse server:
You should see the terminal fill up with logging. This is expected. In ClickHouse
the default logging level
is set to trace rather than warning.
Start the client
Use clickhouse-client to connect to your ClickHouse service. Open a new
terminal, change directories to where your clickhouse binary is saved, and
run the following command:
You should see a smiling face as it connects to your service running on localhost:
Insert data
You can use the familiar INSERT INTO TABLE command with ClickHouse, but it is
important to understand that each insert into a MergeTree table causes what we
call a part in ClickHouse to be created in storage. These partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. later get
merged in the background by ClickHouse.
In ClickHouse, we try to bulk insert lots of rows at a time (tens of thousands or even millions at once) to minimize the number of parts that need to get merged in the background process.
In this guide, we won't worry about that just yet. Run the following command to insert a few rows of data into your table:
Query your new table
You can write a SELECT query just like you would with any SQL database:
Notice the response comes back in a nice table format:
Insert your own data
The next step is to get your own data into ClickHouse. We have lots of table functions and integrations for ingesting data. We have some examples in the tabs below, or you can check out our Integrations page for a long list of technologies that integrate with ClickHouse.
- S3
- GCS
- Web
- Local
- PostgreSQL
- MySQL
- ODBC/JDBC
- Message Queues
- Data Lakes
- Other
Use the s3 table function to
read files from S3. It's a table function - meaning that the result is a table
that can be:
- used as the source of a SELECTquery (allowing you to run ad-hoc queries and leave your data in S3), or...
- insert the resulting table into a MergeTreetable (when you are ready to move your data into ClickHouse)
An ad-hoc query looks like:
Moving the data into a ClickHouse table looks like the following, where
nyc_taxi is a MergeTree table:
View our collection of AWS S3 documentation pages for lots more details and examples of using S3 with ClickHouse.
The s3 table function used for
reading data in AWS S3 also works on files in Google Cloud Storage.
For example:
Find more details on the s3 table function page.
The url table function reads
files accessible from the web:
Find more details on the url table function page.
Use the file table engine to
read a local file. For simplicity, copy the file to the user_files directory
(which is found in the directory where you downloaded the ClickHouse binary).
Notice ClickHouse infers the names and data types of your columns by analyzing a large batch of rows. If ClickHouse can not determine the file format from the filename, you can specify it as the second argument:
View the file table function
docs page for more details.
Use the postgresql table function
to read data from a table in PostgreSQL:
View the postgresql table function
docs page for more details.
Use the mysql table function
to read data from a table in MySQL:
View the mysql table function
docs page for more details.
ClickHouse can read data from any ODBC or JDBC data source:
View the odbc table function
and the jdbc table function docs
pages for more details.
Message queues can stream data into ClickHouse using the corresponding table engine, including:
- Kafka: integrate with Kafka using the Kafkatable engine
- Amazon MSK: integrate with Amazon Managed Streaming for Apache Kafka (MSK)
- RabbitMQ: integrate with RabbitMQ using the RabbitMQtable engine
ClickHouse has table functions to read data from the following sources:
- Hadoop: integrate with Apache Hadoop using the hdfstable function
- Hudi: read from existing Apache Hudi tables in S3 using the huditable function
- Iceberg: read from existing Apache Iceberg tables in S3 using the icebergtable function
- DeltaLake: read from existing Delta Lake tables in S3 using the deltaLaketable function
Check out our long list of ClickHouse integrations to find how to connect your existing frameworks and data sources to ClickHouse.
Explore
- Check out our Core Concepts section to learn some of the fundamentals of how ClickHouse works under the hood.
- Check out the Advanced Tutorial which takes a much deeper dive into the key concepts and capabilities of ClickHouse.
- Continue your learning by taking our free on-demand training courses at the ClickHouse Academy.
- We have a list of example datasets with instructions on how to insert them.
- If your data is coming from an external source, view our collection of integration guides for connecting to message queues, databases, pipelines and more.
- If you are using a UI/BI visualization tool, view the user guides for connecting a UI to ClickHouse.
- The user guide on primary keys is everything you need to know about primary keys and how to define them.
