We typically see five primary methods for getting data into Timescale, depending on your needs and use case:
Bulk Upload and Migration
Importing data from .csv
Migrating from an existing postgreSQL database
Migrating from another time-series database like InfluxDB
Inserting Data
1. Client drivers (such as JDBC, Python, and Node.js)
2. Message queues, specifically Kafka
NOTE: Before you get started, first make sure that you have installed your Managed Service for TimescaleDB instance and connected to a sql client like psql, pgAdmin, etc.
1. Importing data from .csv
If you have a dataset stored in a .csv file, you can import it into an empty TimescaleDB table or a “hypertable”. A hypertable is a Timescale-specific abstraction of a single continuous table across all space and time partitions, but it can still be queried with standard SQL.
1.1 Define the schema for your new table
We need to define the schema for a new table before we insert data into it. Creating this empty table requires knowledge of the schema of the data in the file, but is otherwise the same as creating any new table. Our example is a database named new_db and a data file named old_db.csv.
TIP: We’ve created a quick tool to automatically create your postgreSQL or timescale schema, found here on pgschema.com
First, you’ll need to use psql (or your preferred client/UI) to connect to the database. If you choose to use psql, first check that you may already have psql installed:
psql --version
Next, connecting to Managed Service for TimescaleDB is as simple as running:
psql -h [HOSTNAME] -p [PORT] -U [USERNAME] -W -d [DATABASENAME]
You can find your service URL from within the cloud portal, on the “Overview” tab.
Next, let’s create a database in which we will store our new table.
CREATE DATABASE new_db;
\c new_db;
This allows us to connect to the new database we just created.
Finally, we create a new table with a defined schema that maps to our CSV file.
-- Assuming the data file's columns are time, location, temperature
CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location text NOT NULL,
temperature DOUBLE PRECISION NULL
);
1.2 Convert the new table into a hypertable
Now that we have a SQL table, we can (optionally) convert it into a hypertable for better performance.
In Managed Service for TimescaleDB, any new databases you create need to have the TimescaleDB extension loaded in before you can create a new hypertable.
CREATE EXTENSION timescaledb;
\dx
this command lists the extensions currently available in your database. You should see timescaledb
listed there.
Then. convert the table you created earlier into a hypertable using the function “create_hypertable”
1.3. Inserting data into the hypertable
Recommended: Using timescaledb-parallel-copy
To bulk insert data into the new table, we recommend using our open sourced Go program that can speed up large data migrations by running multiple COPY’s concurrently. If you have TimescaleDB installed locally, your package manager should have installed timescaledb-parallel-copy. If not, you can go to our GitHub to install it separately.
Here’s an example of how to use timescaledb-parallel-copy with 4 workers:
timescaledb-parallel-copy
--connection '<service_url>’'
--table taxi_data
--file ~/Downloads/jan_1.csv
--workers 4
--copy-options "CSV"
--skip-header
Again, the service URL can be found in the cloud portal UI and should look something like this:
postgres://tsdbadmin:[email protected]:20457/defaultdb?sslmode=require
In addition to parallelizing the workload, the tool also offers flags to improve the copy experience. See the repo on GitHub for full details.
TIP: We recommend to not set the number of workers higher than the number of available CPU cores on either your client machine or server. Above that, the workers tend to compete with each other for resources and reduce the performance improvements.
Using PostgreSQL's COPY
Although we recommend our open sourced Go program for better bulk insert performance, we can also use PostgreSQL's bulk insert command COPY to copy data from the .csv into our new db:
psql '<service_url>/new_db?sslmode=require' -c "\copy conditions FROM old.csv WITH (FORMAT CSV, HEADER)"
This method is straightforward and requires no extra tools, but for large datasets it can be impractical and time-consuming because COPY is single-threaded. For a faster method that can utilize more of the CPU, use the previous method.
2. Client Drivers: JDBC, Python, and Node.js
2.1 Python / psycopg
2.1.1 Installing and connecting psycopg
Before we begin, you should ensure that you have the psycopg2 library installed, which is a popular adapter for postgresql. If not, you can run the following in your terminal:
pip install psycopg2
In our examples, we will be using Python version 3.6 on a Mac or Linux OS. If you are running and earlier version or are on a Windows machine, the commands should still be similar. With everything set up, let’s dive into connecting to your cloud database.
from psycopg2.extras
import RealDictCursor
import psycopg2
url = <service_url>
db_conn = psycopg2.connect(uri)
c = db_conn.cursor(cursor_factory=RealDictCursor)
c.execute("SELECT 1 = 1")
result = c.fetchone()
Again, the service URL can be found in the cloud portal UI and should look something like this:
postgres://tsdbadmin:[email protected]:20457/defaultdb?sslmode=require
To create a table that fits your dataset, we suggest using psql or your preferred sql client to first define the schema. After the schema is defined, you can use python to query it.
TIP: We’ve created a quick tool to automatically create your postgreSQL or timescale schema, found here on pgschema.com
2.1.2 Inserting data
With our table created, it’s time to load data into the database.
A common way of loading data into a Postgres table is to issue an INSERT command on the table. The insert command requires a table name to insert to and the sequence of values to insert.
Many of our insert examples are written in Python. You can view them at our Github. https://github.com/timescale/examples.
One specific example using time series data is a sample dataset on air quality data:
https://github.com/timescale/examples/blob/master/air-quality/airquality_ingest.py
While INSERT statements accomplishes the task of loading in our data, it’s actually not the most efficient way of doing it. As you can see, we had to loop through every single row from the file just to insert them into the database. Luckily for us, Postgres has a command specific for loading files into tables.
Recommended: using the COPY command
The Postgres command to load files directly into tables is called COPY. It takes in a file (like a CSV) and automatically loads the file into a Postgres table. Instead of creating the query and then running it through execute() like INSERT, psycopg2, has a method written solely for this query.
The method to load a file into a table is called copy_from. Like the execute() method, it is attached to the Cursor object. However, it differs quite a bit from the execute() method due to its parameters.
The copy_from arguments requires a file to load (without the header), the tablename it should load into, as well as a delimiter (the key argument sep). Then, running commit(), the file is transferred into ths is the most efficient, and recommended, way to load CSV files into a Postgres table.
This is how we use copy_from() to load our file instead of looping INSERT commands:
import psycopg2
conn = psycopg2.connect("host=<service_url> dbname=postgres user=postgres")
cur = conn.cursor()
with open('user_accounts.csv', 'r') as f:
# Notice that we don't need the csv module.
next(f) # Skip the header row.
cur.copy_from(f, 'users', sep=',')
conn.commit()
2.2. JDBC Driver
Two prerequisites you will need:
Install the Maven dependency for PostgreSQL JDBC driver (You can also download the latest driver from postgresql.org):
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>42.2.0</version>
</dependency>
Find your service CA certificate, which you can copy from the MST console's service page by clicking the "view CA certificate" button.
From there, input the following:
package pg;
import java.sql.Connection;
import java.sql.DriverManager;
import java.util.Properties;
public final class Connect {
public static void main(String[] args) {
Properties props = new Properties();
props.put("jdbc.url", "jdbc:postgresql://pg-3b8d4ed6-myfirstcloudhub.aivencloud.com:20985/defaultdb");
props.put("user", "avnadmin");
props.put("password", "nr0dfnswz36xs9pi");
props.put("ssl", "true");
props.put("sslmode", "verify-ca");
props.put("sslrootcert", "/path/to/ca.pem");
try {
Connection c = DriverManager.getConnection(props.getProperty("jdbc.url"), props);
System.out.println("Success");
c.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
2.3. ODBC Driver
Check out this link to psqlodc for the official ODBC driver for postgresql.
For reference, you will also need your service URL from within the cloud portal, on the “Overview” tab.
Additionally, if you need your service CA certificate, you can copy from the MST console's service page by clicking the "view CA certificate" button.
2.4 Node.js
TIP: We recommend using the Sequelize library, but you could also try with node-postgres library,
Overall, you should consult our docs, link here, to see a quick start for Node.
One key change to note - whenever you see the connection URL for the database, as below:
'postgres://user:[email protected]:5432/dbname'
You'll need to plug in your service URL from the Timescale cloud UI instead.
3. Message queues, specifically Kafka
The examples below are useful for inserting data into Timescale from a message queue, not for monitoring your Timescale instance using a message queue. For information on how to send from Timescale to a message queue, out this link.
3.1 PostgreSQL’s Kafka Connector
One popular method of ingesting data into TimescaleDB is through the use of the PostgreSQL connector with Kafka Connect. The connector is designed to work with Kafka Connect and to be deployed to a Kafka Connect runtime service. It’s purpose is to ingest change events from PostgreSQL databases (i.e. TimescaleDB).
The deployed connector will monitor one or more schemas within a TimescaleDB server and write all change events to Kafka topics, which can be independently consumed by one or more clients. Kafka Connect can be distributed to provide fault tolerance to ensure the connectors are running and continually keeping up with changes in the database.
TIP: The PostgreSQL connector can also be used as a library without Kafka or Kafka Connect, enabling applications and services to directly connect to TimescaleDB and obtain the ordered change events. This approach requires the application to record the progress of the connector so that upon restart, the connect can continue where it left off. This approach may be useful for less critical use cases. However, for production use cases, it’s recommended that you use this connector with Kafka and Kafka Connect.
To start using the PostgreSQL connector, visit the GitHub page. If you are interested in an alternative method to ingest data from Kafka to TimescaleDB, you can download the StreamSets Data Collector and get started with this tutorial.
4. Migrating from an existing database instance to Managed Service for TimescaleDB
Depending on where your data is currently stored, the steps to migrate it to Timescale cloud are slightly different:
Same database: If you want to setup Timescale in the same database and same PostgreSQL instance as your stored data, follow these instructions.
Different database: If you want to migrate data from a different database or a different PostgreSQL instance altogether, follow these instructions. This process uses the popular function pg_dump.
One prerequisite you will need for either step is to set up and connect your preferred SQL client such as psql, pgAdmin, DBeaver, etc.
First, you’ll need to use psql (or your preferred client/UI) to connect to the database. If you choose to use psql, first check that you may already have psql installed:
psql --version
Next, connecting to MST is as simple as running:
psql -h [HOSTNAME] -p [PORT] -U [USERNAME] -W -d [DATABASENAME]
You can find your service URL from within the cloud portal, on the “Overview” tab.
5. Migrating from InfluxDB
If you want to migrate data from InfluxDB, follow these instructions to use our tool called "Outflux".
Outflux is an open-source tool that users can use to batch migrate data from InfluxDB to TimescaleDB. Anyone who is currently running an InfluxDB instance can migrate their workload to TimescaleDB with a single command: outflux migrate. You must also have TimescaleDB installed and a means to connect to it. With Outflux, users can pipe exported data directly into TimescaleDB. Outflux manages schema discovery, validation, and creation.