JDBC replicating tool from scratch

I had simple task to perform — I need to replicate DB from one server to other.

Quite simple, as I thought :)

Let’s take some tool, which can handle all actions I need. And which can be configured as I need.

But, really, I don’t need at #slave all tables. I have to replicate only few tables.  Actually, I need to replicate part of data from #master tables, according to specific requirements.  And in one table I need replication to do only insert but not modify/delete records. And table names in #master and #slave not necessarily must be equals. Oh, almost forget, #master is PostgreSQL, and #slave is MySQL. And #slave in near future could be replaced by either PostgreSQL or Oracle.

Well, I was in need of pretty smart replication tool, which have to be able to do all tasks.  No tool available in world (which I searched) doesn’t met all of the requirements. So, I had to write one.

1. Technology. I’ve chose plain JDBC, because of good support by DB makers, and because my tool doesn’t really care which exactly type is there Database. I can easily replicate from PostgreSQL,MySQL,Oracle, etc. to any of these without changing single line in code, only change meta-info in configuration file.

2. Caveat: MySQL column names.  My config file contains info about type of DB, mainly because of MySQL. In PostgreSQL I can write such queries:

SELECT COST FROM SOME_TABLE;

Without worrying that cost is reserved word. But in MySQL, I should escape column names:

INSERT INTO `COST` …

3. Caveat: MySQL Timestamp which can be ‘0000-00-00 00:00:00′. Normal bases and programs knew sane dates, NULL dates. MySQL knew one more — it’s Jesus birthday I assume, ‘0000-00-00 00:00:00′. This is crazy and should be avoided.  For Java application, add such string to connection URL:

?zeroDateTimeBehavior=convertToNull

This caveat contains one more bug, even with this fix. Select from table with timestamps returns all rows, for example 5 of them. 3 of them contains valid timestamps (2014-…), and 2 more contains zero dates, which converts to NULL.  Most horrible thing is count(*) returns 3, not 5 :(

I hope this behavior will get fix soon.

4. Caveat: MySQL timestamps ot like to be NULL. When table created, if not specified, Timestamp fields get default properties: Not Null, default value current_timestamp for first, and zero timestamp to all others timestamps. If you want to replicate to MySQL, you should mark all timestamps as nullable (of course, if your application required this).

5. Caveat:  Boolean mapping in MySQL. It’s mapped only to Tinyint(1). If you create tinyint without size, you’ll get Tinyint(4), which is not treated as Boolean.

6. Features: JDBC allowed me create simple ways to replicate data, even not knowing column types, sizes of tables. Of course, order and types of tables must be synced while #slave will be created. But anyway, it’s pretty flexible:

preparedStatement = connection.prepareStatement(«select * from » + name);
preparedStatement.execute();

ResultSetMetaData setMetaData = resultSet.getMetaData();

PreparedStatement statement = connection2.prepareStatement(«Insert into » + name + »  values (?,?,?,?,?)»);

/* count of question marks = setMetaData.getColumnCount() */

resultSet = preparedStatement.getResultSet();

while (resultSet.next())
{

for (int i = 1; i <= setMetaData.getColumnCount(); i++)
{
Object o = resultSet.getObject(i);

statement.setObject(i, o); // yes, that simple.
}

With JDBC you don’t have to worry about any types of SQL Injections — they are simply not working. More, you don’t have to bother escaping string, because you can use code statement.setObject(1, «Any » / \ ‘ \ » ‘.;'; line»);

Of course real replication tool not as trivial as example, it checked for exists data, deleted from master data, need for update, additional conditions.

Anyway, JDBC provides very good and solid ways to interchange data between any types of databases.