Data Science Blog: 2014

Friday, 7 February 2014

Data Types Mapping Between SQL Server and TeraData

The other day I was loading data from SQL Server into Teradata so I needed to check corresponding data types between the two, so I made a small table with both data defitions.

I referred to this resources:

MS SQL Server data types / http://www.w3schools.com/sql/sql_datatypes.asp
TeraData data types

http://mant.upol.cz/sashtml/accdb/z1250040.htm
https://www.aquaclusters.com/app/home/project/public/aquadatastudio/wikibook/Teradata-1-34/page/6/Data-Types

Data Type Converting from SQL Server to TeraData

nvarchar => nvarchar

nvarchar [SQL] – Variable width Unicode string. Maximum 4,000 characters
VARCHAR (n) [TD] specifies a varying-length column of length n for character string data. The
maximum for n is 64,000. VARCHAR is also known as CHARACTER VARYING

nvarchar(max) => varchar(4000)

nvarchar(max) [SQL] Variable width Unicode string. Maximum 536,870,912 characters
VARCHAR(4000) [TD]

tinyint => byteint

TINYINT(size) [SQL] -128 to 127 normal. 0 to 255 UNSIGNED*. The maximum number of digits may be specified in parenthesis
BYTEINT [TD] specifies a single-byte signed binary integer. Values can range from -128 to +127.

smallint => smallint

smallint [SQL] Allows whole numbers between -32,768 and 32,767 2 bytes
SMALLINT [TD] specifies a small integer. Values can range from -32,768 through +32,767.

int => int / integer

INT(size) [SQL] -2147483648 to 2147483647 normal. 0 to 4294967295 UNSIGNED*. The maximum number of digits may be specified in parenthesis
INT / INTEGER [TD] specifies a large integer. Values can range from -2,147,483,648 through +2,147,483,647.

numberic => decimal

numeric(18, 3) [SQL] Fixed precision and scale numbers.
Allows numbers from -10^38 +1 to 10^38 –1.
The p parameter indicates the maximum total number of digits that can be stored (both to the left and to the right of the decimal point). p must be a value from 1 to 38. Default is 18.
The s parameter indicates the maximum number of digits stored to the right of the decimal point. s must be a value from 0 to p. Default value is 0
DECIMAL(n,m) or NUMERIC(n,m) [TD] specifies a packed-decimal number. n is the total number of digits (precision). m is the number of digits to the right of the decimal point (scale). The range for precision is 1 through 18. The range for scale is 0 through n. If m is omitted, 0 is assigned and n can also be omitted. Omitting both n and m results in the default DECIMAL(5,0). DECIMAL is also known as NUMERIC.

ntext => long varchar

ntext [SQL] Variable width Unicode string. Maximum 2GB of text data
into ??? LONG VARCHAR [TD]

bigint => bigint

bigint [SQL] Allows whole numbers between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807
bigint [TD] -9,223,372,036,854,775,808 < = BIGINT <= 9,223,372,036,854,775,807

smalldatetime => timestamp

smalldatetime [SQL] From January 1, 1900 to June 6, 2079 with an accuracy of 1 minute 4 bytes
timestamp [TD] (i hope)

datetime2 into timestamp / http://wiki.ispirer.com/sqlways/sql-server/data-types/datetime2
datetime into timestamp / http://wiki.ispirer.com/sqlways/sql-server/data-types/datetime

Friday, 3 January 2014

MySQL data insert test

Recently I needed to duplicate huge portion of data (about 8 GB) on MySQL for testing purpose. So I was wondering how to do it in the fastest possible way using mysqldump utility.

In mysqldump there are two ways how to export data:

normal inserts = one INSERT command contains only one record
extended inserts = one INSERT command combine many records (default mysqldump behaviour)

I tested to import 2M records (cca 800 MB) and here is the result:

Normal INSERTs

Export
- /usr/local/mysql/bin/mysqldump --extended-insert=FALSE --user=user --password=password -P3906 -hserver db table | gzip -9 -c > dump_table_no-extended.gz
Import
- zcat dump_table_no-extended.gz | /usr/local/mysql/bin/mysql -h server --user=user --password=password -P3906 db
Result
- 19 min

Extended INSERTs

Export
- /usr/local/mysql/bin/mysqldump --extended-insert --user=user --password=password -P3906 -hserver db table | gzip -9 -c > dump_table_extended.gz
Import
- zcat dump_table_extended.gz | /usr/local/mysql/bin/mysql -h server --user=user --password=password -P3906 db
Result
- 1:26 min

Disable keys

Further, I tested if there is any improvement when I disable keys during import as MySQL rebuilds index tree every time it inserts a new record.

Extended INSERTs + DISABLE KEYS

Export
- /usr/local/mysql/bin/mysqldump --extended-insert --no-data --user=user --password=password -P3906 -hserver db table | gzip -9 -c > dump_table_extended-structure.gz
- /usr/local/mysql/bin/mysqldump --extended-insert --no-create-info --user=user --password=password -P3906 -hserver db table | gzip -9 -c > dump_table_extended-data.gz
Import
- zcat dump_table_extended-structure.gz | /usr/local/mysql/bin/mysql -h server --user=user --password=password -P3906 db
- ALTER TABLE table DISABLE KEYS;
- zcat dump_table_extended-data.gz | /usr/local/mysql/bin/mysql -h server --user=user --password=password -P3906 db
- ALTER TABLE table ENABLE KEYS;
Result
- 1:20 min

Note

By default, mysqldump generates only one INSERT command per table, resulting in one (very long) line of inserted data for each table that got dumped. So I wondered if it is OK to import huge data set using extended INSERTs due to possible memory issue. Fortunately, mysqldump has a limit for one INSERT set by option net_buffer_lenght which is 16384 bytes. This will ensure that big table is actually dumped into many extended INSERT commands.

Conclusion

The best and simplest way is to use default mysqldump settings, e.g. with extended inserts.

Further, I found that its better to uncompress file, open mysql console and use "source dump_table.sql". It's faster than using "zcat dump_table.gz | mysql .....".