Custom loading/exporting via CLI (Command Line Interface)
For low traffic - Simple JDBC connection would suffice.
For high volume traffic – Bulk loader may treat data stored in files of 1m, 5m traffic.
No specific restrictions/configuration for frequency – the size of the data chunk for bulk loading is decided by the developer of the loader
Data Access Mechanism
Simple SQL interface is provided – for short term queries.
CLI interface with some suite of loaders are provided for higher volume of data.
Data export is done by CLI commands, producing file result.
Standard FTP is used both ways for importing and exporting, with the help of CLI loaders.
Simple Oracle/IRIS export tool is provided.
Export/Import tools can be customized for other DBMS, if requested.
Performance Consideration of IRIS DB
Maximum performance is achieved as follows (SK Telecom)
Number of nodes – 35 nodes (including active/standby masters)
Each node with 256 GB RAM, 36 TB DISKs, 12 cores CPU, 2 x 1 Gbps
10G switch x 1, 3 Racks
Daily Traffic: 80 billion records per day.
Performance depends on the number of nodes
Per node – inserting 8.6 billion records (or 2.5 TB) per day (in 100% CPU Utilization)
Number of duplication (usually 2)
Traffic volume incoming (affects the CPU & Network capacity)
Which kinds of real-time computation is required.
How many summary operations per min/5min
How long the data should be retained in the system. (90 days, or 6 months, etc)
By default, all data is compressed, (ratio 50%~70%, doubling the available space.)
Types of incoming queries.
Time-ranged selection of complex joining & filtering queries
Typical Setup for PoC
Depends on the PoC requirement and the hardware set for the PoC
Typical configuration would be,
5 nodes, (1 master, 4 data)
4 data nodes with 12 cores CPU, 64 GB RAM, 2 x 2TB HDD, 1 Gbps Net
All packaged in 4 U Chassis
1 external Gbps switch with 10G Uplink
Accepting 2 billion records per day (roughly 640 GB per day)
Provides on-premise cluster packaged in 4U chassis for PoC purpose.
Data Management in IRIS DB
Data Duplication & Recovery
Users may choose duplication level, normally 2. (Hadoop default is 3)
All data is to be stored in different nodes.
Disk level failure (including server level failure) doesn’t cause service downtime, if the failure level is within duplicated tolerance.
Disk/Server failure is recovered by H/W replacement and additional data recovery commands.
Data recovery is done by manual operation, not automatic.
Necessary command line toolset is provided.
Recovery time is dependent on the amount of data affected.
Network bandwidth is the deciding factor of recovery time – 1 Gbps internal bandwidth within cluster.
Missing data (lost by Disk/HW failure) is identified by the location management table.
Location data is stored in master node.
Location data can be recovered from data node, from scratch (even if master node is completely failed – time-consuming.)
Master node is secured by active/stand-by duplication.
Data Model of IRIS DB
It’s a distributed database, sharded/partitioned, shared-nothing.
By default, all tables are potentially regarded as big tables.
A big table is to be partitioned into distributed nodes
JOIN inter big tables are not allowed.
For JOIN like operations over big tables, we give special interface with open-source adaptors with Hadoop & Spark, which enables every kinds of Map/Reduce jobs.
Special purpose ‘global table’ needs user-specification at creation time.
‘Global tables’ are small tables duplications on all data nodes within cluster.
JOIN is allowed between a big table and a global table.
Global tables usually contain lookup data for configuration or similar.
Users are required to specify ‘partition key’s.
IRIS-DB is mostly same as traditional database, if not partitioning nature.
Users may create tables by conventional CREATE SQL commands.
With ‘HINT’ as the partitioning key specification.
Support for IRIS DB
Documentation & Support
Documentation & support is done by support engineers in Mobigen, Seoul.
Support and documentations are mostly in Korean, now.
For English or other languages, we need some more work.
Specific business condition is to be identified for more support other than in Korean.
Documents are mostly for developers & system administrators.
For developers, a user guide on SQL and tool commands is provided.
For admin, a user guide for system administration & data management is provided.
Most supporting issues recently raised from the customers are:
Questions related to the specific SQL syntax for specific cases
Data management issues (Data expiration & Storage management issues)
Scale out planning issues (How much new nodes are required, for expected traffic)
Troubles related to H/W failures – DISK failure.
H/W failures are handled by H/W supporting engineers from OEM manufacturer.
S/W failures are handled by Mobigen engineers.
Network Issues of IRIS DB
Normally a 10Gbps uplink, 1Gbps inner cluster network is used.
If inner connections are required to be wider than 1 Gbps,two 1Gbps lines are bonded and make 2Gbps bandwidth.
L2 Switch is added as required by the number of data nodes grows.
For higher bandwidth, a ‘direct-access’ is supported.
In a ‘direct access mode’, external client may access not only the master nodes, but also the data nodes directly.
The direct access is controlled by the CLI command or the client JDBC library.
For direct access mode, data nodes are to be visible outside the cluster.
This requires the IP address to be visible from outside.
From external view, the whole cluster is regarded as a single machine, under the control of JDBC library or CLI commands.
JDBC client library and CLI command accesses the (active) master node first, and accesses the data nodes if necessary (on direct mode).
If active node is shutdown, stand-by node is activated, and from the external view point the downtime is observed as a session close. A reconnection may lead to the connection to the secondary master. This retrial is done by the library.
public/iris-faq_eng.txt · Last modified: 2018/08/10 16:17 by jhnam