User Tools

Site Tools


public:iris-faq_eng

Data Access for IRIS DB

Data Interfaces

  • Standard SQL interface, JDBC Interface
  • Custom loading/exporting via CLI (Command Line Interface)

Data Format

  • For low traffic - Simple JDBC connection would suffice.
  • For high volume traffic – Bulk loader may treat data stored in files of 1m, 5m traffic.
  • No specific restrictions/configuration for frequency – the size of the data chunk for bulk loading is decided by the developer of the loader

Data Access Mechanism

  • Simple SQL interface is provided – for short term queries.
  • CLI interface with some suite of loaders are provided for higher volume of data.
  • Data export is done by CLI commands, producing file result.
  • Standard FTP is used both ways for importing and exporting, with the help of CLI loaders.
  • Simple Oracle/IRIS export tool is provided.
    • Export/Import tools can be customized for other DBMS, if requested.

Performance Consideration of IRIS DB

Performance Limitations

  • Maximum performance is achieved as follows (SK Telecom)
    • Number of nodes – 35 nodes (including active/standby masters)
      • Each node with 256 GB RAM, 36 TB DISKs, 12 cores CPU, 2 x 1 Gbps
      • 10G switch x 1, 3 Racks
    • Daily Traffic: 80 billion records per day.
  • Performance depends on the number of nodes
    • Per node – inserting 8.6 billion records (or 2.5 TB) per day (in 100% CPU Utilization)
    • Sizing factors
      • Number of duplication (usually 2)
      • Traffic volume incoming (affects the CPU & Network capacity)
      • Which kinds of real-time computation is required.
        • How many summary operations per min/5min
      • How long the data should be retained in the system. (90 days, or 6 months, etc)
        • By default, all data is compressed, (ratio 50%~70%, doubling the available space.)
      • Types of incoming queries.
        • Time-ranged selection of complex joining & filtering queries

Typical Setup for PoC

  • Depends on the PoC requirement and the hardware set for the PoC
  • Typical configuration would be,
    • 5 nodes, (1 master, 4 data)
    • 4 data nodes with 12 cores CPU, 64 GB RAM, 2 x 2TB HDD, 1 Gbps Net
    • All packaged in 4 U Chassis
    • 1 external Gbps switch with 10G Uplink
    • Accepting 2 billion records per day (roughly 640 GB per day)
  • Provides on-premise cluster packaged in 4U chassis for PoC purpose.

Data Management in IRIS DB

Data Duplication & Recovery

  • Users may choose duplication level, normally 2. (Hadoop default is 3)
  • All data is to be stored in different nodes.
  • Disk level failure (including server level failure) doesn’t cause service downtime, if the failure level is within duplicated tolerance.
  • Disk/Server failure is recovered by H/W replacement and additional data recovery commands.
    • Data recovery is done by manual operation, not automatic.
    • Necessary command line toolset is provided.
    • Recovery time is dependent on the amount of data affected.
      • Network bandwidth is the deciding factor of recovery time – 1 Gbps internal bandwidth within cluster.
  • Missing data (lost by Disk/HW failure) is identified by the location management table.
    • Location data is stored in master node.
    • Location data can be recovered from data node, from scratch (even if master node is completely failed – time-consuming.)
    • Master node is secured by active/stand-by duplication.

Data Model of IRIS DB

Data Model

  • It’s a distributed database, sharded/partitioned, shared-nothing.
  • By default, all tables are potentially regarded as big tables.
    • A big table is to be partitioned into distributed nodes
    • JOIN inter big tables are not allowed.
      • For JOIN like operations over big tables, we give special interface with open-source adaptors with Hadoop & Spark, which enables every kinds of Map/Reduce jobs.
    • Special purpose ‘global table’ needs user-specification at creation time.
      • ‘Global tables’ are small tables duplications on all data nodes within cluster.
      • JOIN is allowed between a big table and a global table.
      • Global tables usually contain lookup data for configuration or similar.
  • Users are required to specify ‘partition key’s.
  • IRIS-DB is mostly same as traditional database, if not partitioning nature.
  • Users may create tables by conventional CREATE SQL commands.
    • With ‘HINT’ as the partitioning key specification.

Support for IRIS DB

Documentation & Support

  • Documentation & support is done by support engineers in Mobigen, Seoul.
    • Support and documentations are mostly in Korean, now.
    • For English or other languages, we need some more work.
    • Specific business condition is to be identified for more support other than in Korean.
  • Documents are mostly for developers & system administrators.
    • For developers, a user guide on SQL and tool commands is provided.
    • For admin, a user guide for system administration & data management is provided.
  • Most supporting issues recently raised from the customers are:
    • Questions related to the specific SQL syntax for specific cases
    • Data management issues (Data expiration & Storage management issues)
    • Scale out planning issues (How much new nodes are required, for expected traffic)
    • Troubles related to H/W failures – DISK failure.
      • H/W failures are handled by H/W supporting engineers from OEM manufacturer.
      • S/W failures are handled by Mobigen engineers.

Network Issues of IRIS DB

Network Restrictions

  • Normally a 10Gbps uplink, 1Gbps inner cluster network is used.
    • If inner connections are required to be wider than 1 Gbps, two 1Gbps lines are bonded and make 2Gbps bandwidth.
  • L2 Switch is added as required by the number of data nodes grows.
  • For higher bandwidth, a ‘direct-access’ is supported.
    • In a ‘direct access mode’, external client may access not only the master nodes, but also the data nodes directly.
    • The direct access is controlled by the CLI command or the client JDBC library.
    • For direct access mode, data nodes are to be visible outside the cluster.
      • This requires the IP address to be visible from outside.
  • From external view, the whole cluster is regarded as a single machine, under the control of JDBC library or CLI commands.
    • JDBC client library and CLI command accesses the (active) master node first, and accesses the data nodes if necessary (on direct mode).
    • If active node is shutdown, stand-by node is activated, and from the external view point the downtime is observed as a session close. A reconnection may lead to the connection to the secondary master. This retrial is done by the library.

public/iris-faq_eng.txt · Last modified: 2018/08/10 16:17 by jhnam

Backlinks to this page