Skip to content

comchangs/hbase_study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 

Repository files navigation

HBase

Architecture

Overview

  • HBase๋Š” NoSQL์˜ ํ•œ ์ข…๋ฅ˜์ด๋‹ค.
  • HBase๋Š” ๋ถ„์‚ฐ Database ์ด๋‹ค.
  • HBase๋Š” ์‚ฌ์‹ค "Data Base" ๋ณด๋‹ค๋Š” "Data Store"์ด๋‹ค. (RDBMS์˜ Feature์ธ culumns, secondary indexies, triggers, and advanced query languages ๋“ฑ ๋งŽ์€ ๋ถ€๋ถ„์ด ๋ถ€์กฑํ•˜๊ธฐ ๋•Œ๋ฌธ)
  • HBase๋Š” ์„ ํ˜•์ ์ด๊ณ  ๋ชจ๋“ˆํ™”๋œ scaling ์œ„ํ•ด ๋งŽ์€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค.
  • Strongly consistent reads/writes: HBase๋Š” "eventually consistent" DataStore๊ฐ€ ์•„๋‹ˆ๋‹ค. ์ด๋Š” ๋งค์šฐ ๋น ๋ฅธ counter aggregation์ด ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.
  • Automatic sharding: HBase tables๋Š” Region์„ ํ†ตํ•ด ํด๋Ÿฌ์Šคํ„ฐ์— ๋ถ„์‚ฐ๋˜์–ด ์žˆ๊ณ  Region์€ ์ž๋™์ ์œผ๋กœ ๋‚˜๋ˆ„์–ด์ง€๊ณ  Data์˜ ์–‘์ด ๋Š˜์–ด๋‚จ์— ๋”ฐ๋ผ ์žฌ ๋ถ„์‚ฐ๋œ๋‹ค.
  • Automatic RegionServer Failover
  • Hadoop/HDFS Integration: HBase๋Š” HDFS๋ฅผ ์ง€์›ํ•œ๋‹ค.
  • MapReduce: HBase๋ฅผ ํ™œ์šฉํ•˜๋Š” Source์™€ Sink๋กœ์จ MapReduce๋ฅผ ํ†ตํ•ด massively parallelized processing๋ฅผ ์ง€์›ํ•œ๋‹ค.
  • Java Client API: HBase๋Š” ํ”„๋กœ๊ทธ๋žจ์ ์ธ ์ ‘๊ทผ์„ ์‰ฝ๊ฒŒ Java API๋ฅผ ์ด์šฉ๋„๋ก ์ง€์›ํ•œ๋‹ค.
  • Thrift/REST API: Java๊ฐ€ ์ด๋‹Œ Front-end๋ฅผ ์œ„ํ•ด Thrift์™€ REST๋ฅผ ์ง€์›ํ•œ๋‹ค.
  • Block Cache and Bloom Filters: ๊ณ ์šฉ๋Ÿ‰ ์ฟผ๋ฆฌ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•ด Block Cache์™€ Bloom Filter๋ฅผ ์ง€์›ํ•œ๋‹ค.
  • Operational Management: JMX Metric์— ๋”ํ•˜์—ฌ ์ถ”๊ฐ€์ ์ธ insight๋ฅผ ์œ„ํ•ด Build-in web-page๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

Catalog Tables

Catalog table hbase:meta๋Š” ํ•˜๋‚˜์˜ HBase table์ด๊ณ  HBase shell์˜ list ๋ช…๋ น์—์„œ ์ œ์™ธ ๋˜์–ด์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์‚ฌ์‹ค ๋‹ค๋ฅธ ๊ฒƒ๋“ค๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋‹จ์ง€ ํ…Œ์ด๋ธ”์ด๋‹ค.

hbase:meta

hbase:meta table์€ Region๋“ค์˜ ๋ชฉ๋ก์„ ๊ฐ€์ง€๊ณ  ์žˆ๊ณ  hbase:meta์˜ ์œ„์น˜๋Š” Zookeeper์— ์ €์žฅ๋˜์–ด ์žˆ๋‹ค.

Key
  • Reagion key of the format ([table],[region start key],[region id])
Value
  • info:regioninfo ํ•ด๋‹น Region์˜ HRegionInfo instance ๊ฐ€ ๋‚˜์—ด๋จ
  • info:server ํ•ด๋‹น Region์„ ๋‹ด๊ณ  ์žˆ๋Š” RegionServer์˜ port
  • info:serverstartcode ํ•ด๋‹น Region์„ ๋‹ด๊ณ  ์žˆ๋Š” RegionServer process์˜ ์‹œ์ž‘ ์‹œ๊ฐ„

Data Model

Data๋Š” rows์™€ columns๋ฅผ ๊ฐ–๋Š” tables๋กœ ์ €์žฅ๋œ๋‹ค. Terminology๊ฐ€ RDBMS์™€ overlap๋˜์ง€๋งŒ ์™„์ „ํžˆ ๊ฐ™์€ ์œ ์‚ฌ์ ์ด ์•„๋‹ˆ๋‹ค. ๋Œ€์‹ , HBase table์€ multi-dementional map์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋Š” ๊ฒƒ์ด ๋”์šฑ ํšจ๊ณผ์ ์œผ๋กœ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค.

Table

Table์€ ์—ฌ๋Ÿฌ Row๋“ค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.

Column

Column์€ ์ฝœ๋ก (:)๋‚˜๋ˆ„์–ด์ง€๋Š” Column family์™€ Column qualifier๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.

Column Family

Column family๋Š” ์„ฑ๋Šฅ ๊ด€๋ จ ์ด์œ ๋กœ ํ”ํžˆ column๊ณผ ๊ทธ ๊ฐ’์ด ๋ฌผ๋ฆฌ์ ์œผ๋กœ ๊ฐ™์ด ์กด์žฌํ•œ๋‹ค. ๊ฐ Column family๋Š” ๊ฐ’๋“ค์ด ๋ฉ”๋ชจ๋ฆฌ์— ์บ์‹ฑ๋˜์–ด์•ผ ํ•˜๋Š”์ง€ data๊ฐ€ ์••์ถ•๋˜๊ฑฐ๋‚˜ row key๊ฐ€ encoding๋˜๋Š”์ง€ ๋“ฑ์˜ storage properties์˜ ์ง‘ํ•ฉ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

Column Qualifier

Column qualifier๋Š” ๋‹จํŽธ์˜ Data๋ฅผ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด column family์— ์ถ”๊ฐ€๋˜์–ด ์žˆ๋‹ค. Column family "content"๊ฐ€ ์žˆ๋‹ค๋ฉด Column qualifier๋Š” "content:html"๋‚˜ "content:pdf"์ผ ๊ฒƒ์ด๋‹ค. ๋น„๋ก Column family๋Š” table ์ƒ์„ฑ ์‹œ ๊ณ ์ •๋˜์ง€๋งŒ, Column qualifier๋Š” ๋ณ€ํ•  ์ˆ˜ ์žˆ๊ณ , row๋“ค ์‚ฌ์ด์—์„œ๋„ ๋งค์šฐ ๋‹ค๋“ค ์ˆ˜ ์žˆ๋‹ค.

Cell

Cell์€ ๊ฐ’์˜ ๋ฒ„์ „์„ ๋‚˜ํƒ€๋‚ด๋Š” row์™€ column family, column qualifier, ๊ฐ’๊ณผ Timestamp๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” ๊ฒฐํ•ฉ์ด๋‹ค.

  • Row key -> Coulmn Family -> Coulmn -> Version: Value
{row, column, version}

Timestamp

๊ฐ ๊ฐ’๊ณผ ๋‚˜๋ž€ํžˆ ์“ฐ์—ฌ์ง„ timestamp๋กœ ์‹๋ณ„์ž๋กœ์จ ๊ฐ’์˜ ์ฃผ์–ด์ง„ ๋ฒ„์ „์ด๊ธฐ๋„ ํ•˜๋‹ค. ๊ธฐ๋ณธ์ ์œผ๋กœ timestamp๋Š” RegionServer์—์„œ data๋ฅผ ์“ธ ๋•Œ ๋‚˜ํƒ€๋‚˜๋Š” ์‹œ๊ฐ„์ด์ง€๋งŒ data๋ฅผ cell์— ๋„ฃ์„ ๋•Œ ํŠน์ • timestamp ๊ฐ’์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

Row Key

Row Key๋Š” Human readableํ•˜์ง€ ์•Š์€ ๋ฐ”์ดํŠธ๋กœ ๊ตฌ์„ฑ๋œ ํ‚ค์ด๋‹ค.

Row

๋‹จ์ผ row์— ๋Œ€ํ•œ ์ฝ๊ธฐ/์“ฐ๊ธฐ๋Š” ์›์ž์„ฑ์ด ๋ณด์žฅํ•œ๋‹ค. ํ…Œ์ด๋ธ”์—์„œ ๊ฐ€์žฅ ๋‚ฎ์€ ์ˆœ์„œ๋กœ ์ฒซ ๋ฒˆ์งธ๋กœ ํ‘œ์‹œ๋˜๋Š” ์‚ฌ์ „์ˆœ ์ •๋ ฌ๋œ๋‹ค. ๋นˆ ๋ฐ”์ดํŠธ ๋ฐฐ์—ด์€ ํ…Œ์ด๋ธ”์˜ ๋„ค์ž„ ์ŠคํŽ˜์ด์Šค์˜ ์‹œ์ž‘๊ณผ ๋์„ ๋‚˜ํƒ€๋‚ด๋Š”๋ฐ ์‚ฌ์šฉ๋œ๋‹ค. table์€ ๋™์ ์œผ๋กœ row key์˜ ๋ฒ”์œ„๋ฅผ ์ž˜๋ผ์„œ ํŒŒํ‹ฐ์…”๋‹(tablet)ํ•œ๋‹ค.

Conceptual View

Row Key Time Stamp ColumnFamily contents ColumnFamily anchor ColumnFamily people
"com.cnn.www" t9 anchor:cnnsi.com = "CNN"
"com.cnn.www" t8 anchor:my.look.ca = "CNN.com"
"com.cnn.www" t6 contents:html = "โ€ฆโ€‹"
"com.cnn.www" t5 contents:html = "โ€ฆโ€‹"
"com.cnn.www" t3 contents:html = "โ€ฆโ€‹"
"com.example.www" t5 contents:html = "โ€ฆโ€‹" people:author = "John Doe"

Physical View

Row Key Time Stamp ColumnFamily anchor
"com.cnn.www" t9 anchor:cnnsi.com = "CNN"
"com.cnn.www" t8 anchor:my.look.ca = "CNN.com"
Row Key Time Stamp ColumnFamily contents
"com.cnn.www" t6 contents:html = "โ€ฆโ€‹"
"com.cnn.www" t5 contents:html = "โ€ฆโ€‹"
"com.cnn.www" t3 contents:html = "โ€ฆโ€‹"

Namespace

RDB์™€ ์œ ์‚ฌํ•œ ๋…ผ๋ฆฌ์ ์ธ ๊ทธ๋ฃน์ด๋‹ค. ์ด ์ถ”์ƒ์  ๊ฐœ๋…์€ ์•„๋ž˜์˜ multi-tenancy ๊ด€๋ จ ๊ธฐ๋Šฅ์˜ ๊ธฐ์ดˆ๊ฐ€ ๋œ๋‹ค.

  • Quota Management: Namespace๊ฐ€ ์†Œ๋น„ํ•˜๋Š” ์ž์›์˜ ์–‘์„ ์ œํ•œํ•˜๋‹ค.
  • Namespace Security Administration: tenants๋ฅผ ์œ„ํ•œ ๋ณด์•ˆ๊ด€๋ฆฌ์˜ ๋‹ค๋ฅธ ๋ ˆ๋ฒจ์„ ์ œ๊ณตํ•œ๋‹ค.
  • Region server groups: Namespace/table์€ RegionServer์˜ ํ•˜์œ„ ์ง‘ํ•ฉ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์–ด ๊ฑฐ์นœ ์ˆ˜์ค€์˜ ๊ฒฉ๋ฆฌ๊ฐ€ ๋ณด์žฅ๋œ๋‹ค.

Data Model Operation

  • ๊ธฐ๋ณธ์ ์ธ ๋„ค๊ฐ€์ง€์˜ model operation์€ Get๊ณผ Put, Scan, Delete์ด๋‹ค. Operration์€ Table Instance๋ฅผ ํ†ตํ•ด ์ ์šฉ๋œ๋‹ค.
  • ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€๊ฒฝํ•˜๋Š” Row ๋‹จ์œ„ ์ž‘์—…์€ ์›์ž์„ฑ์ด ๋ณด์žฅ
  • ์ผ๋ฐ˜์ ์œผ๋กœ Table์€ Application์„ ์‹œ์ž‘ํ•  ๋•Œ ๋‹จ ํ•œ๋ฒˆ๋งŒ ์ƒ์„ฑ

Put Method

Single Puts
  • put instance๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” row key๋ฅผ ์ œ๊ณตํ•ด์•ผ ํ•œ๋‹ค.
  • HBase์˜ Row๋Š” ๊ณ ์œ ํ•œ Row key๋กœ ์‹๋ณ„ํ•˜๋ฉฐ, Rowkey ํƒ€์ž…์€ java ๋ฐ์ดํ„ฐํƒ€์ž…์ธ byte array๋กœ ์ €์žฅ๋œ๋‹ค.
  • Cloumn์„ ์ถ”๊ฐ€ํ•˜๋ ค๋ฉด add()๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
  • ํŠน์ • cell์ด ์กด์žฌํ•˜๋Š”์ง€ ์•Œ์•„๋ณด๋ ค๋ฉด has()๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
  • ํด๋ผ์ด์–ธํŠธ ์ฝ”๋“œ์—์„œ ์„ค์ • ํŒŒ์ผ์— ์ ‘๊ทผํ•˜๋ ค๋ฉด HBaseConfiguration class๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
KeyValue class
  • Row ๋‹จ์œ„๊ฐ€ ์•„๋‹Œ ํŠน์ • Cell ๋‹จ์œ„์˜ ๋ชจ๋“  ์ •๋ณด๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
  • ์ขŒํ‘œ๊ณ„์ฒ˜๋Ÿผ Row key, Column qualifier, timestamp๊ฐ€ 3์ฐจ์› ๊ณต๊ฐ„์˜ ํ•œ ์ง€์ ์„ ๊ฐ€๋ฆฌํ‚ค๋Š” ๋ชจ์Šต์ด๋‹ค.
  • ์ฃผ๋กœ Key ๋ฐ์ดํ„ฐ๊ฐ„์˜ ๋น„๊ต/๊ฒ€์ฆ/๋ณต์ œ ๋“ฑ์— ์‚ฌ์šฉํ•œ๋‹ค.
  • ์ €์žฅ๊ณต๊ฐ„์„ ์ตœ์†Œํ•˜ํ•˜์—ฌ ํšจ์œจ์ ์œผ๋กœ ๋ฐ์ดํ„ฐ ์ €์žฅํ•˜๊ณ , ๋น ๋ฅธ ๋ฐ์ดํ„ฐ ์—ฐ์‚ฐ์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด Byte Array ํƒ€์ž…์„ ์‚ฌ์šฉํ•œ๋‹ค.
List of puts
  • ์—ฐ์‚ฐ์„ ์ผ๊ด„๋กœ ํ•œ๋ฐ ๋ฌถ์–ด์„œ ์ฒ˜๋ฆฌํ•œ๋‹ค.
  • ๋ฆฌ์ŠคํŠธ ๊ธฐ๋ฐ˜์˜ ์ž…๋ ฅ์€ ์„œ๋ฒ„ ์ธก์—์„œ ์ผ๋ ฅ ์—ฐ์‚ฐ์ด ์ ์šฉ๋˜๋Š” ์ˆœ์„œ๋ฅผ ์ œ์–ด ํ•  ์ˆ˜ ์—†๋‹ค.
  • ๋ฐ์ดํ„ฐ ์ž…๋ ฅ ์ˆœ์„œ๋ฅผ ๋ณด์žฅํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ์ž‘๊ฒŒ ๋‚˜๋ˆˆ ํ›„ White cache๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ Flushํ•ด์•ผ ํ•œ๋‹ค.

Get Method

Single gets
  • ํŠน์ • Row ํ•˜๋‚˜๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์ˆ˜ํ–‰๋˜์ง€๋งŒ, row ๋‚ด์—์„œ๋Š” Column/Cell ์ œํ•œ์ด ์—†๋‹ค.
  • ๋ฒ„์ „ ๊ฐœ์ˆ˜๋Š” 1๋กœ ์ตœ๊ทผ ๊ฐ’๋งŒ ๋ฆฌํ„ด ๋ฐ›๊ณ , setMaxVersions()๋ฅผ ํ†ตํ•˜์—ฌ ์ง€์ • ๊ฐ€๋Šฅํ•˜๋‹ค.
Result class
  • get()์„ ํ†ตํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ์–ด ๋“ค์ผ ๋•Œ get ์กฐ๊ฑด์— ๋งŒ์กฑํ•˜๋Š” ๋ชจ๋“  Cell์„ ๋‹ด๊ณ  ์žˆ๋Š” Result class์˜ Interface๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
  • ์„œ๋ฒ„์—์„œ ๋ฐ˜ํ™˜ํ•œ ๋ชจ๋“  Column family, Column qualifier, Timestamp ์ ‘๊ทผ ์ˆ˜๋‹จ์„ ์ œ๊ณตํ•œ๋‹ค.
List of Gets
  • ํ•˜๋‚˜์˜ ์š”์ฒญ์œผ๋กœ ์—ฌ๋Ÿฌ๊ฐœ์˜ Row๋ฅผ ์š”๊ตฌํ•  ๋•Œ ์‚ฌ์šฉ๋œ๋‹ค.
  • get instance์™€ ๋™์ผํ•œ ๋ฐฐ์—ด์„ ๋ฐ˜ํ™˜, ์˜ˆ์™ธ๋ฐœ์ƒ ์ค‘ ํ•˜๋‚˜๋กœ๋งŒ ๋™์ž‘ํ•œ๋‹ค.

Delete Method

Single Deletes
  • Delete class๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด ์‚ญ์ œ ๋Œ€์ƒ Row Key๋ฅผ ์ž…๋ ฅํ•ด์•ผ ํ•œ๋‹ค.
  • Rowlock ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ถ”๊ฐ€ ์„ ํƒํ•˜์—ฌ ๋™์ผํ•œ Row๋ฅผ ๋‘ ๋ฒˆ ์ด์ƒ ๋ณ€๊ฒฝํ•˜๊ณ ์ž ํ•  ๋•Œ ์‚ฌ์šฉ์ž ์ž์‹ ์˜ ๋ฝ์„ ์„ค์ •ํ•œ๋‹ค.
  • ์ „์ฒด Family ๋ฐ ๊ทธ์— ์†ํ•˜๋Š” Column ์‚ญ์ œ, Timestamp ์ง€์ •์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
List of Deletes
  • put list ์™€ ์œ ์‚ฌํ•˜๊ฒŒ ๋™์ž‘ํ•œ๋‹ค.
  • Remote ์„œ๋ฒ„์—์„œ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ญ์ œ๋˜๋Š” ์ˆœ์„œ๋ฅผ ๋ณด์žฅํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๊ฒƒ์„ ์ฃผ์˜ํ•œ๋‹ค.
  • Table.delete(deletes) ์ˆ˜ํ–‰ ์‹œ ์‹คํŒจํ•œ ์ž‘์—…์€ deletes์— ๋‚จ๊ฒŒ ๋˜๋ฉฐ, Exception ์ฒ˜๋ฆฌ๋Š” try/catch ๊ตฌ๋ฌธ์„ ์ด์šฉํ•œ๋‹ค.

HBase Schema Design

  • Table์˜ row key๋งŒ ์ธ๋ฑ์Šค๋กœ ๊ฐ–๋Š”๋‹ค.
  • Row๋“ค์€ row key์— ์˜ํ•ด ์‚ฌ์ „์‹ ์ •๋ ฌ๋กœ ๋˜์–ด ์žˆ๋‹ค.
  • Row ์ˆ˜์ค€์˜ ๋ชจ๋“  operation์€ ์›์ž์„ฑ(Atomic)์„ ๊ฐ–๋Š”๋‹ค.
  • ์ฝ๊ธฐ์™€ ์“ฐ๊ธฐ๋Š” ๊ณ ๋ฅด๊ฒŒ ๋ถ„์‚ฐ๋˜์–ด ์žˆ์–ด์•ผ ํ•œ๋‹ค.
  • ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹จ์ผ row์—๋Š” entity์˜ ๋ชจ๋“  ์ •๋ณด๋ฅผ ๊ฐ–๋Š”๋‹ค.
  • ๊ด€๊ณ„๊ฐ€ ์žˆ๋Š” entity๋Š” ์ธ์ ‘ํ•œ row์— ์ €์žฅํ•œ๋‹ค.
  • ๋นˆ column์€ ๊ณต๊ฐ„์„ ์†Œ๊ณ  ํ•˜์ง€ ์•Š์•„ ๋งค์šฐ ๋งŽ์€ ์ˆ˜์˜ column์ด ๋Œ€๋ถ€๋ถ„์˜ row์— ๋น„์–ด ์žˆ์–ด๋„ ๊ดœ์ฐฎ๋‹ค.

Rowkey Design

Hotspotting

  • Salting: ์•ž์— ๋žœ๋คํ•˜๊ฒŒ ๋ฌธ์ž๋ฅผ ์‚ฝ์ž…
  • Hashing: ์—ฌ๋Ÿฌ ๊ฐ’์„ ํ•˜๋‚˜์˜ row key๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒจ์šฐ ์œ ์šฉํ•˜๊ณ , ์˜ˆ์ธก๊ฐ€๋Šฅํ•œ ๊ธธ์ด์˜ ๊ฐ’ ์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ
  • Reversing the Key: ์œ ์‚ฌํ•œ row์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์„œ๋กœ ์ธ์ ‘ํ•˜๊ฒŒ ๋˜๊ณ , ์••์ถ•๋ฅ ์ด ๋†’์•„์ง

Monotonically Increasing Row Keys/Timeseries Data

์ˆœ์ฐจ์ ์œผ๋กœ ์ฆ๊ฐ€๋˜๋Š” key๋ฅผ ๋ถ€์—ฌํ•œ๋‹ค๋ฉด, ์ตœ์‹ ์˜ ์ƒˆ๋กœ์šด ์‚ฌ์šฉ์ž๊ฐ€ ๋” ํ™œ๋™์ ์ธ ๊ฒฝํ–ฅ์„ ๋ณด์ด๊ธฐ ๋•Œ๋ฌธ์— ๋Œ€๋ถ€๋ถ„์˜ ํŠธ๋ ˆํ”ฝ์ด ํ•œ์ •๋œ ์ˆ˜์˜ ๋…ธ๋“œ์— ์ง‘์ค‘๋˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ์ดˆ๋ž˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฐ ์ƒํ™ฉ์—์„  ๋’ค์ง‘์–ด์ง„ ID๋ฅผ ์‚ฌ์šฉํ•ด ๋ชจ๋“  ๋…ธ๋“œ๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ„์‚ฐ๋˜๋„๋ก ๊ณ ๋ คํ•ด์•ผ ํ•œ๋‹ค.

Try to minimize row and column sizes

  • Column Families: ๊ฐ€๊ธ‰์ ์ด๋ฉด ์ด๋ฆ„์„ ์งง๊ฒŒ ํ•˜์ž. (์˜ˆ: data๋Š” d)
  • Attributes: ์˜ˆ๋กœ myVeryImportantAttribute๊ฐ€ ์ฝ๊ธฐ ์‰ฝ๋”๋ผ๋„ via๋กœ ์ €์žฅํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.
  • Rowkey Length: ํ‚ค๋Š” ๊ฐ€๋Šฅํ•œ ์งง๊ฒŒ ํ•˜์ž. (ํŠธ๋ ˆ์ด๋“œ ์˜คํ”„ ๊ณ ๋ ค)
  • Byte Patterns: 8๋ฐ”์ดํŠธ์˜ long์„ ์ €์žฅํ•˜๋ฉด 3x๋ฐ”์ดํŠธ๊ฐ€ ๋œ๋‹ค.

Reverse Timestamps

์ผ๋ฐ˜์ ์ธ ๋ฌธ์ œ๋กœ database processing์—์„œ ๊ฐ€์žฅ ์ตœ๊ทผ์˜ ๊ฐ’์„ ๋น ๋ฅด๊ฒŒ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค. ๋’ค์ง‘์–ด์ง„ timestamp๋ฅผ ํ‚ค์˜ ์ผ๋ถ€๋กœ ์‚ฌ์šฉํ•˜๋Š” technique์€ ์ด ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ํŠน๋ณ„ํ•œ ์‚ฌ๋ก€์ด๋‹ค. (Long์˜ ์ตœ๋Œ€ ๊ฐ’์—์„œ timestamp๋ฅผ ๋นผ๋Š” ๋“ฑ์˜ ๋ฐฉ๋ฒ•)

Rowkeys and ColumnFamilies

Rowkey๋„ ColumnFamily์€ ๋ฒ”์œ„์— ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฏ€๋กœ ํ•œ table์—์„œ ์ถฉ๋Œ์ด ์—†๋Š” ์ƒํƒœ๋กœ ์กด์žฌํ•˜๋Š” ColumnFamily์ด๋‹ค.

Immutability of Rowkeys

Row key๋Š” ๋ณ€ํ•  ์ˆ˜ ์—†๊ณ , "changed"๋Š” ์‚ญ์ œํ•˜๊ณ  ๋‹ค์‹œ ์‚ฝ์ž…ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

Apache HBase Coproccessors

Coprocessor framework๋Š” data๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” RegionServer์— ์ง์ ‘ ์ž‘์„ฑํ•œ ์ปค์Šคํ…€ ์ฝ”๋“œ๋ฅผ ๋™์ž‘์‹œํ‚ค ์šฐํ•œ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ์ œ๊ณตํ•œ๋‹ค.

Overview

RDBMS๋Š” SQL ์ฟผ๋ฆฌ๋ฅผ ์“ฐ๋Š” ๋ฐ˜๋ฉด HBase๋Š” Data๋ฅผ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด Get ๋˜๋Š” Scan๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. RDBMS์—์„œ WHERE๋ฅผ ์“ฐ๋Š” ๊ฒƒ๊ณผ ๋‹ฌ๋ฆฌ ์ ์ ˆํ•œ data๋งŒ์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด HBase Filter๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

Data๋ฅผ ๊ฐ€์ง€๊ณ  ์™€์„œ ์ง์ ‘ ๊ณ„์‚ฐ์„ ์‹คํ–‰ํ•œ๋‹ค. ์ด ํŽ˜๋Ÿฌ๋‹ค์ž„์€ ๋ช‡ ์ฒœ๊ฐœ row์™€ ๋ช‡ column์œผ๋กœ ์ด๋ฃจ์–ด์ง„ "Small data"๋Š” ์ž˜ ๋™์ž‘ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ 10์–ต๋‹จ์œ„์˜ row์™€ ๋ฐฑ๋งŒ๋‹จ์œ„์˜ column์˜ ๋Œ€๋Ÿ‰์˜ data๊ฐ€ network์—์„œ ๋ณดํ‹€๋ ‰์„ ๋งŒ๋“ค์–ด ๋‚ด๊ณ , client๋Š” ์ถฉ๋ถ„ํžˆ powerfulํ•˜๊ณ  ์ถฉ๋ถ„ํžˆ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ฐ€์ง€๊ณ  ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ์–ด์•ผ ํ•œ๋‹ค.

๋น„์ง€๋‹ˆ์Šค ๊ณ„์‚ฐ ์ฝ”๋“œ๋ฅผ coprocessor๋กœ์จ ์ง‘์–ด๋„ฃ์–ด RegionServer ์œ„์—์„œ data์™€ ๊ฐ™์€ ์œ„์น˜๋กœ ๊ณ„์‚ฐํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”๋กœ client ์—๊ฒŒ ๋ฐ˜ํ™˜ํ•ด ์ค„ ์ˆ˜ ์žˆ๋‹ค.

Coprocessor Analogies

Triggers and Stored Procedure

Observer coprocessor๋Š” ํŠน์ • event๊ฐ€ ๋ฐœ์ƒํ•  ์ „ํ›„์— custom code๊ฐ€ ์‹คํ–‰๋˜๋Š” RDBMS์˜ trigger์™€ ๋น„์Šทํ•˜๊ณ  endpoint coprocessor๋Š” RDBMS์˜ ์ €์žฅ ํ”„๋กœ์‹œ์ €์™€ ๋น„์Šทํ•˜๋‹ค.

MapReduce

MapReduce๋Š” ๊ณ„์‚ฐ์ด Data๊ฐ€ ์žˆ๋Š” ์œ„์น˜๋กœ ์ด๋™ํ•˜๋Š” ์›์น™์œผ๋กœ ๋™์ž‘ํ•œ๋‹ค. Coprocessor ์—ญ์‹œ ๊ฐ™์€ ์›์น™์œผ๋กœ ๋™์ž‘ํ•œ๋‹ค.

AOP

Aspect Oriented Programming (AOP)์ด ์ต์ˆ™ํ•˜๋‹ค๋ฉด, Coprocessor๋ฅผ ๋งˆ์ง€๋ง‰ ๋„์ฐฉ์ง€๋กœ ๊ฐ€๊ธฐ์ „์— ์š”์ฒญ์„ ๊ฐ€๋กœ์ฑ„๊ณ  custom code๋ฅผ ์‹คํ–‰ํ•˜๋„๋ก ์ ์šฉํ•˜๋Š” ๊ฒƒ์œผ๋กœ์œผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

Coprocessor Implementation Overview

  1. Coprocessor interface ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ตฌํ˜„
  2. HBase Shell์— ์ •์  ๋˜๋Š” ๋™์ ์œผ๋กœ coprocessor ๋กœ๋“œ
  3. client-side code ์—์„œ ํ˜ธ์ถœ

Type of Coprocessors

Observer Coprocessors

Event๊ฐ€ ๋ฐœ์ƒ ์ „ํ›„์— Trigger๋˜์–ด ์‚ฌ์šฉํ•œ๋‹ค.

  • RegionObserver: Get๊ณผ Put operation ๊ฐ™์ด Region์—์„œ์˜ Event๋ฅผ Observeํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ
  • RegionServerObserver: ์‹œ์ž‘ ๋˜๋Š” ์ •์ง€, Merge, Commit, Rollback ๋“ฑ RegionSercer์˜ ์šด์˜์— ๊ด€๊ณ„๋œ Event๋ฅผ Observeํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ
  • MasterObserver: Table ์ƒ์„ฑ๋˜๋Š” ์‚ญ์ œ, ์ˆ˜์ • ๋“ฑ HBase Master์™€ ๊ด€๊ณ„๋œ Event๋ฅผ Observeํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ
  • WalObserver: Write-Ahead Log๋กœ ์“ฐ๊ธฐ ๊ฒƒ์— ๊ด€๊ณ„๋œ Event๋ฅผ Observeํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ

Endpoint Coprocessor

Endpoint Coprocessor๋Š” Data์˜ ์œ„์น˜์—์„œ ๋ฐ”๋กœ ๊ณ„์‚ฐ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์‚ฌ์šฉํ•œ๋‹ค.

Reference

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published