Hypertable architecture

74 views
Skip to first unread message

HyperUser

unread,
May 24, 2013, 5:33:19 PM5/24/13
to hyperta...@googlegroups.com
I am creating a simple setup.

4 Machines, easy to manually manage so I didn't use Cap. 

SYS-01 - 4 cpu (8core each), 128gb RAM
Hadoop HDFS - namenode, datanode, jobtracker, tasktracker 
Hypertable - Master, Hyperspace, Rangeserver(?)

SYS-02 - 4 cpu (8core each), 128gb RAM
Hadoop HDFS - secondary namenode, datanode 
Hypertable - Hyperspace

SYS-03 - 4 cpu (8core each), 128gb RAM
Hadoop HDFS - datanode
Hypertable - Hyperspace

SYS-04 - 4 cpu (8core each), 128gb RAM
Hadoop HDFS - datanode
Hypertable - Hyperspace


The objective is to run clustered Hypertable on all these four systems. SYS-01 is hypertable master. All four servers run a single database, that gets shared across all servers. I am not looking for any redundancy of data. 

What hypertable processes I shall on other servers? 
Do I need range server? What server I shall run range server on? 
Can I use each machine to read/write queries? or just master I have to use? 
Do I need to run thriftbroker on all servers? 




HdfsBroker.fs.default.name=hdfs:/\/SYS-01:9000



root@SYS-01:/opt/hypertable/0.9.7.6/bin# ./start-all-servers.sh hadoop
DFS broker: available file descriptors: 1024
Started DFS Broker (hadoop)
Started Hyperspace
Started Hypertable.Master
/proc/sys/vm/swappiness = 60
Started Hypertable.RangeServer
Started ThriftBroker




Capfile (though its not in use).

set :source_machine, "SYS-01"
set :install_dir,  "/opt/hypertable"
set :hypertable_version, "0.9.7.6"
set :default_pkg, "/tmp/hypertable-0.9.7.6-linux-x86_64.deb"
set :default_dfs, "hadoop"
set :default_distro, "cdh3"
set :default_config, "/opt/hypertable/dev-hypertable.cfg"

role :source, "SYS-01"
role :master, "SYS-01"
role :hyperspace, "SYS-01", "SYS-02", "SYS-03", "SYS-04"
role :slave,  "SYS-01", "SYS-02", "SYS-03", "SYS-04"
role :localhost, "SYS-01"
role :thriftbroker_additional, "SYS-01", "SYS-02", "SYS-03", "SYS-04"



root@SYS-01:/hyperspace# sudo -u hdfs hadoop dfsadmin -report
Configured Capacity: 2377018916864 (2.16 TB)
Present Capacity: 2174557945856 (1.98 TB)
DFS Remaining: 2174556585984 (1.98 TB)
DFS Used: 1359872 (1.3 MB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 4 (4 total, 0 dead)

Decommission Status : Normal
Configured Capacity: 594254729216 (553.44 GB)
DFS Used: 65536 (64 KB)
Non DFS Used: 50588594176 (47.11 GB)
DFS Remaining: 543666069504(506.33 GB)
DFS Used%: 0%
DFS Remaining%: 91.49%
Last contact: Fri May 24 17:23:18 EDT 2013


Decommission Status : Normal
Configured Capacity: 594254729216 (553.44 GB)
DFS Used: 73728 (72 KB)
Non DFS Used: 50589900800 (47.12 GB)
DFS Remaining: 543664754688(506.33 GB)
DFS Used%: 0%
DFS Remaining%: 91.49%
Last contact: Fri May 24 17:23:17 EDT 2013


Decommission Status : Normal
Configured Capacity: 594254729216 (553.44 GB)
DFS Used: 45056 (44 KB)
Non DFS Used: 50589016064 (47.11 GB)
DFS Remaining: 543665668096(506.33 GB)
DFS Used%: 0%
DFS Remaining%: 91.49%
Last contact: Fri May 24 17:23:18 EDT 2013


Decommission Status : Normal
Configured Capacity: 594254729216 (553.44 GB)
DFS Used: 1175552 (1.12 MB)
Non DFS Used: 50693459968 (47.21 GB)
DFS Remaining: 543560093696(506.23 GB)
DFS Used%: 0%
DFS Remaining%: 91.47%
Last contact: Fri May 24 17:23:17 EDT 2013


HyperUser

unread,
May 24, 2013, 5:39:00 PM5/24/13
to hyperta...@googlegroups.com
Here is picture of this.
HS-sys.jpg

ddorian

unread,
May 25, 2013, 5:57:12 AM5/25/13
to hyperta...@googlegroups.com
What hypertable processes I shall on other servers? 
On each datanode you must have a rangeserver.
Do I need range server? What server I shall run range server on? 
See above. When you query you get the data from the rangeservers. Also for writes.
Can I use each machine to read/write queries? or just master I have to use? 
 The thriftbroker knows which server to query depending on the row-key. It gets this info (where each range is from the master)
Do I need to run thriftbroker on all servers?
Run thriftbroker on each application server.

I am not looking for any redundancy of data.
That you can do it on the configuration file, table creation etc.

HyperUser

unread,
May 25, 2013, 7:23:55 PM5/25/13
to hyperta...@googlegroups.com
In conclusion, I run these processes - 
SYS-01 - master, datanode/hyperspace, thriftbroker and range server
SYS-02 - datanode/hyperspace, thriftbroker and range server
SYS-03 - datanode/hyperspace, thriftbroker and range server
SYS-04 - datanode/hyperspace, thriftbroker and range server

Since all servers will be used to add and read data for an external application.

ddorian

unread,
May 26, 2013, 7:20:55 AM5/26/13
to hyperta...@googlegroups.com
Why 4 hyperspaces? 3 are enough

The application connects with which thriftbroker?
Is the application in another server? (if yes install thriftbroker on that server and remove it from the others)

About the master and rangeserver together I don't know if it's a good idea or not.

Out of curiosity what kind of disk setup do these servers have or are you looking for only in-memory data?


On Friday, May 24, 2013 11:33:19 PM UTC+2, HyperUser wrote:
Message has been deleted

HyperUser

unread,
May 26, 2013, 3:51:38 PM5/26/13
to hyperta...@googlegroups.com
Application, DFS and hypertable will run on all four machines. 
Application shall be able to use all four machines to read and write queries. I don't have a load balancer at this time, so I am manually load balancing from design by pointing applications to individual servers' IPs. 
All machines create one DFS cluster, around 2Tb. Each machine has 500Gb RAID 10, with 10k speed. 
All machines create one distributed database cluster (hyperspace), data is stored on DFS across all servers. One hyperspace across all servers, one namespace, multiple tables that can be read and written from all machines. Not looking for any replication or redundancy of data. 

SYS-01 - master, DFS broker, hyperspace process to join the cluster, DFS datanode so range server process, thriftbroker process for read/write
SYS-02 - DFS broker, hyperspace process to join the cluster, DFS datanode so range server process, thriftbroker process for read/write 
SYS-03 - DFS broker, hyperspace process to join the cluster, DFS datanode so range server process, thriftbroker process for read/write
SYS-04 - DFS broker, hyperspace process to join the cluster, DFS datanode so range server process, thriftbroker process for read/write

Does it sound right? 

dorian i

unread,
May 26, 2013, 4:05:29 PM5/26/13
to hyperta...@googlegroups.com
So the application connects on the thriftbroker that is on localhost?
Usually raid is not used in hypertable/hbase/hdfs/cassandra. And raid is a kind of redundancy & avaibility.
If you remove raid and set replication=2 , you will recover data even after a machine loss. If you just remove the raid you will get faster cluster.



--
You received this message because you are subscribed to a topic in the Google Groups "Hypertable User" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hypertable-user/FcKGgt4I4dw/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to hypertable-us...@googlegroups.com.
To post to this group, send email to hyperta...@googlegroups.com.
Visit this group at http://groups.google.com/group/hypertable-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

HyperUser

unread,
May 26, 2013, 4:16:53 PM5/26/13
to hyperta...@googlegroups.com
Yes, applications are run locally on all four machines along with hadoop dfs and hypertable. For writing the data I am manually using the local IP to avoid delay. And for reading I will again use the local IP but it shall still be able to get data written from other machines since it is a single hyperspace DB.

How do I use replication? That is interesting, I am not particular about using RAID. In fact if I avoid RAID 10 then it will actually double my usable disk space. I chose RAID 10 since it has good read/write performance. I can do RAID 0, as just one flat drive. This also has good read/write performance. 

dorian i

unread,
May 26, 2013, 4:29:45 PM5/26/13
to hyperta...@googlegroups.com
Don't do raid at all. You instruct hypertable on the level of replication for default_data(in the config, i think default is 3)/table creation(see the hql reference page)/access_group(just like table creation).
Hypertable tells hdfs to replicate data and hdfs does.
The easiest option is at table_creation time. Also don't do raid 0 (if you lost 1 disk you loose all disks data).

HyperUser

unread,
May 26, 2013, 4:39:31 PM5/26/13
to hyperta...@googlegroups.com
Hmmm. The server hardware has a raid controller, I have to make either raid 0, 1, 5 or 10 volume. If I make raid 0 then each machine will have a single drive of 1.2Tb. that will give me more usable space. It wont let me use individual disks. I can't avoid making raid. 

My application will have a lot of read/write load, I don't want to lose performance on disk level. I would prefer some replication on hypertable, if I skip raid 10.   

dorian i

unread,
May 27, 2013, 10:45:39 AM5/27/13
to hyperta...@googlegroups.com
There has to be a way to disable the raid controller.
If not then don't use raid5.
Use raid0 if you have absolutely no problem for the hole machine data loss when you loose 1 disk.
Use raid1 for mirroring or raid10 for mirror+striping.
Reply all
Reply to author
Forward
0 new messages