S*******e 发帖数: 525 | 1 你们用的Hadoop Cluster是怎么升级的?下面是我的问题
Rolling Upgrade Hadoop Cluster Question
In our company, one of main Hadoop clusters (HDP) has about 600 nodes. It
upgrades almost monthly plus some other maintenance. Every time doing so
takes hours to a couple of days and all apps running on it have to be shut
off. I just cannot imagine the clusters performing such important work in
other companies will get interrupted so often and so long. I asked why don't
we do rolling upgrade? Here is one of main architect's answer. Is it true?
How about the upgrades in your company?
================================================
Regarding rolling upgrades, I want to be careful that everyone
understand what happens during this process. Up to 12 nodes per hour get
upgraded to the next version of HDP. As this process continues with each
passing hour the capacity of the cluster is reduced by X number of nodes
that get completed. When the cluster gets in the neighborhood of 75% a
restart is required for most of the services. The core services are handled
under the up-time such as MapReduce, HDFS, Name Node HA, Resource Manager HA
, Zookeeper and Hive HA if it is configured. Spark, Kafka, Storm and the
other services are not included in Rolling upgrade with no downtime. Express
upgrade has allowed our team to upgrade the clusters in a much faster
timeframe. The last upgrade of the cluster was 5 hours. I believe the issue
of downtime you stated above with 2 days and 4 hours would not be correct
for the actual HDP downtime. This is likely the entire maintenance which
would include Ambari Upgrades, HDP upgrades, stopping jobs, sanity checks,
and restarting all of the jobs to complete catch up with batch processing. I
would like to suggest that your team is engaged with the messages that will
be sent out and stop your job at the time the upgrade will be executing
which would be on Saturday morning. When the upgrade is completed you will
be able to start your job again, another notification will be sent out. | S***s 发帖数: 104 | 2 rolling upgrades会有很多data reblancing带来的系统负担
如果本来系统就很忙了,那是会出问题的
做好先把富余搞大一点,这样滚动升级带来的额外负担不影响已有的工作负荷
't
?
【在 S*******e 的大作中提到】 : 你们用的Hadoop Cluster是怎么升级的?下面是我的问题 : Rolling Upgrade Hadoop Cluster Question : In our company, one of main Hadoop clusters (HDP) has about 600 nodes. It : upgrades almost monthly plus some other maintenance. Every time doing so : takes hours to a couple of days and all apps running on it have to be shut : off. I just cannot imagine the clusters performing such important work in : other companies will get interrupted so often and so long. I asked why don't : we do rolling upgrade? Here is one of main architect's answer. Is it true? : How about the upgrades in your company? : ================================================
| w***g 发帖数: 5958 | 3 0.21.0用到今天的飘过。
't
?
【在 S*******e 的大作中提到】 : 你们用的Hadoop Cluster是怎么升级的?下面是我的问题 : Rolling Upgrade Hadoop Cluster Question : In our company, one of main Hadoop clusters (HDP) has about 600 nodes. It : upgrades almost monthly plus some other maintenance. Every time doing so : takes hours to a couple of days and all apps running on it have to be shut : off. I just cannot imagine the clusters performing such important work in : other companies will get interrupted so often and so long. I asked why don't : we do rolling upgrade? Here is one of main architect's answer. Is it true? : How about the upgrades in your company? : ================================================
| a*****s 发帖数: 1121 | 4 一般公司都是物尽其用,大cluster意味着运行的程序更多。要找富余,只有根据历史
记录找application少的时间。这段时间能否保证所有都upgrade。不清楚是不是
hortonworks自己做过大规模RU测试没有。
【在 S***s 的大作中提到】 : rolling upgrades会有很多data reblancing带来的系统负担 : 如果本来系统就很忙了,那是会出问题的 : 做好先把富余搞大一点,这样滚动升级带来的额外负担不影响已有的工作负荷 : : 't : ?
|
|