Hamake과 다른 WorkflowEngines 과 비교(Cascading Oozie Azkaban)

Hamake 소개에 이어 Cascading, Oozie, Azkaban와 비교

원문 : http://code.google.com/p/hamake/wiki/HamakeComparisonWithOtherWorkflowEngines

//======================================================================
The table below attempts to compare Hamake and similar workflow engines for Hadoop (Oozie, Azkaban, Cascading) based on some key features. Although all of these systems could be used to solve similar problems, they differ significantly in design, philosophy, target user profile, usage scenarios, etc. So our feature-wise comparison is in no way conclusive. Please use it as a guideline, but read respective systems documentation to understand better which one is more suitable for your problem.

Feature	Hamake	Oozie	Azkaban	Cascading
workflow discription language	XML	XML (xPDL based)	text file with key/value pairs	Java API
dependencies mechanism	data-driven	explicit	explicit	explicit
requires Servlet/JSP container	No	Yes	Yes	No
allows to track a workflow progress	console/log messages	web page	web page	Java API
ability to schedule a Hadoop job execution at given time	no	yes	yes	yes
execution model	command line utility	daemon	daemon	API
allows to run Pig Latin scripts	yes	yes	yes	yes
event notification	no	no	no	yes
requires installation	no	yes	yes	no
supported Hadoop version	0.18+	0.20+	currently unknown	0.18+
retries	no	at workflow node level	yes	yes
ability to run arbitrary commands	yes	yes	yes	yes
can be run on Amazon EMR	yes	no	currently unknown	yes

From FAQ:

What is the difference between Hamake and Cascading?

In short: Cascading is an API, while ‘hamake’ is an utility. Some differences:
– hamake does not require any custom programming. It helps to automate running your existing Hadoop tasks and PIG scripts
– We found hamake especially suitable for incremental processing of datasets
– You can use ‘hamake’ to automate tasks written in other languages, for example using Hadoop streaming

How Hamake differs from Oozie and Azkaban?

Oozie and Azkaban are server-side systems that have to be installed and run as a service. Hamake is a lightweight client-side utility that does not require installation and has very simple syntax for workflow definition. Most importantly, Hamake is built based on dataflow programming principles – your Hadoop tasks execution sequence is controlled by the data.

//======================================================================

아래 표는 Hamake와 Hadoop기반의 몇 가지 주요 기능을 가진 유사한 워크 플로우 엔진(Oozie, Azkaban, Cascading)을 비교합니다. 시스템의 비슷한 문제를 해결하는 데 사용할 수 있지만, 그들은 디자인, 철학, 대상 사용자, 사용 시나리오 등이 많이 다릅니다. 그래서 우리는 결론이 없는 기능 측면 비교만 합니다. 지침으로만 활용하시기 바랍니다. 하지만, 당신을 문제를 더 잘 이해하기 위해서는 각각의 시스템 문서를 참조하십시오.

기능	Hamake	Oozie	Azkaban	Cascading
workflow 설명 언어	XML	XML (xPDL based)	text file with key/value pairs	Java API
종속 메커니즘	data-driven	explicit	explicit	explicit
Servlet/JSP 컨테이너의 필요	No	Yes	Yes	No
workflow 진행 상황추적	console/log messages	web page	web page	Java API
특정시간에 Hadoop job 실행예약	no	yes	yes	yes
실행 모델	command line utility	daemon	daemon	API
Pig Latin scripts 실행	yes	yes	yes	yes
이벤트 알림	no	no	no	yes
설치 필요여부	no	yes	yes	no
Hadoop 지원버전	0.18+	0.20+	currently unknown	0.18+
재시도	no	at workflow node level	yes	yes
임의명령을 실행	yes	yes	yes	yes
Amazon EMR 실행	yes	no	currently unknown	yes

FAQ 에서

Hamake와 Cascading의 차이점은 무엇입니까?

한마디로 : Cascading는 API, hamake은 utility 이며 이는 아래와 같은 다른점을 같습니다.:
– hamake는 사용자 프로그래밍이 필요하지 않습니다. 이것은 기존의 Hadoop 작업과 PIG scripts의 자동화하는데 도움이 됩니다.
– hamake는 데이터 세트의 증분 처리에 매우 적합합니다.
– 다른 언어로 작성된 작업을 자동화하려면 ‘hamake’를 사용할 수 있습니다. Hadoop 스트리밍을 예로들수 있습니다.

Hamake는 Oozie와 Azkaban와 어떻게 다릅니까?

Oozie 및 Azkaban는 설치 및 서비스(데몬)로 실행해야 할 서버 측 시스템입니다. Hamake 설치를 요구하지 않는 가벼운 클라이언트 유틸리티이며, 워크 플로우 정의를 매우 간단한 구문으로 할 수 있습니다. 가장 중요한 것은, Hamake는 데이터 흐름 프로그래밍 원칙을 기반으로 내장되어 있습니다. – Hadoop 작업 실행 순서는 데이터에 의해 제어됩니다.

Apollo89.com

아폴로씨의 잡다한 경험들..

Hamake과 다른 WorkflowEngines 과 비교(Cascading Oozie Azkaban)

댓글 남기기응답 취소

카테고리

최신 댓글

메타