3rd Oct, 2018
The Distiller poses an alternative solution for batch job execution for Data Science jobs with dependencies between scripts to existing systems like Apache Airflow or Luigi. That is mainly due to its local thinking when it comes to dependencies and input data, a modular and re-usable approach through stills, pipes, parameters and age requirements. Additionally, with features like data drivers and a built-in scheduler satisfying the age requirements, the complexity is reduced for the user. Apache Airflow and Luigi are developed over a longer time by more people and offer feature-rich systems. Especially components like a UI makes them easier to be monitored and controlled. Nevertheless, the Distiller has conceptual advantages, while the choice of which system to use is a matter of taste and personal preferences. The modularity allows the re-usage of code for different projects and makes it easier to collaborate with others to create pipelines of scripts and make use of existing ones. The age requirements and on-demand dependency exploration keeps the focus on the current work without the need to have the whole pipeline in mind.
Go to GitHub