Catawiki Company Profile

Data Engineer - Hadoop at Catawiki (Amsterdam, Netherlands)

About the Employer

Job Description

Our Data Story With thousands of active lots every day, hundreds of thousands of daily bids, millions of users, and countless marketing campaigns, the Catawiki platform generates vast amount of data every day. This data is being collected and stored in the Data warehouse, which is used extensively by our team of 10 analysts and data scientists. There are also hundreds of active internal users of our Web interface to the Data warehouse.   As Catawiki grows and produces more data, our Analysts would like to perform even more sophisticated analyses, our Data Scientists want more data,  and yes, Product owners want more dashboards! With this in mind, Catawiki is building a new distributed data platform using tools from the Hadoop ecosystem that can accommodate the rapidly growing amount of data and requirements of analysts, data scientists, and product owners.  We have started streaming event data and writing data pipelines in Apache Airflow that transform this data with Apache Spark, and now Catawiki needs more Data Engineers to help us tackle the challenge of bringing even more data sources into our platform and transforming and organizing this data for analysis and consumption by other applications. It’s a big challenge but one that we’re really excited about solving.  What you will do Evolve our current infrastructure to a distributed system and help build scalable data pipelines using the Hadoop ecosystem. You will work in cooperation with our Software Engineers, Devops Engineers, Data Scientists and Product Managers to: Explore new ways of transforming and analyzing data and continuously expand and improve the performance of our data pipelines. Build prototypes, fast, and determine what their worth are in the business and within the infrastructure before iterating and improving them. Work closely with Data Scientists and Product Managers to decide how best to structure and store data in order to make it easily accessible to business users. Evaluate and develop highly distributed Big Data solutions; You will advance our software architecture and tool set to meet growing business requirements regarding performance and data-quality. Who you are A Data Engineer who likes to experiment with and explore new tools and technologies. You will be familiar with tools in the Hadoop ecosystem including Spark, Kafka, Hive or similar. You are a Software Engineer with experience in modern backend web technologies. You know how to design and build low-maintenance, high performing ETL processes and Data pipelines. You can communicate an idea clearly on various levels of abstraction, depending on the audience. Professional experience with relational databases: reading, writing and optimizing complex statements. We have a strong preference for someone experienced with Python rather than Java