Responsibilities
Design, develop, and maintain scalable data pipelines for collecting, processing, and storing large volumes of data. Collaborate with data scientists and analysts to understand data needs and deliver solutions that support their requirements. Optimize and enhance existing data pipelines to improve performance and scalability. Develop and maintain data warehouse solutions, ensuring data accuracy, consistency, and accessibility. Implement and enforce data governance policies, ensuring data security, privacy, and compliance with industry standards. Monitor and troubleshoot data pipelines, identifying and resolving issues to ensure data availability and reliability. Work with cloud platforms (e.g., AWS, GCP, Azure) to manage and optimize data storage and processing resources. Document data engineering processes and best practices, providing guidance to other team members.
Qualifications
Bachelor’s degree in Computer Science, Engineering, or a related field. A Master’s degree is a plus. 0-3 years of experience as a Data Engineer or in a similar role. Proficiency in programming languages such as Python, Java, or Scala. Experience with ETL tools and frameworks (e.g., Apache Airflow, Talend, Informatica). Strong knowledge of SQL and experience with relational databases (e.g., MySQL, PostgreSQL). Familiarity with big data technologies such as Hadoop, Spark, or Kafka. Experience with cloud platforms (AWS, GCP, Azure) and their data services (e.g., Redshift, BigQuery, Azure Data Lake). Knowledge of data modeling, data warehousing, and database design. Strong problem-solving skills and the ability to work independently or in a team. Excellent communication skills and the ability to collaborate with cross-functional teams.