Posted time March 19, 2025 Location India Job type Full-time
  • Experience: 8+Years

Job Description…

Experienceย  Scala (mandatory)+Spark +AWS, develop and maintain scalable ETL pipelines using Spark (with Scala) to process large datasets in a cloud-based environment. Implement data transformation logic using Spark for batch and real-time processing, ensuring efficient handling of structured and unstructured data. Integrate data from various sources (e.g., databases, cloud storage, APIs) into a centralized data warehouse or lake, leveraging Spark and other relevant technologies. Implement data validation, cleansing, and quality checks within the data pipeline to ensure the accuracy, consistency, and integrity of non-sensitive data. Optimize Spark jobs for better performance, reduced processing time, and cost efficiency, applying best practices for cluster and resource management.1.Pyspark Databricks lead for next-gen migration project2. converting complex requirements from NextGen into pyspark notebook using data frames and spark sql3. integrating airflow with data bricks notebooks and troubleshooting issues in migration related to data mismatchย