Data Engineer | Noida Apply Now
We are seeking a motivated and detail-oriented Data Engineer to design, develop, and maintain scalable data pipelines and big data solutions. The ideal candidate should have hands-on experience with Scala, Apache Spark, Hadoop, SQL, and modern data engineering practices. You will work closely with data scientists, software engineers, business analysts, and product teams to build reliable data platforms that enable analytics, reporting, and business intelligence.
In this role, you will be responsible for developing high-performance data pipelines, optimizing large-scale data processing workflows, ensuring data quality, and supporting enterprise data initiatives. This position offers an excellent opportunity to work with modern big data technologies and cloud-ready data architectures.
Key Responsibilities
Data Pipeline Development
- Design, develop, test, deploy, and maintain scalable data pipelines using Scala, Apache Spark, Hadoop, and SQL.
- Build efficient ETL/ELT workflows for processing structured and unstructured data.
- Develop reusable and optimized data processing components.
- Automate data ingestion, transformation, and loading processes.
- Ensure high-quality, reliable, and maintainable data pipelines.
Big Data Engineering
- Process large-scale datasets using distributed computing frameworks.
- Optimize Spark jobs for performance, scalability, and resource utilization.
- Work with Hadoop ecosystem components for distributed data storage and processing.
- Support batch and near real-time data processing workflows.
- Monitor and improve big data platform performance.
Database & Data Management
- Design and optimize SQL queries for efficient data retrieval.
- Work with relational databases such as MySQL and PostgreSQL.
- Ensure data integrity, consistency, and accuracy across systems.
- Support database optimization and performance tuning.
- Manage data storage and lifecycle processes.
Collaboration & Solution Delivery
- Collaborate with cross-functional teams to understand business and technical requirements.
- Translate business needs into scalable data engineering solutions.
- Work closely with data scientists and analysts to prepare datasets for analytics and machine learning.
- Participate in Agile development processes and sprint planning.
- Maintain technical documentation and implementation records.
Performance & Quality
- Troubleshoot complex issues related to data processing, storage, and retrieval.
- Ensure data platform scalability, security, reliability, and availability.
- Perform data validation and quality assurance activities.
- Optimize data workflows for improved efficiency and reduced processing time.
- Follow data engineering best practices and coding standards.
Required Skills
Data Engineering
- Strong experience in Data Engineering concepts and practices.
- Hands-on experience with:
- Scala
- Apache Spark
- Hadoop
- SQL
- Understanding of distributed computing and large-scale data processing.
- Experience developing ETL/ELT pipelines.
- Knowledge of data modeling and data architecture principles.
Big Data Technologies
- Hadoop Ecosystem
- HDFS
- MapReduce
- Apache Spark
- Distributed Data Processing
- Batch Processing
Databases
- MySQL
- PostgreSQL
- SQL Query Optimization
- Database Performance Tuning
- Data Modeling
Programming & Development
- Scala
- SQL
- Python (Preferred)
- Git & Version Control
- Linux Basics
Professional Skills
- Strong analytical and problem-solving skills.
- Excellent communication and collaboration abilities.
- Ability to work independently and in Agile teams.
- Strong attention to detail and commitment to quality.
- Willingness to learn emerging data engineering technologies.
Preferred Skills
- Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
- Familiarity with Apache Kafka or streaming technologies.
- Knowledge of Airflow or workflow orchestration tools.
- Experience with Docker and Kubernetes.
- Exposure to Data Warehousing concepts.
- Understanding of CI/CD pipelines and DevOps practices.
- Basic knowledge of machine learning data pipelines.
Technologies & Tools
Big Data
- Apache Spark
- Hadoop
- HDFS
- MapReduce
Programming
- Scala
- SQL
- Python
Databases
- MySQL
- PostgreSQL
Cloud & DevOps
- AWS (Preferred)
- Azure (Preferred)
- Docker
- Kubernetes
Development Tools
- Git
- Linux
- IntelliJ IDEA
- Maven
Education
- Bachelor’s degree in Computer Science, Information Technology, Data Science, Software Engineering, or a related field.
- B.Tech / BE in Information Technology, Computer Science, Electronics, or equivalent disciplines preferred.
- MCA, M.Tech, M.Sc. (Computer Science, Data Science, Information Technology), or equivalent qualifications are an added advantage.