Original Post
**Innovation Hub Overview** Jefferies is creating a Technology Innovation Hub in Pune, a greenfield opportunity to build the systems that power global markets. As our first India technology center, this hub brings together hands on builders who engineer the platforms behind Jefferies’ growth across capital markets, investment banking, and institutional securities. We’re scaling toward an elite team of 500 engineers while maintaining the agility, ownership, and meritocratic spirit that defines Jefferies. From cloud and data to AI, risk, and core business technologies, teams in Pune will lead high impact work with a global mandate. **Team Overview** **IT Infrastructure Technology** Jefferies’ IT Infrastructure Technology team builds and runs the core technology backbone that enables the firm to operate globally. It covers enterprise infrastructure engineering and operations while driving modernization initiatives such as Cloud Adoption and Network Resiliency. The group supports critical platforms including networking, end\-user computing, cloud, databases, server/Unix engineering, communications, and global data centers (including low\-latency colocations near exchanges). **Role summary** Build and operate scalable, reliable data pipelines and infrastructure on AWS to power analytics, reporting, and data\-driven decision\-making. As a Cloud Data Engineer, you will design and implement data ingestion, transformation, and orchestration workflows, optimize data storage and processing, and ensure data quality and governance. You will partner with Analytics, Data Platform, ML Engineering, and business stakeholders to deliver high\-quality data products that meet business needs. You will collaborate with Analytics Engineers, Data Analysts, ML Engineers, Data Platform Engineers, Cloud Security, Database, and business stakeholders all divisions. Strong teamwork, customer service orientation, and the ability to translate business requirements into technical solutions are essential. Experience working in Agile teams using Jira and Confluence is expected **Key responsibilities** * Design, build, and maintain scalable data pipelines and ETL/ELT workflows on AWS using services such as Glue, EMR, Lambda, Step Functions, Kinesis, and S3 * Implement data ingestion from diverse sources (databases, APIs, streaming platforms, third\-party providers); ensure reliability, performance, and error handling * Develop and optimize data transformation logic using SQL, Python, PySpark, or similar frameworks; ensure data quality, consistency, and lineage * Design and implement data storage solutions on AWS: S3 data lakes, Redshift data warehouses, DynamoDB, RDS, and Aurora; optimize for cost, performance, and access patterns * Build and maintain Infrastructure as Code (Terraform) for data infrastructure; follow team standards for modules, state management, and Terraform Enterprise workflows * Implement CI/CD pipelines for data workflows using GitHub, Bamboo, GitLab, or similar tools; ensure automated testing, deployment, and monitoring * Establish data governance and security controls: encryption at rest and in transit, IAM policies, data classification, audit logging, and compliance with regulatory requirements * Collaborate with Analytics Engineers, Data Analysts, and ML Engineers to understand data requirements and deliver datasets optimized for downstream consumption * Monitor and troubleshoot data pipeline performance, failures, and data quality issues; implement proactive alerting and remediation * Partner with Cloud Architecture, Cloud Security, and Database teams to ensure data infrastructure aligns with enterprise standards and best practices * Document data pipelines, data models, and operational procedures; contribute to team knowledge base * Drive automation and toil reduction; leverage GenAI and agentic workflows to improve engineering productivity * Proficiency using GenAI assistants (ChatGPT, Claude, GitHub Copilot) for SQL generation, pipeline code development, troubleshooting, and documentation * Ability to implement AI agent\-driven automation (agentic workflows) for data engineering tasks (e.g., data quality checks, anomaly detection, pipeline optimization) with enterprise safety controls: human\-in\-the\-loop validation, comprehensive logging and auditability, guardrails to prevent data corruption, rollback mechanisms, and secure credential handling * Proactive mindset to identify data engineering toil; ship automation that measurably reduces manual work and improves pipeline reliability * Monitor data pipeline health and SLAs; respond to failures and data quality incidents promptly * Participate in on\-call rotation as needed for critical data workflows * Conduct post\-incident reviews for data pipeline failures; track corrective actions to completion * Maintain runbooks and operational documentation for data infrastructure * Continuously improve pipeline performance, cost efficiency, and data quality **Requirements** * 7\+ years in data engineering, data platform, or analytics engineering roles with strong cloud focus * Deep expertise in AWS data services: S3, Glue, EMR, Redshift, Athena, Lambda, Kinesis, Step Functions, and Lake Formation * Strong programming skills in Python and SQL; experience with PySpark or similar big data frameworks * Proficiency building ETL/ELT pipelines at scale; experience with data orchestration tools (Airflow, Step Functions, Prefect) * Solid understanding of data modelling, data warehousing, and dimensional design (star schema, snowflake schema) * Experience with Infrastructure as Code (Terraform) and CI/CD pipelines for data infrastructure * Strong understanding of data governance, security, and compliance best practices * Familiarity with streaming data platforms (Kafka, Kinesis, MSK) is a plus * Excellent problem\-solving skills and attention to data quality and reliability * Strong communication and collaboration skills; ability to work with technical and non\-technical stakeholders **Preferred** * AWS Certified Data Analytics or Big Data Specialty certification * Experience with Snowflake or Databricks for data processing and analytics * Familiarity with dbt (data build tool) for transformation workflows * Background in software engineering or site reliability engineering * Experience with data catalog and metadata management tools (AWS Glue Catalog, Collibra, Alation) * Knowledge of DataOps practices and data quality frameworks We have been made aware of bad actors falsely claiming to be associated with Jefferies Group soliciting individuals to attend virtual job interviews, complete online tests or courses and sending fictitious employment offer letters. Please note that any email contact with Jefferies personnel will come from an “@jefferies.com” email address. Further, Jefferies will not notify shortlisted candidates through social media platforms (e.g. WhatsApp or Telegram) or ask candidates to make payment to participate in the hiring process. \#LI\-MF1
Preparing for this role?
Practice with an AI interviewer tailored to Cloud Data Engineer at Jefferies LLC.