AI-Powered Data Engineering: Automating ETL Pipelines for Scalable Cloud Analytics

Authors

  • Gnanendra Reddy Muthirevula Tekvana Inc, USA Author
  • Ravi Kumar Kota Topgolf Callaway Brands, USA Author
  • Debabrata Das Deloitte Consulting, USA Author

Keywords:

AI-powered ETL, data engineering automation, cloud analytics, predictive analytics

Abstract

AI has revolutionised the data engineering in automation of Extract, Transform, Load (ETL) pipelines. This significantly enhances the efficiency, scalability, and reliability of cloud-based analytics. The objective of this research paper is to dive into the application of artificial intelligence in ETL automation workflow, data ingestion, transformation, and governance across modern cloud platforms such as Snowflake, Apache Airflow, and SnapLogic. By utilising machine learning algorithms system can optimise schema evolution, anomaly detection, and data quality management. This results in reducing operational overhead and minimising human intervention in data pipeline.

Downloads

Download data is not yet available.

References

J. Lin, J. Wang, and M. Zhang, "Automating ETL Pipelines Using Deep Learning: A

Review of Techniques and Applications," IEEE Transactions on Knowledge and Data

Engineering, vol. 35, no. 4, pp. 1123–1137, Apr. 2023.

S. Agarwal and R. K. Gupta, "Artificial Intelligence in Data Engineering: Enhancing

ETL Performance Using Machine Learning," IEEE Access, vol. 10, pp. 55823–55836,

Singu, Santosh Kumar. "Real-Time Data Integration: Tools, Techniques, and Best

Practices." ESP Journal of Engineering & Technology Advancements 1.1 (2021):

-172.

Newark Journal of Human-Centric AI and Robotics Interaction

Vol. 3 - 2023

Singu, Santosh Kumar. "ETL Process Automation: Tools and Techniques." ESP

Journal of Engineering & Technology Advancements 2.1 (2022): 74-85.

A. Kumar, P. Singh, and L. Zhao, "AI-Driven Data Transformation: A Scalable

Approach for Big Data Processing," Proceedings of the IEEE International

Conference on Big Data (BigData), Dec. 2021, pp. 478–486.

C. Li and J. Han, "Automated Schema Evolution in Data Lakes Using AI-Powered

ETL Systems," IEEE Transactions on Big Data, vol. 8, no. 3, pp. 945–960, Sept.

R. S. Sharma and V. Bansal, "Intelligent Data Governance and Metadata

Management in AI-Enhanced ETL Workflows," IEEE International Conference on

Cloud Computing (CLOUD), Jul. 2023, pp. 154–161.

Y. Wang, A. G. Ororbia, and M. Giles, "Neural Networks for Feature Engineering in

Automated Data Processing Pipelines," IEEE Transactions on Neural Networks and

Learning Systems, vol. 34, no. 2, pp. 657–672, Feb. 2023.

S. Patel and B. Roy, "AI-Based Anomaly Detection for Data Quality Management in

ETL Systems," IEEE International Conference on Data Engineering (ICDE), May

, pp. 731–742.

M. C. Johnson and K. Ramesh, "AI-Powered Workflow Orchestration in Cloud-Native

ETL Pipelines," IEEE Transactions on Cloud Computing, vol. 9, no. 4, pp.

–1173, Dec. 2022.

T. Nguyen and P. Joshi, "Serverless ETL Architectures: AI-Driven Optimization for

Scalable Cloud Data Processing," IEEE Access, vol. 11, pp. 10329–10344, 2023.

D. Chen and F. Liu, "A Comparative Analysis of AI-Enhanced vs. Rule-Based ETL

Systems in Large-Scale Data Warehousing," IEEE Transactions on Industrial

Informatics, vol. 18, no. 9, pp. 13497–13512, Sept. 2022.

L. Zhang, H. Tan, and K. Xu, "Real-Time AI-Powered ETL for IoT Data Streams: A

Performance Evaluation," IEEE Internet of Things Journal, vol. 10, no. 3, pp.

–2159, Mar. 2023.

W. Lee and J. Park, "Cost Optimization in Cloud-Based ETL Using Reinforcement

Learning," IEEE Transactions on Cloud Computing, vol. 11, no. 1, pp. 114–128, Jan.

P. White and M. Brown, "Federated Learning for Privacy-Preserving Data

Transformation in AI-Based ETL," IEEE Transactions on Information Forensics and

Security, vol. 17, no. 4, pp. 589–601, Apr. 2022.

C. Robinson and D. Patel, "AI for Data Integration: Enhancing Connectivity Between

Structured and Unstructured Sources," Proceedings of the IEEE International

Conference on Data Science and Advanced Analytics (DSAA), Nov. 2022, pp.

–353.

H. K. Wong and Y. Zhang, "Deep Learning Techniques for Automated ETL

Performance Optimization in Cloud Environments," IEEE International Conference on

Cloud Computing (CLOUD), Jul. 2023, pp. 245–252.

A. Das and S. Mishra, "AI-Based Policy Enforcement in Data Governance: A Case

Study in Financial Data Processing," IEEE Transactions on Dependable and Secure

Computing, vol. 20, no. 1, pp. 147–159, Jan. 2023.

T. A. Wilson and J. Smith, "Scalability and Adaptability in AI-Driven ETL for Dynamic

Data Environments," IEEE Transactions on Parallel and Distributed Systems, vol. 34,

no. 6, pp. 975–989, Jun. 2023.

B. Nguyen and A. Singh, "AI-Enhanced Load Balancing for High-Performance ETL

Pipelines," IEEE International Conference on Distributed Computing Systems

(ICDCS), Jul. 2022, pp. 689–700.

M. Garcia and D. Liu, "Challenges and Limitations of AI-Driven Data Engineering:

Addressing Bias and Model Drift," IEEE Transactions on Artificial Intelligence, vol. 3,

no. 1, pp. 83–97, Jan. 2023.

F. Hernandez and P. Wang, "Future Trends in AI-Optimized ETL: A Quantum

Computing Perspective," IEEE Transactions on Emerging Topics in Computing, vol.

, no. 2, pp. 351–366, Apr. 2023.

Downloads

Published

03-08-2023

How to Cite

[1]
Gnanendra Reddy Muthirevula, Ravi Kumar Kota, and Debabrata Das, “AI-Powered Data Engineering: Automating ETL Pipelines for Scalable Cloud Analytics”, Newark J. Hum. Centric AI Robot Inter., vol. 3, pp. 182–223, Aug. 2023, Accessed: Dec. 21, 2025. [Online]. Available: https://www.njhcair.org/index.php/publication/article/view/15