BHANU PRAKASH REDDY RELLA

Senior / Staff Data Engineer · AI & ML Data Platforms · IEEE Senior Member

California, USA· (901) 601-8664· 27rellaprakash@gmail.com· linkedin.com/in/rbhanur· Google Scholar· bhanurella.com

Senior Data Engineer with 10+ years building cloud-native data platforms, streaming pipelines, and AI/ML-ready infrastructure at Meta (Ads), Walmart Global Tech, and TCS (Citibank). Expert in Python ETL on Spark/Hive, Kafka + Structured Streaming, and lakehouse architectures on Databricks, Snowflake, and BigQuery across GCP, AWS, and Azure. Architect of streaming-first systems processing 100B+ records/month and 5+ TB/day, with a track record of 40% faster ETL and 30–80% runtime reduction. Production experience with LangChain-based LLM apps, Kubernetes, and Terraform. IEEE Senior Member; Founder, The Green AI Initiative; patent-holder; 22+ publications.

Core Skills

Lakehouse & Warehousing

Databricks (Delta Lake, Unity Catalog, Delta Sharing, Databricks SQL), Snowflake (Snowpipe, Snowpark), BigQuery, Redshift, Synapse, Iceberg, Hive Metastore

Streaming & Distributed

Apache Kafka, Apache Spark, PySpark, Spark Structured Streaming, Spark Streaming, Apache Flink, Hive, Presto/Trino, Hadoop/HDFS

AI/ML & LLM Apps

Feature stores, training-data pipelines, MLflow, MLOps, PyTorch, LangChain, RAG, embedding/semantic search, vector DBs (pgvector, Pinecone), Llama, Claude, OpenAI, Hugging Face, AWS Bedrock

Cloud Platforms

GCP (Dataproc, GCS, Pub/Sub, GKE) · AWS (S3, EMR, Glue, Athena, EKS) · Azure (ADF, ADLS, Synapse, AKS)

DevOps & Infrastructure

Kubernetes (GKE, EKS, AKS, Helm), Terraform, Docker, Git, GitHub Actions, Jenkins, Maven, environment promotion (dev/stage/prod)

Orchestration & Transform

dbt, Apache Airflow, Databricks Workflows, Dagster, Automic, Oozie, Great Expectations

Programming

Python (pandas, NumPy, PySpark), advanced SQL (CTEs, window functions, partition pruning), Scala, Java, Shell

Governance & BI

Data contracts, schema versioning, lineage, anomaly/drift monitoring, PII/PHI, SOX, GDPR · Power BI, Tableau, Looker

Professional Experience

Senior Data Engineer

Jul 2025 — Present

Meta Platforms, Inc. · Menlo Park, CA

Architected Databricks Lakehouse pipelines on Azure & AWS consolidating advertising measurement, engagement, and experimentation datasets — near real-time analytics for 500+ internal stakeholders, improving data availability 35%.
Engineered PySpark & Spark Structured Streaming workflows processing 8B+ ad interaction events monthly — reduced pipeline latency from 4–5 hours to under 20 minutes and improved campaign attribution accuracy.
Developed metadata-driven ETL frameworks (Databricks, Delta Lake, Kafka, Airflow) with schema validation, partition pruning, Z-Ordering, OPTIMIZE/VACUUM, liquid clustering, salting, and reusable PySpark UDFs — reduced operational overhead 40%.
Implemented Agentic AI-assisted data observability using OpenAI, LangChain, and RAG over runbooks, metric definitions, and lineage metadata — orchestrating OpenAI, Llama, and Claude to automate root-cause analysis, cutting incident investigation time 55% and saving ~24,000 engineering hours annually.
Built scalable feature-engineering pipelines supporting PyTorch-based ML models for lift measurement and advertiser optimization — accelerating experimentation cycles across business units.
Established enterprise governance with Unity Catalog, Delta Sharing, and automated data-quality controls — reduced critical data-quality incidents 30%; delivered cloud-native infra via Terraform, Kubernetes, and ADLS.

Stack: Databricks · Delta Lake · PySpark · Spark Structured Streaming · Kafka · Airflow · Unity Catalog · PyTorch · OpenAI · LangChain · Terraform · Kubernetes · SQL

Lead Data Engineer

Dec 2022 — Jun 2025

Walmart Global Tech · Sunnyvale, CA

Led modernization of Walmart's supplier analytics platform on Databricks Lakehouse, Apache Iceberg, and Delta Lake — scalable processing of 100B+ records/month, reducing reporting latency from over 24 hours to under 6 hours.
Directed Kafka + Spark Structured Streaming ingesting 5+ TB of supply-chain & inventory data daily — real-time visibility across 1,500+ suppliers serving 240M+ weekly customers globally.
Designed unified enterprise data architecture across Databricks, BigQuery, Snowflake, and GCS — improved accessibility and reduced duplicate data movement across business domains.
Implemented Infrastructure as Code with Terraform & Kubernetes on GCP/Azure — cut environment deployment time from days to under 4 hours.
Optimized PySpark/SQL pipelines via partitioning, caching, Delta optimization, and workload tuning — 35% lower compute cost, up to 80% runtime reduction, improved SLA compliance.
Drove code-quality governance (SonarQube, automated testing, CI/CD) — 10%+ quality-score lift; established dbt ELT frameworks with Great Expectations validation.
Mentored a team of engineers through architecture reviews and cloud-modernization programs — saving ~24,000 engineering hours annually and strengthening engineering best practices.

Stack: Databricks · Delta Lake · Iceberg · Unity Catalog · PySpark · Spark Streaming · Kafka · Kubernetes · Terraform · BigQuery · Snowflake · dbt · Airflow · GCP

Senior Data Engineer

Dec 2020 — Nov 2022

Tata Consultancy Services · Client: Citibank · Irving, TX

Designed cloud-native banking data platforms using Azure Data Factory, ADLS Gen2, Databricks, and Synapse Analytics — scalable ingestion of regulatory and financial datasets for enterprise reporting under SOX/PCI-DSS controls.
Developed Databricks ETL (PySpark + Delta Lake) for high-volume transaction data — reduced end-to-end processing from 8 hours to under 5 hours while improving reliability for risk and compliance workloads.
Migrated legacy warehouses to Snowflake & Azure — 35% lower storage costs and improved query performance for BI users; built reusable ingestion frameworks integrating APIs, databases, and financial apps into Azure Data Lake.
Implemented data quality, lineage, and governance with Great Expectations and cloud-native monitoring — strengthening SOX/PCI-DSS compliance; delivered Power BI dashboards for fraud detection and operational reporting; automated provisioning via Terraform & Azure DevOps.

Stack: Azure Data Factory · ADLS Gen2 · Synapse · Databricks · PySpark · Delta Lake · Snowflake · Terraform · Azure DevOps · Power BI

Data Engineer

Oct 2016 — Nov 2020

Tata Consultancy Services · Client: Citibank · Hyderabad, India

Built enterprise-scale ETL with Python, Spark, Hadoop, Hive, and Kafka processing banking transaction and customer datasets — improving data availability for analytics, compliance, and reporting.
Developed real-time streaming pipelines with Kafka, Spark Streaming, and Apache Flink — reducing batch dependencies and enabling faster operational insights.
Implemented ingestion frameworks using Apache NiFi, Sqoop, and HDFS to automate movement of structured and semi-structured data; optimized Spark workloads via partitioning and resource tuning — 25% faster execution and improved cluster utilization.

Stack: Python · Spark · Spark Streaming · Flink · Kafka · NiFi · Sqoop · HDFS · Hive · Hadoop

Data Analyst

Aug 2015 — Sep 2016

Flipkart · Hyderabad, India

Analyzed customer behavior, sales trends, and product performance (SQL, Excel) driving merchandising and business-growth initiatives; designed dimensional data models and ETL workflows for e-commerce reporting, and built dashboards for inventory, order fulfillment, and customer-engagement metrics.

Leadership, Research & Recognition

IEEE Senior Member (2025, top 10% of IEEE); Secretary, IEEE SCV Computer Society Chapter C16 (2026); Advisory Member, IEEE DataPort; Keynote Speaker, IEEE Cloud Summit 2025; Peer Reviewer for IEEE PAMI, Elsevier JPDC, PLOS ONE.
Founder, The Green AI Initiative — sustainable AI & energy-efficient systems; RAG prototypes on AWS Bedrock. 22+ publications, 1 book, 1 patent (sparse neural architectures), 7 research chapters. Recent: JoVE 2026, Springer IJIT 2026, ACL NLP4DH 2025.

Education

Doctor of Business Administration — Energy-Efficient AI · Golden Gate University

In Progress

M.S., Management Information Systems · University of Memphis

Grade: A+

B.Tech., Electrical & Electronics Engineering · Jawaharlal Nehru Technological University

Certifications

AWS Cloud Practitioner · Azure Data Engineer Associate · Databricks Certified Data Engineer · GCP Professional Data Engineer · Snowflake SnowPro Core · Oracle Java SE 6 Professional · Data Analytics in Technology · Alteryx