BHANU PRAKASH REDDY RELLA

Senior / Staff Data Engineer  ·  AI & ML Data Platforms  ·  IEEE Senior Member
Bhanu Prakash Reddy Rella

Senior Data Engineer with 10+ years building cloud-native data platforms, streaming pipelines, and AI/ML-ready infrastructure at Meta (Ads), Walmart Global Tech, and TCS (Citibank). Expert in Python ETL on Spark/Hive, Kafka + Structured Streaming, and lakehouse architectures on Databricks, Snowflake, and BigQuery across GCP, AWS, and Azure. Architect of streaming-first systems processing 100B+ records/month and 5+ TB/day, with a track record of 40% faster ETL and 30–80% runtime reduction. Production experience with LangChain-based LLM apps, Kubernetes, and Terraform. IEEE Senior Member; Founder, The Green AI Initiative; patent-holder; 22+ publications.

Core Skills
Lakehouse & Warehousing
Databricks (Delta Lake, Unity Catalog, Delta Sharing, Databricks SQL), Snowflake (Snowpipe, Snowpark), BigQuery, Redshift, Synapse, Iceberg, Hive Metastore
Streaming & Distributed
Apache Kafka, Apache Spark, PySpark, Spark Structured Streaming, Spark Streaming, Apache Flink, Hive, Presto/Trino, Hadoop/HDFS
AI/ML & LLM Apps
Feature stores, training-data pipelines, MLflow, MLOps, PyTorch, LangChain, RAG, embedding/semantic search, vector DBs (pgvector, Pinecone), Llama, Claude, OpenAI, Hugging Face, AWS Bedrock
Cloud Platforms
GCP (Dataproc, GCS, Pub/Sub, GKE) · AWS (S3, EMR, Glue, Athena, EKS) · Azure (ADF, ADLS, Synapse, AKS)
DevOps & Infrastructure
Kubernetes (GKE, EKS, AKS, Helm), Terraform, Docker, Git, GitHub Actions, Jenkins, Maven, environment promotion (dev/stage/prod)
Orchestration & Transform
dbt, Apache Airflow, Databricks Workflows, Dagster, Automic, Oozie, Great Expectations
Programming
Python (pandas, NumPy, PySpark), advanced SQL (CTEs, window functions, partition pruning), Scala, Java, Shell
Governance & BI
Data contracts, schema versioning, lineage, anomaly/drift monitoring, PII/PHI, SOX, GDPR · Power BI, Tableau, Looker
Professional Experience
Senior Data Engineer
Jul 2025 — Present
Meta Platforms, Inc. · Menlo Park, CA
  • Architected Databricks Lakehouse pipelines on Azure & AWS consolidating advertising measurement, engagement, and experimentation datasets — near real-time analytics for 500+ internal stakeholders, improving data availability 35%.
  • Engineered PySpark & Spark Structured Streaming workflows processing 8B+ ad interaction events monthly — reduced pipeline latency from 4–5 hours to under 20 minutes and improved campaign attribution accuracy.
  • Developed metadata-driven ETL frameworks (Databricks, Delta Lake, Kafka, Airflow) with schema validation, partition pruning, Z-Ordering, OPTIMIZE/VACUUM, liquid clustering, salting, and reusable PySpark UDFs — reduced operational overhead 40%.
  • Implemented Agentic AI-assisted data observability using OpenAI, LangChain, and RAG over runbooks, metric definitions, and lineage metadata — orchestrating OpenAI, Llama, and Claude to automate root-cause analysis, cutting incident investigation time 55% and saving ~24,000 engineering hours annually.
  • Built scalable feature-engineering pipelines supporting PyTorch-based ML models for lift measurement and advertiser optimization — accelerating experimentation cycles across business units.
  • Established enterprise governance with Unity Catalog, Delta Sharing, and automated data-quality controls — reduced critical data-quality incidents 30%; delivered cloud-native infra via Terraform, Kubernetes, and ADLS.
Stack: Databricks · Delta Lake · PySpark · Spark Structured Streaming · Kafka · Airflow · Unity Catalog · PyTorch · OpenAI · LangChain · Terraform · Kubernetes · SQL
Lead Data Engineer
Dec 2022 — Jun 2025
Walmart Global Tech · Sunnyvale, CA
  • Led modernization of Walmart's supplier analytics platform on Databricks Lakehouse, Apache Iceberg, and Delta Lake — scalable processing of 100B+ records/month, reducing reporting latency from over 24 hours to under 6 hours.
  • Directed Kafka + Spark Structured Streaming ingesting 5+ TB of supply-chain & inventory data daily — real-time visibility across 1,500+ suppliers serving 240M+ weekly customers globally.
  • Designed unified enterprise data architecture across Databricks, BigQuery, Snowflake, and GCS — improved accessibility and reduced duplicate data movement across business domains.
  • Implemented Infrastructure as Code with Terraform & Kubernetes on GCP/Azure — cut environment deployment time from days to under 4 hours.
  • Optimized PySpark/SQL pipelines via partitioning, caching, Delta optimization, and workload tuning — 35% lower compute cost, up to 80% runtime reduction, improved SLA compliance.
  • Drove code-quality governance (SonarQube, automated testing, CI/CD) — 10%+ quality-score lift; established dbt ELT frameworks with Great Expectations validation.
  • Mentored a team of engineers through architecture reviews and cloud-modernization programs — saving ~24,000 engineering hours annually and strengthening engineering best practices.
Stack: Databricks · Delta Lake · Iceberg · Unity Catalog · PySpark · Spark Streaming · Kafka · Kubernetes · Terraform · BigQuery · Snowflake · dbt · Airflow · GCP
Senior Data Engineer
Dec 2020 — Nov 2022
Tata Consultancy Services · Client: Citibank · Irving, TX
  • Designed cloud-native banking data platforms using Azure Data Factory, ADLS Gen2, Databricks, and Synapse Analytics — scalable ingestion of regulatory and financial datasets for enterprise reporting under SOX/PCI-DSS controls.
  • Developed Databricks ETL (PySpark + Delta Lake) for high-volume transaction data — reduced end-to-end processing from 8 hours to under 5 hours while improving reliability for risk and compliance workloads.
  • Migrated legacy warehouses to Snowflake & Azure35% lower storage costs and improved query performance for BI users; built reusable ingestion frameworks integrating APIs, databases, and financial apps into Azure Data Lake.
  • Implemented data quality, lineage, and governance with Great Expectations and cloud-native monitoring — strengthening SOX/PCI-DSS compliance; delivered Power BI dashboards for fraud detection and operational reporting; automated provisioning via Terraform & Azure DevOps.
Stack: Azure Data Factory · ADLS Gen2 · Synapse · Databricks · PySpark · Delta Lake · Snowflake · Terraform · Azure DevOps · Power BI
Data Engineer
Oct 2016 — Nov 2020
Tata Consultancy Services · Client: Citibank · Hyderabad, India
  • Built enterprise-scale ETL with Python, Spark, Hadoop, Hive, and Kafka processing banking transaction and customer datasets — improving data availability for analytics, compliance, and reporting.
  • Developed real-time streaming pipelines with Kafka, Spark Streaming, and Apache Flink — reducing batch dependencies and enabling faster operational insights.
  • Implemented ingestion frameworks using Apache NiFi, Sqoop, and HDFS to automate movement of structured and semi-structured data; optimized Spark workloads via partitioning and resource tuning — 25% faster execution and improved cluster utilization.
Stack: Python · Spark · Spark Streaming · Flink · Kafka · NiFi · Sqoop · HDFS · Hive · Hadoop
Data Analyst
Aug 2015 — Sep 2016
Flipkart · Hyderabad, India
  • Analyzed customer behavior, sales trends, and product performance (SQL, Excel) driving merchandising and business-growth initiatives; designed dimensional data models and ETL workflows for e-commerce reporting, and built dashboards for inventory, order fulfillment, and customer-engagement metrics.
Leadership, Research & Recognition
Education
Doctor of Business Administration — Energy-Efficient AI · Golden Gate University
In Progress
M.S., Management Information Systems · University of Memphis
Grade: A+
B.Tech., Electrical & Electronics Engineering · Jawaharlal Nehru Technological University
Certifications

AWS Cloud Practitioner · Azure Data Engineer Associate · Databricks Certified Data Engineer · GCP Professional Data Engineer · Snowflake SnowPro Core · Oracle Java SE 6 Professional · Data Analytics in Technology · Alteryx