Role & Responsibilities: Data Warehouse and Business Intelligence Engineering
To LEAD, OVERSEE, and GUIDE Data Integration, ETL, and Data Pipeline Engineering activities for end-to-end business solutions, ensuring high-performance, scalable, and reliable data movement across on-premise, cloud, and hybrid architectures using batch, API, Streaming, or microservices. This role plays a critical role in automating, optimizing, and modernizing data integration workflows while ensuring data quality, governance, and observability.
Strategic Leadership & Governance
- Enterprise Data Integration Strategy: Drive end-to-end data pipeline architecture across batch, real-time streaming, API-based, and cloud-native integrations.
- Multi-Cloud & Hybrid Data Architecture: Design scalable, flexible, and fault-tolerant data integration strategies spanning on-prem, Hadoop, and GCP (BigQuery, Cloud Storage, Pub/Sub, Dataflow, Dataproc).
- Vendor & Stakeholder Management: Collaborate with Data Engineers, BI Developers, Cloud Engineers, and Vendor Partners to ensure SLA compliance and optimal data flow management
Big Data, Hadoop & NoSQL Integration
- Hadoop Ecosystem Mastery: Deep expertise in HDFS, Hive, Spark, Impala, HBase, Kafka, Oozie, and Sqoop.
- Optimized Data Processing: Implement distributed computing models for massive-scale ETL & analytics workloads.
- Data Lake & Datalakehouse Optimization: Architect data ingestion pipelines for structured, semi-structured, and unstructured data into Delta Lake, Iceberg, or BigQuery.
API-Based Data Integration
- Microservices & API Integration: Develop high-performance API-based ETL solutions using REST, gRPC, GraphQL, and WebSockets for real-time data exchange.
- HBase & NoSQL API Integration: Enable low-latency API access to HBase, Cassandra, and DynamoDB for high-throughput operational analytics.
- Data Federation & Virtualization: Implement Federated Queries and Data Virtualization for seamless cross-platform data access.
Real-Time Streaming & Event-Driven Architecture
- Enterprise Streaming Pipelines: Design & optimize Kafka, Flink, Spark Streaming, and Pub/Sub for real-time data ingestion and transformation.
- Event-Driven ETL Pipelines: Enable Change Data Capture (CDC) and event-based data processing for real-time decision-making.
- Kafka Integration: Develop high-throughput, scalable Kafka pipelines with Kafka Connect, Schema Registry, and KSQL.
- HBase Streaming: Leverage HBase + Kafka for low-latency, high-volume event ingestion & querying.
Cloud Data Engineering & GCP Capabilities
- BigQuery Optimization: Leverage partitioning, clustering, and materialized views for cost-effective and high-speed queries.
- ETL & Orchestration: Develop robust ETL/ELT pipelines using Cloud Data Fusion, Apache Beam, Dataflow, and Airflow.
- Hybrid Cloud & On-Prem Integration: Seamlessly integrate Hadoop-based Big Data systems with GCP, on-premises databases, and legacy BI tools.
BI DevOps, Automation & Innovation
- BI DevOps & Continuous Delivery: Implement CI/CD pipelines to accelerate BI feature releases, ETL deployments, and dashboard updates.
- Data Observability & Quality Monitoring: Ensure end-to-end monitoring of data pipelines, anomaly detection, and real-time alerting.
- AI/ML Integration for BI: Apply predictive analytics and AI-driven insights to enhance business intelligence and reporting.
- Bottleneck Identification & Resolution: Proactively identify and eliminate performance issues in Hadoop clusters, ETL pipelines, and BI reporting layers.
|