Company Description
Inductiv is a private, applied AI lab and product studio that develops AI-first solutions for enterprises. Our work combines a deep understanding of AI algorithms (including GenAI) with expertise in production systems engineering.
Role Description
We are looking for a skilled Data Engineer to design, build, and maintain the data pipelines and infrastructure that power our AI products. As one of our first data engineering hires, you’ll work closely with AI researchers, ML engineers, and backend developers to ensure that our models have access to clean, timely, and reliable data—ranging from structured enterprise records to unstructured text, documents, and logs.
You’ll own the end-to-end data lifecycle: from ingestion and transformation to storage, monitoring, and serving. Your work will directly enable training of LLMs, evaluation of AI agents, and real-time inference systems. If you’re passionate about building data systems that are both robust and intelligent, this role is for you.
Key Responsibilities:
- Design, implement, and maintain scalable, fault-tolerant data pipelines for batch and streaming workloads.
- Build and manage data ingestion systems for diverse sources (APIs, databases, file systems, SaaS platforms, and internal logs).
- Develop transformation logic using modern data processing frameworks (e.g., Spark, dbt, Pandas, Polars, or Ray).
- Architect and optimize data storage solutions—including data lakes (S3, GCS), warehouses (Snowflake, BigQuery, Redshift), and vector databases for AI use cases.
- Ensure data quality, lineage, and observability through monitoring, validation, and alerting systems.
- Collaborate with AI/ML teams to curate, version, and serve datasets for model training, fine-tuning, and evaluation.
- Implement metadata management, cataloging, and governance practices aligned with enterprise security and compliance needs.
- Optimize pipeline performance, cost, and latency—especially for large-scale GenAI data processing (e.g., document parsing, chunking, embedding generation).
- Contribute to CI/CD, infrastructure-as-code (Terraform, CloudFormation), and data testing frameworks.
Qualifications
Required Skills & Experience:
- 3+ years of experience in data engineering or a related role.
- Strong proficiency in Python and SQL; experience with data processing libraries (e.g., Pandas, PySpark, Dask).
- Hands-on experience building ETL/ELT pipelines using tools like Airflow, Prefect, Dagster, or Luigi.
- Deep knowledge of cloud data platforms (AWS, GCP, or Azure)—including object storage, compute, and managed services (e.g., Glue, Dataflow, EMR).
- Experience working with data warehouses (Snowflake, BigQuery, Redshift) and/or data lakes (Delta Lake, Iceberg, Hudi).
- Familiarity with infrastructure-as-code (Terraform, Pulumi) and containerization (Docker, Kubernetes).
- Solid understanding of data modeling, schema design, and performance tuning.
- Experience implementing data quality and monitoring solutions (e.g., Great Expectations, Soda, Monte Carlo).
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
Nice-to-Have (Bonus Skills):
- Experience supporting ML/AI workflows—e.g., feature stores, dataset versioning (DVC, LakeFS), or vector data pipelines.
- Familiarity with unstructured data processing (PDFs, HTML, OCR, embeddings) and text preprocessing at scale.
- Knowledge of real-time streaming systems (Kafka, Kinesis, Pub/Sub) and stream processing (Flink, Spark Streaming).
- Exposure to privacy-aware data handling (PII detection, anonymization, GDPR/CCPA compliance).
- Contributions to open-source data tools or a strong public portfolio (GitHub, blog, talks).
- Understanding of MLOps concepts and how data pipelines integrate with model training and serving.
Why Join Inductiv?
- Build the data foundation for cutting-edge AI products that solve real enterprise problems.
- Work alongside world-class AI researchers and engineers in a collaborative, mission-driven environment.
- Own critical infrastructure from day one—with high impact and visibility.
- Tackle complex challenges at the intersection of data engineering, AI scalability, and system reliability.
- Competitive salary, growth opportunities, and a dynamic work environment.
Ready to engineer the data backbone of the next generation of enterprise AI?
At Inductiv, data isn’t just fuel—it’s the architecture of intelligence. If you’re excited to build systems that make AI trustworthy, scalable, and actionable, we’d love to hear from you.