Data Engineer

Kumo • Full-time • Mountain View, CA • 8m ago

Come and change the world of AI with the Kumo team!

Companies spend millions of dollars to store terabytes of data in data lakehouses, but only leverage a fraction of it for predictive tasks. This is because traditional machine learning is slow and time consuming, taking months to perform feature engineering, build training pipelines, and achieve acceptable performance.

At Kumo, we are building a machine learning platform for data lakehouses, enabling data scientists to train powerful Graph Neural Net models directly on their relational data, with only a few lines of declarative syntax known as Predictive Query Language. The Kumo platform enables users to build models a dozen times faster, and achieve better model accuracy than traditional approaches.

We're seeking intellectually curious and highly motivated Data Engineers to become foundational members of our Machine Learning and Data Platform team.

Your Foundation:

1+ years of professional experience in SaaS/Enterprise companies
Strong experience with data ingestion and connectors
Experience in building end-to-end production-grade data solutions on AWS or GCP
Experience in building scalable ETL pipelines.
Ability to plan effective data storage, security, sharing, and publishing within an organization.
Experience in developing batch ingestion and data transformation routines using ETL tools.
Familiarity with AWS services such as S3, Kinesis, EMR, Lambda, Athena, Glue, IAM, RDS.
Proficiency in several programming languages (Python, Scala, Java).
Familiarity with orchestration tools such as Temporal, Airflow, Luigi, etc.
Self-starter, motivated, with the ability to structure complex problems and develop solutions.
Excellent communication skills and ability to explain data and analytics strengths and weaknesses to both technical and senior business stakeholders.

Your Extra Special Sauce:

Deep familiarity with Spark and/or Hive
Understanding of different storage formats like Parquet, Avro, Arrow, and JSON and when to use each
Understanding of schema designs like normalization vs. denormalization.
Proficiency in Kubernetes, and Terraform.
Azure, ADF and/or Databricks skills
Experience with integrating, transforming, and consolidating data from various data systems into analytics solutions
Good understanding of databases, SQL, ETL tools/techniques, data profiling and modeling
Strong communications skills and client engagement

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.