Data Engineer (OnSite)
Detroit Labs OnSite is looking for experienced Data Engineers to join one of our existing development teams in Detroit. Successful applicants will be working at a client site within our clients’ teams and play an important role in the development of software used by the masses.
As a full time member of our OnSite team, you're part of Detroit Labs family first. You’ll have access to ongoing career development, mentorship, and be a part of a great group of people who are passionate about never settling for less than their best. We prioritize your career growth by providing consistent check-ins, learning activities, yearly retros, and support through out your time to ensure you are always growing, and working on something you are passionate about.
The Data Engineer for Big Data is responsible for the full life cycle of the back-end development of a data platform. As a Data Engineer, you will create new data pipelines, database architectures, and ETL processes, you will observe and suggest what the go-to methodology should be. You will be gathering requirements, performing vendor and product evaluations, deliver solutions, training, and documentation. You will also handle the design and development, tuning, deployment, and maintenance of information, advanced data analytics, and physical data persistence technologies.
You will establish analytic environments required for structured, semi-structured and unstructured data. You will implement the business requirements and business processes, build ETL configurations, create pipelines for the Data Lake and Data Warehouse, research on new technologies and build proofs-of-concept around them. You will carry out monitoring, tuning, and database performance analysis. You will perform the design and extension of data marts, meta data, and data models. You will also ensure all data platform architecture code is maintained in a version control system.
You will be responsible to share knowledge with fellow team members allowing the entire team to grow and become proficient to further build-out and enhance the data platform.
- Minimum two years' experience with big data tools: Hadoop, Spark, Kafka, NiFi, Hive, Sqoop
- Minimum two years' experience with AWS cloud services: EC2, S3, EMR, RDS, Redshift, Athena, Glue
- Minimum two years' experience with stream-processing systems: Spark-Streaming, Kafka Streams, Flink
- Minimum three years' experience with object-oriented/object function scripting languages: Java (preferred), Python, Scala
- Expertise in design / developing platform components like caching, messaging, event processing, automation, transformation and tooling frameworks.
- Minimum two years' experience with relational SQL and NoSQL databases like Mysql, Postgres, Cassandra and Elasticsearch.
- Minimum two years' experience working in a Linux environment
- Demonstrated ability to performance-tune MapReduce jobs
- Demonstrated ability to work independently as well as with a team
- Ability to troubleshoot problems and quickly resolve issues
- Strong communication skills
- Focus on scalability, performance, service robustness, and cost trade-offs
- Designing and implementing high-volume data ingestion and streaming pipelines using Apache Kafka and Apache Spark
- Create prototypes and proof-of-concepts for iterative development
- Develop ETL processes to populate a Data Lake with large datasets from a variety of sources
- Create MapReduce programs in Java, and leverage tools like AWS Athena, AWS Glue and Hive to transform and query large datasets
- Monitor and troubleshoot performance issues on the enterprise data pipelines and the Data Lake
- Follow the design principles and best practices defined by the team for data platform techniques and architecture
This is a Full time salaried role complete with the following benefits:
- Full medical, dental, vision
- 401k matching
- Coin + Craft yearly personal improvement budget
- Ongoing mentorship and learning opportunities
- Paid Vacation
- Monthly outings and events
- Volunteer opportunities
- Yearly Retro