Find Jobs
Data Engineer
Austin, Texas
Contract to Hire Position
It appears that you have already applied to this job.
Applied on October 1, 2020
Job Id #43083 Posted August 27, 2020


  • We are seeking a Data Engineer who is eager to tackle the challenges of processing vast amounts of EHR data originating from multiple sources.
  • You will need to develop a deep understanding of the data and drive efforts to maintain and improve data quality and usability.
  • You should understand the importance and value of writing maintainable, documented, and well-tested code throughout the entire product lifecycle.
  • Above all, you should be curious about what is possible in healthcare with the right tools and infrastructure.

Projects the candidate will be working on:

  • Combine two of the fastest-growing fields on the planet with a culture of performance, collaboration and opportunity and this is what you get.
  • Leading edge technology in an industry that’s improving the lives of millions.
  • Here, innovation isn’t about another gadget, it’s about making health care data available wherever and whenever people need it, safely and reliably.
  • There’s no room for error. Join us and start doing your life’s best work.(sm)

Primary Responsibilities:

  • Design and maintain data pipelines and services using best practices for data management and governance
  • Deploy machine learning and NLP applications in production
  • Work with EHR data across teams with ETL, NLP engineers and data scientists, researchers and clinicians to provide data services with high data quality control standard
  • You’ll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in

Team and Team size:

  • Small core NLP Team
  • 13 core team members (data scientists, project manager, medical informaticists) with support from
  • 12 clinical annotators integrated into the team via vendor contractor

Top Responsibilities:

  • Programming experience, including solid Python experience, following software engineering best practices
  • Experience building and maintaining data pipelines and data assets
  • Experience with distributed data processing frameworks such as Spark or MapReduce
  • Experience as an individual contributor, hands-on developer, non-manager role executing on engineering projects as a primary job responsibility
  • Demonstrated knowledge of data management best practices
  • Prioritization skills; ability to manage ad-hoc requests in parallel with ongoing projects

Software tools/skills:

  • This data engineer will be maintaining, and if necessary re-architecting our data pipelines that ingest notes from a bunch a text files delivered to us on a share drive, move then over to HDFS, do some normalization (convert HTML to plain text, etc) and load them into Hive tables
  • Sqoop some CDR tables (like MDM) from Oracle
  • Schedule and run various NLP “apps” developed by data scientists
  • This is someone that will interface with the ProdOps team (for example, they are the ones delivering to us the notes as text files), with the CDR BE team (NLP2Panther) and others such as dCDR and Life Sciences engineering.
  • Someone that will also be responsible for good data management practices (for example to make sure we can efficiently retire data from H-groups that need to be retired).
  • Currently the main technologies we are using are Spark, Hadoop, Hive, Luigi, Python (and a little bit of Scala) and the platform we use is the on-prem Hadoop cluster. We need to make sure the candidate is solid with at least some of these technologies, and follows good engineering practices (such as testing, code reviews and putting in place monitoring systems like dashboards or alerts).
  • Additional skills that would be good to have: cloud (since there is push for OA to move to AWS, nothing says we will stay forever on the on-prem cluster), and Elasticsearch (we need to build and keep up-to-date Elastic indices to allow users external to our group to search the notes).
  • Familiarity with containers might also be good to have.


  • Python
  • Spark
  • Data pipeline experience
Preferred Qualifications:
  • Experience running machine learning or NLP applications at scale
  • Experience with data pipeline frameworks such as Airflow, Luigi or Oozie
  • Experience with search engines (Elasticsearch or Solr)
  • Experience with cloud-based computing (AWS or Azure)
  • Experience with Scala, in particular with Spark Scala API
  • Familiarity with EHR data and standards (HL7 or FHIR)
  • Experience with HBase or other non-relational data bases
  • Experience with code and process documentation
  • Experience with explaining, educating, presenting and/or training non-engineers on engineering concepts and processes
  • Experience with continuous integration and delivery
  • Experience with ETL

Horizontal is proud to be an Equal Opportunity and Affirmative Action Employer. We seek to provide employment opportunities to talented, qualified candidates regardless of race, color, sex/gender including gender identity and/or expression, national origin, religion, sexual orientation, disability, marital status, citizen status, veteran status, or any other protected classification under federal, state or local law.

In addition, Horizontal will provide reasonable accommodations for qualified individuals with disabilities. If you need to request a reasonable accommodation in order to complete the application or interview process, please contact

All applicants applying must be legally authorized to work in the country of employment.

Upload Your Resume

We accept .DOC, .DOCX, .PDF up to 10 MB.

We do not accept scanned documents, images, or resumes containing images and/or icons.

Password must contain:
  • minimum 10 characters

  • 1 uppercase letter

  • 1 lowercase letter

  • 1 numeric character

  • 1 special character (such as !, %, @, #)

  • Passwords match


What is your gender?

What is your ethnicity? Select all that apply

What is your Veteran / U.S. Military Status?

Do you have a disability?

You are considered to have a disability if you have a physical or mental impairment or medical condition that substantially limits a major life activity, or if you have a history or record of such am impairment or medical condition.

Horizontal is proud to be an Equal Employment Opportunity/Affirmative Action Employer providing a drug-free workplace.


You have saved your first job! To see all your Saved Jobs, click here. Or continue scrolling through jobs and bookmark openings that catch your eye and apply for those jobs later.

Return to Job Search
Cookies help us improve your website experience.
By using our website, you agree to our use of cookies.