Data Engineer
On-site
Job Description
Support a program which leverages integrated discrete technologies to support massive data processing, storage, modeling, and analytics over several thousand unique data sources, to perform threat identification and analysis, as well as support efforts to meet tactical and strategic goals. The components that comprise the data platform capability - the technologies and systems, data, data processing and modeling, and use of the data via data science and querying of the data corpus and model(s) to derive insights. The data platform capability serves as the backbone for other capabilities (e.g., web applications), to accelerate its operations.
Responsibilities
Develop and use data processing technologies (e.g., Python, SPARK, Java, SQL, Jenkins, PyPi, Terraform, Cloudera, ElasticSearch, Pentaho, Apache NiFi, Apache Hop) to perform data processing, and to develop, validate, and use methodologies to support analytic requirements in Clustered Computing environments.
a. Support downstream systems and capabilities of external Customer organizations dependent on the data platform via various approaches, to include application programming interfaces (APIs).
b. On an ongoing basis, develop integration plans that capitalize on new data processing, modeling, and storage technologies including the cloud environments.
c. Evaluate data collections to assess the potential value-add to the Customers data platform to recommend to the Customer.
d. Generate assessments about data, support activities to perform data acquisition and engineering, and enable the processing of data so it is integrated into data platform systems for maximum value.
e. Perform and support data modeling and engineering activities for integration of new data into the data platforms data corpus, refining existing models and intermediate models to address deficiencies and defects, and with Customer oversight, creating new models and data feeds to support existing and new analytic methodologies.
Job Requirements
Required Skills
- Python
- SPARK
- Java
- SQL
- Jenkins
- PyPi
- Terraform and Cloudera
- ElasticSearch
- Pentaho
- Apache NiFi
- Apache Hop
- Perform data processing, and to develop, validate, and use methodologies to support analytic requirements in Clustered Computing environments.
- Perform and support data modeling and engineering activities for integration of new data into the data platforms data corpus, refining existing models and intermediate models to address deficiencies and defects, and with Customer oversight, creating new models and data feeds to support existing and new analytic methodologies.
Desired Skills
- Demonstrated experience using Enterprise Control Language (ECL) and the Lexis-Nexis High Performance Cluster Computing (HPCC) platform.
- Demonstrated experience performing All-Source data analysis to perform analytic support to the Sponsor.
- Demonstrated experience developing custom algorithms to support analytic requirements against massive data stores supporting the Sponsor.
- Demonstrated experience directly supporting the Sponsor performing technical analysis support using massive data processing systems.
- Demonstrated experience writing cables.
- Demonstrated experience planning and coordinating program activities such as installation and upgrading of hardware and software, utilization of cloud services, programming, or systems design development, modification of IT networks, or implementation of Internet and intranet sites.
- Demonstrated experience deploying web applications to a cloud managed environment to include DevOps and security configuration management.
- Demonstrated experience developing, implementing, and maintaining cloud infrastructure services such as EC2, ELB, RDS, S3, and VPC.
- Demonstrated experience planning, coordinating, and executing the required activities to support documentation to meet the Sponsor’s data compliance requirements (e.g., legal, data policy)
- Degree(s)
- Undergraduate degree in mathematics, computer science, engineering, or similar scientific or technical discipline.
- Graduate degree in computer science, information systems, engineering, or another scientific or technical discipline.
- Degree or equivalent in CS, MIS, Economics, Physics, Genetics, or Engineering related field, especially Supercomputing-related.