Our Major Public sector client is currently recruiting for Data Wrangler on a 6 months contract basis paying £550 P/D inside IR35Role: Data Wrangler
The Role: Data Wrangler
Contract Length: 6 months
Location: WFH/remote is fine
Pay Rate to Candidate: up to 550pd
Minimum Requirement: we will not consider anyone who does not have the essential experience in Databricks, Python and R or R Studio
It is expected that the following general outputs/ outcomes will be delivered:
1. Continuing support for the TRE customers including ongoing support for:
• A safe setting that holds data securely such that individual level data cannot be exported without explicit permission.
• A security design with the facility for independent oversight with such that audit reports can be reviewed by a patient/ public oversight group and made public
• Systems to allow secure remote access by researchers to carry out analysis with the ability to keep track of researcher activity.
• A transparent and controlled process for adding users to a shared analytical environment/ collaboration space with levels of access control fully in place.
• A collaborative environment with the ability to share queries, code and result sets within coloration projects, regardless of organisational boundaries.
• The ability to bring own/ local data sets into the research environment and to use alongside national datasets.
• The availability of a standard set of analytical tools
• The ability to support the export of data subject to permissions
2. Delivery of additional datasets as defined by the Infection and AMR TRE roadmap
3. Collaboration and cross-TRE support for other customers as required, working alongside other TRE colleagues to prioritise and deliver new customer requirements.
The following activities are required from a Data Wrangler:
1. Curate data from multiple datasets and prepare for analysis by others via a dashboard presentation
2. Organise a working research structure within the TRE service environment for practical and easy use, supporting users undertaking research
3. Carry out technical validation checks on the linked data sources (e.g. duplicates, linkage errors)
4. Identify appropriate existing code lists and algorithms and apply to derive a set of priority variables from the linked datasets
5. Write, organise and curate support documentation for the linked data resources (e.g. Data dictionaries, variable mapping tables, data access process documentation, Git repositories)
6. Anticipate, communicate and solve any potential problems that may arise with data curation for various research projects and use cases
7. Be the point of contact for researchers and clinicians to address queries about how to work with the linked data resources
The main tool used by researchers within the TRE is Apache Databricks (AWS), an analysis platform which uses Python and a bespoke version of SQL in a notebook style coding interface. Experience in Python (especially Jupyter Notebook) and SQL are preferred, but also experience with other Apache tools and software, such as Apache Spark and Pyspark, would also be of interest as is familiarity with R, tidyverse and odbc database connections. We would not consider anyone who doesn’t have the requisite experience in Databricks and R / R Studio.
The Data Wrangler roles require significant data management and manipulation expertise with a background in one of bioinformatics, biostatistics, computer science, mathematics or statistics along with knowledge of commonly used terminologies in health data, such as ICD10 and SNOMED. The successful candidate will be experienced in preparing data extracts for analysis by others, working closely with end users. (S)he will have strong organisational skills, an ability to work accurately with attention to detail, work independently and organise own workload.
The main tool used by researchers within the TRE is Apache Databricks (AWS), an analysis platform which uses Python and a bespoke version of SQL in a notebook style coding interface. Experience in Python (especially Jupyter Notebook) and SQL are preferred, but also experience with other Apache tools and software, such as Apache Spark and Pyspark, would also be of interest as is familiarity with R, tidyverse and odbc database connections.