CrawlJobs Logo

Data Engineer - Web Scraping

United States, Alpharetta 50000.00 - 75000.00 USD / Year · Job Posted May 27, 2026
Apply Position
Job Link Share

Job Responsibility

  • Creating and managing website scraping configurations on web scraping tools
  • Monitoring scraping configurations for potential errors and blockages
  • Monitoring data being scraped to identify potential issues and blockages
  • Coordinating with stakeholders to understand scraping task requirements and reporting issues
  • Preparing and sharing periodic scraping activity reports with stakeholders

Requirements

  • Minimum 3 years of experience working as a data collection and Quality engineer
  • Should have hands-on experience working with third-party web scraping tools
  • Should have hands-on experience of working with open-source web scraping libraries (e.g. Scrapy, Selenium etc.)
  • Should be familiar with programming Languages like Python and Go
  • Should be Fluent with web technology concepts like HTML, DOM, CSS, XPATH etc.
  • Should be familiar with the usage of regular expressions for data selection and cleaning purposes
  • Should be familiar with Windows and Linux Operating systems and general networking concepts
  • Should be comfortable with speaking and understanding English
  • Should have good written and verbal communication skills
  • Should be a good team player with an ownership mindset

Looking for more opportunities?

Search for other job offers that match your skills and interests.

Similar Jobs for

Data Engineer - Web Scraping

8 matching positions

Web Scraping / Data Acquisition Engineer

Wissen Technology is hiring for Web Scraping / Data Acquisition Engineer. We are...
Location
Location
India , Mumbai
Salary
Salary:
Not provided
votredircom.fr Logo
Wissen
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong hands-on experience with Python
  • Proven experience in web scraping and crawler development
  • Proficiency with browser automation tools: Playwright, Scrapy, or equivalent
  • Experience with PDF extraction tools (pdfplumber, PyMuPDF, Apache Tika, etc.)
  • Strong understanding of HTML parsing, pagination handling, and automated file downloads
  • Knowledge of anti-bot techniques (rate limiting, proxy handling, session rotation)
  • Experience processing structured and semi-structured documents
Job Responsibility
Job Responsibility
  • Design and develop web crawlers to extract data from public websites
  • Crawl listing pages and extract case metadata (case title, number, court, date, etc.)
  • Download judgments and maintain structured PDF/document storage
  • Build automated pipelines to monitor websites and detect new judgments
  • Extract structured data from documents and HTML pages
  • Store data in structured formats suitable for downstream processing or search
  • Handle pagination, anti-bot measures, and data cleaning workflows
  • Maintain scrapers for reliability, accuracy, and long-term scalability
  • Fulltime
Read More
Arrow Right

Software Engineer – Web Data Extraction & API Development

Sybrant Technologies has been in the forefront of transforming its customers int...
Location
Location
Salary
Salary:
Not provided
sybrant.com Logo
Sybrant Technologies
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong proficiency in Python
  • Hands-on experience with web scraping tools (Requests, BeautifulSoup, Selenium, Scrapy)
  • Good understanding of HTML, DOM structure, XPath, and CSS selectors
  • Experience building REST APIs using FastAPI, Flask, or Django
  • Solid knowledge of SQL and relational databases (MySQL / PostgreSQL)
  • Experience handling proxies, cookies, headers, rate limits, and sessions
  • Familiarity with Git and basic CI/CD workflows
Job Responsibility
Job Responsibility
  • Develop and maintain web scraping scripts using Python (Requests, BeautifulSoup, Selenium, Scrapy)
  • Automate extraction workflows to ensure reliable and repeatable data collection
  • Handle anti-scraping mechanisms such as CAPTCHAs, rotating proxies, headers, and session management
  • Clean, transform, and load extracted data into internal databases
  • Design and build REST APIs to expose processed data from the database
  • Optimize scraping workflows for performance, reliability, and error handling
  • Monitor scraping jobs, troubleshoot failures, and ensure data freshness
  • Maintain documentation for scraping logic, API endpoints, and workflows
  • Collaborate with product and data teams to understand evolving data requirements
Read More
Arrow Right

Web Scraping Engineer II

We are seeking a Web Scraping Engineer to join our growing engineering team. In ...
Location
Location
India
Salary
Salary:
Not provided
yipitdata.com Logo
YipitData
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Effective communication in English with both technical and non-technical stakeholders
  • 3+ years of experience with web scraping frameworks (e.g., Selenium, Playwright, or Puppeteer)
  • Strong understanding of HTTP, RESTful APIs, HTML parsing, browser rendering, and TLS/SSL mechanics
  • Expertise in advanced fingerprinting and evasion strategies (e.g., browser fingerprint spoofing, request signature manipulation)
  • Deep experience managing cookies, headers, session states, and proxy rotations, including the deployment of both residential and data center proxies
  • Experience with logging, metrics, and alerting to ensure high availability
  • Troubleshooting skills to optimize scraper performance for efficiency, reliability, and scalability
Job Responsibility
Job Responsibility
  • Refactor and Maintain Web Scrapers: Overhaul existing scraping scripts to improve reliability, maintainability, and efficiency
  • Implement best coding practices (clean code, modular architecture, code reviews, etc.) to ensure quality and sustainability
  • Implement Advanced Scraping Techniques: Utilize sophisticated fingerprinting methods (cookies, headers, user-agent rotation, proxies) to avoid detection and blocking
  • Handle dynamic content, navigate complex DOM structures, and manage session/cookie lifecycles effectively
  • Collaborate with Cross-Functional Teams: Work closely with analysts and other stakeholders to gather requirements, align on targets, and ensure data quality
  • Provide support, documentation, and best practices to internal stakeholders to ensure effective use of our web scraped data in critical reporting workflows
  • Monitor and Troubleshoot: Develop robust monitoring solutions, alerting frameworks to quickly identify and address failures
  • Continuously evaluate scraper performance, proactively diagnosing bottlenecks and scaling issues
  • Drive Continuous Improvement: Propose new tooling, methodologies, and technologies to enhance our scraping capabilities and processes
  • Stay up to date with industry trends, evolving bot-detection tactics, and novel approaches to web data extraction
What we offer
What we offer
  • Our compensation package includes comprehensive benefits, perks, and a competitive salary
  • We care about your personal life and we mean it. We offer vacation time, parental leave, team events, learning reimbursement, and more!
  • Your growth at YipitData is determined by the impact that you are making, not by tenure, unnecessary facetime, or office politics. Everyone at YipitData is empowered to learn, self-improve, and master their skills in an environment focused on ownership, respect, and trust
  • Fulltime
Read More
Arrow Right

Data Engineer

Our client is seeking a skilled Data Engineer to support the design, development...
Location
Location
United States , Miami
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Bachelor’s degree in Computer Science, Information Systems, Engineering or related field
  • 3+ years of experience in a Data Engineer or similar role
  • Strong hands-on experience with SQL, Python and Snowflake
  • Experience developing and maintaining ETL workflows
  • Knowledge of data modeling concepts and best practices
  • Experience with Selenium for web automation or web scraping support
  • Strong analytical, problem-solving and troubleshooting skills
  • Ability to work independently and collaboratively in a team environment
Job Responsibility
Job Responsibility
  • Design, build and maintain scalable ETL pipelines to support data integration and reporting needs
  • Develop and optimize complex queries using SQL
  • Use Python to support data processing, transformation and automation tasks
  • Work within Snowflake to manage, transform and optimize cloud-based data solutions
  • Assist with automation efforts, including Selenium web scripting for web-based data extraction and process automation
  • Create and maintain logical and physical data models
  • Ensure data quality, integrity and consistency across multiple data sources
  • Collaborate with business stakeholders, analysts and technical teams to gather requirements and deliver data solutions
  • Monitor and troubleshoot data workflows and automation scripts
  • Document processes, workflows and technical specifications
What we offer
What we offer
  • medical, vision, dental, and life and disability insurance
  • 401(k) plan
  • Fulltime
Read More
Arrow Right

Senior Software Engineer - Data Acquisition

Join TxODDS as a Senior Software Engineer and help build scalable, high-performa...
Location
Location
United Kingdom , London
Salary
Salary:
Not provided
txodds.net Logo
TXODDS
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • Strong experience with at least one core programming language (e.g. Python, Java, Scala)
  • Hands-on experience with Kubernetes, container orchestration, and Docker
  • Experience working with distributed systems and event‑driven technologies (e.g. Kafka)
  • Solid understanding of networking fundamentals (HTTP, APIs)
  • Experience with relational and NoSQL databases
  • Strong Git skills and familiarity with modern development practices (code reviews, testing, CI/CD)
  • Comfort working in a Linux/Unix command-line environment
  • Experience designing and debugging software from inception to deployment
  • Excellent problem‑solving skills and a proactive approach to improving systems and processes
  • Strong communication and collaboration skills, and the ability to work effectively across teams
Job Responsibility
Job Responsibility
  • Developing, testing, and deploying high‑quality software that processes data from diverse sources
  • Building, improving, and maintaining distributed systems and data pipelines (including Kafka-based services)
  • Deploying and supporting containerised workloads running in Kubernetes environments
  • Creating and maintaining clear, accurate documentation for the systems you build
  • Validating and monitoring data quality using internal tools and processes
  • Supporting data‑gathering workflows, including those involving web‑scraping or automated data acquisition
  • Investigating and resolving data‑related issues escalated from the Client Services team
  • Participating in an out‑of‑hours on‑call rotation to support critical data acquisition systems
  • Sharing knowledge widely and contributing to a positive, collaborative team culture
  • Mentoring junior engineers and helping raise the overall technical bar
What we offer
What we offer
  • Competitive benefits package tailored to your location
  • Fulltime
Read More
Arrow Right

Software Engineer – Web Crawling

Woflow is a technology startup creating products and solutions to support a high...
Location
Location
Salary
Salary:
Not provided
woflow.com Logo
Woflow
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 3+ years of experience in software engineering with a focus on web crawling and data extraction
  • Strong expertise in Node.js (preferred) for web crawling applications
  • Deep understanding of HTML, JavaScript, and reverse engineering techniques
  • Hands-on experience with Playwright, Puppeteer, and Cheerio for automation and scraping
  • Knowledge of security and performance best practices related to web crawling
Job Responsibility
Job Responsibility
  • Develop, enhance, and maintain web crawlers and scraping infrastructure
  • Optimize scraping techniques to handle anti-bot mechanisms, performance, and security challenges
  • Collaborate with a geographically distributed team to identify and resolve issues
  • Ensure high availability, efficiency, and reliability of crawling operations
  • Integrate AI solutions to enhance automation and data extraction accuracy
What we offer
What we offer
  • Unlimited PTO
  • Comprehensive medical, dental, and vision insurance plans
  • STD, LTD, AD&D, and life insurance coverage
  • Free membership to TalkSpace, Teladoc and Health Advocate
  • Free annual membership to One Medical in participating regions
  • 401(k) retirement plan with company matching
  • Pre-tax commuter benefits
  • Free equipment: laptop and home office stipend
  • Fulltime
Read More
Arrow Right

Python Data Engineer

Arthur Lawrence is looking for a Python Data Engineer one of our clients in Hous...
Location
Location
United States , Houston
Salary
Salary:
Not provided
arthurlawrence.net Logo
Arthur Lawrence
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 7+ years of professional Python development
  • Strong knowledge of OOP, design patterns, and SOA
  • Hands-on experience in data engineering, data pipeline development, and web scraping (Requests, BeautifulSoup, Selenium)
  • Oracle/PL SQL expertise, stored procedures
  • Bachelor’s degree in Computer Science, MIS, or related field
  • Agile/Scrum experience
  • Fulltime
Read More
Arrow Right

Data Engineer - Python

We are seeking a talented and motivated Python Data Engineer to join our global ...
Location
Location
United States , Houston
Salary
Salary:
Not provided
https://www.roberthalf.com Logo
Robert Half
Expiration Date
Until further notice
Flip Icon
Requirements
Requirements
  • 6+ years of professional Python development experience at an enterprise level
  • Bachelor's degree in Computer Science, MIS, or a related technical field
  • Proven experience building and maintaining data pipelines and ETL processes
  • Proficiency with web scraping tools and techniques (e.g., Requests, BeautifulSoup, Selenium)
  • Hands-on experience with Oracle / PL SQL, including stored procedures
  • Strong knowledge of object-oriented programming, design patterns, and SOA architectures
  • Familiarity with Agile/Scrum methodologies and modern version control and issue tracking tools
  • Experience with Python libraries such as Pandas and NumPy
  • Excellent written and verbal communication skills
Job Responsibility
Job Responsibility
  • Develop modular and reusable Python components to connect external data sources with internal systems and databases
  • Work directly with business stakeholders to translate analytical requirements into technical implementations
  • Ensure the integrity and maintainability of the central Python codebase by adhering to existing design standards and best practices
  • Maintain and improve the in-house Python ETL toolkit, contributing to the standardization and consolidation of data engineering workflows
  • Partner with global team members to ensure efficient coordination and delivery
  • Actively participate in internal Python development community and support ongoing business development initiatives with technical expertise
What we offer
What we offer
  • medical
  • vision
  • dental
  • life and disability insurance
  • company 401(k) plan
Read More
Arrow Right