This list contains only the countries for which job offers have been published in the selected language (e.g., in the French version, only job offers written in French are displayed, and in the English version, only those in English).
Wissen Technology is hiring for Web Scraping / Data Acquisition Engineer. We are looking for a skilled Web Scraping / Data Acquisition Engineer with 3–7 years of experience to build robust data extraction pipelines for collecting legal data from public websites. The role involves designing crawlers to extract court judgments, tribunal orders, and regulatory decisions, storing structured metadata, and automating monitoring for new content. The ideal candidate has strong Python skills, hands-on web scraping experience, and the ability to handle large volumes of documents and structured data.
Job Responsibility:
Design and develop web crawlers to extract data from public websites
Crawl listing pages and extract case metadata (case title, number, court, date, etc.)
Download judgments and maintain structured PDF/document storage
Build automated pipelines to monitor websites and detect new judgments
Extract structured data from documents and HTML pages
Store data in structured formats suitable for downstream processing or search
Handle pagination, anti-bot measures, and data cleaning workflows
Maintain scrapers for reliability, accuracy, and long-term scalability
Requirements:
Strong hands-on experience with Python
Proven experience in web scraping and crawler development
Proficiency with browser automation tools: Playwright, Scrapy, or equivalent
Experience with PDF extraction tools (pdfplumber, PyMuPDF, Apache Tika, etc.)
Strong understanding of HTML parsing, pagination handling, and automated file downloads
Knowledge of anti-bot techniques (rate limiting, proxy handling, session rotation)
Experience processing structured and semi-structured documents
Nice to have:
Experience with large-scale crawlers or distributed scraping
Working experience with document datasets and text-heavy systems
Familiarity with Apache Tika / advanced PDF extraction
Experience with AWS S3 for storing large volumes of raw documents
Exposure to Elasticsearch or search indexing systems
Experience with Kafka / AWS MSK for event-driven pipelines
Background in legal, regulatory, or compliance datasets (optional)