About Our Crawler

Transparency in how we collect job listing data

What We Crawl

Formulate aggregates publicly available job listings from pharmaceutical and biotechnology companies. We crawl:

  • Public ATS (Applicant Tracking System) APIs — Greenhouse, Lever, Ashby, Workday, SmartRecruiters, and 20+ other platforms
  • Company career pages that are publicly accessible

Our Crawler Identity

Our crawler identifies itself with the following User-Agent string:

Formulate/1.0 (+https://formulate.io/about/crawler)

How We Crawl

  • robots.txt compliance: We respect robots.txt directives on all websites. If your robots.txt disallows our crawler, we will not access your site.
  • Rate limiting: We limit requests to approximately 1-2 per second per domain to minimize impact on your servers.
  • Caching: We cache robots.txt responses for 24 hours to reduce repeated requests.
  • Crawl frequency: Most sites are crawled once daily. Lower-priority sites are crawled weekly or bi-weekly.

What We Store

From job listings, we extract and store:

  • Job title, department, and location
  • Job description and requirements
  • Salary information (when publicly listed)
  • Application URL (we link directly to your ATS)

We do not store applicant data, resumes, or personal information from job seekers.

Opt Out

If you would like your company's listings removed from Formulate:

  1. robots.txt: Add the following to your robots.txt to block our crawler:
    User-agent: Formulate
    Disallow: /
  2. Email: Contact us at crawler@formulate.io and we will remove your listings within 48 hours.

Contact

For questions about our crawling practices, email crawler@formulate.io.