About Our Crawler | Formulate

What We Crawl

Formulate aggregates publicly available job listings from pharmaceutical and biotechnology companies. We crawl:

Public ATS (Applicant Tracking System) APIs: Greenhouse, Lever, Ashby, Workday, SmartRecruiters, and 20+ other platforms
Company career pages that are publicly accessible

Our crawler identifies itself with the following User-Agent string:

Formulate/1.0 (+https://formulatesearch.com/about/crawler)

robots.txt compliance: We respect robots.txt directives on all websites. If your robots.txt disallows our crawler, we will not access your site.
Rate limiting: We limit requests to approximately 1-2 per second per domain to minimize impact on your servers.
Caching: We cache robots.txt responses for 24 hours to reduce repeated requests.
Crawl frequency: Most sites are crawled once daily. Lower-priority sites are crawled weekly or bi-weekly.

From job listings, we extract and store:

We do not store applicant data, resumes, or personal information from job seekers.

If you would like your company's listings removed from Formulate:

robots.txt: Add the following to your robots.txt to block our crawler:
```
User-agent: Formulate
Disallow: /
```
Email: Contact us at crawler@formulatesearch.com and we will remove your listings within 48 hours.

For questions about our crawling practices, email crawler@formulatesearch.com.