We are expanding our engineering team. We're looking for someone who is excited about our non-profit, open data mission. Proficient with Python, and hopefully also some Java. Proficient at cloud systems such as Spark/PySpark. Willing to learn the rest: crawling parsing indexing etc.
I'm the CTO at the Common Crawl Foundation, which has a 17 year old, 8 petabyte crawl & archive of the web. Our open dataset has been cited in nearly 10,000 research papers, and is the most-used dataset in the AWS Open Data program. Our organization is also very active in the open source community.