The history of Zyte
Joanne O’Flynn meets with Pablo Hoffman and Shane Evans to find out what inspired them to set up web crawling company Zyte....
Skinfer: A tool for inferring JSON schemas
Imagine that you have a lot of samples for a certain kind of data in JSON format. Maybe you want to have a better feel of it, know which fields appear in all records, which appear only in some and wha...
Handling JavaScript in Scrapy with Splash
A common roadblock when developing spiders is dealing with sites that use a heavy amount of JavaScript. Many modern websites run entirely on JavaScript and require scripts to be run in order for the p...
Zyte crawls the deep web
Zyte is participating in Memex, an ambitious DARPA project that tackles the huge challenge of crawling, indexing, and making sense of areas of the Deep Web, that is, web content not being indexed by t...
New changes to our Scrapy cloud platform
We are proud to announce some exciting changes we've introduced this week. These changes bring a much more pleasant user experience, and several new features including the addition of Portia to our pl...
Introducing ScrapyRT: An API for Scrapy spiders
We’re proud to announce our new open source project, ScrapyRT! ScrapyRT, short for Scrapy Real Time, allows you to extract data from a single web page via an API using your existing Scrapy spiders....
Looking back at 2014
One year ago we were looking back at the great 2013 we had and realized we would have quite a big challenge in front of us in order to have as much growth as we had during last year....
XPath tips from the web scraping trenches
In the context of web scraping, XPath is a nice tool to have in your belt, as it allows you to write specifications of document locations more flexibly than CSS selectors....
Introducing data reviews
One of the things that takes more time when building a spider is reviewing the scraped data and making sure it conforms to the requirements and expectations of your client or team....