WHAT IS A WEB SCRAPER?
A web scraper is a program that extracts data from a website and manipulates it into a “more usable” format. Scrapers are built for a specific web-site and purpose. Properly done, a scraper will emulate a user’s on-line behaviour whilst disguising its origin and true identity.
They are also known as crawling bots, data collectors, data extractors, spiders, web crawlers, web scrapers and web-site rippers.
HOW MUCH DOES IT COST TO WRITE AND RUN A WEB SCRAPER?
The cost to scrape a site depends on numerous factors, including:
- Number of fields to be scraped
- Number of pages, and different page formats
- Navigation structure
- Complexity of the data clean-up
- Existence of anti-spider protection mechanisms, or limitations of the rate at which the site can be scraped
- Quality of the target site’s code
- Once off or ongoing project
As a ballpark indication, a very simple scraper costs around AUD $450 whilst a complicated, protected site with millions of pages can cost around AUD $5,000. Running costs are highly dependent on the volume of data downloaded and the degree of manual intervention required. Scraper maintenance is required when the target site changes its page structure or introduces new technologies. Typically billed on a time-to-fix basis. Prices are discounted for multi-site or long-term ongoing projects.
FIXED PRICE OR PER-HOUR?
We offer both types of billing structures:
- Fixed price is typically used for simple, accurately specified projects, where job requirements are not expected to change over the life of the project.
- Per-Hour billing is preferred for complex projects, or those on an agile development path where requirements adapt in response to changed circumstances.
DO YOU OFFER DISCOUNTS?
Prices are discounted for multi-site or long-term ongoing projects.
CAN WE OBTAIN THE WEB SCRAPER’S SOURCE CODE?
We can provide you the spider source code at no extra charge, however you may need to install additional software, and/or modify the scraper’s code to suit your system.
CAN I RUN THE WEB SCRAPERS FROM MY OWN SERVER?
Not recommended, but yes you can. You may need to install additional software, and/or modify the scraper’s code to suit your system.
TYPICAL WORKFLOW FOR A WEB SCRAPING PROJECT?
- Contact us with an overview of your project requirements.
- We call (or skype) you back to discuss your project and better understand your requirements.
- We will review the sites with your intentions in mind and then provide a written quote. This takes approx. 1 day.
- Sign a mutual non-disclosure agreement, if required.
- Once we receive your signed Letter of Engagement, we commence the development of the web scraper(s).
- Once completed, scrapers are tested and reviewed. We also provide you with a sample data extract for your review and approval.
- Once approved, we run the scrapers, providing feedback at pre-agreed intervals.
- Complete any post-production data processing including parsing, standardisation, normalisation and de-duplication.
- Deliver data to you in agreed format.
- Once deliverables approved, we submit our invoice, which is due for payment in 14 days
PAYMENT MECHANISMS?
We prefer payment via direct deposit into our bank account. PayPal is acceptable, however we include the (somewhat expensive) PayPal fees to your invoice. Please note that we do not have credit card facilities.
WHAT ARE YOUR PAYMENT TERMS?
Payment is expected within 14 days from the date of invoice.
HOW DO WE ACCESS THE SCRAPED DATA?
We can provide you with all the data, or just the changes since the last scrape, in your required format via email, a shared DropBox folder or by uploading directly to your Amazon S3 account.
DO YOU PROVIDE A FULLY MANAGED WEB SCRAPING SOLUTION?
All of our projects are fully managed. We handle the entire web scraping process so that you receive your freshly-harvested data, in the right format, without the fuss and hassle.
WHAT OTHER SERVICES DO YOU PROVIDE?
We are a division of The Data Group and can provide you with a full suite of data analysis, data-cleansing, data mining and data warehousing services.
ARE YOU RELATED TO THE DATA-SCRAPING GROUP?
Yes. The Web Scraping Group and the Data Scraping Group are divisions of the same Australian company, Net Assets (Australia) Pty Ltd t/a The Data Group. We have a couple of websites on the internet in order to optimise our traffic from Google. But behind the scenes, it’s the same team.
IS WEB SCRAPING LEGAL?
Surprisingly, the largest web scraping company in the world is … Google! They have built an incredibly profitable business from the analysis and aggregation of the internet’s data. Millions of organisations actually encourage Google to scrape their websites on a daily basis so that they can get ranked in Google’s search engine. So the first conclusion is that web scraping can be a good and even, desirable process.
Contrastingly, some people have scraped a web site and set up an identical competing business, based on the unmodified harvested data. These copy-cats sometimes get sued. So the second conclusion is that web scraping can be unethical and in some cases, illegal.
There is also an argument that if data is on the web, it is in the public domain, and can be used by anyone. However copyright also needs to be considered.
Considering the complexity of these issues, some of our clients have sought specialist advice on their proposed projects. From the feedback we have received, the majority of these legal opinions have been inconclusive.
We value client confidentiality and discretion. We also scrape web sites in a highly anonymous manner that is impossible to trace back to us (and you). We can covertly get the data for you, but what happens thereafter, depends on what you do with it.
If in doubt, please obtain your own legal advice.