Skip to main content

How to use Crawler and API keys?

A POST request is sent to the following address: http://s.infatica.io:5063;

JSON request format
{
"user_key":"atFiN7r7ZspBalGNvjJV",
"URLS":
[
{
"URL":"https://www.google.com/search?q=download+youtube+videos&ie=utf-8&num=20&oe=utf-8&hl=us&gl=us",
"Headers":{
"Connection":"keep-alive",
"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0",
"Upgrade-Insecure-Requests":"1"
},
"userId":"ID-0"
}
]
}

With the following attributes:

  • user_key โ€“ Hash key for API interactions; available in the personal accountโ€™s billing area;
  • URLS โ€“ Array containing all planned downloads;
  • URL โ€“ Download link;
  • Headers โ€“ List of headers that are sent within the request; additional headers (e.g. cookie, accept, and more) are also accepted;
  • Required headers: Connection, User-Agent, Upgrade-Insecure-Requests;
  • userId โ€“ Unique identifier within a single request; returning responses contain the userId attribute;
Sample request for 4 URLs
{
"user_key":"atFiN7r7ZspBalGNvjJV",
"URLS":[
{
"URL":"https://www.google.com/search?q=download+youtube+videos&ie=utf-8&num=20&oe=utf-8&hl=us&gl=us",
"Headers":{
"Connection":"keep-alive",
"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0",
"Upgrade-Insecure-Requests":"1"
},
"userId":"ID-0"
},
{
"URL":"https://www.google.com/search?q=download+scratch&ie=utf-8&num=20&oe=utf-8&hl=us&gl=us",
"Headers":{
"Connection":"keep-alive",
"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0",
"Upgrade-Insecure-Requests":"1"
},
"userId":"ID-1"
},
{
"URL":"https://www.amazon.de/dp/B07F66M9RB",
"Headers":{
"Connection":"keep-alive",
"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0",
"Upgrade-Insecure-Requests":"1"
},
"userId":"ID-2"
},
{
"URL":"https://www.amazon.de/dp/B07VNFXXPQ/ref=sbl_dpx_B07ZYDGFSV_0?th=1",
"Headers":{
"Connection":"keep-alive",
"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0",
"Upgrade-Insecure-Requests":"1"
},
"userId":"ID-3"
}
]
}
How to use Parser and Crawler
{
"ID-0":{ "status":996,"link":"" },
"ID-1":{ "status":200,"link":"http://s.infatica.io:5063/162012.txt" },
"ID-2":{ "status":200,"link":"http://s.infatica.io:5063/162013.txt" },
"ID-3":{ "status":null,"link":"" }
}
Recommended settings for optimal scraper performance
  • Use multiple concurrent threads, but no more than 10;
  • Send one request per each thread;
  • Send several URLs per each request, but no more than 1000.

How to perform tests: We've attached a sample Postman collection (scraper.example) for 1,000 URLs per single request. You can open 5 Postman windows (which gives you 5 threads) and import this collection in each window.

Performance test results when sending queries to google.com:

  • 1 request with 100 URLs (100 URLs in total): 36 seconds;
  • 5 requests with 100 URLs (500 URLs in total): 48 seconds;
  • 1 request with 1,000 URLs (1,000 URLs in total): 50 seconds;
  • 5 requests with 1,000 URLs (5,000 URLs in total): 90 seconds.

This shows us that 5 concurrent threads can reach a throughput of 50 URLs per second. With 10 concurrent threads, the throughput can reach 100-120 URLs per second. scraper