Python Parser API documentation
Get to know how Python Parser API works and integrate it into your app. Examples are provided in Curl, Javascript and Python.
Getting started
Data scraping and parsing endpoint.
Sync API
Sync API allows to receive immediate result using our proxy
If you intend to keep the same IP address across multiple requests, you can use the session_number
(integer) parameter to proceed. Your session has a duration of 60 seconds.
If you want to define the geolocation of your session, you may set the country_code (string) parameter with one country code at the creation of the session. Allowed country codes are 'us'
, 'uk'
, 'fr'
, 'de'
, 'jp'
, 'cn'
, 'ru'
You can usemobile flag (boolean) to switch the user-agent to the mobile mode
You may also turn on following redirect links (301 code) with follow_redirect
parameter and retrying URL not found (404) results using retry_404
parameter.
You're also free to pass your own headers set to the request
Query Parameters
Name | Description | Example | Options |
---|---|---|---|
url (string, required) | Destination url to retrieve (url-encoded) | {"url":"google.com"} | default=null |
api_key (string, required) | Python Parser API key | {"api_key":"0de32912321"} | default=null |
mobile (bool, optional) | User-Agent type (true for mobile, false for desktop) | {"mobile":"true"} | default=false |
follow_redirect (bool, optional) | Allow request to follow redirects (301 code) | {"follow_redirect":"true"} | default=false |
retry_404 (bool, optional) | Retry with another proxy if 404 message returned | {"retry_404":"true"} | default=false |
country_code (str, optional) | Proxy country code (geolocation) | {"country_code":"fr"} | default=null; options - us, uk, de, fr, cn, jp, ru |
session_number (int, optional) | Proxy session number | {"session_number":"31"} | default=0 |
render_js (bool, optional) | Render JS on page | {"render_js":"true"} | default=false |
Returns
Status code | Description | Example |
---|---|---|
200 (Success) | Request successful. Returns JSON with headers and html fields | {"headers":{}, 'html':""} |
401 (Unauthorized) | API key is missing or wrong | {'error':'API key is missing or wrong'} |
422 (Unprocessable Entity) | Error in query parameters | {'error':'Wrong query'} |
504 (Timeout) | Site returned timeout after 3 attempts to reach it | {'error':'Timeout'} |
'api_key': 'xxxxxx', where xxxxxx is your Infatica API key
Curl
curl -X GET "http://scrape.infatica.io:9000/" -H "Content-Type: application/json" -d '{"api_key": "xxxxxx", "url": "https://www.google.com"}'
Python
import requests
import json
req = requests.get('http://scrape.infatica.io:9000/', data = json.dumps({
'url': 'TARGET_URL',
'api_key': 'xxxxxx',
'mobile':true,
'country_code':'uk',
'session_number':55
}), headers = {
'user_header_1': 'header1_value',
'user_header_2': 'header2_value'
})
content = json.loads(req.content)
print(content)
Javascript / NodeJS
const axios = require('axios')
const options = {
method: 'GET',
responseType: 'json',
data: {
url: 'TARGET_URL',
api_key: 'xxxxxx',
mobile: true,
country_code: 'uk',
session_number: 55
},
url: 'http://scrape.infatica.io:9000'
}
axios(options)
.then((result) => {
console.log(result)
})
.catch((err) => {
console.error(err)
})
Async API
Async API allows to put multiple time-consuming requests to the queue and receive the results as soon as they are getting ready
Payload parameters
Name | Description | Example | Options |
---|---|---|---|
url (string, required) | Destination url to retrieve (url-encoded) | {"url":"google.com"} | default=null |
api_key (string, required) | Python Parser API key | {"api_key":"0de32912321"} | default=null |
mobile (bool, optional) | User-Agent type (true for mobile, false for desktop) | {"mobile":"true"} | default=false |
follow_redirect (bool, optional) | Allow request to follow redirects (301 code) | {"follow_redirect":"true"} | default=false |
retry_404 (bool, optional) | Retry with another proxy if 404 message returned | {"retry_404":"true"} | default=false |
country_code (str, optional) | Proxy country code (geolocation) | {"country_code":"fr"} | default=null; options - us, uk, de, fr, cn, jp, ru |
session_number (int, optional) | Proxy session number | {"session_number":"31"} | default=0 |
render_js (bool, optional) | Render JS on page | {"render_js":"true"} | default=false |
Returns
Status code | Description | Example |
---|---|---|
200 (Success) | Request successful. Returns JSON with headers and html fields | {"id":'result_id'} |
401 (Unauthorized) | API key is missing or wrong | {'error':'API key is missing or wrong'} |
422 (Unprocessable Entity) | Error in query parameters | {'error':'Wrong query'} |
504 (Timeout) | Site returned timeout after 3 attempts to reach it | {'error':'Timeout'} |
Curl
curl -X POST -H "Content-Type: application/json" -d '{"api_key": "xxxxxx", "url": "http://httpbin.org/ip"}' "http://scrape.infatica.io:9000/job"
Python
import requests
import json
req = requests.post('http://scrape.infatica.io:9000/job', data = json.dumps({
'url': 'TARGET_URL',
'api_key': 'xxxxxx',
'mobile':true,
'country_code':'uk',
'session_number':55
}), headers = {
'user_header_1': 'header1_value',
'user_header_2': 'header2_value'
})
content = json.loads(req.content)
print(content)
Javascript / NodeJS
const axios = require('axios')
const options = {
method: 'POST',
responseType: 'json',
data: {
url: 'TARGET_URL',
api_key: 'xxxxxx',
mobile: true,
country_code: 'uk',
session_number: 55
},
url: 'http://scrape.infatica.io:9000/job'
}
axios(options)
.then((result) => {
console.log(result)
})
.catch((err) => {
console.error(err)
})
Async API - receiving results
GET http://scrape.infatica.io:9000/job/<job_id>
Path Parameters
Name | Description | Example | Options |
---|---|---|---|
job_id (string, required) | Job ID | http://scrape.infatica.io:9000/job/0de32912321 | default=null |
Payload Parameters
Name | Description | Example | Options |
---|---|---|---|
api_key (string, required) | Python Parser API key | {"api_key":"0de32912321"} | default=null |
Returns
Status code | Description | Example |
---|---|---|
200 (Success) | Request successful. Returns JSON with headers and html fields | { "status":"running", "statusUrl":"http://scrape.infatica.io:9000/job/0962a8a0-5f1a-4e14-bf8c-5efcc18f1953", "url":"http://httpbin.org/ip" } |
401 (Unauthorized) | API key is missing or wrong | {'error':'API key is missing or wrong'} |
422 (Unprocessable Entity) | Error in query parameters | {'error':'Wrong query'} |
504 (Timeout) | Site returned timeout after 3 attempts to reach it | {'error':'Timeout'} |
Curl
curl -X GET -H "Content-Type: application/json" -d '{"api_key": "xxxxxx"}' "http://scrape.infatica.io:9000/job/<job_id>"
Python
import requests
import json
req = requests.get('http://scrape.infatica.io:9000/job/<job_id>', data = json.dumps({
'api_key': 'xxxxxx'}))
content = json.loads(req.content)
print(content)
Javascript / NodeJS
const axios = require('axios')
const options = {
method: 'GET',
responseType: 'json',
data: {
api_key: 'xxxxxx',
},
url: 'http://scrape.infatica.io:9000/job/<job_id>'
}
axios(options)
.then((result) => {
console.log(result)
})
.catch((err) => {
console.error(err)
})