A lightweight, dependency free PHP class that acts as wrapper for Crawlbase API.
Choose a way of installing:
- Use Packagist PHP package manager.
- Download the project from Github and save it into your project so you can require it
require_once('crawlbase-php/src/[class].php')
First initialize the CrawlingAPI class. You can get your free token here.
$api = new Crawlbase\CrawlingAPI(['token' => 'YOUR_TOKEN']);
Pass the url that you want to scrape plus any options from the ones available in the API documentation.
$api->get(string $url, array $options = []);
Example:
$response = $api->get('https://www.facebook.com/britneyspears');
if ($response->statusCode === 200) {
echo $response->body;
}
You can pass any options from Crawlbase API.
Example:
$response = $api->get('https://www.reddit.com/r/pics/comments/5bx4bx/thanks_obama/', [
'user_agent' => 'Mozilla/5.0 (Windows NT 6.2; rv:20.0) Gecko/20121202 Firefox/30.0',
'format' => 'json'
]);
if ($response->statusCode === 200) {
echo $response->body;
}
Optionally pass store parameter to true
to store a copy of the API response in the Crawlbase Cloud Storage.
Example:
$response = $api->get('https://www.reddit.com/r/pics/comments/5bx4bx/thanks_obama/', [
'store' => true
]);
if ($response->statusCode === 200) {
echo 'storage url: ' . $response->headers->storage_url . PHP_EOL;
}
Pass the url that you want to scrape, the data that you want to send which can be either a json or a string, plus any options from the ones available in the API documentation.
$api->post(string $url, array or string $data, array options = []);
Example:
$response = $api->post('https://producthunt.com/search', ['text' => 'example search']);
if ($response->statusCode === 200) {
echo $response->body;
}
You can send the data as application/json
instead of x-www-form-urlencoded
by setting option post_content_type
as json.
$response = $api->post('https://httpbin.org/post', json_encode(['some_json' => 'with some value']), ['post_content_type' => 'json']);
if ($response->statusCode === 200) {
echo $response->body;
}
Pass the url that you want to scrape, the data that you want to send which can be either a json or a string, plus any options from the ones available in the API documentation.
$api->put(string $url, array or string $data, array options = []);
Example:
$response = $api->put('https://producthunt.com/search', ['text' => 'example search']);
if ($response->statusCode === 200) {
echo $response->body;
}
If you need to scrape any website built with Javascript like React, Angular, Vue, etc. You just need to pass your javascript token and use the same calls. Note that only ->get
is available for javascript and not ->post
.
$api = new Crawlbase\CrawlingAPI(['token' => 'YOUR_JAVASCRIPT_TOKEN']);
$response = $api->get('https://www.nfl.com');
if ($response->statusCode === 200) {
echo $response->body;
}
Same way you can pass javascript additional options.
$response = $api->get('https://www.freelancer.com', ['page_wait' => 5000]);
if ($response->statusCode === 200) {
echo $response->body;
}
You can always get the original status and crawlbase status from the response. Read the Crawlbase documentation to learn more about those status.
$response = $api->get('https://craiglist.com');
echo $response->headers->original_status . PHP_EOL;
echo $response->headers->pc_status . PHP_EOL;
First initialize the ScraperAPI class. You can get your free token here. Please note that only some websites are supported, check the API documentation for more information.
$api = new Crawlbase\ScraperAPI(['token' => 'YOUR_TOKEN']);
Pass the url that you want to scrape plus any options from the ones available in the API documentation.
Example:
$response = $api->get('https://www.amazon.com/DualSense-Wireless-Controller-PlayStation-5/dp/B08FC6C75Y/');
echo 'status code: ' . $response->statusCode . PHP_EOL;
if ($response->statusCode === 200) {
var_dump($response->json); // Will print scraped Amazon details
}
First initialize the LeadsAPI class. You can get your free token here.
$api = new Crawlbase\LeadsAPI(['token' => 'YOUR_TOKEN']);
Pass the domain where you want to search for leads.
Example:
$response = $api->getFromDomain('target.com');
if ($response->statusCode === 200) {
foreach ($response->json->leads as $key => $lead) {
echo $lead->email . PHP_EOL;
}
}
Initialize with your Screenshots API token and call the get
method.
$api = new Crawlbase\ScreenshotsAPI(['token' => 'YOUR_TOKEN']);
$response = $api->get('https://www.apple.com');
echo 'success: ' . $response->headers->success . PHP_EOL;
echo 'remaining requests: ' . $response->headers->remaining_requests . PHP_EOL;
file_put_contents('apple.jpg', $response->body);
or you can specify a callback that automatically saves the file to the temporary folder
$api = new Crawlbase\ScreenshotsAPI(['token' => 'YOUR_TOKEN']);
$response = $api->get('https://www.apple.com', [
'callback' => function($filepath) {
echo 'filepath: ' . $filepath . PHP_EOL;
}
]);
echo 'success: ' . $response->headers->success . PHP_EOL;
echo 'remaining requests: ' . $response->headers->remaining_requests . PHP_EOL;
or specifying a file path via saveToPath
option
$api = new Crawlbase\ScreenshotsAPI(['token' => 'YOUR_TOKEN']);
$response = $api->get('https://www.apple.com', [
'saveToPath' => 'apple.jpg',
'callback' => function($filepath) {
echo 'filepath: ' . $filepath . PHP_EOL;
}
]);
echo 'success: ' . $response->headers->success . PHP_EOL;
echo 'remaining requests: ' . $response->headers->remaining_requests . PHP_EOL;
Note that $api.get(url, options)
method accepts an options
Initialize the Storage API using your private token.
$api = new Crawlbase\StorageAPI(['token' => 'YOUR_TOKEN']);
Pass the url that you want to get from Crawlbase Storage.
$response = $api->get('https://www.apple.com');
echo 'status code: ' . $response->statusCode . PHP_EOL;
if ($response->statusCode === 200) {
echo 'body: ' . $response->body . PHP_EOL;
echo 'original status: ' . $response->headers->original_status . PHP_EOL;
echo 'crawlbase status: ' . $response->headers->pc_status . PHP_EOL;
echo 'rid: ' . $response->headers->rid . PHP_EOL;
echo 'url: ' . $response->headers->url . PHP_EOL;
echo 'stored date: ' . $response->headers->stored_at . PHP_EOL;
}
or you can use the RID
$response = $api->get('RID_REPLACE');
echo 'status code: ' . $response->statusCode . PHP_EOL;
if ($response->statusCode === 200) {
echo 'body: ' . $response->body . PHP_EOL;
echo 'original status: ' . $response->headers->original_status . PHP_EOL;
echo 'crawlbase status: ' . $response->headers->pc_status . PHP_EOL;
echo 'rid: ' . $response->headers->rid . PHP_EOL;
echo 'url: ' . $response->headers->url . PHP_EOL;
echo 'stored date: ' . $response->headers->stored_at . PHP_EOL;
}
Note: One of the two RID or URL must be sent. So both are optional but it's mandatory to send one of the two.
Delete request
To delete a storage item from your storage area, use the correct RID
if ($api->delete('RID_REPLACE')) {
echo 'delete success' . PHP_EOL;
echo 'status code: ' . $api->response->statusCode . PHP_EOL;
} else {
echo 'delete failed' . PHP_EOL;
echo 'status code: ' . $api->response->statusCode . PHP_EOL;
}
Bulk request
To do a bulk request with a list of RIDs, please send the list of rids as an array
$items = $api->bulk(['RID1', 'RID2', 'RID3', ...]);
foreach ($items as $item) {
echo 'body: ' . $item->body . PHP_EOL;
echo 'stored at: ' . $item->stored_at . PHP_EOL;
echo 'original status: ' . $item->original_status . PHP_EOL;
echo 'crawlbase status: ' . $item->pc_status . PHP_EOL;
echo 'rid: ' . $item->rid . PHP_EOL;
echo 'url: ' . $item->url . PHP_EOL;
echo PHP_EOL;
}
RIDs request
To request a bulk list of RIDs from your storage area
$rids = $api->rids();
foreach ($rids as $rid) {
echo $rid . PHP_EOL;
}
You can also specify a limit as a parameter
$rids = $api->rids(10);
To get the total number of documents in your storage area
$totalCount = $api->totalCount();
echo 'total count: ' . $totalCount . PHP_EOL;
If you have questions or need help using the library, please open an issue or contact us.
Copyright 2024 Crawlbase