Skip to content

Camelot PDF table extraction library wrapper for PHP

Notifications You must be signed in to change notification settings

randomstate/camelot-php

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

randomstate/camelot-php

A PHP wrapper for Camelot, the python PDF table extraction library

Installation

composer require randomstate/camelot-php

Usage

The package adheres closely with the camelot CLI API Usage. Default output is in CSV format as a simple string. If you need to parse CSV strings we recommend the league/csv package (https://csv.thephpleague.com/)

<?php

use RandomState\Camelot\Camelot;
use League\Csv\Reader;

$tables = Camelot::lattice('/path/to/my/file.pdf')
       ->extract();

$csv = Reader::createFromString($tables[0]);
$allRecords = $csv->getRecords();

Advanced Processing

Saving / Extracting

Note: No Camelot operations are run until one of these methods is run

$camelot->extract(); // uses temporary files and automatically grabs the table contents for you from each
$camelot->save('/path/to/my-file.csv'); // mirrors the behaviour of Camelot and saves files in the format /path/to/my-file-page-*-table-*.csv
$camelot->plot(); // useful for debugging, it will plot it in a separate window (see Visual Debugging below)   
$camelot->json();
$camelot->csv();
$camelot->html();
$camelot->excel();
$camelot->sqlite();

$camelot->pages('1,2,3-4,8-end')

$camelot->password('my-pass')

$camelot->stream()->processBackgroundLines()

$camelot->plot()

<?php

use RandomState\Camelot\Camelot;
use RandomState\Camelot\Areas;

Camelot::stream('my-file.pdf')
    ->inAreas(
        Areas::from($xTopLeft, $yTopLeft, $xBottomRight, $yBottomRight)
            // ->add($xTopLeft2, $yTopLeft2, $xBottomRight2, $yBottomRight2)
            // ->add($xTopLeft3, $yTopLeft3, $xBottomRight3, $yBottomRight3)
    );
<?php

use RandomState\Camelot\Camelot;
use RandomState\Camelot\Areas;

Camelot::stream('my-file.pdf')
    ->inRegions(
        Areas::from($xTopLeft, $yTopLeft, $xBottomRight, $yBottomRight)
            // ->add($xTopLeft2, $yTopLeft2, $xBottomRight2, $yBottomRight2)
            // ->add($xTopLeft3, $yTopLeft3, $xBottomRight3, $yBottomRight3)
    );

$camelot->stream()->setColumnSeparators($x1,$x2...)

$camelot->split()

$camelot->flagSize()

$camelot->strip("\n")

$camelot->setEdgeTolerance(500)

$camelot->setRowTolerance(15)

$camelot->lineScale(20)

$camelot->shiftText('r', 'b')

$camelot->copyTextSpanningCells('r', 'b')

License

MIT. Use at your own risk, we accept no liability for how this code is used.

About

Camelot PDF table extraction library wrapper for PHP

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages