The original package was developed by Anthony Casagrande and can be downloaded at https://github.com/BirdAPI This is a forked package that I will continue maintaining in the foreseeable future. I will try to maintain a strongly modularized design so when something is broken anyone can quickly repair it. All contributions are very welcome.
Google Search API is a python based library for searching various functionalities of google. It uses screen scraping to retrieve the results, and thus is unreliable if the way google's web pages are returned change in the future. This package is currently under heavy refactoring so changes in the user interface should be expected for the time being.
Disclaimer: This software uses screen scraping to retrieve search results from google.com, and therefore this software may stop working at any given time. Use this software at your own risk. I assume no responsibility for how this software API is used by others.
Table of Contents generated with DocToc
All methods are currently functioning and returning its primary target data. Although, some of the secondary data that is supposed to be collected in the result objects is not yet working.
Redesign of the package is still a work in progress. After completed, I will attempt to repair the gathering of secondary data. Contributions are welcome!
The repo is structured like a package, so it can be installed from pip using github clone url. From command line type:
pip install Google-Search-API
To upgrade the package if you have already installed it:
pip install Google-Search-API --upgrade
Please note that you should also install Firefox browser in order to use images search.
You could also just download or clone the repo and import the package from Google-Search-API folder.
import os
os.chdir("C:\Path_where_repo_is")
from google import google
You can search google web in the following way:
from google import google
num_page = 3
search_results = google.search("This is my query", num_page)
search_results
will contain a list of GoogleResult
objects. num_page parameter is optional (default is 1 page)
GoogleResult:
self.name # The title of the link
self.link # The external link
self.google_link # The google link
self.description # The description of the link
self.thumb # The link to a thumbnail of the website (NOT implemented yet)
self.cached # A link to the cached version of the page
self.page # What page this result was on (When searching more than one page)
self.index # What index on this page it was on
self.number_of_results # The total number of results the query returned
Description text parsing has some encoding problems to be resolved. Only google link of the search is being parsed right now, parse the external link is an implementation priority.
Attempts to search google calculator for the result of an expression. Returns a CalculatorResult
if successful or None
if it fails.
from google import google
google.calculate("157.3kg in grams")
CalculatorResult
value = None # Result value (eg. 157300.0)
from_value = None # Initial value (eg. 157.3)
unit = None # Result unit (eg. u'grams') (NOT implemented yet)
from_unit = None # Initial unit (eg. u'kilograms') (NOT implemented yet)
expr = None # Initial expression (eg. u'157.3 grams') (NOT implemented yet)
result = None # Result expression (eg. u'157300 kilograms') (NOT implemented yet)
fullstring = None # Result unit (eg. u'157.3 kilograms = 157300 grams') (NOT implemented yet)
Parsing of the units must be implemented. The rest of the data members of CalculatorResult can be build from the values and units of the calculation.
Searches google images for a list of images. Image searches can be filtered to produce better results. Image searches can be downloaded.
Image search uses the selenium & the Firefox driver, therefor you MUST have Firefox installed to use it.
Perform a google image search on "banana" and filter it:
from google import google, images
options = images.ImageOptions()
options.image_type = images.ImageType.CLIPART
options.larger_than = images.LargerThan.MP_4
options.color = "green"
results = google.search_images("banana", options)
Sample Result:
{'domain': 'shop.tradedoubler.com',
'filesize': None,
'format': None,
'height': '2000',
'index': 15,
'link': 'http://tesco.scene7.com/is/image/tesco/210-8446_PI_1000013MN%3Fwid%3D2000%26hei%3D2000',
'name': None,
'page': 1,
'site': 'http://shop.tradedoubler.com/shop/uk-01/a/2058674/productName/banana/sortBy/price/sortReverse/false',
'thumb': 'https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcS8JPH_bgyvvyf5X67k32ZZYjf9MlWlxHIEXXxi91TVrNafpokI',
'thumb_height': '199px',
'thumb_width': '199px',
'width': '2000'}
filesize is to be implemented. format works, but sometimes the link of the image doesn't show the format. Google images right now seems to not have a names, so the method for that is not implemented.
Filter options:
ImageOptions:
image_type # face, body, clipart, line drawing
size_category # large, small, icon
larger_than # the well known name of the smallest image size you want
exact_width # the exact width of the image you want
exact_height # the exact height of the image you want
color_type # color, b&w, specific
color # blue, green, red
Enums of values that can be used to filter image searches:
class ImageType:
NONE = None
FACE = "face"
PHOTO = "photo"
CLIPART = "clipart"
LINE_DRAWING = "lineart"
class SizeCategory:
NONE = None
ICON = "i"
LARGE = "l"
MEDIUM = "m"
SMALL = "s"
LARGER_THAN = "lt"
EXACTLY = "ex"
class LargerThan:
NONE = None
QSVGA = "qsvga" # 400 x 300
VGA = "vga" # 640 x 480
SVGA = "svga" # 800 x 600
XGA = "xga" # 1024 x 768
MP_2 = "2mp" # 2 MP (1600 x 1200)
MP_4 = "4mp" # 4 MP (2272 x 1704)
MP_6 = "6mp" # 6 MP (2816 x 2112)
MP_8 = "8mp" # 8 MP (3264 x 2448)
MP_10 = "10mp" # 10 MP (3648 x 2736)
MP_12 = "12mp" # 12 MP (4096 x 3072)
MP_15 = "15mp" # 15 MP (4480 x 3360)
MP_20 = "20mp" # 20 MP (5120 x 3840)
MP_40 = "40mp" # 40 MP (7216 x 5412)
MP_70 = "70mp" # 70 MP (9600 x 7200)
class ColorType:
NONE = None
COLOR = "color"
BLACK_WHITE = "gray"
SPECIFIC = "specific"
class License:
NONE = None
REUSE = "fc"
REUSE_WITH_MOD = "fmc"
REUSE_NON_COMMERCIAL = "f"
REUSE_WITH_MOD_NON_COMMERCIAL = "fm"
You can download a list of images.
images.download(image_results, path = "path/to/download/images")
Path is an optional argument, if you don't specify a path, images will be downloaded to an "images" folder inside the working directory.
If you want to download a large list of images, the previous method could be slow. A better method using multithreading is provided for this case.
images.fast_download(image_results, path = "path/to/download/images", threads=12)
You may change the number of threads, 12 is the number that has offered the best speed after a number of informal tests that I've done.
Convert between one currency and another using google calculator. Results are real time and can change at any time based on the current exchange rate according to google.
Convert 5 US Dollars to Euros using the official 3 letter currency acronym (ISO 4217):
from google import google
euros = google.convert_currency(5.0, "USD", "EUR")
print "5.0 USD = {0} EUR".format(euros)
5.0 USD = 3.82350692 EUR
Convert 1000 Japanese Yen to US Dollars:
yen = google.convert_currency(1000, "yen", "us dollars")
print "1000 yen = {0} us dollars".format(yen)
1000 yen = 12.379 us dollars
Instead you can get the exchange rate which returns what 1 from_currency
equals in to_currency
and do your own math:
rate = google.exchange_rate("dollars", "pesos")
print "dollars -> pesos exchange rate = {0}".format(rate)
dollars -> pesos exchange rate = 13.1580679
Perform your own math. The following 2 statements are equal:
5.0 * google.exchange_rate("USD", "EUR")
google.convert_currency(5.0, "USD", "EUR")
As a side note, convert_currency
is always more accurate than performing your own math on exchange_rate
because of possible rounding errors. However if you have more than one value to convert it is best to call exchange_rate
and cache the result to use for multiple calculations instead of querying the google server for each one.
All contributions are very welcome! As you have seen, there is still some methods that are not implemented. The structure of the package is intended to facilitate that you can contribute implementing or improving any method without changing other code.
Other interesting things that you may do is to build a good command line interface for the package. You can also take a look to the TODO list
For all contributions, we intend to follow the Google Python Style Guide