Scraping Factory

Scraping library to retrieve data from useful pages, such as Amazon wishlists

API

The API to use the library, scrape data and manage spiders is the following:

scrape(SPIDER_NAME, URL): scrapes the given URL using the spider referenced on SPIDER_NAME.
spiders(): list all spiders found by the library.

Custom Spiders

Using custom spiders is possible, as long as they:

They must be implemented as a class, and inherit from BaseSpider.
The spider file need to be either on scraper_factory/spiders, or in a custom location, as long as the environment variable $SPIDER_PATH is set to the directory where the spider is located.

Usage example

>>> import scraper_factory as SF
>>> SF.scrape('amazon-wishlist', 'https://www.amazon.com/hz/wishlist/ls/24XY9873RPAYN')
[{
    'id': 'I1MZVK8RDPYK8P',
    'title': 'AmazonBasics Heavy Weight Ruled Lined Index Cards, White, 3x5 Inch Card, 100-Count - AMZ63500',
    'byline': None,
    'price': None,
    'link': 'https://www.amazon.com/dp/B06XSRLP51/',
    'img': 'https://images-na.ssl-images-amazon.com/images/I/71i7LVTzpsL._SS135_.jpg'
}, {
    'id': 'I14TUJ6TADACU5',
    'title': "Women's Walking Shoes Sock Sneakers - Mesh Slip On Air Cushion Lady Girls Modern Jazz Dance Easy Shoes Platform Loafers",
    'byline': None,
    'price': None,
    'link': 'https://www.amazon.com/dp/B07MWCDJ9X/',
    'img': 'https://images-na.ssl-images-amazon.com/images/I/61sHA7o-bxL._SS135_.jpg'
}, {
    'id': 'I3C97JA2JR06PN',
    'title': 'Tenergy Redigrill\xa0Smoke-Less Infrared Grill, Indoor Grill, Heating\xa0Electric Tabletop Grill, Non-Stick Easy to Clean\xa0BBQ Grill, for Party/Home, ETL Certified',
    'byline': None,
    'price': '$179.99',
    'link': 'https://www.amazon.com/dp/B07BZ412HT/',
    'img': 'https://images-na.ssl-images-amazon.com/images/I/41uGvSPg-ML._SS135_.jpg'
}, {
    'id': 'I1C7RJI2H0VWZ7',
    'title': 'Shelf Liners for Wire Shelf Liner Set of 4 - Graphite (14-Inch-by-36-Inch)',
    'byline': None,
    'price': '$29.99',
    'link': 'https://www.amazon.com/dp/B01N9V4A9A/',
    'img': 'https://images-na.ssl-images-amazon.com/images/I/71Lg6J7sGHL._SS135_.jpg'
},
...]

Installation

Latest release through PyPI:

$ pip install scraper_factory

Development version:

$ git clone [email protected]:machinia/scraper-factory.git
$ cd scraper_factory
$ pip install -e .

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
requirements		requirements
scraper_factory		scraper_factory
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraping Factory

API

Custom Spiders

Usage example

Installation

About

Releases

Packages

Contributors 2

Languages

License

machinia/scraper-factory

Folders and files

Latest commit

History

Repository files navigation

Scraping Factory

API

Custom Spiders

Usage example

Installation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages