Skip to content

Commit

Permalink
Merge pull request #415 from OliverCullimore/409-update-dev-mode-remo…
Browse files Browse the repository at this point in the history
…ve-outputs
  • Loading branch information
robbrad authored Nov 3, 2023
2 parents 599f617 + 99d5812 commit b9d04f7
Show file tree
Hide file tree
Showing 108 changed files with 746 additions and 5,821 deletions.
75 changes: 33 additions & 42 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
+ [Common Functions](#common-functions)
* [Additional files](#additional-files)
+ [Input JSON file](#input-json-file)
+ [Output JSON file](#output-json-file)
+ [Feature file](#feature-file)
* [Testing](#testing)
+ [Behave (Integration Testing)](#behave--integration-testing-)
Expand Down Expand Up @@ -90,23 +89,49 @@ There are a few different options for scraping, and you are free to choose which
## Developing
To get started, first you will need to fork this repository and setup your own working environment before you can start developing.

Once your environment is ready, create a new branch from your master/main branch and then create a new .py file within the `uk_bin_collection\councils` directory. The new .py file will be used in the CLI to call the parser, so be sure to pick a sensible name - e.g. CheshireEastCouncil.py is called with:
Once your environment is ready, create a new branch from your master/main branch and then create a new .py file within the `uk_bin_collection\councils` directory then use the development mode to generate the input.json entry. The new .py file will be used in the CLI to call the parser, so be sure to pick a sensible name - e.g. CheshireEastCouncil.py is called with:
```
python collect_data.py CheshireEastCouncil <web-url>
```

To simplify things somewhat, a [template](https://github.com/robbrad/UKBinCollectionData/blob/master/uk_bin_collection/uk_bin_collection/councils/council_class_template/councilclasstemplate.py) file has been created - open this file, copy the contents to your new .py file and start from there. You are pretty much free to approach the scraping however you would like, but please ensure that:
- Your scraper returns a dictionary made up of the key "bins" and a value that is a list of bin types and collection dates (see [outputs folder](https://github.com/robbrad/UKBinCollectionData/tree/master/uk_bin_collection/tests/outputs) for examples).
- Your scraper returns a dictionary made up of the key "bins" and a value that is a list of bin types and collection dates. An example of this can be seen below.
- Any dates or times are formatted to standard UK formats (see [below](#common-functions))
<details>
<summary>Output Example</summary>

```json
{
"bins": [
{
"type": "Empty Standard Mixed Recycling",
"collectionDate": "29/07/2022"
},
{
"type": "Empty Standard Garden Waste",
"collectionDate": "29/07/2022"
},
{
"type": "Empty Standard General Waste",
"collectionDate": "05/08/2022"
}
]
}
```
</details>

### Kwargs
UKBCD has two mandatory parameters when it runs - the name of the parser (sans .py) and the URL from which to scrape. However, developers can also get the following data using `kwargs`:

| Parameter | Prompt | Notes | kwargs.get |
|--------------|----------------------|------------------------------------------|--------------------------|
| UPRN | `-u` or `--uprn` | | `kwargs.get('uprn')` |
| House number | `-n` or `--number` | Sometimes called PAON | `kwargs.get('paon')` |
| Postcode | `-p` or `--postcode` | Needs to be wrapped in quotes on the CLI | `kwargs.get('postcode')` |
| Parameter | Prompt | Notes | kwargs.get |
|-----------------------------------------|--------------------------|-------------------------------------------------------------|------------------------------|
| UPRN (Unique Property Reference Number) | `-u` or `--uprn` | | `kwargs.get('uprn')` |
| USRN (Unique Street Reference Number) | `-us` or `--usrn` | | `kwargs.get('usrn')` |
| House number | `-n` or `--number` | Sometimes called PAON | `kwargs.get('paon')` |
| Postcode | `-p` or `--postcode` | Needs to be wrapped in quotes on the CLI | `kwargs.get('postcode')` |
| Skip Get URL | `-s` or `--skip_get_url` | | `kwargs.get('skip_get_url')` |
| URL for remote Selenium web driver | `-w` or `--web_driver` | Needs to be wrapped in quotes on the CLI | `kwargs.get('web_driver')` |
| Development Mode | `-d` or `--dev_mode` | Create/update council's entry in the input.json on each run | `kwargs.get('dev_mode')` |

These parameters are useful if you're using something like the requests module and need to take additional user information into the request, such as:
```commandline
Expand Down Expand Up @@ -144,7 +169,6 @@ Please feel free to contribute to this library as you see fit - added functions
In order for your scraper to work with the project's testing suite, some additional files need to be provided or
modified:
- [ ] [Input JSON file](#input-json-file)
- [ ] [Output JSON file](#output-json-file)
- [ ] [Feature file](#feature-file)

**Note:** from here on, anything containing`<council_name>` should be replaced with the scraper's name.
Expand Down Expand Up @@ -186,39 +210,6 @@ recommended - the council's address is usually a good one).
```
</details>

### Output JSON file
| Type | File location |
|------|---------------------------------------------------------------------------|
| Add | `UKBinCollectionData/uk_bin_collection/tests/outputs/<council_name>.json` |

A sample of what the scraper outputs should be provided in the [outputs](https://github.com/robbrad/UKBinCollectionData/blob/master/uk_bin_collection/tests/outputs/)
folder. This can be taken from your development environment's console or a CLI. Please only include the "bins" data.

Adding the `-d` or `--dev_mode` parameter to your CLI command enables development mode which creates/updates the Output JSON file for the council automatically for you on each run

<details>
<summary>Example</summary>

```json
{
"bins": [
{
"type": "Empty Standard Mixed Recycling",
"collectionDate": "29/07/2022"
},
{
"type": "Empty Standard Garden Waste",
"collectionDate": "29/07/2022"
},
{
"type": "Empty Standard General Waste",
"collectionDate": "05/08/2022"
}
]
}
```
</details>

### Feature file
| Type | File location |
|--------|-----------------------------------------------------------------------------------------|
Expand Down
4 changes: 2 additions & 2 deletions custom_components/uk_bin_collection/config_flow.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ async def get_council_schema(self, council=str) -> vol.Schema:
if self.councils_data is None:
self.councils_data = await self.get_councils_json()
council_schema = vol.Schema({})
if ("SKIP_GET_URL" not in self.councils_data[council] or
if ("skip_get_url" not in self.councils_data[council] or
"custom_component_show_url_field" in self.councils_data[council]):
council_schema = council_schema.extend(
{vol.Required("url", default=""): cv.string}
Expand Down Expand Up @@ -102,7 +102,7 @@ async def async_step_council(self, user_input=None):

if user_input is not None:
# Set additional options
if "SKIP_GET_URL" in self.councils_data[self.data["council"]]:
if "skip_get_url" in self.councils_data[self.data["council"]]:
user_input["skip_get_url"] = True
user_input["url"] = self.councils_data[self.data["council"]]["url"]

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Feature: Test each council output matches expected results in /outputs
Feature: Test each council output matches expected results

Scenario Outline: Validate Council Output
Given the council: <council>
Expand Down
Loading

0 comments on commit b9d04f7

Please sign in to comment.