Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dev mode & remove JSON outputs #415

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 33 additions & 42 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
+ [Common Functions](#common-functions)
* [Additional files](#additional-files)
+ [Input JSON file](#input-json-file)
+ [Output JSON file](#output-json-file)
+ [Feature file](#feature-file)
* [Testing](#testing)
+ [Behave (Integration Testing)](#behave--integration-testing-)
Expand Down Expand Up @@ -90,23 +89,49 @@ There are a few different options for scraping, and you are free to choose which
## Developing
To get started, first you will need to fork this repository and setup your own working environment before you can start developing.

Once your environment is ready, create a new branch from your master/main branch and then create a new .py file within the `uk_bin_collection\councils` directory. The new .py file will be used in the CLI to call the parser, so be sure to pick a sensible name - e.g. CheshireEastCouncil.py is called with:
Once your environment is ready, create a new branch from your master/main branch and then create a new .py file within the `uk_bin_collection\councils` directory then use the development mode to generate the input.json entry. The new .py file will be used in the CLI to call the parser, so be sure to pick a sensible name - e.g. CheshireEastCouncil.py is called with:
```
python collect_data.py CheshireEastCouncil <web-url>
```

To simplify things somewhat, a [template](https://github.com/robbrad/UKBinCollectionData/blob/master/uk_bin_collection/uk_bin_collection/councils/council_class_template/councilclasstemplate.py) file has been created - open this file, copy the contents to your new .py file and start from there. You are pretty much free to approach the scraping however you would like, but please ensure that:
- Your scraper returns a dictionary made up of the key "bins" and a value that is a list of bin types and collection dates (see [outputs folder](https://github.com/robbrad/UKBinCollectionData/tree/master/uk_bin_collection/tests/outputs) for examples).
- Your scraper returns a dictionary made up of the key "bins" and a value that is a list of bin types and collection dates. An example of this can be seen below.
- Any dates or times are formatted to standard UK formats (see [below](#common-functions))
<details>
<summary>Output Example</summary>

```json
{
"bins": [
{
"type": "Empty Standard Mixed Recycling",
"collectionDate": "29/07/2022"
},
{
"type": "Empty Standard Garden Waste",
"collectionDate": "29/07/2022"
},
{
"type": "Empty Standard General Waste",
"collectionDate": "05/08/2022"
}
]
}
```
</details>

### Kwargs
UKBCD has two mandatory parameters when it runs - the name of the parser (sans .py) and the URL from which to scrape. However, developers can also get the following data using `kwargs`:

| Parameter | Prompt | Notes | kwargs.get |
|--------------|----------------------|------------------------------------------|--------------------------|
| UPRN | `-u` or `--uprn` | | `kwargs.get('uprn')` |
| House number | `-n` or `--number` | Sometimes called PAON | `kwargs.get('paon')` |
| Postcode | `-p` or `--postcode` | Needs to be wrapped in quotes on the CLI | `kwargs.get('postcode')` |
| Parameter | Prompt | Notes | kwargs.get |
|-----------------------------------------|--------------------------|-------------------------------------------------------------|------------------------------|
| UPRN (Unique Property Reference Number) | `-u` or `--uprn` | | `kwargs.get('uprn')` |
| USRN (Unique Street Reference Number) | `-us` or `--usrn` | | `kwargs.get('usrn')` |
| House number | `-n` or `--number` | Sometimes called PAON | `kwargs.get('paon')` |
| Postcode | `-p` or `--postcode` | Needs to be wrapped in quotes on the CLI | `kwargs.get('postcode')` |
| Skip Get URL | `-s` or `--skip_get_url` | | `kwargs.get('skip_get_url')` |
| URL for remote Selenium web driver | `-w` or `--web_driver` | Needs to be wrapped in quotes on the CLI | `kwargs.get('web_driver')` |
| Development Mode | `-d` or `--dev_mode` | Create/update council's entry in the input.json on each run | `kwargs.get('dev_mode')` |

These parameters are useful if you're using something like the requests module and need to take additional user information into the request, such as:
```commandline
Expand Down Expand Up @@ -144,7 +169,6 @@ Please feel free to contribute to this library as you see fit - added functions
In order for your scraper to work with the project's testing suite, some additional files need to be provided or
modified:
- [ ] [Input JSON file](#input-json-file)
- [ ] [Output JSON file](#output-json-file)
- [ ] [Feature file](#feature-file)

**Note:** from here on, anything containing`<council_name>` should be replaced with the scraper's name.
Expand Down Expand Up @@ -186,39 +210,6 @@ recommended - the council's address is usually a good one).
```
</details>

### Output JSON file
| Type | File location |
|------|---------------------------------------------------------------------------|
| Add | `UKBinCollectionData/uk_bin_collection/tests/outputs/<council_name>.json` |

A sample of what the scraper outputs should be provided in the [outputs](https://github.com/robbrad/UKBinCollectionData/blob/master/uk_bin_collection/tests/outputs/)
folder. This can be taken from your development environment's console or a CLI. Please only include the "bins" data.

Adding the `-d` or `--dev_mode` parameter to your CLI command enables development mode which creates/updates the Output JSON file for the council automatically for you on each run

<details>
<summary>Example</summary>

```json
{
"bins": [
{
"type": "Empty Standard Mixed Recycling",
"collectionDate": "29/07/2022"
},
{
"type": "Empty Standard Garden Waste",
"collectionDate": "29/07/2022"
},
{
"type": "Empty Standard General Waste",
"collectionDate": "05/08/2022"
}
]
}
```
</details>

### Feature file
| Type | File location |
|--------|-----------------------------------------------------------------------------------------|
Expand Down
4 changes: 2 additions & 2 deletions custom_components/uk_bin_collection/config_flow.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ async def get_council_schema(self, council=str) -> vol.Schema:
if self.councils_data is None:
self.councils_data = await self.get_councils_json()
council_schema = vol.Schema({})
if ("SKIP_GET_URL" not in self.councils_data[council] or
if ("skip_get_url" not in self.councils_data[council] or
"custom_component_show_url_field" in self.councils_data[council]):
council_schema = council_schema.extend(
{vol.Required("url", default=""): cv.string}
Expand Down Expand Up @@ -102,7 +102,7 @@ async def async_step_council(self, user_input=None):

if user_input is not None:
# Set additional options
if "SKIP_GET_URL" in self.councils_data[self.data["council"]]:
if "skip_get_url" in self.councils_data[self.data["council"]]:
user_input["skip_get_url"] = True
user_input["url"] = self.councils_data[self.data["council"]]["url"]

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Feature: Test each council output matches expected results in /outputs
Feature: Test each council output matches expected results

Scenario Outline: Validate Council Output
Given the council: <council>
Expand Down
Loading
Loading