Skip to content

Commit

Permalink
Merge branch 'dev'
Browse files Browse the repository at this point in the history
  • Loading branch information
mikeym88 committed Jul 14, 2020
2 parents 25f5e95 + 1528557 commit 3ea0944
Show file tree
Hide file tree
Showing 8 changed files with 525 additions and 94 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
## SQLite3 DB ignore
/**/*.db
/**/*.db-journal

## .xlsx ignore
/**/*.xlsx
Expand Down Expand Up @@ -98,7 +99,6 @@ ENV/
### VirtualEnv template
# Virtualenv
# http://iamzed.com/2009/05/07/a-primer-on-virtualenv/
.Python
[Bb]in
[Ii]nclude
[Ll]ib
Expand Down
119 changes: 116 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,118 @@
# PatentsView API mining
# PatentsView API wrapper

This project calls the PatentsView API.
This project is a wrapper for the PatentsView API.

* [PatentsView Glossary](http://www.patentsview.org/api/glossary.html) provides a description of the variables.
* [PatentsView Glossary](http://www.patentsview.org/api/glossary.html) provides a description of the variables.

## Important Notes:

* Use <https://dev.patentsview.org> and not <https://www.patentsview.org/>; the former is laxer in terms of accepting input
* Naming is finicky (more so on the second one), even spacing and other character affect the search results (see below).

## Remarks about the data

* Patent Numbers are alphanumeric (they can include letters)
* PatentsView only includes information about the patent at issue. It does not include changes to patent information after the patent has been issued.
* This means that if the company changes name, it won't be reflected in the patent. Example: if "International Business Machines" renames itself to "IBM", patents issued to "International Business Machines" will still be issued to "International Business Machines" (and not "IBM").
* As an example: `NETFLIX, INC.` has an `assignee_key_id` of `17594` and an `assignee_id` of `org_2lAuxOpAtNMvtTxhuLmX`; `NETFLIX.COM, INC.` on the other hand an `assignee_key_id` of `org_UNHkzir8tY7NlQrOJKT4` and an `assignee_id` of `363028`. (This of course assumes `NETFLIX, INC.` and `NETFLIX.COM, INC.` are the same company, which is highly probable).
* The same applies for acquisitions. Example: Company A has patent *X*; once company B acquires company A, patent *X* would still show that it is assigned to company *A*.
* Probably the same thing holds if a company acquires certain patents of another company.
* The patents can be assigned to organizations (as opposed to individuals). This is indicated by the 'assignee organization' field returned by the API.
* The assignee organizations (i.e. companies) are distinguished by name. Each organization name is a 'separate' company.
* This means that a patent can be assigned to "IBM", "IBM Inc.", "International Business Machines".
* Different organization names have different `assignee_id`s and `assignee_key_id`s (see `NETFLIX` example above).
* **Different endpoints behave differently**: particularly <https://www.patentsview.org/> and <https://dev.patentsview.org>
* **Naming is finicky on the first one**: If you search for `Abbott Laboratories` or for `ABBOTT LABORATORIES`,
you will get the same results. If you search for `ABBOTT Laboratories`, `Abbott LABORATORIES`,
or `abbott laboratories`, you will get nothing.
* The second one seems to work better, but you still have to replace the carriage return and line break characters.

## Adding companies

Create an Microsoft Excel spreadsheet (`.xlsx` file) with the following structure:

<table>
<thead>
<tr>
<th>Firm ID</th>
<th>Firm Name</th>
<th>Alternative names </th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>ID</th>
<th>Name 1</th>
<th>Name 2</th>
<th>Name 3</th>
<th>Name 4</th>
<th>...</th>
<th>Name X</th>
</tr>
</thead>
<tbody>
<tr>
<td>ID2</td>
<td>Company 2 Primary Name / Name 1</td>
<td>Name 2</td>
<td>Name 3</td>
<td>Name 4</td>
<td>...</td>
<td>Name X</td>
</tr>
<tr>
<td>ID1</td>
<td>Company 1 Primary Name / Name 1</td>
<td>Name 2</td>
<td>Name 3</td>
<td>Name 4</td>
<td>...</td>
<td>Name X</td>
</tr>
</tbody>
</table>


## Database Structure

Here is an Entity Relationship Diagram (ERD) of the database structure.

![Entity Relationship Diagram (ERD) of the database structure](images/patents_view_table.png)

## Using SQL to Select Patents

The `sql` folder has some SQL scripts that might come in handy.
As an example, here is SQL query that selects patents between two dates:

```
SELECT
p.patent_number as "Patent Number",
p.patent_title as "Patent Title",
-- p.company_id as "Company ID",
c.name as "Company Name",
-- p.company_alternate_name_id as "Alternate Name ID",
an.name as "Company Name Listed on Patent",
p.year,
p.grant_date as "Grant Date",
p.uspc_class as "USPC Classes"
FROM
patents as p
JOIN
companies as c
ON
p.company_id = c.id
LEFT JOIN
alternate_company_names as an
ON
p.company_alternate_name_id = an.id
WHERE
p.grant_date > DATE("2006-01-03") AND
p.grant_date < DATE("2010-06-13");
```

## Software Applications

* [DbVisualizer](https://www.dbvis.com/) was used to generate the graphs
* [DB Browser for SQLite](https://sqlitebrowser.org/) was used to look at the data and execute SQL queries
Binary file added images/patents_view_table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions sql/select_cited_patent_numbers_not_populated.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
/*
* Use this script to find which cited patents need to be added to the Patents table.
*/
SELECT
cited_patent_number
FROM
cited_patents
WHERE
cited_patent_number NOT IN (SELECT DISTINCT patent_number FROM patents);
28 changes: 28 additions & 0 deletions sql/select_cited_patents.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
SELECT
DISTINCT
p.patent_number as "Citing Patent Number"
,co.name as "Citing Company"
,cp.cited_patent_number as "Cited Patent Number"
,pp.patent_title as "Cited Patent Title"
,pp.year as "Year"
,pp.grant_date as "Grant Date"
,pp.uspc_class as "USPC Class"
FROM
patents as p
LEFT JOIN
companies as co
ON
co.id = p.company_id
JOIN
cited_patents as cp
ON
p.patent_number = cp.citing_patent_number
LEFT JOIN
patents as pp
ON
cp.cited_patent_number = pp.patent_number
-- Uncomment the following 2 lines if you to filter by patent_number (or something else of your choosing)
--WHERE
-- p.patent_number = "10001497"
ORDER BY
p.patent_number ASC
27 changes: 27 additions & 0 deletions sql/select_patents_between_two_dates.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* Use this SQL query to select patents between two dates.
* Uncomment the lines to retrieve the company ID and the alternate name ID.
*/
SELECT
p.patent_number as "Patent Number",
p.patent_title as "Patent Title",
-- p.company_id as "Company ID",
c.name as "Company Name",
-- p.company_alternate_name_id as "Alternate Name ID",
an.name as "Company Name Listed on Patent",
p.year,
p.grant_date as "Grant Date",
p.uspc_class as "USPC Classes"
FROM
patents as p
JOIN
companies as c
ON
p.company_id = c.id
LEFT JOIN
alternate_company_names as an
ON
p.company_alternate_name_id = an.id
WHERE
p.grant_date > DATE("2006-01-03") AND
p.grant_date < DATE("2010-06-13");
Loading

0 comments on commit 3ea0944

Please sign in to comment.