Women Inventors Data in R

Data on Women Inventors from the USPTO

Paul Oldham https://github.com/wipo-analytics (Manchester Institute of Innovation Research)
2022-03-15

Introduction

In this post we describe how to create a set of datasets on Women Inventors using the bulk data from the USPTO PatentsView website. This builds on the previous post on how to download bulk patent data from the USPTO for use in R.

The data on women inventors at the USPTO is the result of work by the team at the USPTO PatentsView service to classify male and female inventors in USPTO data. You can read more about this on the PatentsView site here. The Office of the Chief Economist produced a report on Women and Innovation that can be accessed here and the methodology can be accessed here.

We will be working with a set of large files and you will find the results of this work available for download into R in an Open Science Framework repository. The data has also been bundled up into the WomenInventorR package that can be accessed here.

#install.packages("devtools")
devtools::install_github("poldham/patentsviewdata")

We will also be using the tidyverse to wrangle the data.

The data files

We will be working with US granted patents data from PatentsView.

The patentsviewdata package allows us to import the metadata for that page on the website including the urls for the AWS file storage. You can access that page at PatentsView to explore the contents. As explained in a previous post, creating the table in R makes it easier to access and downlaod the data we want.

grants <- patentsviewdata::pv_meta("grant")

There are a large number of files in this table and we need just a few. We will pick our way through them with filter().

fns <- grants %>% 
  filter(file_name == "assignee" | file_name == "ipcr" | file_name == "inventor" | 
           file_name == "location" | file_name == "patent" | file_name == "patent_assignee" | 
           file_name == "patent_inventor")
url zip_name
https://s3.amazonaws.com/data.patentsview.org/download/assignee.tsv.zip assignee.tsv.zip
https://s3.amazonaws.com/data.patentsview.org/download/inventor.tsv.zip inventor.tsv.zip
https://s3.amazonaws.com/data.patentsview.org/download/ipcr.tsv.zip ipcr.tsv.zip
https://s3.amazonaws.com/data.patentsview.org/download/location.tsv.zip location.tsv.zip
https://s3.amazonaws.com/data.patentsview.org/download/patent.tsv.zip patent.tsv.zip
https://s3.amazonaws.com/data.patentsview.org/download/patent_assignee.tsv.zip patent_assignee.tsv.zip

We can read these files over a connection using the pv_download() function that was developed in an earlier post here and is now included in the patentsviewdata package. This function will download the file from the url and add it to a named destination folder (it will be created if it doesn’t exist). Because we are dealing with large files we set the time out to something sensible (10 minutes by default). You may need to adjust this depending on your internet connection.

pv_download(fns$url[1], dest = "grants", timeout = 600)

Files can be imported directly from .zip files (with the occasional exception of the patent.tsv.zip) file, using pv_import(). If you save and then specify the path to the metadata file we just downloaded the function will check that the number of rows match those on the PatentsView data download page.

pv_import(path = "/grants/inventor.tsv.zip", dest = grants, meta_path = "grants/grants.rda")
id name_first name_last male_flag attribution_status
00008e0f-bdce-11ea-8a73-121df0c29c1e Paramjit S. Tappia 1 1
0000n6xqianutadbzbgzwled7 Eva K. Mudráné 0 1
0000n8nqsxhrztn7djlxou00k Muamer Zukic 1 1
0001c325-bd4c-11ea-8a73-121df0c29c1e Eneman Geert NA 99
0001f65b-bd4c-11ea-8a73-121df0c29c1e Fernando GONZALEZ-ZALBA 1 1
00020b9f-bd4c-11ea-8a73-121df0c29c1e Sergey RESHANOV 1 1

Note that these can take some time to download as they are large files. The patents file can be difficult to directly import from a zip file in R.

Inventor Data

In the first step we need to import the inventor table. This contains a column called male_flag (we rename it gender_flag) that allows us to divide the data into women and male inventors.

library(tidyverse)

# inventors
pv_download(fns$url[2], dest = "grants", timeout = 600)

inventor <- pv_import("/Users/pauloldham/Documents/patentsview2021/grant/inventor.tsv.zip")

women_inventors <- inventor %>% 
  rename(gender_flag = male_flag) %>% 
  filter(gender_flag == 0) 

male_inventors <- inventor %>% 
  rename(gender_flag = male_flag) %>% 
  filter(gender_flag == 1) 
id name_first name_last gender_flag attribution_status
0000n6xqianutadbzbgzwled7 Eva K. Mudráné 0 1
000f0k6brgval6kr9agzjlgcg Lynda M. Salvetti 0 1
000m3sct572dqfs1pd41c7ayq Bohumila Zapletal 0 1
00141444-3b10-11eb-a3cd-121df0c29c1e Chen Linyong 0 1
001d857b-3b10-11eb-a3cd-121df0c29c1e Sun-Il Mho 0 1
001wwao8tcfowqif3g924pk7j Jill Kathleen Sestini 0 1

In the next step we will want to link our women inventor data to the main patent data table. However, we have to use the ids in an intermediate table to do that.

Linking Inventors to the Granted Patents Table

We now download the main patent file (granted patents). However, to make the link to our women inventors we need to pass through the patent_inventor table that contains the inventor id, the patent document id (patent_id) and a location_id.

We can read in the patent_inventor table and take a look.

library(readr)
# expect 19,111,181

# downlaod the patent_inventor table

pv_download(fns$url[7], dest = "grants", timeout = 600)

patent_inventor <- pv_import("/Users/pauloldham/Documents/patentsview2021/grant/patent_inventor.tsv.zip")
patent_id inventor_id location_id
10000000 fl:j_ln:marron-6 e310ff76-cb8e-11eb-9615-121df0c29c1e
10000001 fl:h_ln:yu-295 fca7dc3f-cb8f-11eb-9615-121df0c29c1e
10000001 fl:s_ln:lee-753 fbf41696-cb8f-11eb-9615-121df0c29c1e
10000002 fl:d_ln:choi-70 ff698db4-cb8e-11eb-9615-121df0c29c1e
10000002 fl:d_ln:kim-329 ff698db4-cb8e-11eb-9615-121df0c29c1e
10000002 fl:s_ln:kim-845 ff698db4-cb8e-11eb-9615-121df0c29c1e

Next we join the data with the women_inventor table we created above using the relevant ids.

women_patent_id <- patent_inventor %>% 
  mutate(women = .$inventor_id %in% women_inventors$id) %>% 
  filter(women == TRUE)
patent_id inventor_id location_id women
10000001 fl:s_ln:lee-753 fbf41696-cb8f-11eb-9615-121df0c29c1e TRUE
10000002 fl:y_ln:kim-91 ff698db4-cb8e-11eb-9615-121df0c29c1e TRUE
10000010 fl:l_ln:saxton-1 3d4f3e1d-cb8e-11eb-9615-121df0c29c1e TRUE
10000018 fl:l_ln:fang-10 de8e30d6-cb8e-11eb-9615-121df0c29c1e TRUE
10000019 fl:k_ln:ferguson-2 92db3aae-cb90-11eb-9615-121df0c29c1e TRUE
10000019 fl:m_ln:vargas-1 92db3aae-cb90-11eb-9615-121df0c29c1e TRUE

Note that the outcome of this is a table that allows us to link to the patent table, so it is an intermediate object that just contains identifiers.

Linking to the Patent table

The patent table is large and is particularly troublesome to read in. When reading in directly from a .zip it will sometimes fail. It can be best to unzip it first as we have done here.

# expect 7814196

pv_download(fns$url[5], dest = "grants", timeout = 600)

patent <- pv_import("/Users/pauloldham/Documents/patentsview2021/grant/patent.tsv")

We now use the patent_id in the women_patent_id table to filter the granted patents table to those involving women. It is possible to do this directly with patent %>% filter(.$id %in% women_patent_id$patent_id) but creating a “women” column to use in filtering can also be a useful device for helping to keep track of what you have done.

women_granted <- patent %>% 
  mutate(women = .$id %in% women_patent_id$patent_id) %>% 
  filter(women == TRUE)
# A tibble: 6 × 10
  id       type    number country date       kind  num_claims filename
  <chr>    <chr>   <chr>  <chr>   <date>     <chr>      <dbl> <chr>   
1 10000001 utility 10000… US      2018-06-19 B2            12 ipg1806…
2 10000002 utility 10000… US      2018-06-19 B2             9 ipg1806…
3 10000010 utility 10000… US      2018-06-19 B2            20 ipg1806…
4 10000018 utility 10000… US      2018-06-19 B2            13 ipg1806…
5 10000019 utility 10000… US      2018-06-19 B2            11 ipg1806…
6 10000024 utility 10000… US      2018-06-19 B2            25 ipg1806…
# … with 2 more variables: withdrawn <dbl>, women <lgl>

We now have a data.frame with over 1.4 million granted patents where women appear as an inventor.

Separating out Patent Texts

This data.frame is somewhat large because it includes the title and abstracts for the patent documents. In practice, if we want to engage in text mining or named entity recognition tasks we will normally only want the text fields and the id. So, let’s separate these out into two sets.

womens_texts <- women_granted %>% 
  select(id, title, abstract)
# A tibble: 2 × 3
  id       title                                              abstract
  <chr>    <chr>                                              <chr>   
1 10000001 Injection molding machine and mold thickness cont… The inj…
2 10000002 Method for manufacturing polymer film and co-extr… The pre…

The main patent data frame

We now create an easier to handle patent data frame.

women_granted <- women_granted %>% 
  select(-title, -abstract)
id type number country date kind num_claims filename withdrawn women
10000001 utility 10000001 US 2018-06-19 B2 12 ipg180619.xml 0 TRUE
10000002 utility 10000002 US 2018-06-19 B2 9 ipg180619.xml 0 TRUE
10000010 utility 10000010 US 2018-06-19 B2 20 ipg180619.xml 0 TRUE
10000018 utility 10000018 US 2018-06-19 B2 13 ipg180619.xml 0 TRUE
10000019 utility 10000019 US 2018-06-19 B2 11 ipg180619.xml 0 TRUE
10000024 utility 10000024 US 2018-06-19 B2 25 ipg180619.xml 0 TRUE

Note that the patent grant table includes a link to the original xml file. However, we no longer need to work with the XML for the full texts of US patents because the the PatentsView team have converted them to table format to make analysis easier. The main text field datasets (briefsum, description or specification, and claims) are avaiable as tables from the data download page. It is therefore unecessary to work with the XML unless you enjoy suffering.

What Technology Areas are Women Active In?

The patent system uses detailed classification systems commonly consisting of alphanumeric codes (known as symbols) with the Internatinal Patent Classification and the more detailed Cooperative Patent Classification as the main classifications. The classifications are hierarchical and proceed from the section (e.g. A) to the subgroup level. For analytics purposes (when presenting to an audience) we will typically use the sub-group level. The IPC table that we will import will divide the classification into its relevant units (section, class, sub-class, group and sub-group) elements. To make life easier we will create a sub-class column that we can work with. We will minimize the table by only selecting relevant fields that we are likely to use. However, if we were seeking to explore indiviudal areas of technolology we would want to have the group and subgroup data available.

To make this all a bit more understandable we are going to add an additional column with a short description table known as the short IPC created by Paul Oldham and Stephen Hall.

pv_download(fns$url[3], dest = "grants", timeout = 600)

ipc <- pv_import("/Users/pauloldham/Documents/patentsview2021/grant/ipcr.tsv.zip") %>% 
  unite(sub_class, c("section", "ipc_class", "subclass"), sep = "", remove = FALSE) %>% 
  unite(group, c("sub_class", "main_group"), sep = "/", remove = FALSE) %>% 
  select(uuid, patent_id, section, ipc_class, sub_class, group) %>% 
  left_join(., patentr::ipc_short, by = c("sub_class" = "code"))
uuid patent_id section ipc_class sub_class group description level
00005z3qh82fwpo5r1oupwpr3 6864832 G 01 G01S G01S/013 RADIO DIRECTION-FINDING subclass
0000662nssr53hdo3lp92sz26 9954111 H 01 H01L H01L/27 SEMICONDUCTOR DEVICES subclass
00008u9j3g8oivqtuc1dqayb1 10048897 G 06 G06F G06F/12 ELECTRIC DIGITAL DATA PROCESSING subclass
00008v5gnw215cdjozwehxqky 10694566 H 4 H4W H4W/4 NA NA
0000hj3ytmy8g9l2qa5x1hta5 D409748 D 24 D2404 D2404/NA NA NA
0000k4cvm77w3i6k6cdmxnye5 7645556 G 03 G03F G03F/7 PHOTOMECHANICAL PRODUCTION OF TEXTURED/PATTERNED SURFACES subclass

Once we have our table we can start counting things.

ipc_count <- ipc %>% 
  count(sub_class, sort = TRUE)

head(ipc_count)
sub_class n
G06F 1101726
H01L 804604
A61K 660071
H04N 481911
A61B 438263
H04L 422091

These sub classes may not mean a lot as codes but they are the key to understanding areeas of technology where women are most active as inventors in the patent system. You can find out more by visiting the IPC website for the top result G06f.

We can do the same for the group level to try and get a more detailed idea.

ipc_group <- ipc %>% 
  count(group, sort = TRUE)

head(ipc_group)
group n
A61K/31 279462
H01L/21 246294
G06F/17 193136
G06F/3 187070
H01L/29 170410
H04L/12 159821

Here we could look up A61K/31 to see what this code encompasses.

We have already provided you with the means to produce a visualisation of this data without using the codes, and invite you to experiment.

Who do women inventors work for

To answer this question we need to obtain the assignee (applicant) data. We will need two tables to make the link to our women inventors. The assignee table contains ids and details for individual applicants and organisations. The patent_assignees table contains ids to link between tables.

# downlaod the assignee table
pv_download(fns$url[1], dest = "grants", timeout = 600)

assignee <- pv_import("/Users/pauloldham/Documents/patentsview2021/grant/assignee.tsv.zip") 
id type name_first name_last organization
00002ded-cef9-4c06-ad0c-0fee8891a8ed 2 NA NA Butterick Company, Inc.
00002ed6-a81c-4adf-afa3-e91961107dca 3 NA NA Conros Corporation
000055d3-0d65-4d07-8d0a-8939b578b0e1 3 NA NA Chungbuk National University
0000591f-7548-49ee-a4ae-fca3b0c10b1c 3 NA NA TELEVIC CONFERENCE NV
00007585-cd5c-46d6-96ea-09042748a550 3 NA NA ACES INGENIEURGESELLSCHAFT MBH
00011034-bfa0-442a-b9e4-20a27cdedc2d 5 John Van Der Greft NA
library(readr)
# download the patent_assignee table
pv_download(fns$url[6], dest = "grants", timeout = 600)

patent_assignee <- pv_import("/Users/pauloldham/Documents/patentsview2021/grant/patent_assignee.tsv.zip") 
patent_id assignee_id location_id
10000000 ca78627d-f6e7-48f4-add1-2782e15befc3 a0fda6be-cb8e-11eb-9615-121df0c29c1e
10000001 9038760c-b1a6-485a-8655-7909dd8d75f4 e8ec5703-cb8f-11eb-9615-121df0c29c1e
10000002 71bc12c0-f21e-48f8-bfc8-2599959c1c9e 60542c0c-cb8e-11eb-9615-121df0c29c1e
10000003 a8eb651f-81bd-4b1e-b905-4167ad40e94f f853ec68-cb90-11eb-9615-121df0c29c1e
10000004 177b3796-e4aa-47dc-a0b3-6d8abf2e2e66 f2aaa1f3-09bd-11ec-893a-12de62d610b1
10000005 fbdde58f-6f05-455f-9fd2-2cb3b5aaf8b0 ff405b60-cb8e-11eb-9615-121df0c29c1e
library(tidyverse)
women_assignees <- women_patent_id %>% 
  rename(inventor_location_id = location_id) %>% 
  left_join(patent_assignee, by = "patent_id") %>% 
  rename(assignee_location_id = location_id)  %>% 
  left_join(assignee, by = c("assignee_id" = "id"))

# note that the women_assignees table is longer at 1,913,654 than the women_patent_id. This may arise if an inventor with the same id appears with different assignees over their career but merits investigation.  
patent_id inventor_id inventor_location_id women assignee_id assignee_location_id type name_first name_last organization
10000001 fl:s_ln:lee-753 fbf41696-cb8f-11eb-9615-121df0c29c1e TRUE 9038760c-b1a6-485a-8655-7909dd8d75f4 e8ec5703-cb8f-11eb-9615-121df0c29c1e 3 NA NA LS MTRON LTD.
10000002 fl:y_ln:kim-91 ff698db4-cb8e-11eb-9615-121df0c29c1e TRUE 71bc12c0-f21e-48f8-bfc8-2599959c1c9e 60542c0c-cb8e-11eb-9615-121df0c29c1e 3 NA NA KOLON INDUSTRIES, INC.
10000010 fl:l_ln:saxton-1 3d4f3e1d-cb8e-11eb-9615-121df0c29c1e TRUE d870922b-3e71-4293-bd2f-c14039ee5441 d2219935-cb90-11eb-9615-121df0c29c1e 2 NA NA Xerox Corporation
10000018 fl:l_ln:fang-10 de8e30d6-cb8e-11eb-9615-121df0c29c1e TRUE d334323f-ec20-4264-a7a6-691ce4f9bd98 9eecf551-cb8f-11eb-9615-121df0c29c1e 2 NA NA Apple Inc.
10000019 fl:k_ln:ferguson-2 92db3aae-cb90-11eb-9615-121df0c29c1e TRUE ab3a2f59-ddfc-4b36-beda-9f40b9fe7cb4 ee4d846b-cb8e-11eb-9615-121df0c29c1e 2 NA NA THE BOEING COMPANY
10000019 fl:m_ln:vargas-1 92db3aae-cb90-11eb-9615-121df0c29c1e TRUE ab3a2f59-ddfc-4b36-beda-9f40b9fe7cb4 ee4d846b-cb8e-11eb-9615-121df0c29c1e 2 NA NA THE BOEING COMPANY

Location Data

There are two types of location data that are available to us. The first is inventor location data and the second is applicany (assignee) data. While the geocoding is unlikely to be perfect it will provide opportunities to create maps and other forms of analysis on the global distribution of women inventors and the organisations they work for.

We will start by obtaining the locations for the women inventors.

pv_download(fns$url[4], dest = "grants", timeout = 600)

location <- pv_import("/Users/pauloldham/Documents/patentsview2021/grant/location.tsv.zip") 

country <- countrycode::codelist_panel %>% 
  janitor::clean_names() %>% 
  mutate(duplicated = duplicated(country_name_en)) %>% 
  filter(duplicated == FALSE) %>%
  select(country_name_en, iso2c, region) %>% 
  rename(country_name = country_name_en)
  
women_location <- left_join(women_patent_id, location, by = c("location_id" = "id")) %>% 
  drop_na(country) %>% # some country entries are NA values (no example of NA for Namibia)
  left_join(country, by = c("country" = "iso2c")) # 1802405 so a small drop from the main table
patent_id inventor_id location_id women city state country latitude longitude county state_fips county_fips country_name region
10000001 fl:s_ln:lee-753 fbf41696-cb8f-11eb-9615-121df0c29c1e TRUE Gunpo-si NA KR 37.3421 126.9210 NA NA NA South Korea East Asia & Pacific
10000002 fl:y_ln:kim-91 ff698db4-cb8e-11eb-9615-121df0c29c1e TRUE Yongin-si NA KR 37.2284 127.2040 NA NA NA South Korea East Asia & Pacific
10000010 fl:l_ln:saxton-1 3d4f3e1d-cb8e-11eb-9615-121df0c29c1e TRUE Walworth NY US 43.1392 -77.2722 NA 36 NA United States North America
10000018 fl:l_ln:fang-10 de8e30d6-cb8e-11eb-9615-121df0c29c1e TRUE Los Altos CA US 37.3674 -122.1090 Santa Clara 6 6085 United States North America
10000019 fl:k_ln:ferguson-2 92db3aae-cb90-11eb-9615-121df0c29c1e TRUE Woodinville WA US 47.7501 -122.1670 King 53 53033 United States North America
10000019 fl:m_ln:vargas-1 92db3aae-cb90-11eb-9615-121df0c29c1e TRUE Woodinville WA US 47.7501 -122.1670 King 53 53033 United States North America

Note that this table provides latitude and longitude coordinates that can be used in R to create maps.

We now do the same for the applicants (assignees) data.

women_assignees_location <- left_join(women_assignees, location, by = c("assignee_location_id" = "id")) %>% 
  drop_na(country) %>% # some country entries are NA values (no example of NA for Namibia)
  left_join(country, by = c("country" = "iso2c"))
patent_id inventor_id inventor_location_id women assignee_id assignee_location_id type name_first name_last organization city state country latitude longitude county state_fips county_fips country_name region
10000001 fl:s_ln:lee-753 fbf41696-cb8f-11eb-9615-121df0c29c1e TRUE 9038760c-b1a6-485a-8655-7909dd8d75f4 e8ec5703-cb8f-11eb-9615-121df0c29c1e 3 NA NA LS MTRON LTD. Anyang-si NA KR 37.4036 126.9280 NA NA NA South Korea East Asia & Pacific
10000002 fl:y_ln:kim-91 ff698db4-cb8e-11eb-9615-121df0c29c1e TRUE 71bc12c0-f21e-48f8-bfc8-2599959c1c9e 60542c0c-cb8e-11eb-9615-121df0c29c1e 3 NA NA KOLON INDUSTRIES, INC. Gwacheon-si NA KR 37.4343 127.0040 NA NA NA South Korea East Asia & Pacific
10000010 fl:l_ln:saxton-1 3d4f3e1d-cb8e-11eb-9615-121df0c29c1e TRUE d870922b-3e71-4293-bd2f-c14039ee5441 d2219935-cb90-11eb-9615-121df0c29c1e 2 NA NA Xerox Corporation Norwalk CT US 41.0958 -73.4205 Fairfield 9 9001 United States North America
10000018 fl:l_ln:fang-10 de8e30d6-cb8e-11eb-9615-121df0c29c1e TRUE d334323f-ec20-4264-a7a6-691ce4f9bd98 9eecf551-cb8f-11eb-9615-121df0c29c1e 2 NA NA Apple Inc. Cupertino CA US 37.3094 -122.0610 Santa Clara 6 6085 United States North America
10000019 fl:k_ln:ferguson-2 92db3aae-cb90-11eb-9615-121df0c29c1e TRUE ab3a2f59-ddfc-4b36-beda-9f40b9fe7cb4 ee4d846b-cb8e-11eb-9615-121df0c29c1e 2 NA NA THE BOEING COMPANY Chicago IL US 41.8338 -87.6718 Cook 17 17031 United States North America
10000019 fl:m_ln:vargas-1 92db3aae-cb90-11eb-9615-121df0c29c1e TRUE ab3a2f59-ddfc-4b36-beda-9f40b9fe7cb4 ee4d846b-cb8e-11eb-9615-121df0c29c1e 2 NA NA THE BOEING COMPANY Chicago IL US 41.8338 -87.6718 Cook 17 17031 United States North America

Extra Data

In performing an analysis we will often want to place the data in its wider context. Thus, the data on women inventors that we have just created is a subset of the wider data on US patent grants. We have preserved some of this context by retaining the data on male inventors. We can finish up by thinking about other types of data that would be useful.

In considering the context of patent activity involving women as inventors it would clearly be desirable to retain the data on overall trends on patent grants in the patent table. However, it would make sense to drop the text (title and abstract) fields that can make this a difficult file to work with.

patent_trends <- patent %>% 
  select(-title, -abstract)

The womeninventoR Package

The code above describes the process used to create the data tables for women inventors that we have bundled into the womeninventoR data package. You can access the data package here.

Conlusion

The release of large scale data on women inventors in the United States represents a significant achievement on the part of the USPTO PatentsView team led by Christina Jones. This dataset deserves to be more widely known and offers rich opportunities for exploration by the R and wider data science community.

Exercises

Based on the data that we have imported we can now start asking questions that move from basic to advanced approaches.

  1. Who are the top women inventors in the United States (based on the count of patent grants)?
  2. Distinct people may share the same name (known as lumping). Can you see any evidence of this in the data? What other data fields (possibly in other tables) might assist with addressing lumped names?
  3. Who are the top applicant organisations in the United States?
  4. What are the top technology areas where women inventors have received patent grants?
  5. What is the trend over time in patent grants involving women inventors relative to the overall trend in patent grants?
  6. What are the top countries represented in the women inventor location data (by inventor and by organisation)
  7. What are the main locations in the United States where women inventors are located. Visualise the data on the state and the city/town level.
  8. A significant proportion of US patents involving women as inventors are listed for countries outside the United States. Create a global map that allows the data to be visualised and filtered by country.
  9. The USPTO Office of the Chief Economist report here focuses on two measures: a) the share of granted patents held by women, and: b) the WIR rate which is the share of women among all inventor-patentees for a given period of time. Can you reproduce this approach taking into account the information in the methodology here.
  10. What improvements would you make to the representation of the data on women inventors in the Progress and Potential report?
  11. The availability of text data provides opportunities to engage in text based topic modelling (e.g. Following the tidytext approach popularised by Julia Silge and Daniel Robinson)
  12. What opportunities exist for modelling women inventor data in R (for example using the tidymodels framework). Is it possible to produce forecasts for trends in women inventors over time and what factors would affect the ability to forecast this type of data?

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/wipo-analytics, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Oldham (2022, March 15). WIPO Patent Analytics: Women Inventors Data in R. Retrieved from https://wipo-analytics.github.io/posts/2022-03-15-women-inventors/

BibTeX citation

@misc{oldham2022women,
  author = {Oldham, Paul},
  title = {WIPO Patent Analytics: Women Inventors Data in R},
  url = {https://wipo-analytics.github.io/posts/2022-03-15-women-inventors/},
  year = {2022}
}