Global Innovation Index Database
Jack Gregory
06 December 2022
Why did we choose MariaDB and AWS?
How does the giidb project fit within the larger gii repository? How is it organized?
The profiles repository contains the following folders.
Folder | Description |
---|---|
build | Contains the main <giidb_build.R> script along with other support scripts. |
data | Contains a backup of all data in the GIIDB in csv and rds formats. |
src | Contains all source scripts, where the names are directly related to the constituent functions. |
How to rebuild the GIIDB on AWS RDS?
The following slides describe how to build the GIIDB from scratch on AWS RDS:
gii.czhncdmgvotw.eu-central-2.rds.amazonaws.com
3306
admin
...
How to connect to and interact with the GIIDB?
The following slides describe how to connect to the GIIDB hosted remotely on an AWS RDS instance of MariaDB.
For this purpose, we typically rely on two programs:
Important
Note that under no circumstances should GIIDB credentials (i.e. <config.yml>) be committed to the remote gii repository!
odbc
instructions are only applicable to Windows.config
& RMariaDB
gii
repository. If you have not yet received a <config.yml> file, please contact the database administrator.odbc
gii.czhncdmgvotw.eu-central-2.rds.amazonaws.com
3306
gii.czhncdmgvotw.eu-central-2.rds.amazonaws.com
3306
gii
How is the GIIDB structured?
The following slides summarize the structure and contents of the GIIDB.
It includes the following:
GIIDB entity relationship diagram (ERD), where data flows from top to bottom. Header colors signify the specific group for each table:
GIIDB entity relationship diagram (ERD), where data flows from top to bottom. Header colors signify the specific group for each table, where views are defined by lighter colors:
GIIDB entity relationship diagram (ERD), where data flows from top to bottom. Header colors signify the specific group for each table, where views are defined by lighter colors:
GIIDB dendograph, where the root is the GIIDB, labeled “gii.” The second and third levels identify Tables and Views. The fourth level defines the columns in each table. Here, the colors represent the R-equivalent data type:
How is the GIIDB rebuilt and backed up? How is a new collaborator added?
The following slides describe how to operate the GIIDB from R:
build/build_giidb.R
# (1) PREAMBLE --------------------------------------------
## Initiate
## ... Packages
install.packages(
setdiff(c("tidyverse","glue","here","config","DBI","RMariaDB"),
rownames(installed.packages()))
)
library(tidyverse)
library(glue)
library(here)
library(config)
library(DBI)
library(RMariaDB)
## ... Functions
source(here::here("src/gii_tools.R"))
source(here::here("src/sql_tools.R"))
gii_attributes()
write_output()
build/build_giidb.R
# (2) AWS CONNECT -----------------------------------------
## (2a) Connect via config & RMariaDB
## Get credentials
db <- config::get("admin")
## Connect
con <- DBI::dbConnect(RMariaDB::MariaDB(),
host = db$host,
port = db$port,
dbname = db$dbname,
user = db$user,
password = db$password)
## (2b) Connect via odbc
con <- DBI::dbConnect(odbc::odbc(), "gii_admin")
config
& RMariaDB
; or,odbc
.build/build_giidb.R
> data frame with 0 columns and 0 rows
create_user()
> jg successfully created
> lrl successfully created
> swv successfully created
> vb successfully created
build/build_giidb.R
# (3) BUILD GIIDB -----------------------------------------
## (3c) Create gii users, if necessary
## Initiate function
source(here::here("01_giidb/src/sql_user.R"))
## Define users
l.user <- list("jg","lrl","swv","vb")
## Create users
purrr::walk(l.user, ~create_user(con, .x))
## (3d) Create gii tables, if necessary
## Define users
user <- paste0("'admin','", paste(unlist(l.user), collapse="','"), "'")
## Create tables
source(here::here("01_giidb/src/sql_tbl_code.R"))
source(here::here("01_giidb/src/sql_tbl_economy.R"))
source(here::here("01_giidb/src/sql_tbl_indicator.R"))
source(here::here("01_giidb/src/sql_tbl_index.R"))
source(here::here("01_giidb/src/sql_tbl_model.R"))
source(here::here("01_giidb/src/sql_tbl_audit.R"))
source(here::here("01_giidb/src/sql_tbl_raw.R"))
populate_gii()
> Read data
> Write data
> ... code_xwalk [11]
> ... economy_id [750]
> ... email_id [303]
> ... index_id [1,068]
> ... indicator_id [789]
> ... audit [282]
> ... economy [1,854]
> ... email_map [441]
> ... indicator_audit [118,278]
> ... indicator_prep [1,281,111]
> ... outliers [4,227]
> ... raw_BrandVal [14,802]
> ... raw_CorpIAs [3,666]
> ... raw_QSRank [687]
> ... raw_RDCompExp [7,500]
build/build_giidb.R
# (4) POPULATE GIIDB --------------------------------------
## (4c) Populate GII output data from GII objects
## Read previous GII objects
l.GII <- list(here::here("01_giidb/data/GII2020.rds"),
here::here("01_giidb/data/GII2021_Galapagos_FINAL_indnames.rds"),
here::here("01_giidb/data/GII2022.rds")) |>
purrr::map(~readRDS(.x) |> gii_attributes())
## Read previous JRC objects
l.JRC <- list(NULL,
here::here("01_giidb/data/GII2021 JRC audit.xlsx"),
here::here("01_giidb/data/GII2022 JRC audit.xlsx")) |>
purrr::map({
~tryCatch({
readxl::read_excel(.x, sheet = "Table4 -Intervals", skip = 1) |>
dplyr::select(ECONOMY_NAME = ...1,
Index = Interval...3,
Inputs = Interval...6,
Outputs = Interval...9) |>
## Create ISO3 var
dplyr::mutate(ISO3 = countrycode::countrycode(ECONOMY_NAME,
origin="country.name",
destination="iso3c")) |>
dplyr::select(-ECONOMY_NAME) |>
## Transpose to long format
tidyr::pivot_longer(cols=-ISO3, names_to="CODE", values_to="INTERVAL") |>
## Separate INTERVAL column into lower and upper ranks
dplyr::mutate(GIIYR = as.numeric(stringr::str_extract(.x, "(?<=GII)\\d{4}")),
INTERVAL = gsub("(\\[|\\])", "", INTERVAL)) |>
tidyr::separate(INTERVAL, into=c("RANK_LB","RANK_UB"), sep=", ") |>
dplyr::mutate_at(vars(starts_with("RANK")), as.integer)
},
error=function(e) {
return(NULL)
}
)
})
## Write model output to MariaDB gii.model
purrr::map2(l.GII, l.JRC, ~write_output(con, .x, .y))
gii.model
.> GII Report Year = 2020
> Collect data ... GII object ... economy_id
> Insert into gii.model successful [58,650]
>
> GII Report Year = 2021
> Collect data ... GII object ... economy_id
> Insert into gii.model successful [64,467]
>
> GII Report Year = 2022
> Collect data ... GII object ... economy_id
> Insert into gii.model successful [44,352]
build/giidb_backup.R
# (1) PREAMBLE --------------------------------------------
# (2) AWS CONNECT -----------------------------------------
# (3) BACKUP GIIDB ----------------------------------------
## (3a) Import GIIDB data
## Collect table names
l.tbl <- DBI::dbGetQuery(con, DBI::SQL("
SHOW FULL TABLES FROM `gii`;
")) |>
dplyr::rename(TABLE = Tables_in_gii,
TYPE = Table_type) |>
dplyr::filter(TYPE!="VIEW") |>
dplyr::filter(TABLE!="model") |>
dplyr::pull(TABLE)
gii.model
build/giidb_backup.R
# (3) BACKUP GIIDB ----------------------------------------
## (3a) Import GIIDB data
## Collect table names
l.tbl <- DBI::dbGetQuery(con, DBI::SQL("
SHOW FULL TABLES FROM `gii`;
")) |>
dplyr::rename(TABLE = Tables_in_gii,
TYPE = Table_type) |>
dplyr::filter(TYPE!="VIEW") |>
dplyr::filter(TABLE!="model") |>
dplyr::pull(TABLE)
## Collect table data
l.data <- purrr::map(l.tbl, ~read_tbl(con, .x)) |>
purrr::set_names(nm=l.tbl)
read_tbl()
\(\rightarrow\) <../src/sql_tools.R>> audit
> code_xwalk
> economy
> economy_id
> email_id
> email_map
> index_id
> indicator_audit
> indicator_id
> indicator_prep
> outliers
> raw_BrandVal
> raw_CorpIAs
> raw_QSRank
> raw_RDCompExp
build/giidb_backup.R
# (3) BACKUP GIIDB ----------------------------------------
## (3a) Import GIIDB data
## Collect table names
l.tbl <- DBI::dbGetQuery(con, DBI::SQL("
SHOW FULL TABLES FROM `gii`;
")) |>
dplyr::rename(TABLE = Tables_in_gii,
TYPE = Table_type) |>
dplyr::filter(TYPE!="VIEW") |>
dplyr::filter(TABLE!="model") |>
dplyr::pull(TABLE)
## Collect table data
l.data <- purrr::map(l.tbl, ~read_tbl(con, .x)) |>
purrr::set_names(nm=l.tbl)
## Drop primary keys
keys <- l.data |>
purrr::imap(~{
col <- .y |>
stringr::str_replace_all(c("_id$"="", "s$"="")) |>
stringr::str_to_upper()
if (any(names(.x)==col)) {
col
} else {
NULL
}
}) |>
purrr::compact() |>
unlist()
l.data <- l.data |>
purrr::map(~{
col_drop <- intersect(names(.x), keys)
if (length(col_drop)>0) {
dplyr::select(.x, -dplyr::all_of(col_drop))
} else {
.x
}
})
build/giidb_backup.R
# (3) BACKUP GIIDB ----------------------------------------
## (3c) Clean tables
## Merge ID columns
l.data <- purrr::imap(
l.data,
~{
if (any(names(l.xlsx)==.y)) {
.x |>
dplyr::left_join(dplyr::select(l.xlsx[[.y]], GIIYR, CODE, ID),
by=c("GIIYR","CODE")) |>
dplyr::relocate(ID, .after=CODE)
} else {
.x
}
})
## Escape xml characters
l.data$email_id <- l.data$email_id |>
dplyr::mutate(dplyr::across(dplyr::everything(),
.fns=~stringr::str_replace_all(., c("<"="[", ">"="]"))))
build/giidb_backup.R
# (4) EXPORT XLSX -----------------------------------------
## (4b) Build worksheets
purrr::iwalk(
l.data,
~{
cat(.y, "\n", sep="")
## Add worksheet
if (any(names(wb)==.y)) openxlsx::removeWorksheet(wb, .y)
openxlsx::addWorksheet(wb, sheetName=.y)
## Add table
openxlsx::writeData(wb, .y, .x, colNames=TRUE, startRow=1)
openxlsx::addFilter(wb, .y, cols=1:ncol(.x), rows=1)
openxlsx::setColWidths(wb, .y, cols=1:ncol(.x), widths=15)
## Hide ID column
if (any(names(.x)=="ID")) {
openxlsx::setColWidths(wb, .y, cols=grep("^ID$", names(.x)), hidden=TRUE)
}
## Freeze pane
openxlsx::freezePane(wb, .y, firstRow=TRUE)
## Add styles
openxlsx::addStyle(wb, .y, cols=1:ncol(.x), rows=1,
style=openxlsx::createStyle(textDecoration="bold"))
}
)
> audit
> code_xwalk
> economy
...
> raw_CorpIAs
> raw_QSRank
> raw_RDCompExp
add_user()
function calls create_user()
and adds the username to the set of valid USER
s for each table.build/giidb_add_user.R
# (1) PREAMBLE --------------------------------------------
# (2) AWS CONNECT -----------------------------------------
# (3) ADD USER --------------------------------------------
## (3a) Initiate functions
source(here::here("01_giidb/src/sql_user.R"))
source(here::here("01_giidb/src/sql_user_add.R"))
create_user()
add_user()
add_user()
function calls create_user()
and adds the username to the set of valid USER
s for each table.build/giidb_add_user.R
# (1) PREAMBLE --------------------------------------------
# (2) AWS CONNECT -----------------------------------------
# (3) ADD USER --------------------------------------------
## (3a) Initiate functions
source(here::here("01_giidb/src/sql_user.R"))
source(here::here("01_giidb/src/sql_user_add.R"))
## (3b) Set schema options
## Permit system-versioned tables to be altered
DBI::dbGetQuery(con, DBI::SQL("
SET @@system_versioning_alter_history = 1;
"))
> data frame with 0 columns and 0 rows
add_user()
function calls create_user()
and adds the username to the set of valid USER
s for each table.build/giidb_add_user.R
# (1) PREAMBLE --------------------------------------------
# (2) AWS CONNECT -----------------------------------------
# (3) ADD USER --------------------------------------------
## (3a) Initiate functions
source(here::here("01_giidb/src/sql_user.R"))
source(here::here("01_giidb/src/sql_user_add.R"))
## (3b) Set schema options
## Permit system-versioned tables to be altered
DBI::dbGetQuery(con, DBI::SQL("
SET @@system_versioning_alter_history = 1;
"))
## (3c) Add new user
user <- "jg"
add_user(con, user)
# (4) AWS DISCONNECT --------------------------------------
USER
column in all tables.> Create user ... jg successfully created
> Alter tables
> ... audit
> ... code_xwalk
> ... economy
> ... economy_id
> ... email_id
> ... email_map
> ... index_id
> ... indicator_audit
> ... indicator_id
> ... indicator_prep
> ... model
> ... outliers
> ... raw_BrandVal
> ... raw_CorpIAs
> ... raw_QSRank
> ... raw_RDCompExp
> All tables successfully altered
Let’s run some code.