GIIDB

Global Innovation Index Database

Jack Gregory

06 December 2022

Overview

Project Summary

Purpose

  • A cloud storage platform that forms the basis of an integrated, reproducible, redundant, standardized and documented GII workflow.
  • It provides the “single source of truth” for all data associated with the GII.

Description

  • MariaDB RDBMS hosted on AWS RDS and written in R & SQL.
  • The GIIDB relies on batch processing, whereby “batches” of data are piped into storage once per report cycle.

Uses

  • Data collection & cleaning
  • Data audits
  • GII Model
  • All GII outputs (incl. Profiles, Briefs, Presentations, etc.)

Outline

  1. Design decisions
  2. Project structure
  3. Database build
  4. Tools
  5. Database structure
  6. Database operation
  7. Tutorial

1. Design decisions

Why did we choose MariaDB and AWS?

Historic data management

Problem

  • Prior to the current data pipline, the GII relied on a disjointed set of spreadsheets, R and Stata code.
  • It was (near) impossible to reproduce outputs, let alone provide materials for an audit.

Solution \(\longrightarrow\) RDBMS in the cloud

  • Systematized method for storing relevant data
  • Local and remote access for multiple users with different access rights
  • Multiple and simultaneous queries
  • Scalable when additional data becomes available

Why MariaDB?

  • Open source RDBMS
  • Functionally similar to MySQL, leveraging current skills within the GII team
  • Compatible with MySQL Workbench
  • Proactive development, where MySQL tends to be slower due to Oracle prioritizing proprietary DBMSs
  • Temporal tables (i.e. system versioned), which are not available in MySQL or PostgreSQL

Why AWS?

  • Common platform across WIPO
  • Most advanced cloud platform
  • Most comprehensive documentation
  • Relational DBaaS with MariaDB
  • Easy integration with GitHub

2. Project structure

How does the giidb project fit within the larger gii repository? How is it organized?

GII workflow

GII workflow

GIIDB repository

  • The gii repository contains the GII workflow from data to analysis to outputs.
  • The giidb stores the code necessary to rebuild the GIIDB.
  • The giidb project is stored in a separate repository and added to the gii as a subrepository.

Repository structure

The profiles repository contains the following folders.

Folder Description
build Contains the main <giidb_build.R> script along with other support scripts.
data Contains a backup of all data in the GIIDB in csv and rds formats.
src Contains all source scripts, where the names are directly related to the constituent functions.

3. Database build

How to rebuild the GIIDB on AWS RDS?

Outline

The following slides describe how to build the GIIDB from scratch on AWS RDS:

  1. Create an AWS account.
  2. Go to RDS service.
  3. Create RDS parameter group.
  4. Create RDS database.
  5. Amend RDS database security.

a. Create an AWS account

  • Sign up for an AWS account here.
  • It requires a credit card and mobile number.

b. Go to RDS service

  • Go to the RDS service using the “Services” menu.
  • Click “Database”.
  • Click “RDS”.

  • The RDS dashboard appears as below.

c. Create RDS parameter group

  • Go to the RDS service.
  • Select “Parameter groups” from the lefthand menu.
  • Select “Create parameter group” at the top right.

  • Under “Parameter group family”, select the latest MariaDB version.
  • Enter a “Group name” and “Description”.
  • Click “Create”.

  • The new parameter group now appears in the list.
  • From the “Parameter group actions” menu, select “Edit”.

  • Under “Parameters”, search for “log_bin_trust_function_creators”.
  • Update its value to 1.
  • This allows R to add/modify triggers within MariaDB tables on AWS RDS.

  • Under “Parameters”, search for “max_allowed_packet”.
  • Update its value to the maximum allowed.
  • This allows R to transfer large amounts of data to MariaDB tables on AWS RDS.
  • Click “Save changes”.

d. Create RDS database

  • Go to the RDS service.
  • Select “Databases” from the lefthand menu.
  • Select “Create database” at the top right.

  • Go to the RDS service.
  • Select “Databases” from the lefthand menu.
  • Select “Create database” at the top right.
  • Under “Engine Options”, select “MariaDB”.
  • Under “Version”, select the latest version of MariaDB.

  • Under “Templates”, select “Free tier”.

  • Under “DB cluster identifier”, enter the desired name (e.g., “gii”).
  • Under “Master username”, enter the desired administrator name (e.g., “admin”).
  • Select “Auto generate a password”.

  • Under “DB instance class”, select “Burstable classes …”.
  • From the drop down menu, there will only be two options, either: “db.t2.micro” or “db.t3.micro”.
  • For our purposes, it is not consequential which one is chosen.

  • Under “Storage type”, select “General Pupose SSD …”.
  • Under “Allowed storage”, select “20”.
  • Unselect “Enable storage autoscaling”.

  • Keep the defaults.

  • Select “Don’t connect to an EC2 compute resource”.
  • Under “Public access”, select “Yes”.
  • Keep the defaults for all other options.

  • Keep the defaults.

  • Keep the defaults.

  • Under “Initial database name”, enter the desired name (e.g., “gii”).
  • Under “DB parameter group”, select “gii”.
  • Keep the defaults for all other options.

  • Select “Enable automated backups”.
  • Under “Backup retention period”, select the desired number of days (e.g., 14).
  • Under “Backup window”, select “Choose a window”.
  • Under “Start time” and “Duration”, select the desired options.
  • Under “DB parameter group”, select “gii”.
  • Select “Copy tags to snapshots”.

  • Unselect “Enable encryption”.
  • Keep the defaults for all other options.

  • Select “Enable auto minor version upgrade”.
  • Under “Maintenance window”, select “Choose a window”.
  • Under “Start day”, “Start time” and “Duration”, select the desired options.
  • Keep the defaults for all other options.
  • Finally, click “Create database”.

  • The database now appears in the list.
  • The blue banner signifies that AWS is “creating” the database.
  • Click “View credential details”, to access and copy the “admin” password.

  • The green banner signifies that AWS has successfully “creatied” the database, and it is now “available”.
  • Click the “gii” database link, to access its “Endpoint” and “Port”.
  • These parameters, in addition to the username and password, are required to connect to the remote database.

e. Amend RDS database security

  • Click the “gii” database link.

  • Select the “Connectivity & security” tab.
  • Under “Security group rules”, click one of the “Security group” links.

  • Select the “Inbound rules” tab.
  • Click the “Edit inbound rules” button on the right.

  • Click the “Add rule” button.
  • Under “Type”, select “All traffic” from the dropdown menu.
  • Under “Source”, select “Anywhere-IPv4”.
  • This allows remote users to freely access the AWS RDS database.
  • Click “Save rules”.

  • The inbound rule now appears in the list.

Finish

  • The AWS RDS remote database is now operational.
  • To access it from MySQL Workbench or RStudio, you will need the following parameters:
    • Endpoint = gii.czhncdmgvotw.eu-central-2.rds.amazonaws.com
    • Port = 3306
    • Username = admin
    • Password = ...

4. Tools

How to connect to and interact with the GIIDB?

Outline

The following slides describe how to connect to the GIIDB hosted remotely on an AWS RDS instance of MariaDB.

For this purpose, we typically rely on two programs:

  1. RStudio
  2. MySQL Workbench

Warning

Important

Note that under no circumstances should GIIDB credentials (i.e. <config.yml>) be committed to the remote gii repository!

a. RStudio

  • Within RStudio, there are two methods to connect to a DBMS:
  • The former requires minimal setup, but lacks an in-built connection to the database through RStudio.
  • The latter requires a more complicated setup, but allows an in-built connection to the database through RStudio.
  • Note that the odbc instructions are only applicable to Windows.

Connect via config & RMariaDB

  • Ensure your <config.yml> file is located in the root folder of gii repository. If you have not yet received a <config.yml> file, please contact the database administrator.
  • Use the code below to connect to the GIIDB.
## Install necessary packages
pkgs <- c("config","DBI","RMariaDB")
lapply(pkgs, require, character.only = TRUE)

## Connect via config & RMariaDB
db <- config::get()
con <- DBI::dbConnect(RMariaDB::MariaDB(),
                      host = db$host,
                      port = db$port,
                      dbname = db$dbname,
                      user = db$user,
                      password = db$password)

Connect via odbc

  • Run the installation program.
  • Step through the installation wizard accepting all defaults.

  • Run “ODBC Data Sources (64-bit)” by typing it into the Windows search bar.

  • In the dialogue that appears, click on the “System DSN” tab.
  • Click on the “Add” button.

  • In the dialogue that appears, select the “MariaDB ODBC * Driver”.
  • Click the “Finish” button.

  • In the dialogue that appears, enter an ODBC connection name.
  • If desired, enter a description of the connection and click the “Next” button.
  • Note that the name will be needed in the final step.

  • Enter the following details along with your username and password provided in <config.yml>:
    • Server Name = gii.czhncdmgvotw.eu-central-2.rds.amazonaws.com
    • Port = 3306
  • Click the “Test DSN” button and ensure the connection is successful.
  • In the “Database” dropdown menu, select “gii” and click the “Next” button.
  • Step through the remainder of the setup wizard by accepting all defaults.

  • You should now see a System Data Source with a “MariaDB ODBC * Driver” in the displayed table.
  • Click on the “OK” button.

  • Run RStudio.
  • Enter your ODBC connection name as the second parameter in the DBI::dbConnect() function.
  • Run the code.
## Install necessary packages
library(odbc)

## Connect via odbc
con <- DBI::dbConnect(odbc::odbc(), "<ODBC connection name>")

b. MySQL Workbench

  • MySQL Workbench is freeware and the GUI for the MySQL relational database management system (RDBMS).
  • It also operates with MariaDB and recognizes most of its features.
  • It is an optional tool; necessary primarily for the database adminstrators.
  • The remainder of this section describes how to
    • Install MySQL Workbench; and,
    • Connect it to the GIIDB.

Install

  • On the subsequent page, click the “Download” button next to “Windows (x86, 64-bit), MSI Installer.”
  • Download the installer program.

  • Run the installer.
  • Step through the installation wizard accepting all defaults.

Connect

  • Run “MySQL Workbench”.
  • At the start up screen, next to “MySQL Connections”, click on the plus symbol.

  • In the dialogue that appears, enter an “Connection Name” to identify the database.
  • Ensure the “Connection Method” is set to “Standard (TCP/IP)”.
  • Under the “Parameters” tab, enter the following details along with your username and password provided in <config.yml>:
    • Hostame = gii.czhncdmgvotw.eu-central-2.rds.amazonaws.com
    • Port = 3306
    • Default Schema = gii
  • Click the “Test Connection” button and ensure the connection is successful.
  • Click the “OK” button.

  • You should now see a tile titled with the “Connection Name” you previously entered.
  • Click the tile.
  • Ignore the warning dialogue and you should be connected to the GIIDB.

5. Database structure

How is the GIIDB structured?

Outline

The following slides summarize the structure and contents of the GIIDB.

It includes the following:

  • A data map, summarizing the relationships between GIIDB tables; and,
  • A data tree, outlining the table and column structure of the GIIDB.

Data map

GIIDB entity relationship diagram (ERD), where data flows from top to bottom. Header colors signify the specific group for each table:

  • Identifiers
  • Audits
  • Model preparation and inputs
  • Model outputs

Data map

GIIDB entity relationship diagram (ERD), where data flows from top to bottom. Header colors signify the specific group for each table, where views are defined by lighter colors:

  • Identifiers / View
  • Audits / View
  • Model preparation and inputs / View
  • Model outputs

Data map

GIIDB entity relationship diagram (ERD), where data flows from top to bottom. Header colors signify the specific group for each table, where views are defined by lighter colors:

  • Identifiers / View
  • Audits / View
  • Model preparation and inputs / View
  • Model outputs
  • Raw data

Data tree

GIIDB dendograph, where the root is the GIIDB, labeled “gii.” The second and third levels identify Tables and Views. The fourth level defines the columns in each table. Here, the colors represent the R-equivalent data type:

  • Character
  • Factor
  • Datetime
  • Integer
  • Numeric
  • Other

6. Database operation

How is the GIIDB rebuilt and backed up? How is a new collaborator added?

Outline

The following slides describe how to operate the GIIDB from R:

  1. Build GIIDB
  2. Backup GIIDB
  3. Add user

a. Build GIIDB

  • The code is found in <build/giidb.R>.
  • This script builds the GIIDB in the location of your choosing.
  • It relies on an updated GIIDB backup in the <data/> folder.
  • In particular, the <data/GIIDB.xlsx> file discussed below.

a. Build GIIDB

Preamble

Code

build/build_giidb.R
# (1) PREAMBLE --------------------------------------------

## Initiate
## ... Packages
install.packages(
  setdiff(c("tidyverse","glue","here","config","DBI","RMariaDB"),
          rownames(installed.packages()))
  )
library(tidyverse)
library(glue)
library(here)
library(config)
library(DBI)
library(RMariaDB)

## ... Functions
source(here::here("src/gii_tools.R"))
source(here::here("src/sql_tools.R"))

Purpose

  • Import all necessary packages.
  • Import all necessary source scripts.
  • <../src/gii_tools.R> \(\rightarrow\) gii_attributes()
  • <../src/sql_tools.R> \(\rightarrow\) write_output()

a. Build GIIDB

AWS Connect

Code

build/build_giidb.R
# (2) AWS CONNECT -----------------------------------------

## (2a) Connect via config & RMariaDB
## Get credentials
db <- config::get("admin")

## Connect
con <- DBI::dbConnect(RMariaDB::MariaDB(),
                      host = db$host,
                      port = db$port,
                      dbname = db$dbname,
                      user = db$user,
                      password = db$password)

## (2b) Connect via odbc
con <- DBI::dbConnect(odbc::odbc(), "gii_admin")

Purpose

  • Connect to the remote GIIDB using either:
    • config & RMariaDB; or,
    • odbc.
  • One of these blocks should be commented out at all times.
  • The code is setup for “admin”; however, it can be easily adapted to any GIIDB user.

a. Build GIIDB

Build GIIDB

Code

build/build_giidb.R
# (3) BUILD GIIDB -----------------------------------------

## (3a) Create gii schema, if necessary
# source(here::here("01_giidb/sql_db_gii.R"))

Purpose

  • If necessary, create the GIIDB schema.
  • Since we create the schema in AWS directly, this line is not typically required.
  • It remains in case the GIIDB is built outside of AWS.

a. Build GIIDB

Build GIIDB

Code

build/build_giidb.R
# (3) BUILD GIIDB -----------------------------------------

## (3a) Create gii schema, if necessary
# source(here::here("01_giidb/sql_db_gii.R"))

## (3b) Set schema options
## Permit system-versioned tables to be altered
DBI::dbGetQuery(con, DBI::SQL("
    SET @@system_versioning_alter_history = 1;
  "))

Purpose

  • Allow system-versioned tables to be altered.
  • Otherwise, all code that transforms the database from this point will fail.

Result

> data frame with 0 columns and 0 rows

a. Build GIIDB

Build GIIDB

Code

build/build_giidb.R
# (3) BUILD GIIDB -----------------------------------------

## (3c) Create gii users, if necessary
## Initiate function
source(here::here("01_giidb/src/sql_user.R"))

## Define users
l.user <- list("jg","lrl","swv","vb")

## Create users
purrr::walk(l.user, ~create_user(con, .x))

Purpose

  • Import necessary source script.
  • <src/sql_user.R> \(\rightarrow\) create_user()
  • Define a list of usernames.
  • Iteratively create GIIDB user accounts.

Result

> jg successfully created
> lrl successfully created
> swv successfully created
> vb successfully created

a. Build GIIDB

Build GIIDB

Code

build/build_giidb.R
# (3) BUILD GIIDB -----------------------------------------

## (3c) Create gii users, if necessary
## Initiate function
source(here::here("01_giidb/src/sql_user.R"))

## Define users
l.user <- list("jg","lrl","swv","vb")

## Create users
purrr::walk(l.user, ~create_user(con, .x))

## (3d) Create gii tables, if necessary
## Define users
user <- paste0("'admin','", paste(unlist(l.user), collapse="','"), "'")

## Create tables
source(here::here("01_giidb/src/sql_tbl_code.R"))
source(here::here("01_giidb/src/sql_tbl_economy.R"))
source(here::here("01_giidb/src/sql_tbl_indicator.R"))
source(here::here("01_giidb/src/sql_tbl_index.R"))
source(here::here("01_giidb/src/sql_tbl_model.R"))
source(here::here("01_giidb/src/sql_tbl_audit.R"))
source(here::here("01_giidb/src/sql_tbl_raw.R"))

Purpose

  • Convert list of user names to a string.
  • Create all GIIDB tables.

a. Build GIIDB

Build GIIDB

Code

build/build_giidb.R
# (3) BUILD GIIDB -----------------------------------------

## (3e) Create gii triggers, if necessary
source(here::here("01_giidb/src/sql_trigger.R"))

## (3f) Create gii views, if necessary
source(here::here("01_giidb/src/sql_view.R"))

Purpose

  • Create all GIIDB triggers.
  • Create all GIIDB views.

a. Build GIIDB

Populate GIIDB

Code

build/build_giidb.R
# (4) POPULATE GIIDB --------------------------------------

## (4a) Initiate functions
source(here::here("01_giidb/src/sql_data_write.R"))

Purpose

  • Import necessary source script.
  • <src/sql_data_write.R> \(\rightarrow\) populate_gii()

a. Build GIIDB

Populate GIIDB

Code

build/build_giidb.R
# (4) POPULATE GIIDB --------------------------------------

## (4a) Initiate functions
source(here::here("01_giidb/src/sql_data_write.R"))

## (4b) Populate GII input data
populate_gii(con, here::here("01_giidb/data/GIIDB.xlsx"))

Purpose

  • Populate the GIIDB.

Result

> Read data
> Write data
>   ... code_xwalk  [11]
>   ... economy_id  [750]
>   ... email_id  [303]
>   ... index_id  [1,068]
>   ... indicator_id  [789]
>   ... audit  [282]
>   ... economy  [1,854]
>   ... email_map  [441]
>   ... indicator_audit  [118,278]
>   ... indicator_prep  [1,281,111]
>   ... outliers  [4,227]
>   ... raw_BrandVal  [14,802]
>   ... raw_CorpIAs  [3,666]
>   ... raw_QSRank  [687]
>   ... raw_RDCompExp  [7,500]

a. Build GIIDB

Populate GIIDB

Code

build/build_giidb.R
# (4) POPULATE GIIDB --------------------------------------

## (4c) Populate GII output data from GII objects
## Read previous GII objects
l.GII <- list(here::here("01_giidb/data/GII2020.rds"),
              here::here("01_giidb/data/GII2021_Galapagos_FINAL_indnames.rds"),
              here::here("01_giidb/data/GII2022.rds")) |>
  purrr::map(~readRDS(.x) |> gii_attributes())

## Read previous JRC objects
l.JRC <- list(NULL,
              here::here("01_giidb/data/GII2021 JRC audit.xlsx"),
              here::here("01_giidb/data/GII2022 JRC audit.xlsx")) |>
  purrr::map({
    ~tryCatch({
      readxl::read_excel(.x, sheet = "Table4 -Intervals", skip = 1) |>
        dplyr::select(ECONOMY_NAME = ...1,
                      Index = Interval...3,
                      Inputs = Interval...6,
                      Outputs = Interval...9) |>
        
        ## Create ISO3 var
        dplyr::mutate(ISO3 = countrycode::countrycode(ECONOMY_NAME, 
                                                      origin="country.name", 
                                                      destination="iso3c")) |>
        dplyr::select(-ECONOMY_NAME) |>
        
        ## Transpose to long format
        tidyr::pivot_longer(cols=-ISO3, names_to="CODE", values_to="INTERVAL") |>
        
        ## Separate INTERVAL column into lower and upper ranks
        dplyr::mutate(GIIYR = as.numeric(stringr::str_extract(.x, "(?<=GII)\\d{4}")),
                      INTERVAL = gsub("(\\[|\\])", "", INTERVAL)) |>
        tidyr::separate(INTERVAL, into=c("RANK_LB","RANK_UB"), sep=", ") |>
        dplyr::mutate_at(vars(starts_with("RANK")), as.integer)
      },
      error=function(e) {
        return(NULL)
      }
    )
  })

## Write model output to MariaDB gii.model
purrr::map2(l.GII, l.JRC, ~write_output(con, .x, .y))

Purpose

  • Import model and JRC data.
  • Write to GIIDB table gii.model.

Result

> GII Report Year = 2020
> Collect data ... GII object ... economy_id 
> Insert into gii.model successful [58,650]
> 
> GII Report Year = 2021
> Collect data ... GII object ... economy_id 
> Insert into gii.model successful [64,467]
> 
> GII Report Year = 2022
> Collect data ... GII object ... economy_id 
> Insert into gii.model successful [44,352]

a. Build GIIDB

AWS Disconnect

Code

build/build_giidb.R
# (5) AWS DISCONNECT --------------------------------------
DBI::dbDisconnect(con)
rm(con)

Purpose

  • Disconnect from GIIDB.

b. Backup GIIDB

  • The code is found in <build/giidb_backup.R>.
  • This script creates a backup of the GIIDB in Excel.
  • The output file is saved to <data/GIIDB.xlsx> and pushed to the giidb repository.

b. Backup GIIDB

Backup GIIDB

Code

build/giidb_backup.R
# (1) PREAMBLE --------------------------------------------

# (2) AWS CONNECT -----------------------------------------

# (3) BACKUP GIIDB ----------------------------------------

## (3a) Import GIIDB data
## Collect table names
l.tbl <- DBI::dbGetQuery(con, DBI::SQL("
    SHOW FULL TABLES FROM `gii`;
  ")) |>
  dplyr::rename(TABLE = Tables_in_gii,
                TYPE = Table_type) |>
  dplyr::filter(TYPE!="VIEW") |>
  dplyr::filter(TABLE!="model") |>
  dplyr::pull(TABLE)

Purpose

  • Query all tables in GIIDB
  • Drop all views
  • Drop gii.model
  • Returns a list of tables

b. Backup GIIDB

Backup GIIDB

Code

build/giidb_backup.R
# (3) BACKUP GIIDB ----------------------------------------

## (3a) Import GIIDB data
## Collect table names
l.tbl <- DBI::dbGetQuery(con, DBI::SQL("
    SHOW FULL TABLES FROM `gii`;
  ")) |>
  dplyr::rename(TABLE = Tables_in_gii,
                TYPE = Table_type) |>
  dplyr::filter(TYPE!="VIEW") |>
  dplyr::filter(TABLE!="model") |>
  dplyr::pull(TABLE)

## Collect table data
l.data <- purrr::map(l.tbl, ~read_tbl(con, .x)) |>
  purrr::set_names(nm=l.tbl)

Purpose

  • Query all GIIDB tables
  • read_tbl() \(\rightarrow\) <../src/sql_tools.R>

Result

> audit 
> code_xwalk 
> economy 
> economy_id 
> email_id 
> email_map 
> index_id 
> indicator_audit 
> indicator_id 
> indicator_prep 
> outliers 
> raw_BrandVal 
> raw_CorpIAs 
> raw_QSRank 
> raw_RDCompExp 

b. Backup GIIDB

Backup GIIDB

Code

build/giidb_backup.R
# (3) BACKUP GIIDB ----------------------------------------

## (3a) Import GIIDB data
## Collect table names
l.tbl <- DBI::dbGetQuery(con, DBI::SQL("
    SHOW FULL TABLES FROM `gii`;
  ")) |>
  dplyr::rename(TABLE = Tables_in_gii,
                TYPE = Table_type) |>
  dplyr::filter(TYPE!="VIEW") |>
  dplyr::filter(TABLE!="model") |>
  dplyr::pull(TABLE)

## Collect table data
l.data <- purrr::map(l.tbl, ~read_tbl(con, .x)) |>
  purrr::set_names(nm=l.tbl)
  
## Drop primary keys
keys <- l.data |>
  purrr::imap(~{
    col <- .y |> 
      stringr::str_replace_all(c("_id$"="", "s$"="")) |>
      stringr::str_to_upper()
    
    if (any(names(.x)==col)) {
      col
    } else {
      NULL
    }
  }) |>
  purrr::compact() |>
  unlist()
  
l.data <- l.data |>
  purrr::map(~{
    col_drop <- intersect(names(.x), keys)

    if (length(col_drop)>0) {
      dplyr::select(.x, -dplyr::all_of(col_drop))
    } else {
      .x
    }
  })

Purpose

  • Drop primary keys as these are automatically added when written to the GIIDB.
  • As a simple rule of thumb, primary key names are standardized as the singular of the respective GIIDB table name.

b. Backup GIIDB

Backup GIIDB

Code

build/giidb_backup.R
# (3) BACKUP GIIDB ----------------------------------------

## (3b) Import backup data
## Import current backup
l.tbl_id <- list("index_id","indicator_id")
l.xlsx <- purrr::map(l.tbl_id, ~readxl::read_xlsx(l.file$xlsx, sheet=.x)) |>
  purrr::set_names(nm=unlist(l.tbl_id))

Purpose

  • Import current backup spreadsheets:
    • “index_id”; and,
    • “indicator_id”.
  • Some legacy data is required for backwards compatibility.

b. Backup GIIDB

Backup GIIDB

Code

build/giidb_backup.R
# (3) BACKUP GIIDB ----------------------------------------

## (3c) Clean tables
## Merge ID columns
l.data <- purrr::imap(
  l.data,
  ~{
    if (any(names(l.xlsx)==.y)) {
      .x |>
        dplyr::left_join(dplyr::select(l.xlsx[[.y]], GIIYR, CODE, ID),
                         by=c("GIIYR","CODE")) |>
        dplyr::relocate(ID, .after=CODE)
    } else {
      .x
    }
  })

## Escape xml characters
l.data$email_id <- l.data$email_id |>
  dplyr::mutate(dplyr::across(dplyr::everything(),
                              .fns=~stringr::str_replace_all(., c("<"="[", ">"="]"))))

Purpose

  • Merge legacy data with current GIIDB tables.
  • Remove xml escape characters from “email_id” table.

b. Backup GIIDB

Export XLSX

  • This section of code uses the openxlsx package.

Code

build/giidb_backup.R
# (4) EXPORT XLSX -----------------------------------------

## (4a) Open workbook
if (fs::is_file(l.file$xlsx)) {
  wb <- openxlsx::loadWorkbook(l.file$xlsx)
} else {
  wb <- openxlsx::createWorkbook()
}

Purpose

  • Open or create an xlsx workbook.

b. Backup GIIDB

Export XLSX

Code

build/giidb_backup.R
# (4) EXPORT XLSX -----------------------------------------

## (4b) Build worksheets
purrr::iwalk(
  l.data,
  ~{
    cat(.y, "\n", sep="")
    
    ## Add worksheet
    if (any(names(wb)==.y)) openxlsx::removeWorksheet(wb, .y)
    openxlsx::addWorksheet(wb, sheetName=.y)
    
    ## Add table
    openxlsx::writeData(wb, .y, .x, colNames=TRUE, startRow=1)
    openxlsx::addFilter(wb, .y, cols=1:ncol(.x), rows=1)
    openxlsx::setColWidths(wb, .y, cols=1:ncol(.x), widths=15)
    
    ## Hide ID column
    if (any(names(.x)=="ID")) {
      openxlsx::setColWidths(wb, .y, cols=grep("^ID$", names(.x)), hidden=TRUE)
    }
    
    ## Freeze pane
    openxlsx::freezePane(wb, .y, firstRow=TRUE)
    
    ## Add styles
    openxlsx::addStyle(wb, .y, cols=1:ncol(.x), rows=1, 
                       style=openxlsx::createStyle(textDecoration="bold"))
  }
)

Purpose

  • For each GIIDB table:
    • Add a new worksheet.
    • Write table data.
    • Hide legacy columns.
    • Freeze pane and format header row.

Result

> audit
> code_xwalk
> economy
...
> raw_CorpIAs
> raw_QSRank
> raw_RDCompExp

b. Backup GIIDB

Export XLSX

Code

build/giidb_backup.R
# (4) EXPORT XLSX -----------------------------------------

## (4c) Save xlsx
openxlsx::saveWorkbook(wb, l.file$xlsx, overwrite=TRUE)

# (5) AWS DISCONNECT --------------------------------------

Purpose

c. Add user

  • The code is found in <build/giidb_add_user.R>.
  • As the name suggest, this script adds a new user to the GIIDB.
  • It can only be run by an administrator with the necessary credentials.
  • The new user name and password must first be added to the <config.yml> stored locally within the administrator’s giidb repository.

c. Add user

Code

build/giidb_add_user.R
# (1) PREAMBLE --------------------------------------------

# (2) AWS CONNECT -----------------------------------------

# (3) ADD USER --------------------------------------------

## (3a) Initiate functions
source(here::here("01_giidb/src/sql_user.R"))
source(here::here("01_giidb/src/sql_user_add.R"))

Purpose

c. Add user

Code

build/giidb_add_user.R
# (1) PREAMBLE --------------------------------------------

# (2) AWS CONNECT -----------------------------------------

# (3) ADD USER --------------------------------------------

## (3a) Initiate functions
source(here::here("01_giidb/src/sql_user.R"))
source(here::here("01_giidb/src/sql_user_add.R"))

## (3b) Set schema options
## Permit system-versioned tables to be altered
DBI::dbGetQuery(con, DBI::SQL("
    SET @@system_versioning_alter_history = 1;
  "))

Purpose

  • Allow system-versioned tables to be altered.
  • Otherwise, all code that transforms the database from this point will fail.

Result

> data frame with 0 columns and 0 rows

c. Add user

Code

build/giidb_add_user.R
# (1) PREAMBLE --------------------------------------------

# (2) AWS CONNECT -----------------------------------------

# (3) ADD USER --------------------------------------------

## (3a) Initiate functions
source(here::here("01_giidb/src/sql_user.R"))
source(here::here("01_giidb/src/sql_user_add.R"))

## (3b) Set schema options
## Permit system-versioned tables to be altered
DBI::dbGetQuery(con, DBI::SQL("
    SET @@system_versioning_alter_history = 1;
  "))

## (3c) Add new user
user <- "jg"
add_user(con, user)

# (4) AWS DISCONNECT --------------------------------------

Purpose

  • Select username, then create user and alter the USER column in all tables.

Result

> Create user ... jg successfully created
> Alter tables
>   ... audit
>   ... code_xwalk
>   ... economy
>   ... economy_id
>   ... email_id
>   ... email_map
>   ... index_id
>   ... indicator_audit
>   ... indicator_id
>   ... indicator_prep
>   ... model
>   ... outliers
>   ... raw_BrandVal
>   ... raw_CorpIAs
>   ... raw_QSRank
>   ... raw_RDCompExp
> All tables successfully altered

7. Tutorial

Let’s run some code.

Thank You

jackgregory@gmail.com