HOTMapper Tutorial
Here you'll learn how to set your environment, create your table, and insert data into it.
Updated with v1.1.
Summary
- 1. Setting your environment
- 2. Adjusting your HOTMapper Settings
- 3. Creating your table definition
- 4. Creating your mapping protocol
- 5. Creating the table in the database
- 6. Inserting the data into the table
- 7. Creating, removing, renaming columns
- 8. Updating the table data
1. Setting your environment
In this section we'll look at how to get MonetDB working with HOTMapper.
Important:
If you are using the MonetDB Apr2019 (11.33.3) version please update to Apr2019-SP1(11.33.11) before proceeding with the tutorial.
Getting MonetDB
You can get your MonetDB installation following their instructions.
Be sure to create and start your farm, if you didn't already, using:
$ monetdbd create /path/to/my-dbfarm
$ monetdbd start /path/to/my-dbfarm
If you didn't set a systemctl command for your farm remember to start it every time you reboot your OS.
To create a new database, and remove it from maintenance mode(to avoid a very high memory usage):
$ monetdb create hotmapper-demo
$ monetdb release hotmapper-demo
Getting HOTMapper dependencies
Let's create a new virtual environment and install the requirements.
Be sure that you have installed pip3
and python 3.5+
.
Install python3-venv
, if you don't have it, using the command bellow.
$ sudo apt-get install python3-venv
Inside the HOTMapper folder execute the following commands:
$ python3 -m venv env
$ source env/bin/activate
$ pip install -r requirements.txt
2. Adjusting your HOTMapper Settings
Now that you have a working MonetDB database it's needed to verify the HOTMapper settings to be sure it'll work with it.
First, using your favourite editor, open the file settings.py
contained inside your HOTMapper folder.
Set the DATABASE variable to yours.
DATABASE = 'hotmapper_demo'
Verify the locations of your mapping protocols folder and table definitions. If you cloned the master repository it should already be correct and the folders already exist.
MAPPING_PROTOCOLS_FOLDER = 'mapping_protocols'
TABLE_DEFINITIONS_FOLDER = 'table_definitions'
If you want to learn more about all settings of HOTMapper, head to our Settings page in the wiki
3. Creating your table definition
The HOTMapper tool uses two files to create a table, a JSON inside the table_definitions folder and a CSV present on the mapping_protocols folder. Both should have as name of the file: "table name + extension". Ex: table_test.json, table_test.csv.
The table_definitions JSON has the objective of storing the table constraints and information about the source of the table and it's description.
Let's create a JSON for a table_test:
- Go to the table_definitions folder and create a file named table_test.json.
- Open the file with your favourite editor and create the base object as shown bellow.
{
"pairing_description": "A string with the description of what this table stores.",
"data_source": "A string with the source of the data.",
"pk": ["A list with the primary key columns."],
"foreign_keys": ["A list with the foreign key columns."]
}
- Let's fill the json with the information of table_test:
{
"pairing_description": "Table with the graduation rates, which represent a percentage of students expected to graduate, used to learn the HOTMapper Tool",
"data_source": "Open data made by Sustainable Development Solutions Network",
"pk": ["cod_country", "year", "cod_sex", "isced"],
"foreign_keys": []
}
If you want to learn more about Table Definitions, click here.
4. Creating your mapping protocol
The map protocol file is used to store the information about the table columns and to make the link between the data from the CSV and the database. This file is stored inside the mapping_protocols folder and uses as name the "table name + .csv"
This CSV must contain the following columns:
- Var. Lab -- An unique identifier for the column, this represents a specific column for the HOTMapper and shouldn't be changed after table creation.
- Novo Rótulo -- A description of this column.
- Nome Banco -- The name of the column in the database.
- Tipo de Dado -- Type of the column in the database.
- *[YEAR] - Eg.: 2010, 2011, ... -- A temporal identifier to help insertion of data from various years, which can have a different mapping from the standard one. It has the name of the column in the data csv.
Let's create a mapping protocol for the table_test:
- First create a file named
table_test.csv
inside the mapping_protocols folder. - Now, in the row number 1, create the following headers for the columns A-G: Var.Lab, Novo Rótulo, Nome Banco,
Tipo de Dado, 2016.
- If you have any doubt, inside your mapping_protocols folder should have a file named empty_map_protocol.csv which you can use as a model.
For this example, we'll insert data with the graduation rates as percentage of students expected to graduate, the file is EAD_GRAD_RATES_2016.csv which is inside the folder: hotmapper/open_data/.
Data taken from the Organisation for Economic Co-operation and Development. (https://stats.oecd.org/Index.aspx?datasetcode=EAG_GRAD_ENTR_RATES#) accessed in 2019-08-26
Let's create the columns.
- First, let's fill the protocol for the cod_country.
- In Var.Lab we'll use as an unique identifier: CODECOUNTRY
- Novo Rótulo: Code of the country
- Nome Banco: cod_country
- Tipo de Dado: VARCHAR(3)
- 2016: COUNTRY
- Now let's follow a similiar formula for the columns: Country, Year, Value, ISC11_LEVEL_CAT, SEX. You should have
a protocol similar with the one bellow.
- Now, lets create a column called sex_id where we'll store a TINYINT representing the sex. 1 = Female, 2 = Male.
If you want to learn more about the Mapping Protocol, click here.
5. Creating the table in the database
To create the table in your database we'll execute the HOTMapper command create
. In the following way:
$ ./manage.py create <table_name>
So, for the table test:
$ ./manage.py create table_test
6. Inserting the data into the table
To insert the CSV data into the table, we'll use the HOTMapper command insert
in the following way:
$ ./manage.py insert <file_location> <table_name> <year> --sep=<optional, csv separator>
For the table test:
$ ./manage.py insert ~/hotmapper/open_data/EAG_GRAD_RATES_2016.csv table_test 2016 --sep="|"
7. Creating, removing, renaming columns
- To create a new column, first add it to the protocol the same way as the others.
- To remove a column, first delete it from the protocol.
- To rename a column, change the "Nome Banco" (name of the column) in the protocol.
After that, execute the HOTMapper command remap
in the following way:
$ ./manage.py remap <table_name>
For the table test:
$ ./manage.py create table_test
Now you'll have to update the data in the table
8. Updating the table data
If your data CSV suffered any modifications or you changed the table by adding, removing or renaming a column, you'll
have to update the data of the table. You can do that using the HOTMapper command update_from_file
in a similar way
to the insert command:
$ ./manage.py update_from_file <file_location> <table_name> <year> --sep=<optional, csv separator>
$ ./manage.py update_from_file ~/hotmapper/open_data/EAG_GRAD_RATES_2016.csv table_test 2016 --sep="|"