How to Run SQL from Jupyter Notebook
Hello everyone, today I will discuss how to create, and read a MySQL table by using Jupyter Notebook. I will create a test remote database and setup our environment in Jupyter Notebook.
Installation of Conda
First you need to install anaconda at this link
After you have installed Anaconda go to your terminal and let us create an environment called mysql, but you can put the name that you like.
conda create -n mysql python==3.7
If you are running anaconda for first time, you should init conda with the shell that you want to work, in this case I choose the cmd.exe
conda init cmd.exe
and then close and open the terminal
conda activate mysql
then in your terminal type the following commands:
conda install ipykernel
python -m ipykernel install --user --name mysql --display-name "Python (mysql)"
Then we install mysql-connector
pip3 install mysql-connector
pip3 install sqlalchemy
pip3 install pandas
In addition if you want to install you local client
conda install -c conda-forge mysqlclient
Creation of testing Free Database.
The first step that we need is the creation of a remote Database, if you have already a Remote MySQL database, you can skip this part. Otherwise you can choose one DB free testing provider.
- Free MySQL Hosting 5mb
- FreeSQLdatabase 5mb
- Free MySQL Hosting 5mb
As I mentioned earlier, best use of these free online MySQL websites would be for testing purposes, or even learning purposes. For this tutorial I have chosen the first one.
It’s pretty easy to use this free MySQL database hosting service. Simply register on it via your email and after verification, you will able to see the Control panel. After that, create a database there and it will send you all the database and server credentials via email. You can then create database either using the PhpMyAdmin or you can use a desktop client to do the same.
After creating your database on its server, you can open Jupyter notebook and lets try to can create tables, schemas, etc.,
How to connect MySQL database in Python
Let’s see how to connect the MySQL database in Python using the ‘MySQL Connector Python’ module.
Arguments required to connect
You need to know the following detail of the MySQL server to perform the connection from Python.
Argument | Description |
---|---|
Username | The username that you use to work with MySQL Server. The default username for the MySQL database is a root. |
Password | Password is given by the user at the time of installing the MySQL server. If you are using root then you won’t need the password. |
Host name | The server name or Ip address on which MySQL is running. if you are running on localhost, then you can use localhost or its IP 127.0.0.0 |
Database name | The name of the database to which you want to connect and perform the operations. |
The CREATE TABLE statement is used to create tables in MYSQL database. Here, you need to specify the name of the table and, definition (name and datatype) of each column.
Syntax
Following is the syntax to create a table in MySQL −
CREATE TABLE table_name(
column1 datatype,
column2 datatype,
column3 datatype,
.....
columnN datatype,
);
Example
Following query creates a table named EMPLOYEE in MySQL with five columns namely, FIRST_NAME, LAST_NAME, AGE, SEX and, INCOME.
mysql> CREATE TABLE EMPLOYEE(
FIRST_NAME CHAR(20) NOT NULL,
LAST_NAME CHAR(20),
AGE INT,
SEX CHAR(1),
INCOME FLOAT
);
Query OK, 0 rows affected (0.42 sec)
The DESC statement gives you the description of the specified table. Using this you can verify if the table has been created or not as shown below −
mysql> Desc Employee;
+------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+-------+
| FIRST_NAME | char(20) | NO | | NULL | |
| LAST_NAME | char(20) | YES | | NULL | |
| AGE | int(11) | YES | | NULL | |
| SEX | char(1) | YES | | NULL | |
| INCOME | float | YES | | NULL | |
+------------+----------+------+-----+---------+-------+
5 rows in set (0.07 sec)
How to connect to Mysql using python and import the csv file into mysql from Jupyter Notebook
In ordering to keep safe our credentials we save our credentials in a file credentials.json
Once you have received your credentials in your email you can create the following credentials.json file:
{
"host":"sql11.freesqldatabase.com",
"database":"xxxxxx",
"user":"xxxxxxx",
"password":"xxxxxx"
}
Then go to your Jupyter Notebook and open it.
conda activate mysql
jupyter notebook .
and create a new notebook, then we can type:
# Python program to read
# json file
import json
# Opening JSON file
f = open('credentials.json')
# returns JSON object as
# a dictionary
data = json.load(f)
# You can uncommnet to see if you put correct credentials
#for i in data:
# print(data[i])
#You can identify the database you have created
data['host']
'sql11.freesqldatabase.com'
import mysql.connector
import mysql.connector
from mysql.connector import Error
Checking our Mysql connection
connection = mysql.connector.connect(host=data['host'],
database=data['database'],
user=data['user'],
password=data['password'])
try:
connection = mysql.connector.connect(host=data['host'],
database=data['database'],
user=data['user'],
password=data['password'])
if connection.is_connected():
db_Info = connection.get_server_info()
print("Connected to MySQL Server version ", db_Info)
cursor = connection.cursor()
cursor.execute("select database();")
record = cursor.fetchone()
print("You're connected to database: ", record)
except Error as e:
print("Error while connecting to MySQL", e)
finally:
if connection.is_connected():
cursor.close()
connection.close()
print("MySQL connection is closed")
Connected to MySQL Server version 5.5.62-0ubuntu0.14.04.1
You're connected to database: ('sql14543562',)
MySQL connection is closed
Once a database connection is established, you can create tables by passing the CREATE TABLE query to the execute() method.
Create MySQL table from Python
Now you know how to connect to a MySQL server from Python, In this section, we will learn how to create a table in MySQL from Python. Let’s create table ‘Laptop’ under your remote database.
import mysql.connector
try:
connection = mysql.connector.connect(host=data['host'],
database=data['database'],
user=data['user'],
password=data['password'])
mySql_Create_Table_Query = """CREATE TABLE Laptop (
Id int(11) NOT NULL,
Name varchar(250) NOT NULL,
Price float NOT NULL,
Purchase_date Date NOT NULL,
PRIMARY KEY (Id)) """
cursor = connection.cursor()
result = cursor.execute(mySql_Create_Table_Query)
print("Laptop Table created successfully ")
except mysql.connector.Error as error:
print("Failed to create table in MySQL: {}".format(error))
finally:
if connection.is_connected():
cursor.close()
connection.close()
print("MySQL connection is closed")
Failed to create table in MySQL: 1050 (42S01): Table 'Laptop' already exists
MySQL connection is closed
Create MySQL table with Pandas
import pandas as pd
Create an in-memory SQLite database.
from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)
df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
df
name | |
---|---|
0 | User 1 |
1 | User 2 |
2 | User 3 |
df.to_sql('users', con=engine)
An sqlalchemy.engine.Connection can also be passed to con:
with engine.begin() as connection:
df1 = pd.DataFrame({'name' : ['User 4', 'User 5']})
df1.to_sql('users', con=connection, if_exists='append')
This is allowed to support operations that require that the same DBAPI connection is used for the entire operation.
df2 = pd.DataFrame({'name' : ['User 6', 'User 7']})
df2.to_sql('users', con=engine, if_exists='append')
engine.execute("SELECT * FROM users").fetchall()
[(0, 'User 1'),
(1, 'User 2'),
(2, 'User 3'),
(0, 'User 4'),
(1, 'User 5'),
(0, 'User 6'),
(1, 'User 7')]
In short, to create a table using python ;
- Import *mysql.connector* package.
- Create a connection object using the *mysql.connector.connect()* method, by passing the user name, password, host (optional default: localhost) and, database (optional) as parameters to it.
- Create a cursor object by invoking the *cursor()* method on the connection object created above.
- Then, execute the CREATE TABLE statement by passing it as a parameter to the execute() method.
Create an MySQL database with Pandas in real Database
First method with mysql.connector
Step 1: Prepare the CSV File
To begin, prepare the CSV file that you’d like to import to MySQL. For example, I prepared a simple CSV file with the following data:
df=pd.read_csv('gpu.csv')
df.head()
Product Name | GPU Chip | Released | Bus | Memory | GPU clock | Memory clock | |
---|---|---|---|---|---|---|---|
0 | GeForce RTX 3060 | GA106 | Jan 12th, 2021 | PCIe 4.0 x16 | 12 GB, GDDR6, 192 bit | 1320 MHz | 1875 MHz |
1 | GeForce RTX 3060 Ti | GA104 | Dec 1st, 2020 | PCIe 4.0 x16 | 8 GB, GDDR6, 256 bit | 1410 MHz | 1750 MHz |
2 | GeForce RTX 3070 | GA104 | Sep 1st, 2020 | PCIe 4.0 x16 | 8 GB, GDDR6, 256 bit | 1500 MHz | 1750 MHz |
3 | GeForce RTX 3080 | GA102 | Sep 1st, 2020 | PCIe 4.0 x16 | 10 GB, GDDR6X, 320 bit | 1440 MHz | 1188 MHz |
4 | Radeon RX 6600 XT | Navi 23 | Jul 30th, 2021 | PCIe 4.0 x8 | 8 GB, GDDR6, 128 bit | 1968 MHz | 2000 MHz |
You can download here
df.columns
Index(['Product Name', 'GPU Chip', 'Released', 'Bus', 'Memory', 'GPU clock',
'Memory clock'],
dtype='object')
import pandas
from pandas.io.json import build_table_schema
build_table_schema(df)
{'fields': [{'name': 'index', 'type': 'integer'},
{'name': 'Product Name', 'type': 'string'},
{'name': 'GPU Chip', 'type': 'string'},
{'name': 'Released', 'type': 'string'},
{'name': 'Bus', 'type': 'string'},
{'name': 'Memory', 'type': 'string'},
{'name': 'GPU clock', 'type': 'string'},
{'name': 'Memory clock', 'type': 'string'}],
'primaryKey': ['index'],
'pandas_version': '0.20.0'}
data_set=df
num_cols = len(data_set.axes[1])
print(num_cols)
7
col_count = [len(l.split(",")) for l in data_set.columns]
print(type(col_count))
<class 'list'>
print(len(col_count))
7
First, I establish the CREATE TABLE command and table name – ‘gpu_data’ – for this example, storing it in an ‘SQL_CREATE_TBL’ variable:
SQL_CREATE_TBL = "CREATE TABLE gpu_data("
for name in range(0, len(col_count)):
column_name=data_set.columns[name].replace(' ', '_')
SQL_CREATE_TBL += "{} TEXT, ".format(column_name)
Let’s view the ‘SQL_CREATE_TBL’ string variable contents once the for loop completes:
SQL_CREATE_TBL
'CREATE TABLE gpu_data(Product_Name TEXT, GPU_Chip TEXT, Released TEXT, Bus TEXT, Memory TEXT, GPU_clock TEXT, Memory_clock TEXT, '
SQL_CREATE_TBL = SQL_CREATE_TBL.rstrip(" ,")
All that’s left to complete the CREATE TABLE statement, is simply append the closing parenthesis and semicolon to the ‘SQL_CREATE_TBL’ variable:
SQL_CREATE_TBL += ");"
SQL_CREATE_TBL
'CREATE TABLE so_data(Product Name TEXT, GPU Chip TEXT, Released TEXT, Bus TEXT, Memory TEXT, GPU clock TEXT, Memory clock TEXT);'
The entire code structure for the dynamic CREATE TABLE statement is shown below:
SQL_CREATE_TBL = "CREATE TABLE gpu_data("
for name in range(0, len(col_count)):
column_name=data_set.columns[name].replace(' ', '_')
SQL_CREATE_TBL += "{} TEXT, ".format(column_name)
SQL_CREATE_TBL = SQL_CREATE_TBL.rstrip(" ,")
SQL_CREATE_TBL += ");"
SQL_CREATE_TBL
'CREATE TABLE gpu_data(Product_Name TEXT, GPU_Chip TEXT, Released TEXT, Bus TEXT, Memory TEXT, GPU_clock TEXT, Memory_clock TEXT);'
SQL_CREATE_TBL = "CREATE TABLE gpu_data("
for name in range(0, len(col_count)):
column_name=data_set.columns[name].replace(' ', '_')
SQL_CREATE_TBL += "{} varchar(255), ".format(column_name)
SQL_CREATE_TBL = SQL_CREATE_TBL.rstrip(" ,")
SQL_CREATE_TBL += ")"
SQL_CREATE_TBL
'CREATE TABLE gpu_data(Product_Name varchar(255), GPU_Chip varchar(255), Released varchar(255), Bus varchar(255), Memory varchar(255), GPU_clock varchar(255), Memory_clock varchar(255))'
Step 2: Connect to the MySQL using Python and create a Database
Create a connection object to connect to MySQL, The connect() constructor creates a connection to the MySQL and returns a MySQLConnection object.
connection = mysql.connector.connect( host=data['host'],
database=data['database'],
user=data['user'],
password=data['password'])
Step 3: Create a table and Import the CSV data into the MySQL table
We will create an gpus table under the our database and insert the records in MySQL with below python code.
cursor.close()
connection.close()
import mysql.connector as msql
from mysql.connector import Error
try:
#conn = mysql.connect(host='localhost', database='employee', user='root', password='root@123')
conn = mysql.connector.connect(host=data['host'],database=data['database'],user=data['user'],password=data['password'])
if conn.is_connected():
cursor = conn.cursor()
cursor.execute("select database();")
record = cursor.fetchone()
print("You're connected to database: ", record)
cursor.execute('DROP TABLE IF EXISTS gpu_data;')
print('Creating table....')
# in the below line please pass the create table statement which you want #to create
cursor.execute(SQL_CREATE_TBL)
print("Table is created....")
#loop through the data frame
for i,row in df.iterrows():
#here %S means string values
sql = "INSERT INTO "+data['database']+".gpu_data VALUES (%s,%s,%s,%s,%s,%s,%s)"
cursor.execute(sql, tuple(row))
print("Record inserted")
# the connection is not auto committed by default, so we must commit to save our changes
conn.commit()
except Error as e:
print("Error while connecting to MySQL", e)
You're connected to database: ('sql11501562',)
Creating table....
Table is created....
Record inserted
Record inserted
.
.
.
Record inserted
Step 4 : Query the Table
Query the table to make sure that our inserted data has been saved correctly.
# Execute query
sql = "SELECT * FROM "+data['database']+".gpu_data"
#sql = "SELECT * FROM employee.employee_data"
cursor.execute(sql)
# Fetch all the records
result = cursor.fetchall()
for i in result:
print(i)
('GeForce RTX 3060', 'GA106', 'Jan 12th, 2021', 'PCIe 4.0 x16', '12 GB, GDDR6, 192 bit', '1320 MHz', '1875 MHz')
('GeForce RTX 3060 Ti', 'GA104', 'Dec 1st, 2020', 'PCIe 4.0 x16', '8 GB, GDDR6, 256 bit', '1410 MHz', '1750 MHz')
.
.
.
('GeForce GTX 980 Ti', 'GM200', 'Jun 2nd, 2015', 'PCIe 3.0 x16', '6 GB, GDDR5, 384 bit', '1000 MHz', '1753 MHz')
('GeForce GTX 1650 SUPER', 'TU116', 'Nov 22nd, 2019', 'PCIe 3.0 x16', '4 GB, GDDR6, 128 bit', '1530 MHz', '1500 MHz')
('Radeon RX 550', 'Lexa', 'Apr 20th, 2017', 'PCIe 3.0 x8', '2 GB, GDDR5, 128 bit', '1100 MHz', '1750 MHz')
Using Pandas SQL Query Method
import mysql.connector
connection = mysql.connector.connect(host=data['host'],
database=data['database'],
user=data['user'],
password=data['password'])
Step 5: Run Pandas SQL Query method
df_remote = pd.read_sql_query(sql, con = connection)
df_remote.head(3)
Product_Name | GPU_Chip | Released | Bus | Memory | GPU_clock | Memory_clock | |
---|---|---|---|---|---|---|---|
0 | GeForce RTX 3060 | GA106 | Jan 12th, 2021 | PCIe 4.0 x16 | 12 GB, GDDR6, 192 bit | 1320 MHz | 1875 MHz |
1 | GeForce RTX 3060 Ti | GA104 | Dec 1st, 2020 | PCIe 4.0 x16 | 8 GB, GDDR6, 256 bit | 1410 MHz | 1750 MHz |
2 | GeForce RTX 3070 | GA104 | Sep 1st, 2020 | PCIe 4.0 x16 | 8 GB, GDDR6, 256 bit | 1500 MHz | 1750 MHz |
cursor.close()
connection.close()
Second method with sqlalchemy
df=pd.read_csv('gpu.csv')
import pandas as pd
from sqlalchemy import create_engine
infile = 'gpu.csv'
db = data['database']
db_tbl_name = 'gpu_data2'
'''
Load a csv file into a dataframe; if csv does not have headers, use the headers arg to create a list of headers; rename unnamed columns to conform to mysql column requirements
'''
def csv_to_df(infile, headers = []):
if len(headers) == 0:
df = pd.read_csv(infile)
else:
df = pd.read_csv(infile, header = None)
df.columns = headers
for r in range(10):
try:
df.rename( columns={'Unnamed: {0}'.format(r):'Unnamed{0}'.format(r)}, inplace=True )
except:
pass
df.columns = df.columns.str.replace(' ', '_')
return df
'''
Create a mapping of df dtypes to mysql data types local
'''
def dtype_mapping_local():
return {'object' : 'TEXT',
'int64' : 'INT',
'float64' : 'FLOAT',
'datetime64' : 'DATETIME',
'bool' : 'TINYINT',
'category' : 'TEXT',
'timedelta[ns]' : 'TEXT'}
'''
Create a mapping of df dtypes to mysql data types remote (not perfect, but close enough)
'''
def dtype_mapping():
return {'object' : 'VARCHAR(255)',
'int64' : 'INT',
'float64' : 'FLOAT',
'datetime64' : 'DATETIME',
'bool' : 'TINYINT',
'category' : 'VARCHAR(255)',
'timedelta[ns]' : 'VARCHAR(255)'}
'''
Create a sqlalchemy engine local
'''
def mysql_engine_local(user = 'root', password = 'abc', host = '127.0.0.1', port = '3306', database = 'a001_db'):
engine = create_engine("mysql://{0}:{1}@{2}:{3}/{4}?charset=utf8".format(user, password, host, port, database))
return engine
'''
Create a sqlalchemy engine remote
'''
def mysql_engine_remote(user = data['user'],
password = data['password'],
host = data['host'],
port = '3306',
database = data['database']):
engine = create_engine("mysql://{0}:{1}@{2}:{3}/{4}?charset=utf8".format(user, password, host, port, database))
return engine
'''
Create a mysql connection from sqlalchemy engine
'''
def mysql_conn(engine):
conn = engine.raw_connection()
return conn
'''
Create sql input for table names and types
'''
def gen_tbl_cols_sql(df):
dmap = dtype_mapping()
sql = "pi_db_uid INT AUTO_INCREMENT PRIMARY KEY"
df1 = df.rename(columns = {"" : "nocolname"})
hdrs = df1.dtypes.index
hdrs_list = [(hdr, str(df1[hdr].dtype)) for hdr in hdrs]
for hl in hdrs_list:
sql += " ,{0} {1}".format(hl[0], dmap[hl[1]])
return sql
'''
Create a mysql table from a df
'''
def create_mysql_tbl_schema(df, conn, db, tbl_name):
tbl_cols_sql = gen_tbl_cols_sql(df)
sql = "USE {0}; CREATE TABLE {1} ({2})".format(db, tbl_name, tbl_cols_sql)
cur = conn.cursor()
cur.execute(sql)
cur.close()
conn.commit()
'''
Write df data to newly create mysql table
'''
def df_to_mysql(df, engine, tbl_name):
df.to_sql(tbl_name, engine, if_exists='replace')
df = csv_to_df(infile)
df.head()
Product_Name | GPU_Chip | Released | Bus | Memory | GPU_clock | Memory_clock | |
---|---|---|---|---|---|---|---|
0 | GeForce RTX 3060 | GA106 | Jan 12th, 2021 | PCIe 4.0 x16 | 12 GB, GDDR6, 192 bit | 1320 MHz | 1875 MHz |
1 | GeForce RTX 3060 Ti | GA104 | Dec 1st, 2020 | PCIe 4.0 x16 | 8 GB, GDDR6, 256 bit | 1410 MHz | 1750 MHz |
2 | GeForce RTX 3070 | GA104 | Sep 1st, 2020 | PCIe 4.0 x16 | 8 GB, GDDR6, 256 bit | 1500 MHz | 1750 MHz |
3 | GeForce RTX 3080 | GA102 | Sep 1st, 2020 | PCIe 4.0 x16 | 10 GB, GDDR6X, 320 bit | 1440 MHz | 1188 MHz |
4 | Radeon RX 6600 XT | Navi 23 | Jul 30th, 2021 | PCIe 4.0 x8 | 8 GB, GDDR6, 128 bit | 1968 MHz | 2000 MHz |
create_mysql_tbl_schema(df, mysql_conn(mysql_engine_remote()), db, db_tbl_name)
df_to_mysql(df, mysql_engine_remote(), db_tbl_name)
connection = mysql.connector.connect( host=data['host'],
database=data['database'],
user=data['user'],
password=data['password'])
# Execute query
sql = "SELECT * FROM "+data['database']+".gpu_data2"
df_remote_2 = pd.read_sql_query(sql, con = connection)
df_remote_2.head(10)
index | Product_Name | GPU_Chip | Released | Bus | Memory | GPU_clock | Memory_clock | |
---|---|---|---|---|---|---|---|---|
0 | 0 | GeForce RTX 3060 | GA106 | Jan 12th, 2021 | PCIe 4.0 x16 | 12 GB, GDDR6, 192 bit | 1320 MHz | 1875 MHz |
1 | 1 | GeForce RTX 3060 Ti | GA104 | Dec 1st, 2020 | PCIe 4.0 x16 | 8 GB, GDDR6, 256 bit | 1410 MHz | 1750 MHz |
2 | 2 | GeForce RTX 3070 | GA104 | Sep 1st, 2020 | PCIe 4.0 x16 | 8 GB, GDDR6, 256 bit | 1500 MHz | 1750 MHz |
3 | 3 | GeForce RTX 3080 | GA102 | Sep 1st, 2020 | PCIe 4.0 x16 | 10 GB, GDDR6X, 320 bit | 1440 MHz | 1188 MHz |
4 | 4 | Radeon RX 6600 XT | Navi 23 | Jul 30th, 2021 | PCIe 4.0 x8 | 8 GB, GDDR6, 128 bit | 1968 MHz | 2000 MHz |
5 | 5 | GeForce RTX 2060 | TU106 | Jan 7th, 2019 | PCIe 3.0 x16 | 6 GB, GDDR6, 192 bit | 1365 MHz | 1750 MHz |
6 | 6 | Radeon RX 6600 | Navi 23 | Oct 13th, 2021 | PCIe 4.0 x8 | 8 GB, GDDR6, 128 bit | 1626 MHz | 1750 MHz |
7 | 7 | GeForce RTX 3050 8 GB | GA106 | Jan 4th, 2022 | PCIe 4.0 x8 | 8 GB, GDDR6, 128 bit | 1552 MHz | 1750 MHz |
8 | 8 | GeForce RTX 3090 Ti | GA102 | Jan 27th, 2022 | PCIe 4.0 x16 | 24 GB, GDDR6X, 384 bit | 1560 MHz | 1313 MHz |
9 | 9 | GeForce RTX 3070 Ti | GA104 | May 31st, 2021 | PCIe 4.0 x16 | 8 GB, GDDR6X, 256 bit | 1575 MHz | 1188 MHz |
connection.close()
# Closing json file
f.close()
You can download this notebook here.
Congratulations! You have learned how to create a MySQL table from a csv file with Jupyter Notebook .
Leave a comment