Pythonのチートシート（基礎構文、Pandasなど）

更新日：2024年7月10日 by Heysho

Pythonの学習を始めたばかりで、初心者が押さえておくべき重要なコードを簡単に確認できる一覧があれば便利かと思いましたので、以下にまとめていきます。

基本から効率的に学習を進めたい方におすすめです。

基礎構文
データ分析（Pandas）
データ型の変換
グループ化
カラム名の変更
データテーブルの結合
CSVファイルで出力
特定のカラムでデータテーブルを作成
インデックスカラムを指定
インデックスカラムをリセット
データの可視化（Matplotlib）
Streamlit
Googleスプレッドシートと連携する
Gemini API

基礎構文

`print(◯◯◯)` ... テキストや変数を表示

コード

print("Hello World")

結果： Hello World

文字列を出力します。

`◯◯◯ = △△△` ... 変数（variable）

コード

name = "Mike"
print(name)

自分で設定した文字列に値を代入します。

`input(◯◯◯)` ... ユーザーに文字入力させる

コード

name = input("What is your name?")
print("Hello" + name)

inputという機能を利用して、ユーザーに文字入力させてそれを変数に収納します。

`str`, `int`, `float`, etc ... データの型変換

コード

num_char = len(input("What is your name?"))
new_num_char = str(num_char)
print("あなたの名前の文字数は" + new_num_char + "文字です。")

「str」で文字列の種類を数字から文字に変換しています。その他にもintやfloatなどがあります。

`+`, `-`, `*`, `/` ... 計算

コード

height = input()
weight = input()
weight_as_int = int(weight)
heigh_as_float = float(height)
bmi = weight_as_int / height_as_float ** 2
bmi = weight_as_int / (height_as_float * height_as_float)
bmi_as_int = int(bmi)
print(bmi_as_int)

「*」や「/」を使用することで計算が可能になります。

`print(f"◯◯◯")` ... f string

コード

height = 177
print(f"あなたの身長は{height}cmです")

" "の前に「f」を置いて、変数を{}で囲むことで、文章中に変数を入れることが出来ます。

`if` ... if構文（if statement）

コード

age = int(input("あなたの年齢は何歳ですか？"))
if age >= 20:
    print("お酒を購入できます。")
else:
    print("未成年ですのでお酒を購入できません。")

ifを使用しすることで、条件に合わせて別の文章を表示しています。

`random.randint(◯, ◯)` ... ランダムモジュール（Random Module）

コード

import random
random_score = random.randint(1,100)
print(f"あなたのスコアは{random_score}です")

randomというモジュールをインポートして使用することで、数字をランダムに生成することが出来ます。

`[◯◯, ◯◯,◯◯]` ... リスト（List）

コード

programing_language = ["Python", "Ruby", "JavaScript", "Go", "PHP", "Java"]
print(programing_language)

"Python", "Ruby", "JavaScript", "Go", "PHP", "Java"、という値が出力されます。

print(programing_language[1])に変更するとリストの中の1番目の"Ruby"のみが出力されます。

`{◯◯◯ = ◯◯◯, ◯◯◯ = ◯◯◯, ◯◯◯ = ◯◯◯}` ... 辞書（Dictionaries）

コード

student_score = {
    "Taro": "86",
    "Mike": "86",
    "Jenny": "90",
    "Sho": "92",
}
mike_score = student_score["Mike"]
print(mike_score)

`for ◯◯◯ in ◯◯◯:` ... forループ（for loop）

コード

fruits = ["Apple","Peach","Banana"]
for fruit in fruits:
    print(fruit)

繰り返し処理をしてくれるものです。

上記のコードの結果では、Apple, Peach, Bananaと出力されます。

inの右側にリストなどのグループを置いて、forの右側には変数を入れます。

`def ◯◯◯:` ... ファンクション（function）

コード

def my_function(name):
    print(f"Hello {name}")

my_function("John")

defのところにコードを書くことで、長いコードから生成されるプログラミングを次から一行で呼び出すことが出来ます。

上記の結果では「Hello John」と出力されます。

もう一つ、実践的な内容をお見せします。

コード

def format_name(f_name, l_name):
    if f_name == "" or l_name == "":
        return "You didn't provide valid inputs."
    formated_f_name = f_name.title()
    formated_l_name = l_name.title()
    return f"Result: {formated_f_name} {formated_l_name}"
formatted_name = format_name(input("Your first name: "), input("Your last name: "))
print(formatted_name)

こちらはフォーマットが崩れた名前（例：miKE jAcKSoN）を整えるという機能になっています。

入力すると「Mike Jackson」というように整ったフォーマットで出力されるというFunctionです。

`import ◯◯◯` ... 別のファイルから変数を呼び出す

another_module.py

another_variable = 12

main.py

import another_module
print(another_module.another_variable)

another_module.pyという別のファイルに変数を作って、それをmain.pyから実行します。

Class（簡易版）

コード

class User:
    def __init__(self, user_id, username):
        self.id = user_id
        self.username = username
        self.followers = 0
        self.following = 0
    def follow(self, user):
        user.followers += 1
        self.following += 1

user_1 = User("001", "Taro")
user_2 = User("002", "Hanako")

user_1.follow(user_2)

print(user_1.id)
print(user_1.username)
print(user_1.followers)
print(user_1.following)

print(user_2.id)
print(user_2.username)
print(user_2.followers)
print(user_2.following)

Classは、Functionを整理して管理するために使用する設計図のようなものになります。

上記の例で言うと、ユーザーの行動に関する操作が「User」というClassに収納されています。

上記のコードはClassにSNSのフォロー機能を実装しているのですが、実行すると以下の回答が返ってきます。

001
Taro
0
1

002
Hanako
1
0

Class（複雑版）

data.py

question_data = [{"category": "Science: Computers",
"type": "boolean",
"difficulty": "medium",
"question": "The HTML5 standard was published in 2014.",
"correct_answer": "True",
"incorrect_answers": ["False"]},
{"category": "Science: Computers",
"type": "boolean",
"difficulty": "medium",
"question": "The first computer bug was formed by faulty wires.",
"correct_answer": "False",
"incorrect_answers": ["True"]
},
{"category": "Science: Computers",
"type": "boolean",
"difficulty": "medium",
"question": "FLAC stands for 'Free Lossless Audio Condenser'.",
"correct_answer": "False",
"incorrect_answers": ["True"]},
{"category": "Science: Computers",
"type": "boolean",
"difficulty": "medium",
"question": "All program codes have to be compiled into an executable file in order to be run. This file can then be executed on any machine.",
"correct_answer": "False",
"incorrect_answers": ["True"]},
{"category": "Science: Computers",
"type": "boolean",
"difficulty": "easy",
"question": "Linus Torvalds created Linux and Git.",
"correct_answer": "True",
"incorrect_answers": ["False"]},
{"category": "Science: Computers",
"type": "boolean",
"difficulty": "hard",
"question": "The IBM PC used an Intel 8008 microprocessor clocked at 4.77 MHz and 8 kilobytes of memory.",
"correct_answer": "False",
"incorrect_answers": ["True"]}
]

main.py

from question_model import Question
from data import question_data
from quiz_brain import QuizBrain

question_bank = []
for question in question_data:
    question_text = question["question"]
    question_answer = question["correct_answer"]
    new_question = Question(question_text, question_answer)
    question_bank.append(new_question)

quiz = QuizBrain(question_bank)

while quiz.still_has_questions():
    quiz.next_question()

print("You've completed the quiz")
print(f"Your final score was: {quiz.score}/{quiz.question_number}")

question_model.py

class Question:
    def __init__(self, q_text, q_answer):
        self.text = q_text
        self.answer = q_answer

quiz_brain.py

class QuizBrain:
    def __init__(self, q_list):
        self.question_number = 0
        self.score = 0
        self.question_list = q_list
    def still_has_questions(self):
        return self.question_number < len(self.question_list)
    def next_question(self):
        current_question = self.question_list[self.question_number]
        self.question_number += 1
        user_answer = input(f"Q.{self.question_number}: {current_question.text} (True/False): ")
        self.check_answer(user_answer, current_question.answer)
    def check_answer(self, user_answer, correct_answer):
        if user_answer.lower() == correct_answer.lower():
            self.score += 1
            print("You got it right!")
        else:
            print("That's wrong.")
        print(f"The correct answer was: {correct_answer}.")
        print(f"Your current score is: {self.score}/{self.question_number}")
        print("\n")

main.pyを実行するとクイズのプログラムが走ります。

データ分析（Pandas）

フォルダの構成

フォルダ構成は好みによりますが、自分は以下の通りで構成します。

プロジェクト名
- data
  - input
    - rawdata1.csv
    - rawdata2.csv
    - rawdata3.csv
  - output
    - preprocess.csv
- ui
- backend
- notebook

`import ◯◯◯ as ◯◯◯` ... ライブラリをインポート

コード

import numpy as np
import pandas as pd

`pd.read_csv("★ファイルのパス★")` ... CSVファイルを読み込む

コード

file = "○○○○/○○○○○○○.csv"
df = pd.read_csv(file)

fileの変数には、ファイルのパスを指定します。

オンライン上にある場合はURLを指定しても大丈夫です。

上記のフォルダ構成で進める場合は、以下のようなコードをテンプレ的に使用しても良いかと思います。

コード

dir = "../data"
input_dir = f"{dir}/input"
output_dir = f"{dir}/output"
customer_file = f"{input_dir}/customer_master.csv"
product_file = f"{input_dir}/product_master.csv"
transaction_file = f"{input_dir}/transaction.csv"
output_file = f"{output_dir}/transaction_processed.csv"
df_customer = pd.read_csv(customer_file)
df_product = pd.read_csv(product_file)
df_transaction = pd.read_csv(transaction_file)

`df.head()`, `df.shape()` etc ... データの情報をチェックする

コード

#データの内容を2行分だけ見る
df.head(2)
#行数とカラム数を確認
df.shape
#各カラムを確認
df.columns
#各カラムとその中に入っている情報（値の数やデータタイプなど）を確認
df.info()

`df.max()`, `df.min()`, etc ... 行の情報を確認する

コード

#行の値を確認
df['sales']
#行の最大値を確認
df.max()
#行の最小値を確認
df.min()
#行の各値を確認
df.unique()
#行の各値が何種類あるか確認
df.nunique()

`df.drop(["不要な列"],axis=1,inplace=True)` ... 不要な列を削除

コード

df.drop(["quantity"],axis=1,inplace=True)

`df.isnull().sum()` ... 欠損値を確認

コード

df.isnull().sum()

結果：

オーダーID 0
オーダー日 0
顧客ID 3
商品ID 3
数量 0
売上 0
出荷日 0
キャンセル日 13665
ステータス 0

`df.loc[df['★カラム名★'].isnull()]` ... 欠損値の行を確認

コード

df.loc[df['顧客ID'].isnull()]

結果

	オーダーID	オーダー日	顧客ID	商品ID	売上	出荷日	キャンセル日	ステータス
237	201902090006-1	2019-02-09	NaN	NaN	999999	2019-02-17	NaN	テスト
239	201902090008-1	2019-02-09	NaN	NaN	999999	2019-02-15	NaN	テスト
240	201902090000-1	2019-02-09	NaN	NaN	999999	2019-02-15	NaN	テスト

locはデータテーブルから条件に合致したデータを参照できます。

よく使用するので覚えておくとよいです。

次に、以下のコードにより行を削除します。

コード

df.dropna(subset=['商品ID'],inplace=True)

次に重複が消えているか確認します。

コード

df_order.isnull().sum()

`df.duplicated(subset=['★カラム名★']).sum()` ... 重複データを確認、削除

コード

df.duplicated(subset=['オーダーID']).sum()

重複の数を確認します。

結果は「261」みたいな数値で返ってきます。

ある場合は、重複している行を実際に見てみます。

コード

df.loc[df.duplicated(subset=['オーダーID'],keep=False)].sort_values(['オーダーID'])

以下のような結果が返ってきます。

	オーダーID	オーダー日	顧客ID	商品ID	数量	売上	出荷日	キャンセル日	ステータス
10865	202203010000-1	2022-03-01	C00000340	SW-CWWM	1	18400	2022-03-05	NaN	配達済み
11126	202203010000-1	2022-03-01	C00000340	SW-CWWM	1	18400	2022-03-05	NaN	配達済み
11142	202203010007-1	2022-03-01	C00000314	SW-CWWM	1	18400	2022-03-05	NaN	配達済み

重複を削除

次に重複分を削除します。

コード

df.dropna(subset=['商品ID'],inplace=True)

確認します。

コード

df.duplicated(subset = ['オーダーID']).sum()

データ型の変換

`pd.to_datetime(df['★カラム名★'], format='%Y-%m-%d')` ... 時間軸のデータタイプを「Object」から「Date」に変換

コード

df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')

時間軸のデータの値が「文字列」として入っていることが多いのですが、その場合はデータタイプを「時間」に変更しなければなりません。

「pd.to_datetime」を使用します。

ちなみにCSVファイルを読み込む際に設定することも可能です。

コード

file = "○○○○/○○○○○○○.csv"
df = pd.read_csv(file, parse_dates=["date"])

グループ化

`df.groupby(df['★カラム名★'].dt.year).sum()` ... グループ化する

コード

df_yearly = df.groupby(df['date'].dt.year).sum()

年毎のデータを見たい時など、行をグループ化する際に使用します。

カラム名の変更

`df.rename(columns={"★今のテキスト★":"★変更したいテキスト★"})` ... カラムの名前を変える

コード

df.rename(columns={"sales":"revenue"},inplace=True)

カラム名をsalesからrevenueに変更しました。

データテーブルの結合

`pd.merge(df_1 df_2, on = '顧客ID', how='left')` ... データテーブルを結合する

コード

df_merge = pd.merge(df_order, df_customer, on = '顧客ID', how='left')

複数のテーブルが存在する場合は、全てを結合したテーブルを作っておくと便利です。

作成後も、念のため重複と欠損も確認しておきましょう。

コード

df_merge = pd.merge(df_order, df_customer, on = '顧客ID', how='left')
df_merge = pd.merge(df_merge,df_item,on='商品ID', how='left')

CSVファイルで出力

`df.to_csv("★ファイルのパス★",index=False)` ... データをCSVファイルで出力する

コード

df_merge.to_csv("../data/order_processed.csv",index=False)

特定のカラムでデータテーブルを作成

`df[["★カラム１★", "★カラム２★"]]` ... 任意のカラムのみで構成されたデータテーブルを作る

コード

df[["transaction_date", "sales"]]

結果：

transaction_date	sales
2020-01-02	779400
2020-01-03	1230
2020-01-04	1992800
2020-01-05	231400
2020-01-06	4749100

インデックスカラムを指定

`df_filtered.set_index('★カラム１★', inplace=True)` ... 任意のカラムをインデックスカラム（一番左端のカラム）に指定する

コード

df.set_index('transaction_date', inplace=True)

インデックスカラムをリセット

`df.reset_index()` ... インデックスカラム（一番左端のカラム）をリセットする（デフォルトの設定に戻す）

コード

df_weekly_sales_2019 = df_daily_sales_detail[df_daily_sales_detail['year'] == 2019].groupby("week").agg({"sales":"sum"}).reset_index()

データの可視化（Matplotlib）

売上の推移を確認

コード

plt.figure(figsize=(30,6))
df_date_index = df.groupby("date").sum().reset_index()
sns.lineplot(x="date",y="sales",data=df_date_index)
plt.show()

売上の推移グラフ — ECサイトのデータ分析としては個人的に一番最初に見るグラフです。

.reset_index()はdateのカラムがインデックスになってしまったのを外すために使用します。

売上の推移をセグメントして確認

コード

plt.figure(figsize=(30,6))
df_date_store = df.groupby(["date","store"]).sum().reset_index()
sns.lineplot(x="date",y="sales",data=df_date_store[df_tmp["store"]==1])
sns.lineplot(x="date",y="sales",data=df_date_store[df_tmp["store"]==2])
sns.lineplot(x="date",y="sales",data=df_date_store[df_tmp["store"]==3])
plt.show()

ストア単位の売上推移グラフ — ストア単位でセグメントした形で売上の推移を見ています。

Streamlit

利用回数を３回までにする+パスワード入力が必要にする

コード


# Only show the login form if the user is not authenticated
if not st.session_state['authenticated']:
    password_placeholder = st.empty()
    login_button_placeholder = st.empty()
    password = password_placeholder.text_input("Enter your password", type="password")
    if login_button_placeholder.button('Login'):
        if password == password_key:
            st.session_state['authenticated'] = True
            st.session_state['usage_count'] = 0 # Reset the usage count upon new login
            password_placeholder.empty()
            login_button_placeholder.empty()
            st.success("Login successful!")
        else:
            st.error('Wrong password')
# If authenticated, show the main page content
if st.session_state['authenticated']:
    # Define a maximum number of uses
    max_uses = 3
    if st.session_state['usage_count'] < max_uses:
        # Main Contents Start from here -------------------------------
        pass
    else:
        st.error("You have reached your maximum usage limit.")

Googleスプレッドシートと連携する

ファイルを読み込む

コード


from google.colab import drive
drive.mount('/content/drive')
from google.colab import auth
auth.authenticate_user()
import gspread
from google.auth import default
creds, _ = default()
gc = gspread.authorize(creds)
# Open Google Sheet (file)
filename = "Google Spreadsheet File Name"
ss = gc.open(filename)
# Open the Sheet
st = ss.worksheet("sheet1")
rows = st.get_all_values()
df = pd.DataFrame.from_records(rows[1:], columns=rows[0])

Gemini API

Google Colabを利用する場合のコードを記載します。

Google AI StudioよりAPIキーを入手して、Google Colabのノートの左側にある「secrets」というメニューから、「GOOGLE_API_KEY」という変数名でAPIキーを入力ください。

APIキーを読み込む

コード

!pip install -q -U google-generativeai
import google.generativeai as genai
from google.colab import userdata
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

これでGeminiのAPIキーが使用できます。

基本型

コード

from google.generativeai import GenerativeModel
prompt ="ここにプロンプトを入力" # @param {type:"string"}
model = GenerativeModel('gemini-pro')
response = model.generate_content(prompt)
Markdown(response.text)

一番シンプルな構文です。

モデルにパラメーターを設定する

コード

from google.generativeai import GenerativeModel, GenerationConfig
from IPython.display import Markdown
prompt ="ここにプロンプトを入力" # @param {type:"string"}
config = GenerationConfig(
    max_output_tokens = 300,
    temperature = 0.75,
    top_p = 0.5,
    top_k = 100
)

model = GenerativeModel('gemini-pro', generation_config=config)
response = model.generate_content(prompt)
Markdown(response.text)

モデルに人格を設定する

コード

from google.generativeai import GenerativeModel
from IPython.display import Markdown
model = GenerativeModel('gemini-pro')

prompt_parts =[
    "input: あなたは誰ですか？",
    "output: 私はSEOコンサルのタローです。",
    "input:何歳ですか?",
    "output:40歳です",
    "input:あなたの職業は？",
    "output: 私はSEOコンサルタントです",
    "input: " + prompt,
    "output: ",
]

response = model.generate_content(prompt_parts)
response.text

会話を記録して、チャットできるようにする

コード

from google.generativeai import GenerativeModel
from IPython.display import Markdown
model = GenerativeModel('gemini-pro')
chat = model.start_chat(history=[])
prompt = "ここに会話を入力" # @param {type:"string"}
response = chat.send_message(prompt)
Markdown(response.text)

回答文を、1文字ずつ表示する

コード

import time
from google.generativeai import GenerativeModel
prompt = "広告について簡単に説明してください。" # @param{type:"string"}

model = GenerativeModel('gemini-pro')
response = model.generate_content(prompt, stream=True)

for item in response:
  for c in item.text:
    print(c, end="")
    time.sleep(0.1)

１度に複数のプロンプトを送る

コード

from google.generativeai import GenerativeModel
model = GenerativeModel('gemini-pro')

message = [
    {
        'role':'user',
        'parts': ["5歳でも分かるように、デジタルマーケティングの仕組みを説明して",
                  "5歳にわかるように、SEOの仕組みを説明して",
                  "5歳にわかるように、広告の仕組みを説明して"]
    }
]
response = model.generate_content(message)

Markdown(response.text)

プロンプトにパラメーターを設定する

コード

from google.generativeai import GenerativeModel, GenerationConfig
prompt = "ここにプロンプトを入力" # @param{type:"string"}

model = GenerativeModel('gemini-pro')
config = GenerationConfig(
    max_output_tokens=300,
    temperature=0.75,
    top_p=0.3,
    top_k=100
)

response = model.generate_content(
    prompt,
    generation_config=config)

Markdown(response.text)

プロンプト毎にパラメーターを設定する場合は、モデルでなくプロンプトにパラメーターを付与します。

安全性評価をする

コード

from google.generativeai import GenerativeModel
from IPython.display import Markdown

safety_setting =[
    {
        "category": "HARM_CATEGORY_HARASSMENT",
        "threshold": "BLOCK_ONLY_HIGH"
    },
    {
        "category": "HARM_CATEGORY_HATE_SPEECH",
        "threshold":"BLOCK_MEDIUM_AND_ABOVE"
    },
    {
        "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
        "threshold":"BLOCK_LOW_AND_ABOVE"
    },
    {
        "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
        "threshold":"BLOCK_NONE"
    },
]

model = GenerativeModel(
    'gemini-pro',
    safety_settings=safety_setting)
prompt = "ここにプロンプトを入力" # @param{type:"string"}

response = model.generate_content(prompt)
Markdown(response.text)

プロンプト毎にパラメーターを設定する場合は、モデルでなくプロンプトにパラメーターを付与します。

Gemini Pro Visionを使って画像からテキストを作成

コード

import google.generativeai as genai
from google.colab import userdata

GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

from google.generativeai import GenerativeModel

model = GenerativeModel('gemini-pro-vision')

Pythonのチートシート（基礎構文、Pandasなど）

目次

基礎構文

print(◯◯◯) ... テキストや変数を表示

◯◯◯ = △△△ ... 変数（variable）

input(◯◯◯) ... ユーザーに文字入力させる

str, int, float, etc ... データの型変換

+, -, *, / ... 計算

print(f"◯◯◯") ... f string

if ... if構文（if statement）

random.randint(◯, ◯) ... ランダムモジュール（Random Module）

[◯◯, ◯◯,◯◯] ... リスト（List）

{◯◯◯ = ◯◯◯, ◯◯◯ = ◯◯◯, ◯◯◯ = ◯◯◯} ... 辞書（Dictionaries）

for ◯◯◯ in ◯◯◯: ... forループ（for loop）

def ◯◯◯: ... ファンクション（function）

import ◯◯◯ ... 別のファイルから変数を呼び出す