Pythonでyahoo画像検索した結果をimgcatに流して表示してURLをクリップボードにコピーするやつ

- はじめに -

近年では、チャットツールの発展が睦まじく、グループ内、企業内においてもチャットツールによるコミュニケーションが盛んとなっている。

チャットツールでのコミュニケーションにおいて欠かせないのが、画像によるハイコンテクストなやり取りである。
互いに同じレベルでの前提知識を持ち合わせている時、「有名な漫画のコマ」や「その場の状況を風刺する画像」を共有するコミュニケーションは、一般的な文字でのやり取りよりも時に頑強となる事が多い。

本記事では、有名な画像を検索しチャットに貼るために必要な工程である、画像検索、選択、コピーを簡略化するため、xonshを利用したPythonによる画像検索スクリプトを提示する。

つまるところコンソールだけで以下のように画像検索、URLコピーまでを扱えるようにする。
f:id:vaaaaaanquish:20180613153439p:plain
書いたスクリプト：imgsearch_on_xonsh · GitHub

- 画像検索先 -

なんかGoogleは普通にクロールしようとするとすぐBANされるしAPIも画像検索は全然ダメなので、Yahoo!画像検索を利用する。

- yahoo画像検索の結果を取得する -

あるワードでYahoo!の画像検索をかけ、結果のURLを取得するPython スクリプトを書く。
BeautifulSoupでHTMLを解析し、画像のURLだけ取ってくる。

from mimetypes import guess_extension
from urllib.request import urlopen, Request
from urllib.parse import quote
from bs4 import BeautifulSoup

def _request(url):
    # requestを処理しHTMLとcontent-typeを返す
    req = Request(url)
    try:
        with urlopen(req, timeout=5) as p:
             b_content = p.read()
             mime = p.getheader('Content-Type')
    except:
        return None, None
    return b_content, mime

def _yahoo_img_search(word):
    # yahoo!画像検索の結果から画像のURLのlistを返す
    url = 'http://image.search.yahoo.co.jp/search?n=60&p={}&search.x=1'.format(quote(word))
    byte_content, _ = _request(url)
    structured_page = BeautifulSoup(byte_content.decode('UTF-8'), 'html.parser')
    img_link_elems = structured_page.find_all('a', attrs={'target': 'imagewin'})
    # 順番守りつつset取る
    seen = set()
    seen_add = seen.add
    img_urls = [e.get('href') for e in img_link_elems if e.get('href') not in seen and not seen_add(e.get('href'))]
    return img_urls

print(_yahoo_img_search('hoge piyo'))

画像の検索結果のURLが取得できた。
これだけでも使えるけど、コンソールで選択的にコピーしたいのでもうちょっとがんばる。

- URLから画像をローカルに保存する -

取得したURLから、画像を一旦ローカルに保存してやる。

今回最終的な目的がコンソール上での処理なので、10枚の画像を想定。

この処理が最も重たくなるのでmultiprocessingでよしなにやる。

先述のスクリプトの関数を利用して以下。

import os
import sys
from PIL import Image
from multiprocessing import Pool
from multiprocessing import cpu_count

def _save_img(t):
    # (id+'\t'+url)を受け取って/tmp/img配下にid名で画像を保存する
    img, mime = _request(t.split('\t')[1])
    if mime is None or img is None:
        return ''
    # 拡張子
    ext = guess_extension(mime.split(';')[0])
    if ext in ('.jpe', '.jpeg', '.png', '.gif'):
        ext = '.jpg'
    if not ext:
        return ''
    # 保存
    result_file = os.path.join('/tmp/img', t.split('\t')[0] + ext)
    with open(result_file, mode='wb') as f:
        f.write(img)
    # multiprocessingからprintするにはこう
    sys.stdout.write('.')
    sys.stdout.flush()
    return result_file


def _img_d(word):
   # tmp/img無かったら作る
    if not os.path.exists('/tmp/img'):
        os.makedirs('/tmp/img')
    # wordに対して検索結果の画像URL取得し10個に絞る
    t = _yahoo_img_search(word)
    if len(t)<10:
        print('Not Found 10 IMG.')
        return [], []
    t = t[:10]
    # id付ける
    urls = [str(i)+'\t'+x for i,x in enumerate(t)]
    # multiprocessで画像ダウンロード
    cpu = cpu_count()
    p = Pool(cpu-1)
    a = p.map(_save_img, urls)
    p.close()
    print('saved images.')
    return a, t

上記スクリプトで/tmp/imgにYahoo!画像検索の結果が保存される。
あとでけす。

- 画像を並べてxonshを利用してimgcat -

プロンプト上で画像を表示するにはimgcatを使う。
私はiterm2を使っているので以下を利用する。
www.iterm2.com

Linuxでimgcatを使いたい場合は以下。
github.com

先述のスクリプトで保存された画像をそれぞれ読み込んで、サイズを250*250に加工する。
横に5つずつ並べてtmp/h.jpgを作成し、imgcatに流す。

imgcatに流した画像に対して、番号入力を待ち受ける形にし、番号に相当する画像URLをpbcopyによってPCのクリップボードにコピーする。

def _imgs(word):
    # 引数をつなげて画像検索、ダウンロード
    word = ' '.join(word)
    paths, urls = _img_d(word)
    if not paths or not urls:
        print('Bad input.')
    else:
        # 画像をリサイズし、縦2*横5で並べた画像を生成
        img = Image.new('RGB', (250 * 5, 500))
        for j in range(10):
            im = Image.open(paths[j]).resize((250, 250))
            if j >= 5:
                img.paste(im, ( 250*(j-5), 250))
            else:
                img.paste(im, ( 250*j, 0))
        img.save("/tmp/h.jpg")
        # imgcatで表示
        imgcat /tmp/h.jpg
        # input待ち受け
        img_num = input('image number(1~10) : ')
        # inputした結果に対応するURLをシェルコマンドを利用してクリップボードに流す
        try:
            echo -n @(urls[int(img_num)+1]) | pbcopy
        except:
            print('Bad input.')

aliases['imgs'] = _imgs