微博主页图片爬取

得意时要看淡,失意时要看开。不论得意失意,切莫大意;不论成功失败,切莫止步。志得意满时,需要的是淡然,给自己留一条退路;失意落魄时,需要的是泰然,给自己觅一条出路微博主页图片爬取,希望对大家有帮助,欢迎收藏,转发!站点地址:www.bmabk.com,来源:原文

#!/usr/bin/env python
# encoding: utf-8
'''
@author: JHC 
@license: None
@contact: JHC000abc@gmail.com
@file: start.py
@time: 2022/07/06/ 17:06
@desc:since_id 有时为空,多跑几次就可以了
'''
import requests
import re
from threading import Thread


cookies = {
    'SUB': '_2A25Pz5bbDeRhGeFJ7VEV9i_FzjWIHXVsvI8TrDV8PUNbmtAKLW2hkW9Nf2FO4AI2fqiHmH8XvBCKkwDRmTSZsiW3',
}

headers = {
    'accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
}

def down_load_pic(url):
    '''
    下载器
    '''
    res = requests.get(url)
    name = url.split("/")[-1]
    with open("./pic_src/{}".format(name),mode="wb")as fp:
        fp.write(res.content)
        
def get_weibo_pic(since_id):
    '''
    非首页数据爬取
    '''
    params = {
        'page_id': '1004061646239802',
        'ajax_call': '1',
        "since_id":"{}-1".format(since_id)
    }

    response = requests.get('https://weibo.com/p/aj/album/loading', params=params, cookies=cookies, headers=headers)
    # print(response.text.replace("\\",""))
    src = re.findall('<img class="photo_pict" src="(.*?)"/></a>',response.text.replace("\\",""))
    for i in src:
        pic_url = "https:"+i
        t1 = Thread(target=down_load_pic,args=(pic_url,))
        t1.start()
        # with open("./weibo.txt","a",encoding="utf-8")as fp:
        #     fp.write("https:"+i+"\n")
    since_id = re.findall('&since_id=(.*?)-1',response.text.replace("\\",""))
    print("since_id = ",since_id)
    if since_id != []:
        get_weibo_pic(since_id[0])
    else:
        print("已经爬取完成!!!")


def get_first_msg(url):
    '''
    首页数据爬取
    '''
    response = requests.get(url,
                            cookies=cookies)
    since_id = re.findall('&since_id=(.*?)-1', response.text.replace("\\", ""))
    # print("since_id = ", since_id)
    src = re.findall('<img class="photo_pict" src="(.*?)"/></a>', response.text.replace("\\", ""))
    for i in src:
        pic_url = "https:" + i
        # print(pic_url)
        t2 = Thread(target=down_load_pic, args=(pic_url,))
        t2.start()

    return since_id[0]



url = input("请输入要爬取的主页链接:")
since_id = get_first_msg(url)
get_weibo_pic(since_id)

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

文章由极客之家整理,本文链接:https://www.bmabk.com/index.php/post/156917.html

(0)
飞熊的头像飞熊bm

相关推荐

发表回复

登录后才能评论
极客之家——专业性很强的中文编程技术网站,欢迎收藏到浏览器,订阅我们!