平时没事的时候除了刷抖音还喜欢逛B站,但在B站看见有趣的视频时候想要下载下来,却发现没有下载的按钮,人生是一个发现困难并且解决困难的过程,既然你不让我下,那我就非得下,爬虫可见及可爬,那我就用爬虫给你爬下来,
有请受害者B站排行榜舞蹈区排名第一:https://www.bilibili.com/video/BV1c341187m9
先分析一波:打开检查抓一下包,因为视频是异步加载的所以抓xhr的数据,清空数据包再将视频打开,发现多了不少数据包,但哪些是我们需要的数据呢,先看一下有什么特殊的数据,我们知道,视频是比较大的并且B站的视频是音频分离的,所以我们要找两个数据包,一个是视频一个是音频,

发现抓到了很多带有一堆数字的数据包,而且有的还不一样,返回的东西看不懂,仔细看一下url,发现里面有没见过的后缀名.m4s,查了一下发现M4S属于HTML5播放格式,可以为视频、也可以为音频,如此历来视频的数据包就找到了,这里有两种数据包,一个是后缀名前是30077的,一个是30280的
1 2 3
| https://xy221x131x191x56xy.mcdn.bilivideo.cn:4483/upgcxcode/95/78/439527895/439527895-1-30077.m4s?e=ig8euxZM2rNcNbdlhoNvNC8BqJIzNbfqXBvEqxTEto8BTrNvN0GvT90W5JZMkX_YN0MvXg8gNEV4NC8xNEV4N03eN0B5tZlqNxTEto8BTrNvNeZVuJ10Kj_g2UB02J0mN0B5tZlqNCNEto8BTrNvNC7MTX502C8f2jmMQJ6mqF2fka1mqx6gqj0eN0B599M=&uipk=5&nbs=1&deadline=1636727944&gen=playurlv2&os=mcdn&oi=3748183839&trid=00015abe094f6dc94938917e8895505bead4u&platform=pc&upsig=df692a5e07e5ceb77c354bf38dea609f&uparams=e,uipk,nbs,deadline,gen,os,oi,trid,platform&mcdnid=9001331&mid=671157361&bvc=vod&nettype=0&orderid=0,3&agrr=0&bw=181576&logo=A0000100
https://xy221x131x191x56xy.mcdn.bilivideo.cn:4483/upgcxcode/95/78/439527895/439527895_nb2-1-30280.m4s?e=ig8euxZM2rNcNbdlhoNvNC8BqJIzNbfqXBvEqxTEto8BTrNvN0GvT90W5JZMkX_YN0MvXg8gNEV4NC8xNEV4N03eN0B5tZlqNxTEto8BTrNvNeZVuJ10Kj_g2UB02J0mN0B5tZlqNCNEto8BTrNvNC7MTX502C8f2jmMQJ6mqF2fka1mqx6gqj0eN0B599M=&uipk=5&nbs=1&deadline=1636727944&gen=playurlv2&os=mcdn&oi=3748183839&trid=00015abe094f6dc94938917e8895505bead4u&platform=pc&upsig=f9b8eb9e4545f2d0d244804b9e65e780&uparams=e,uipk,nbs,deadline,gen,os,oi,trid,platform&mcdnid=9001331&mid=671157361&bvc=vod&nettype=0&orderid=0,3&agrr=0&bw=40218&logo=A0000100
|
既然找到了数据包就对这两个url进行请求,不过这种请求一般设有防盗链,就是检查你从哪个地址跳转过来的,我们要在headers里设置referer。referer的获取方法和获取user-agent的方法一样,不过这个referer要在你所要请求的数据包里找
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| def get_page(url_30280, url_30077, headers): response1 = requests.get(url_30280, headers=headers).content response2 = requests.get(url_30077, headers=headers).content # print(response1,response2) with open('B站视频1.mp4', 'wb') as f: f.write(response1) with open('B站视频2.mp4', 'wb') as f: f.write(response2)
def main(): url_30280 = 'https://xy221x131x191x56xy.mcdn.bilivideo.cn:4483/upgcxcode/95/78/439527895/439527895_nb2-1-30280.m4s?e=ig8euxZM2rNcNbdlhoNvNC8BqJIzNbfqXBvEqxTEto8BTrNvN0GvT90W5JZMkX_YN0MvXg8gNEV4NC8xNEV4N03eN0B5tZlqNxTEto8BTrNvNeZVuJ10Kj_g2UB02J0mN0B5tZlqNCNEto8BTrNvNC7MTX502C8f2jmMQJ6mqF2fka1mqx6gqj0eN0B599M=&uipk=5&nbs=1&deadline=1636727944&gen=playurlv2&os=mcdn&oi=3748183839&trid=00015abe094f6dc94938917e8895505bead4u&platform=pc&upsig=f9b8eb9e4545f2d0d244804b9e65e780&uparams=e,uipk,nbs,deadline,gen,os,oi,trid,platform&mcdnid=9001331&mid=671157361&bvc=vod&nettype=0&orderid=0,3&agrr=0&bw=40218&logo=A0000100' url_30077 = 'https://xy221x131x191x56xy.mcdn.bilivideo.cn:4483/upgcxcode/95/78/439527895/439527895-1-30077.m4s?e=ig8euxZM2rNcNbdlhoNvNC8BqJIzNbfqXBvEqxTEto8BTrNvN0GvT90W5JZMkX_YN0MvXg8gNEV4NC8xNEV4N03eN0B5tZlqNxTEto8BTrNvNeZVuJ10Kj_g2UB02J0mN0B5tZlqNCNEto8BTrNvNC7MTX502C8f2jmMQJ6mqF2fka1mqx6gqj0eN0B599M=&uipk=5&nbs=1&deadline=1636727944&gen=playurlv2&os=mcdn&oi=3748183839&trid=00015abe094f6dc94938917e8895505bead4u&platform=pc&upsig=df692a5e07e5ceb77c354bf38dea609f&uparams=e,uipk,nbs,deadline,gen,os,oi,trid,platform&mcdnid=9001331&mid=671157361&bvc=vod&nettype=0&orderid=0,3&agrr=0&bw=181576&logo=A0000100' header = { 'referer': 'https://www.bilibili.com/video/BV1c341187m9', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36' } get_page(url_30280, url_30077, header)
if __name__ == '__main__': main()
|
运行一下就将两个数据包的内容获取到了,打开后发现视频一是纯音乐文件,视频二是纯视频文件
将音乐文件后缀名改成mp3
接下来就要把这两个文件合并起来,利用moviepy这个模块就可以合成一个完整的文件
1 2 3 4 5 6 7 8
| video = VideoFileClip('B站视频2.mp4')
audio = AudioFileClip('B站视频1.mp3')
movie = video.set_audio(audio)
movie.write_videofile('B站视频.mp4')
|
运行后打开新生成的文件就可以观看带有声音的视频了
完整代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
| import requests from moviepy.editor import *
def get_page(url_30280, url_30077, headers): response1 = requests.get(url_30280, headers=headers).content response2 = requests.get(url_30077, headers=headers).content # print(response1,response2) with open('B站视频1.mp4', 'wb') as f: f.write(response1) with open('B站视频2.mp4', 'wb') as f: f.write(response2) # 导入纯视频文件 video = VideoFileClip('B站视频2.mp4') # 导入纯音乐文件 audio = AudioFileClip('B站视频1.mp3') # 将视频文件中加入音乐文件 movie = video.set_audio(audio) # movie.weite_videofile('B站视频.mp4')
def main(): url_30280 = 'https://xy221x131x191x56xy.mcdn.bilivideo.cn:4483/upgcxcode/95/78/439527895/439527895_nb2-1-30280.m4s?e=ig8euxZM2rNcNbdlhoNvNC8BqJIzNbfqXBvEqxTEto8BTrNvN0GvT90W5JZMkX_YN0MvXg8gNEV4NC8xNEV4N03eN0B5tZlqNxTEto8BTrNvNeZVuJ10Kj_g2UB02J0mN0B5tZlqNCNEto8BTrNvNC7MTX502C8f2jmMQJ6mqF2fka1mqx6gqj0eN0B599M=&uipk=5&nbs=1&deadline=1636727944&gen=playurlv2&os=mcdn&oi=3748183839&trid=00015abe094f6dc94938917e8895505bead4u&platform=pc&upsig=f9b8eb9e4545f2d0d244804b9e65e780&uparams=e,uipk,nbs,deadline,gen,os,oi,trid,platform&mcdnid=9001331&mid=671157361&bvc=vod&nettype=0&orderid=0,3&agrr=0&bw=40218&logo=A0000100' url_30077 = 'https://xy221x131x191x56xy.mcdn.bilivideo.cn:4483/upgcxcode/95/78/439527895/439527895-1-30077.m4s?e=ig8euxZM2rNcNbdlhoNvNC8BqJIzNbfqXBvEqxTEto8BTrNvN0GvT90W5JZMkX_YN0MvXg8gNEV4NC8xNEV4N03eN0B5tZlqNxTEto8BTrNvNeZVuJ10Kj_g2UB02J0mN0B5tZlqNCNEto8BTrNvNC7MTX502C8f2jmMQJ6mqF2fka1mqx6gqj0eN0B599M=&uipk=5&nbs=1&deadline=1636727944&gen=playurlv2&os=mcdn&oi=3748183839&trid=00015abe094f6dc94938917e8895505bead4u&platform=pc&upsig=df692a5e07e5ceb77c354bf38dea609f&uparams=e,uipk,nbs,deadline,gen,os,oi,trid,platform&mcdnid=9001331&mid=671157361&bvc=vod&nettype=0&orderid=0,3&agrr=0&bw=181576&logo=A0000100' header = { 'referer': 'https://www.bilibili.com/video/BV1c341187m9', 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36' } get_page(url_30280, url_30077, header)
|