Skip to content
This repository has been archived by the owner on Jan 9, 2022. It is now read-only.

网易公开课抓取不完整 #24

Open
TonySue2000 opened this issue Dec 6, 2018 · 12 comments
Open

网易公开课抓取不完整 #24

TonySue2000 opened this issue Dec 6, 2018 · 12 comments

Comments

@TonySue2000
Copy link

课程链接:http://open.163.com/special/opencourse/daishu.html
一共35节课,但只抓取前10节,如图
image
随意选取其他课,也只抓取前10节
后在课程页面按F12自行调查,如图
image
课程列表里的十节课后面的课要自行展开,右边网页代码也显示前边和后边的课并不是放在一起的,维护者可能忽略了这一点导致课程抓取不完整
还是希望哪位维护者可以抽空改一下,不胜感激.

@SigureMo
Copy link
Contributor

SigureMo commented Dec 6, 2018

Review open_163
前两天发现并解决了 你这应该是没更新的:joy:

@TonySue2000
Copy link
Author

image
更新了之后如图报错
上方那一段能抓到所有的视频(到期末复习),但下面到十九课之后终止

@SigureMo
Copy link
Contributor

SigureMo commented Dec 9, 2018

win下确实可以解析19的video_url(视频有点大,没下下来测试是否能看),但Mac下和Linux下会解析粗来一串‘\x10\x10\x10',所以在此处会引发崩溃,emmm,最近没太多时间,如有急用建议先使用win解析课程链接吧

这节课解析出来的url确实和其他课不一样,是/open-movie/下的视频(其他均为/movie/),但这不应该影响跨平台性,暂时不资到似什么问题,有空再嗦吧:joy:~

@SigureMo
Copy link
Contributor

刚刚对问题进行重新审查了下,发现问题并不是由于什么平台的问题(当然也不应该是),只是我win上默认用了sd清晰度,而当时只是远程的win没考虑到这些

该课程的hd的flv链接确实无法解析,按照主关键字shd、hd、sd以及次关键字mp4、flv的顺序,将会先尝试hd、flv的视频,这刚好是不能解析的那个视频,而使用sd参数后,将会先尝试sd、mp4的视频,这个刚好可以解析,这使得我以为win下可以、Linux不可以……真的好蠢啊

大概地改了下逻辑,当出现无法解析的问题时会自动换模式(清晰度以及格式)

@TonySue2000
Copy link
Author

image
额...数组越界???
已经更新,用的是@SigureMo 的Fork

@SigureMo
Copy link
Contributor

SigureMo commented Dec 17, 2018

课程链接

@TonySue2000
Copy link
Author

还是我第一个用的链接啊
http://open.163.com/special/opencourse/daishu.html

@SigureMo
Copy link
Contributor

Ubuntu server测试没任何问题啊……

@TonySue2000
Copy link
Author

PS C:\Users\47999\Desktop\course-crawler-master> python mooc.py http://open.163.com/special/opencourse/daishu.html
Traceback (most recent call last):
File "C:\Users\47999\Desktop\course-crawler-master\mooc\open_163.py", line 11, in
from Crypto.Cipher import AES
ModuleNotFoundError: No module named 'Crypto'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "mooc.py", line 107, in
main()
File "mooc.py", line 84, in main
from mooc import open_163
File "C:\Users\47999\Desktop\course-crawler-master\mooc\open_163.py", line 13, in
from crypto.Cipher import AES # pip install pycryptodome
File "C:\Users\47999\AppData\Local\Programs\Python\Python37-32\lib\site-packages\crypto\Cipher_init_.py", line 27, in
from Crypto.Cipher._mode_ecb import _create_ecb_cipher
ModuleNotFoundError: No module named 'Crypto'
PS C:\Users\47999\Desktop\course-crawler-master> pip install Crypto
Requirement already satisfied: Crypto in c:\users\47999\appdata\local\programs\python\python37-32\lib\site-packages (1.4.1)
Requirement already satisfied: shellescape in c:\users\47999\appdata\local\programs\python\python37-32\lib\site-packages (from Crypto) (3.4.1)
Requirement already satisfied: Naked in c:\users\47999\appdata\local\programs\python\python37-32\lib\site-packages (from Crypto) (0.1.31)
Requirement already satisfied: requests in c:\users\47999\appdata\local\programs\python\python37-32\lib\site-packages (from Naked->Crypto) (2.21.0)
Requirement already satisfied: pyyaml in c:\users\47999\appdata\local\programs\python\python37-32\lib\site-packages (from Naked->Crypto) (3.13)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\47999\appdata\local\programs\python\python37-32\lib\site-packages (from requests->Naked->Crypto) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\47999\appdata\local\programs\python\python37-32\lib\site-packages (from requests->Naked->Crypto) (2018.11.29)
Requirement already satisfied: idna<2.9,>=2.5 in c:\users\47999\appdata\local\programs\python\python37-32\lib\site-packages (from requests->Naked->Crypto) (2.8)
Requirement already satisfied: urllib3<1.25,>=1.21.1 in c:\users\47999\appdata\local\programs\python\python37-32\lib\site-packages (from requests->Naked->Crypto) (1.24.1)
PS C:\Users\47999\Desktop\course-crawler-master> pip install pycryptodome
Requirement already satisfied: pycryptodome in c:\users\47999\appdata\local\programs\python\python37-32\lib\site-packages (3.7.2)
今天切换到windows下用,竟然遇到依赖包问题???
然而Crypto这个包我是有打的啊。。。

@SigureMo
Copy link
Contributor

Traceback (most recent call last):
File "mooc.py", line 107, in
main()
File "mooc.py", line 84, in main
from mooc import open_163
File "C:\Users\47999\Desktop\course-crawler-master\mooc\open_163.py", line 13, in
from crypto.Cipher import AES # pip install pycryptodome
File "C:\Users\47999\AppData\Local\Programs\Python\Python37-32\lib\site-packages\crypto\Cipher_init_.py", line 27, in
from Crypto.Cipher._mode_ecb import _create_ecb_cipher
ModuleNotFoundError: No module named 'Crypto'

win不要用Crypto 用pycryptodome……

@TonySue2000
Copy link
Author

Traceback (most recent call last):
File "mooc.py", line 107, in
main()
File "mooc.py", line 84, in main
from mooc import open_163
File "C:\Users\47999\Desktop\course-crawler-master\mooc\open_163.py", line 13, in
from crypto.Cipher import AES # pip install pycryptodome
File "C:\Users\47999\AppData\Local\Programs\Python\Python37-32\lib\site-packages\crypto\Cipher_init_.py", line 27, in
from Crypto.Cipher._mode_ecb import _create_ecb_cipher
ModuleNotFoundError: No module named 'Crypto'

win不要用Crypto 用pycryptodome……

倒数第二句和倒数第三句,我已经打了啊....

@SigureMo
Copy link
Contributor

装卸了N次发现先装 crypto 再装 pycryptodome 会导致 pycryptodome 也不好用,暂时不清楚什么情况,但是可以先把它们全卸掉后重装(只装 pycryptodome )就可以解决

pip uninstall pycryptodome crypto
pip install pycryptodome

我记得当初我使用先安 crypto 后安 pycryptodome 时 crypto 会好用的,不过并没有仔细测试,现在看来只有 Crypto 是好用的(无论 *nix、win),后面我会注意下这里,对try except进行一些改动,多谢发现问题啦╰( ´・ω・)つ──☆✿✿✿

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants