forked from TILPJ/data_donhwi
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.html
380 lines (314 loc) · 21.2 KB
/
README.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
<!DOCTYPE html><html><head>
<title>README</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="file:///c:\Users\howsi\.vscode\extensions\shd101wyy.markdown-preview-enhanced-0.5.21\node_modules\@shd101wyy\mume\dependencies\katex\katex.min.css">
<style>
/**
* prism.js Github theme based on GitHub's theme.
* @author Sam Clarke
*/
code[class*="language-"],
pre[class*="language-"] {
color: #333;
background: none;
font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace;
text-align: left;
white-space: pre;
word-spacing: normal;
word-break: normal;
word-wrap: normal;
line-height: 1.4;
-moz-tab-size: 8;
-o-tab-size: 8;
tab-size: 8;
-webkit-hyphens: none;
-moz-hyphens: none;
-ms-hyphens: none;
hyphens: none;
}
/* Code blocks */
pre[class*="language-"] {
padding: .8em;
overflow: auto;
/* border: 1px solid #ddd; */
border-radius: 3px;
/* background: #fff; */
background: #f5f5f5;
}
/* Inline code */
:not(pre) > code[class*="language-"] {
padding: .1em;
border-radius: .3em;
white-space: normal;
background: #f5f5f5;
}
.token.comment,
.token.blockquote {
color: #969896;
}
.token.cdata {
color: #183691;
}
.token.doctype,
.token.punctuation,
.token.variable,
.token.macro.property {
color: #333;
}
.token.operator,
.token.important,
.token.keyword,
.token.rule,
.token.builtin {
color: #a71d5d;
}
.token.string,
.token.url,
.token.regex,
.token.attr-value {
color: #183691;
}
.token.property,
.token.number,
.token.boolean,
.token.entity,
.token.atrule,
.token.constant,
.token.symbol,
.token.command,
.token.code {
color: #0086b3;
}
.token.tag,
.token.selector,
.token.prolog {
color: #63a35c;
}
.token.function,
.token.namespace,
.token.pseudo-element,
.token.class,
.token.class-name,
.token.pseudo-class,
.token.id,
.token.url-reference .token.variable,
.token.attr-name {
color: #795da3;
}
.token.entity {
cursor: help;
}
.token.title,
.token.title .token.punctuation {
font-weight: bold;
color: #1d3e81;
}
.token.list {
color: #ed6a43;
}
.token.inserted {
background-color: #eaffea;
color: #55a532;
}
.token.deleted {
background-color: #ffecec;
color: #bd2c00;
}
.token.bold {
font-weight: bold;
}
.token.italic {
font-style: italic;
}
/* JSON */
.language-json .token.property {
color: #183691;
}
.language-markup .token.tag .token.punctuation {
color: #333;
}
/* CSS */
code.language-css,
.language-css .token.function {
color: #0086b3;
}
/* YAML */
.language-yaml .token.atrule {
color: #63a35c;
}
code.language-yaml {
color: #183691;
}
/* Ruby */
.language-ruby .token.function {
color: #333;
}
/* Markdown */
.language-markdown .token.url {
color: #795da3;
}
/* Makefile */
.language-makefile .token.symbol {
color: #795da3;
}
.language-makefile .token.variable {
color: #183691;
}
.language-makefile .token.builtin {
color: #0086b3;
}
/* Bash */
.language-bash .token.keyword {
color: #0086b3;
}
/* highlight */
pre[data-line] {
position: relative;
padding: 1em 0 1em 3em;
}
pre[data-line] .line-highlight-wrapper {
position: absolute;
top: 0;
left: 0;
background-color: transparent;
display: block;
width: 100%;
}
pre[data-line] .line-highlight {
position: absolute;
left: 0;
right: 0;
padding: inherit 0;
margin-top: 1em;
background: hsla(24, 20%, 50%,.08);
background: linear-gradient(to right, hsla(24, 20%, 50%,.1) 70%, hsla(24, 20%, 50%,0));
pointer-events: none;
line-height: inherit;
white-space: pre;
}
pre[data-line] .line-highlight:before,
pre[data-line] .line-highlight[data-end]:after {
content: attr(data-start);
position: absolute;
top: .4em;
left: .6em;
min-width: 1em;
padding: 0 .5em;
background-color: hsla(24, 20%, 50%,.4);
color: hsl(24, 20%, 95%);
font: bold 65%/1.5 sans-serif;
text-align: center;
vertical-align: .3em;
border-radius: 999px;
text-shadow: none;
box-shadow: 0 1px white;
}
pre[data-line] .line-highlight[data-end]:after {
content: attr(data-end);
top: auto;
bottom: .4em;
}html body{font-family:"Helvetica Neue",Helvetica,"Segoe UI",Arial,freesans,sans-serif;font-size:16px;line-height:1.6;color:#333;background-color:#fff;overflow:initial;box-sizing:border-box;word-wrap:break-word}html body>:first-child{margin-top:0}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{line-height:1.2;margin-top:1em;margin-bottom:16px;color:#000}html body h1{font-size:2.25em;font-weight:300;padding-bottom:.3em}html body h2{font-size:1.75em;font-weight:400;padding-bottom:.3em}html body h3{font-size:1.5em;font-weight:500}html body h4{font-size:1.25em;font-weight:600}html body h5{font-size:1.1em;font-weight:600}html body h6{font-size:1em;font-weight:600}html body h1,html body h2,html body h3,html body h4,html body h5{font-weight:600}html body h5{font-size:1em}html body h6{color:#5c5c5c}html body strong{color:#000}html body del{color:#5c5c5c}html body a:not([href]){color:inherit;text-decoration:none}html body a{color:#08c;text-decoration:none}html body a:hover{color:#00a3f5;text-decoration:none}html body img{max-width:100%}html body>p{margin-top:0;margin-bottom:16px;word-wrap:break-word}html body>ul,html body>ol{margin-bottom:16px}html body ul,html body ol{padding-left:2em}html body ul.no-list,html body ol.no-list{padding:0;list-style-type:none}html body ul ul,html body ul ol,html body ol ol,html body ol ul{margin-top:0;margin-bottom:0}html body li{margin-bottom:0}html body li.task-list-item{list-style:none}html body li>p{margin-top:0;margin-bottom:0}html body .task-list-item-checkbox{margin:0 .2em .25em -1.8em;vertical-align:middle}html body .task-list-item-checkbox:hover{cursor:pointer}html body blockquote{margin:16px 0;font-size:inherit;padding:0 15px;color:#5c5c5c;background-color:#f0f0f0;border-left:4px solid #d6d6d6}html body blockquote>:first-child{margin-top:0}html body blockquote>:last-child{margin-bottom:0}html body hr{height:4px;margin:32px 0;background-color:#d6d6d6;border:0 none}html body table{margin:10px 0 15px 0;border-collapse:collapse;border-spacing:0;display:block;width:100%;overflow:auto;word-break:normal;word-break:keep-all}html body table th{font-weight:bold;color:#000}html body table td,html body table th{border:1px solid #d6d6d6;padding:6px 13px}html body dl{padding:0}html body dl dt{padding:0;margin-top:16px;font-size:1em;font-style:italic;font-weight:bold}html body dl dd{padding:0 16px;margin-bottom:16px}html body code{font-family:Menlo,Monaco,Consolas,'Courier New',monospace;font-size:.85em !important;color:#000;background-color:#f0f0f0;border-radius:3px;padding:.2em 0}html body code::before,html body code::after{letter-spacing:-0.2em;content:"\00a0"}html body pre>code{padding:0;margin:0;font-size:.85em !important;word-break:normal;white-space:pre;background:transparent;border:0}html body .highlight{margin-bottom:16px}html body .highlight pre,html body pre{padding:1em;overflow:auto;font-size:.85em !important;line-height:1.45;border:#d6d6d6;border-radius:3px}html body .highlight pre{margin-bottom:0;word-break:normal}html body pre code,html body pre tt{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;word-wrap:normal;background-color:transparent;border:0}html body pre code:before,html body pre tt:before,html body pre code:after,html body pre tt:after{content:normal}html body p,html body blockquote,html body ul,html body ol,html body dl,html body pre{margin-top:0;margin-bottom:16px}html body kbd{color:#000;border:1px solid #d6d6d6;border-bottom:2px solid #c7c7c7;padding:2px 4px;background-color:#f0f0f0;border-radius:3px}@media print{html body{background-color:#fff}html body h1,html body h2,html body h3,html body h4,html body h5,html body h6{color:#000;page-break-after:avoid}html body blockquote{color:#5c5c5c}html body pre{page-break-inside:avoid}html body table{display:table}html body img{display:block;max-width:100%;max-height:100%}html body pre,html body code{word-wrap:break-word;white-space:pre}}.markdown-preview{width:100%;height:100%;box-sizing:border-box}.markdown-preview .pagebreak,.markdown-preview .newpage{page-break-before:always}.markdown-preview pre.line-numbers{position:relative;padding-left:3.8em;counter-reset:linenumber}.markdown-preview pre.line-numbers>code{position:relative}.markdown-preview pre.line-numbers .line-numbers-rows{position:absolute;pointer-events:none;top:1em;font-size:100%;left:0;width:3em;letter-spacing:-1px;border-right:1px solid #999;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;user-select:none}.markdown-preview pre.line-numbers .line-numbers-rows>span{pointer-events:none;display:block;counter-increment:linenumber}.markdown-preview pre.line-numbers .line-numbers-rows>span:before{content:counter(linenumber);color:#999;display:block;padding-right:.8em;text-align:right}.markdown-preview .mathjax-exps .MathJax_Display{text-align:center !important}.markdown-preview:not([for="preview"]) .code-chunk .btn-group{display:none}.markdown-preview:not([for="preview"]) .code-chunk .status{display:none}.markdown-preview:not([for="preview"]) .code-chunk .output-div{margin-bottom:16px}.scrollbar-style::-webkit-scrollbar{width:8px}.scrollbar-style::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}.scrollbar-style::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode]){position:relative;width:100%;height:100%;top:0;left:0;margin:0;padding:0;overflow:auto}html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{position:relative;top:0}@media screen and (min-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em calc(50% - 457px + 2em)}}@media screen and (max-width:914px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode]) .markdown-preview{font-size:14px !important;padding:1em}}@media print{html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{display:none}}html body[for="html-export"]:not([data-presentation-mode]) #sidebar-toc-btn{position:fixed;bottom:8px;left:8px;font-size:28px;cursor:pointer;color:inherit;z-index:99;width:32px;text-align:center;opacity:.4}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] #sidebar-toc-btn{opacity:1}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc{position:fixed;top:0;left:0;width:300px;height:100%;padding:32px 0 48px 0;font-size:14px;box-shadow:0 0 4px rgba(150,150,150,0.33);box-sizing:border-box;overflow:auto;background-color:inherit}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar{width:8px}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-track{border-radius:10px;background-color:transparent}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc::-webkit-scrollbar-thumb{border-radius:5px;background-color:rgba(150,150,150,0.66);border:4px solid rgba(150,150,150,0.66);background-clip:content-box}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc a{text-decoration:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{padding:0 1.6em;margin-top:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc li{margin-bottom:.8em}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .md-sidebar-toc ul{list-style-type:none}html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{left:300px;width:calc(100% - 300px);padding:2em calc(50% - 457px - 150px);margin:0;box-sizing:border-box}@media screen and (max-width:1274px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{padding:2em}}@media screen and (max-width:450px){html body[for="html-export"]:not([data-presentation-mode])[html-show-sidebar-toc] .markdown-preview{width:100%}}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .markdown-preview{left:50%;transform:translateX(-50%)}html body[for="html-export"]:not([data-presentation-mode]):not([html-show-sidebar-toc]) .md-sidebar-toc{display:none}
/* Please visit the URL below for more information: */
/* https://shd101wyy.github.io/markdown-preview-enhanced/#/customize-css */
</style>
</head>
<body for="html-export">
<div class="mume markdown-preview ">
<h1 class="mume-header" id="%EC%84%A4%EC%B9%98%ED%95%9C-%ED%8C%A8%ED%82%A4%EC%A7%80">설치한 패키지</h1>
<ul>
<li>python 3.8.10</li>
<li>django 3.2.3</li>
<li>requests</li>
<li>beautifulsoup4</li>
<li>psycopg2-binary</li>
<li>python-dotenv</li>
</ul>
<h1 class="mume-header" id="%ED%99%98%EA%B2%BD">환경</h1>
<p>python-dotenv로 환경변수를 분리해서 관리(secret_key, database 정보)</p>
<h1 class="mume-header" id="69-%EC%88%98%EC%A0%95%ED%95%9C-%EB%B6%80%EB%B6%84">(6/9) 수정한 부분</h1>
<p>conf > <a href="http://settings.py">settings.py</a></p>
<ul>
<li>import os 추가</li>
<li>secret key 부분 수정</li>
<li>installed_apps 부분을 django_apps와 project_apps로 구분함</li>
<li>time_zone을 Asia/Seoul로 변경</li>
<li>그 외 동일</li>
</ul>
<p>clipper > <a href="http://admin.py">admin.py</a></p>
<ul>
<li>models를 admin 사이트에 등록</li>
</ul>
<p>clipper > <a href="http://models.py">models.py</a></p>
<ul>
<li>Site 모델 추가 (site 테이블의 데이터는 admin 페이지에서 밀어넣었습니다)</li>
<li>Course 모델에 site foreign 필드 추가</li>
<li>Chapter, Section 모델에서 content -> name으로 필드명 변경</li>
<li>Section 모델 name 필드를 Textfield -> CharField로 변경</li>
</ul>
<p><a href="http://manage.py">manage.py</a></p>
<ul>
<li>python-dotenv로 분리한 .env 파일을 불러오기 위한 로직 추가</li>
</ul>
<p>.env 파일 추가 (secret key와 db 정보가 들어있습니다)</p>
<p><img src="https://user-images.githubusercontent.com/80886445/121350172-4c111580-c965-11eb-814e-00062ef2e1d4.png" alt="스크린샷 2021-06-09 오후 8 57 22"></p>
<p><a href="http://start.py">start.py</a> 파일 수정</p>
<ul>
<li>'_' 레포에 스크래핑 샘플 폴더에 제가 구현해 놓았던 <a href="http://main.py">main.py</a> 파일을 이름을 바꾸고 안에 로직을 추가했습니다)</li>
</ul>
<p>inflearn_save.py 파일 추가</p>
<ul>
<li><a href="http://inflearn.py">inflearn.py</a> 파일에서 스크랩 후 최종으로 리턴된 리스트를 넘겨받아 DB에 저장하는 파일입니다)</li>
<li><a href="http://inflean.py">inflean.py</a> 파일에 db에 저장하는 로직까지 넣기에는 몸집이 커질것 같아 파일을 분리했습니다</li>
</ul>
<p><a href="http://inflearn.py">inflearn.py</a> 파일은 수정하지 않았습니당</p>
<h1 class="mume-header" id="%EC%8B%A4%ED%96%89-%EB%B0%A9%EB%B2%95">실행 방법</h1>
<p><a href="http://start.py">start.py</a> 파일을 실행하거나, 터미널에서 python <a href="http://start.py">start.py</a> 를 입력하여 실행합니다</p>
<hr>
<h1 class="mume-header" id="%EB%B3%80%EA%B2%BD-%EC%82%AC%ED%95%AD%EB%8A%90%EB%A6%BF%EB%8A%90%EB%A6%BF-2021-06-13">변경 사항(느릿느릿 2021-06-13)</h1>
<ul>
<li><a href="http://start.py">start.py</a> -> start_clipper.py<br>
cli 명령 예시:</li>
</ul>
<pre data-role="codeBlock" data-info="linux" class="language-linux"><code>>> python start_clipper.py -n 인프런
</code></pre><ul>
<li>inflearn_save.py -> clipper/inflearn_save.py</li>
</ul>
<blockquote>
<p>Course 중복 저장 방지<br>
Constructor 이용한 DB 저장코드 통일(<a href="https://docs.djangoproject.com/en/3.2/topics/db/examples/many_to_one/">https://docs.djangoproject.com/en/3.2/topics/db/examples/many_to_one/</a>)</p>
</blockquote>
<ul>
<li><a href="http://inflearn.py">inflearn.py</a> -> clipper/inflearn.py</li>
</ul>
<blockquote>
<p>requests.compat.urljoin 추가함.(url manipulator)<br>
URL 분리<br>
tested at 2021-06-13 and found some errors like</p>
</blockquote>
<pre data-role="codeBlock" data-info class="language-"><code>psycopg2.errors.StringDataRightTruncation: value too long for type character varying(200)
DataError: value too long for type character varying(200)
</code></pre><hr>
<h1 class="mume-header" id="%EB%B3%80%EA%B2%BD-%EC%82%AC%ED%95%AD%EB%8A%90%EB%A6%BF%EB%8A%90%EB%A6%BF-2021-06-14">변경 사항(느릿느릿 2021-06-14)</h1>
<ul>
<li>inflearn_save.py -> course_save.py 로 공용 세이버로 만듦</li>
<li>nomadcoders courses 카테고리 저장 확인 <code>python start_clipper.py -n nomad</code></li>
</ul>
<hr>
<h1 class="mume-header" id="%EB%B3%80%EA%B2%BD-%EC%82%AC%ED%95%AD%EB%8A%90%EB%A6%BF%EB%8A%90%EB%A6%BF-2021-06-16">변경 사항(느릿느릿 2021-06-16)</h1>
<ul>
<li>유데미의 thumbnail link는 길이가 디폴트값 200을 넘는게 많다. <strong>urlfield의 max_length=500 으로 변경</strong>하고 makemigrations, migrate.</li>
<li>유데미는 영어강의도 수집했다. (페이지10 이하로만)</li>
<li>유데미 개발 카테고리 이외에 IT, 디자인, 음악 등 컴퓨터 프로그램 관련 강의를 중심으로 추가해 봤다.</li>
<li>get_soup_from_page 메서드에 화면 스크롤 액션을 추가함.(유데미 강의 목록의 썸네일 이미지 링크를 온전히 페이지에 렌더링시키기 위함.)</li>
<li>한 페이지당 크롤링 소요 시간은 WAIT * 2 이상이며 한 강의당 1.0625 페이지이므로 강의 1000개를 한 번에 스크래이핑 하는데 소요되는 시간은 최소 WAIT*2125 = 10625초 = 177분 = 3시간 이다.</li>
<li>병렬 스크래이핑 구현을 고민해봐야 할듯.</li>
</ul>
<hr>
<h1 class="mume-header" id="references">References</h1>
<ul>
<li>Xpath cheatsheet : <a href="https://devhints.io/xpath#indexing">https://devhints.io/xpath#indexing</a></li>
<li>장고 모델 URLField : <a href="https://docs.djangoproject.com/en/3.2/ref/models/fields/">https://docs.djangoproject.com/en/3.2/ref/models/fields/</a></li>
</ul>
</div>
</body></html>