Add files via upload

This commit is contained in:
Duckling 2023-01-23 22:30:56 +06:00 committed by GitHub
parent 300b5e9f0d
commit e613c14408
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
7 changed files with 41 additions and 16 deletions

View File

@ -1,6 +1,7 @@
---
label: Devtools detector
order: 3
order: 997
icon: /static/tool.png
---
**TL;DR**: You are going to get fucked by sites detecting your devtools, the easiest bypass for this is using [a web sniffer extension](https://chrome.google.com/webstore/detail/web-sniffer/ndfgffclcpdbgghfgkmooklaendohaef?hl=en)
@ -98,6 +99,4 @@ At line 23
// Bypasses
pref("devtools.console.bypass", true);
pref("devtools.debugger.bypass", true);
```
### Next up: [Why your requests fail](disguising_your_scraper)
```

View File

@ -1,6 +1,7 @@
---
label: Disguishing your scrapers
order: 4
order: 996
icon: /static/incognito.png
---
# Disguishing your scrapers
@ -189,6 +190,4 @@ print(bypassed_response.hcaptcha_token)
Keep in mind that if there is no ribbon/token, there is no way of reasonably accessing it.
In any case, this is how you, as a decent developer, handle the response properly.
### Next up: [Finding video links](finding_video_links)
In any case, this is how you, as a decent developer, handle the response properly.

View File

@ -1,6 +1,7 @@
---
label: Finding video links
order: 5
order: 995
icon: codescan
---
# Finding video links

View File

@ -0,0 +1,25 @@
---
label: Getting Started
order: 1000
icon: milestone
---
# Requests based scraping tutorial
You want to start scraping? Well this guide will teach you, and not some baby selenium scraping. This guide only uses raw requests and has examples in both python and kotlin. Only basic programming knowlege in one of those languages is required to follow along in the guide.
If you find any aspect of this guide confusing please open an issue about it and I will try to improve things.
If you do not know programming at all then this guide will __not__ help you, learn programming! first Real scraping cannot be done by copy pasting with a vauge understanding.
[!badge variant="light" text="Step 0"] [Starting scraping from zero](starting)
[!badge variant="light" text="Step 1"] [Properly scraping JSON apis often found on sites](using_apis)
[!badge variant="light" text="Step 2"] [Evading developer tools detection when scraping](devtools_detectors)
[!badge variant="light" text="Step 3"] [Why your requests fail and how to fix them](disguising_your_scraper)
[!badge variant="light" text="Step 4"] [Finding links and scraping videos](finding_video_links)
Once you've read and understood the concepts behind scraping take a look at [a provider for CloudStream](https://github.com/recloudstream/cloudstream-extensions/blob/master/VidstreamBundle/src/main/kotlin/com/lagradost/VidEmbedProvider.kt#L4). I added tons of comments to make every aspect of writing CloudStream providers clear. Even if you're not planning on contributing to Cloudstream looking at the code may help

3
devs/scraping/index.yml Normal file
View File

@ -0,0 +1,3 @@
icon: /static/scraper.png
label: Scraping
expanded: false

View File

@ -1,8 +1,8 @@
---
label: Starting
order: 1
order: 999
icon: rocket
---
Scraping is just downloading a webpage and getting the wanted information from it.
As a start you can scrape the README.md
@ -219,5 +219,4 @@ fun main() {
val description = descriptionRegex.find(response.text)?.groups?.get(1)?.value
println(description)
}
```
### Next up: [Properly scraping JSON apis](../using_apis.md)
```

View File

@ -1,6 +1,7 @@
---
label: Using APIs
order: 2
order: 998
icon: /static/api.png
---
### About
@ -168,5 +169,3 @@ One thing to note is that you don't need to add all of the json key/value pairs
### Note
Even though we set `DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES` as `false` it will still error on missing properties. <br/>
If a json may or may not include some info, make those properties as nullable in the structure you build.
### Next up: [Evading developer tools detection](devtools_detectors)