mirror of
https://github.com/recloudstream/csdocs.git
synced 2024-08-14 22:46:50 +00:00
Add files via upload
This commit is contained in:
parent
300b5e9f0d
commit
e613c14408
7 changed files with 41 additions and 16 deletions
|
@ -1,6 +1,7 @@
|
||||||
---
|
---
|
||||||
label: Devtools detector
|
label: Devtools detector
|
||||||
order: 3
|
order: 997
|
||||||
|
icon: /static/tool.png
|
||||||
---
|
---
|
||||||
|
|
||||||
**TL;DR**: You are going to get fucked by sites detecting your devtools, the easiest bypass for this is using [a web sniffer extension](https://chrome.google.com/webstore/detail/web-sniffer/ndfgffclcpdbgghfgkmooklaendohaef?hl=en)
|
**TL;DR**: You are going to get fucked by sites detecting your devtools, the easiest bypass for this is using [a web sniffer extension](https://chrome.google.com/webstore/detail/web-sniffer/ndfgffclcpdbgghfgkmooklaendohaef?hl=en)
|
||||||
|
@ -98,6 +99,4 @@ At line 23
|
||||||
// Bypasses
|
// Bypasses
|
||||||
pref("devtools.console.bypass", true);
|
pref("devtools.console.bypass", true);
|
||||||
pref("devtools.debugger.bypass", true);
|
pref("devtools.debugger.bypass", true);
|
||||||
```
|
```
|
||||||
|
|
||||||
### Next up: [Why your requests fail](disguising_your_scraper)
|
|
|
@ -1,6 +1,7 @@
|
||||||
---
|
---
|
||||||
label: Disguishing your scrapers
|
label: Disguishing your scrapers
|
||||||
order: 4
|
order: 996
|
||||||
|
icon: /static/incognito.png
|
||||||
---
|
---
|
||||||
|
|
||||||
# Disguishing your scrapers
|
# Disguishing your scrapers
|
||||||
|
@ -189,6 +190,4 @@ print(bypassed_response.hcaptcha_token)
|
||||||
|
|
||||||
Keep in mind that if there is no ribbon/token, there is no way of reasonably accessing it.
|
Keep in mind that if there is no ribbon/token, there is no way of reasonably accessing it.
|
||||||
|
|
||||||
In any case, this is how you, as a decent developer, handle the response properly.
|
In any case, this is how you, as a decent developer, handle the response properly.
|
||||||
|
|
||||||
### Next up: [Finding video links](finding_video_links)
|
|
|
@ -1,6 +1,7 @@
|
||||||
---
|
---
|
||||||
label: Finding video links
|
label: Finding video links
|
||||||
order: 5
|
order: 995
|
||||||
|
icon: codescan
|
||||||
---
|
---
|
||||||
|
|
||||||
# Finding video links
|
# Finding video links
|
||||||
|
|
25
devs/scraping/gettingstarted.md
Normal file
25
devs/scraping/gettingstarted.md
Normal file
|
@ -0,0 +1,25 @@
|
||||||
|
---
|
||||||
|
label: Getting Started
|
||||||
|
order: 1000
|
||||||
|
icon: milestone
|
||||||
|
---
|
||||||
|
|
||||||
|
# Requests based scraping tutorial
|
||||||
|
|
||||||
|
You want to start scraping? Well this guide will teach you, and not some baby selenium scraping. This guide only uses raw requests and has examples in both python and kotlin. Only basic programming knowlege in one of those languages is required to follow along in the guide.
|
||||||
|
|
||||||
|
If you find any aspect of this guide confusing please open an issue about it and I will try to improve things.
|
||||||
|
|
||||||
|
If you do not know programming at all then this guide will __not__ help you, learn programming! first Real scraping cannot be done by copy pasting with a vauge understanding.
|
||||||
|
|
||||||
|
[!badge variant="light" text="Step 0"] [Starting scraping from zero](starting)
|
||||||
|
|
||||||
|
[!badge variant="light" text="Step 1"] [Properly scraping JSON apis often found on sites](using_apis)
|
||||||
|
|
||||||
|
[!badge variant="light" text="Step 2"] [Evading developer tools detection when scraping](devtools_detectors)
|
||||||
|
|
||||||
|
[!badge variant="light" text="Step 3"] [Why your requests fail and how to fix them](disguising_your_scraper)
|
||||||
|
|
||||||
|
[!badge variant="light" text="Step 4"] [Finding links and scraping videos](finding_video_links)
|
||||||
|
|
||||||
|
Once you've read and understood the concepts behind scraping take a look at [a provider for CloudStream](https://github.com/recloudstream/cloudstream-extensions/blob/master/VidstreamBundle/src/main/kotlin/com/lagradost/VidEmbedProvider.kt#L4). I added tons of comments to make every aspect of writing CloudStream providers clear. Even if you're not planning on contributing to Cloudstream looking at the code may help
|
3
devs/scraping/index.yml
Normal file
3
devs/scraping/index.yml
Normal file
|
@ -0,0 +1,3 @@
|
||||||
|
icon: /static/scraper.png
|
||||||
|
label: Scraping
|
||||||
|
expanded: false
|
|
@ -1,8 +1,8 @@
|
||||||
---
|
---
|
||||||
label: Starting
|
label: Starting
|
||||||
order: 1
|
order: 999
|
||||||
|
icon: rocket
|
||||||
---
|
---
|
||||||
|
|
||||||
Scraping is just downloading a webpage and getting the wanted information from it.
|
Scraping is just downloading a webpage and getting the wanted information from it.
|
||||||
As a start you can scrape the README.md
|
As a start you can scrape the README.md
|
||||||
|
|
||||||
|
@ -219,5 +219,4 @@ fun main() {
|
||||||
val description = descriptionRegex.find(response.text)?.groups?.get(1)?.value
|
val description = descriptionRegex.find(response.text)?.groups?.get(1)?.value
|
||||||
println(description)
|
println(description)
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
### Next up: [Properly scraping JSON apis](../using_apis.md)
|
|
|
@ -1,6 +1,7 @@
|
||||||
---
|
---
|
||||||
label: Using APIs
|
label: Using APIs
|
||||||
order: 2
|
order: 998
|
||||||
|
icon: /static/api.png
|
||||||
---
|
---
|
||||||
|
|
||||||
### About
|
### About
|
||||||
|
@ -168,5 +169,3 @@ One thing to note is that you don't need to add all of the json key/value pairs
|
||||||
### Note
|
### Note
|
||||||
Even though we set `DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES` as `false` it will still error on missing properties. <br/>
|
Even though we set `DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES` as `false` it will still error on missing properties. <br/>
|
||||||
If a json may or may not include some info, make those properties as nullable in the structure you build.
|
If a json may or may not include some info, make those properties as nullable in the structure you build.
|
||||||
|
|
||||||
### Next up: [Evading developer tools detection](devtools_detectors)
|
|
Loading…
Reference in a new issue