Skip to content

This is not meant to be a production grade script but a weekend fun to make a free alternative to X API.

License

Notifications You must be signed in to change notification settings

oceancholic/eXtractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

                '||' '|'   .                             .                   
          ....    || |   .||.  ... ..   ....     ....  .||.    ...   ... ..  
        .|...||    ||     ||    ||' '' '' .||  .|   ''  ||   .|  '|.  ||' '' 
        ||        | ||    ||    ||     .|' ||  ||       ||   ||   ||  ||     
         '|...' .|   ||.  '|.' .||.    '|..'|'  '|...'  '|.'  '|..|' .||.    

!!! Attention !!!

Using This Script doesn't comply with the X Terms of Service and doing so may result in the PERMANENT SUSPENSION of your account. Using your personal account with this script is NOT RECOMMENDED!!! Create a Disposable Account.
YOU HAVE BEEN WARNED!!!

This is not meant to be a production grade script but a weekend fun to make a free alternative to X API.
X web handles/loads tweets and their replies in a weird way (renders certain amount of tweets based on their tweet height img/gif etc.) to avoid AI scrapping their data from site and advertisements also adds some complexity.


I tried to find an optimum scroll size and wait time to minimize errors and extract most tweets in shortest possible time yet it is lot slower (abt 220sec per ~200 posts) than pulling data from legit X API. There could be still errors and way more room for improvement so contributions/collaborations and ideas are very very welcome!

License file is for the awesome Selenium Library


install & requirements

eXtractor script expects linux OS, Chromium browser and suitable ChromeDriver to be installed at
"/usr/local/bin/chromedriver".
Please Dig Selenium Documentation for instructions.

Python Selenium wrapper can be installed

    pip3 install selenium
  

Usage :

eXtractor has 4 modes of operation:

  • Search For Hashtags and Keywords
  • Get Profile data from supplied profile link
  • Get Replies to a Tweet from supplied tweet link
  • Get News from Explorer Tab

Also There is 3 Methods of Login

  • using a credential file (format below)
  • manually enter
  • cookies

Using the Credential File

Credential File consist of 3 lines

  1. First line is username without '@'
  2. second line is email address
  3. third line is password in plain text

Example Credential File

    shitposter_2024
    email@address.com
    v3ry53Cre7P@s5W0rD!
  

supply your credential file path name with -c or --credential flag

    python3 eXtractor.py -c myFile.txt (along with the modes --profile /--replies /--search flags see below.)
  

Using Credential File is usefull when you manage multiple accounts for scrapping data. Please be aware that keeping credentials in plain text file is not a good choice. Another good reason to use a "disposable account" with this script.


Manually Enter

if you do not have a saved cookie and didn't provide a credential file, script will kindly ask for your credentials interactively. (P.S. typed password won't show in cmd line.)


Cookies

Once you successfully Login by one of the above methods it will save your cookies into "xitter" file in the same directory(There is no special reason to call the cookie file "xitter" so feel free to change for your personal enjoyment.). The next time you use the script you don't need to supply credential file it will read the cookies and authorize.
However you usually don't want your auth cookies lying around unencrypted! it's a good idea to delete them when you finished. There is a catch thou. X will constantly notify you "There was a Login from a new device...." if you don't use cookies and this is the main reason why I had the urge to add the cookie feature.

Searching


Searching is made with the -s or --search flag followed by a search term. if you are going to search a hashtag write your term in quotes like "#TwitterRocksElonSucks", keywords does not require quotes.

Additionally you can cap or increase the number of results being extracted by adding -n flag followed by a number. Default is ~200.

Another option is -t --top flag. if you provide this flag it will search for "Top Tweets" otherwise you'll get latest ones (Beware of porn bots.).

Results will be saved in json format in the same directory.
You can get bored while eXtractor is busy doing it's thing just hit CTRL+C to exit and it will save the tweets already downloaded before shutting down.

  python3 eXtractor.py -s "#viraltweets" -n 1000 -t
  ^
  --this will search top 1000 tweets with #viraltweets hashtag. 

Profile Data

if you need information from a certain profile like description/location/joinDate use -p --profile flag followed by a profile link. number and top flags are irrelevant in this mode.

  python3 eXtractor.py -p https://x.com/debian

Replies

When you need to get responses to a certain tweet or unroll a thread you can use -r --replies flag followed by a tweet link. Replies is not a recursive function so it will only get first level of replies. You can provide -n flag to increase the amount of replies to get(Default ~200).

Whatever number you provide it will only return actual first level replies.

Replies will be saved in the same directory in json format same as the search function with "replies" prefix in the file name.

  python3 eXtractor.py -r https://x.com/AnonymousUK2022/status/1825436338683781120

News

In this mode it will get news posted on Explorer tab. İt will get only headers time and category of news.

  python3 eXtractor.py --news -n 50
  ^
  --this will get top 50 news from explorer tab.

Headless

You can also provide a --headless flag to run without opening a browser window.


Notes :


  • Why i wrote this script is in the banner and I belive gestures like these without expectations strengthen ties between us and we are stronger together in this mad world. At least try to be a good person. If it doesn't make sense to you just ignore.
  • Despite trying to be good person it was a nonsense and ridiculous decision from X to hide read API behind a pay wall. So maybe this helps to put pressure on X decision makers to revert their changes. People get creative when they are constrained. Read some History!!!
  • Finally as i said before this script is a work in progress I just make it barely to work so any help and idea is Welcomed. Stay Safe...

About

This is not meant to be a production grade script but a weekend fun to make a free alternative to X API.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages