-
Hi there, I visit a site that uses JavaScript to send a request to the server's back end using authentication that can't be bypassed. I need to read the response body which is JSON. I imagined this to be fairly straightforward but have honestly not been able to solve it. My idea was to access the response body using the network method get_response_body but Nodriver just hangs without raising an exception no matter what I try. I must be fundamentally misunderstanding Nodrivers interaction with CDP. Any pointers would be much appreciated. Here is what I have so far: async def scrape_data(self, example: str) -> dict[str, Any]:
await self._main_tab.send(nodriver.cdp.network.enable())
self._main_tab.add_handler(nodriver.cdp.network.ResponseReceived, self.receive_handler)
await self._main_tab.get(f"{self._manager.config.url_base}{example}&locale=en_US")
await asyncio.sleep(3600)
async def receive_handler(self, event: nodriver.cdp.network.ResponseReceived):
if "example.cdn.com" in event.response.url and event.type_ == nodriver.cdp.network.ResourceType.FETCH:
body, base = self._main_tab.send(nodriver.cdp.network.get_response_body(request_id=event.request_id)) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
I want to preface this by saying idk wtf im doing. also I want to clarify what you are doing, you want to modify the browser object source code to start collecting network responses (like in chrome dev tools panel, network tab)
then im lost in your condition in your if statement in receive_handler. i dont know what types ResponseReceived can be, but it seems like under that condition you want to return the body of the http request i also dont get what also I assume you probably want a network.disable() somewhere right? (i get that it may be because its unfinished, just asking) |
Beta Was this translation helpful? Give feedback.
-
I managed to solve it. I had to create an asyncio task for it to make it work, here is my current quick and dirty solution: async def scrape_data(self, certificate_id: str) -> dict[str, Any]:
# Network event handlers
await self._main_tab.send(nodriver.cdp.network.enable())
self._main_tab.add_handler(nodriver.cdp.network.ResponseReceived, self._receive_handler)
await self._main_tab.get(f"{self._manager.config.url_base}{example}&locale=en_US")
# Check that correct response is received by polling until _receive_task is set or timeout occurs
timeout = 10
start_time = asyncio.get_running_loop().time()
while self._receive_task is None:
await asyncio.sleep(0.2)
if asyncio.get_running_loop().time() - start_time > timeout:
raise ScrapingError(f"Timeout waiting for response to create _receive_task.")
# Wait for the response body to be fetched
try:
await asyncio.wait_for(self._receive_task, timeout=timeout)
except asyncio.TimeoutError:
raise ScrapingError("Timeout waiting for response body.")
# Reset before next scrape and return data
data = self._json_data
self._reset_scrape()
return data
async def _receive_handler(self, event: nodriver.cdp.network.ResponseReceived):
if "example.cdn.com" in event.response.url and event.type_ == nodriver.cdp.network.ResourceType.FETCH:
# Start the response body fetching in a separate task
self._receive_task = asyncio.create_task(self._handle_response(event.request_id))
async def _handle_response(self, request_id: nodriver.cdp.network.RequestId):
# Create the command, initiate the generator and send the command
response_body_command = nodriver.cdp.network.get_response_body(request_id=request_id)
response = await self._main_tab.send(response_body_command)
body, is_base64_encoded = response
# Check encoding and decode if necessary
if is_base64_encoded:
body = base64.b64decode(body)
# Load the JSON and check if it is stored in a list
data = json.loads(body)
if isinstance(data, list):
data = data[0]
self._json_data = data The implementation can obviously be streamlined. But I take it that sending CDP commands using Nodriver's send method requires the creation of async tasks to interact with the CDP generator methods. Hope it may be of use to someone. |
Beta Was this translation helpful? Give feedback.
I managed to solve it. I had to create an asyncio task for it to make it work, here is my current quick and dirty solution: