Running splash on ec2

Submitted 3 years, 8 months ago
Ticket #104
Views 358
Language/Framework Python
Priority Urgent
Status Closed

I have deployed a scrapy project on an ec2. I need a splash for scrapping some websites. Splash is working fine on my local machine but in ec2 it gets 504 times out error, I have changed the max_timeout to 3600. But it's the same.

The ec2 has 2 GB ram. 


Any help or suggestion is appreciated

Submitted on Aug 09, 20
add a comment

2 Answers

@ Shovon . Did you check the EC2 outbound polciy and make sure its doesnt have any restriction on it. May be you can try ping the website from your EC2 instance and make sure you are getting response or not. If doesnt then you need to allow the host in AWS firewall.

Submitted 3 years, 8 months ago

yes

I also have some websites in the scrap project that has no need for splash they are working fine

- shovon 3 years, 8 months ago

Can you share your splash settings on here or github?

- Vengat 3 years, 8 months ago

https://github.com/scrapy-plugins/scrapy-splash/issues/28 check this , try some concurrency or set time out in request param

- Vengat 3 years, 8 months ago

i set the time out to 90 same problem

and I need the images I am only running the splash is for the images

- shovon 3 years, 8 months ago

So what I understand is after splash enabled even small sites are not working right? Can you share the code details in order to investigate further.

- Vengat 3 years, 8 months ago

Hello Every one the problem was solved using this package 

https://github.com/TeamHG-Memex/aquarium

Submitted 3 years, 8 months ago

Excellent. Can you share more detail how this package resolved issue so that it would helpful for everyone.

- Vengat 3 years, 8 months ago

Can you close the ticket?

- Vengat 3 years, 8 months ago


Latest Blogs