You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we did talk about the idea of handing over to something like phantomjs for indexing this sort of thing, but it seemed like it would become a bit of a never-ending job to maintain it, because it would be difficult to generalise between sites.
maybe we should maintain a reference list of uncrawlable sites in the repo?
I think they may be easily retrieved from the resulting index, by getting all pages with just 1-2 resp 200 pages for domain. anyway, I keep it here just because I have met it and for a reference for future testing
https://guidelines.canceraustralia.gov.au/
- relies on javascript, links are not in href="" element, so crawled doesn't see themThe text was updated successfully, but these errors were encountered: