-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does not respect robots.txt or rel="nofollow" directives #29
Comments
Hi, thanks for raising this. We are running some some new code and this is obviously a bug! |
Hi Chris Please call Sharyn Clarkson, Assistant Secretary, Online Services Branch, Department of Finance. Her number is xxxxxxxxx. She is expecting your call. Nathan Wall |
I have spoken to Sharyn, and redacted her number from your post @nathan-w. |
Thanks Chris! |
Well that was exciting. I'll circulate a post-incident report (through Sharyn) on Monday afternoon or Tuesday, after gathering facts then consulting our DTA masters. As well as fixing the robots.txt bug, we might need to create some new throttle features. The current throttle says "don't hit the same domain name more than once per {DNS_THROTTLE} seconds". We assumed that would suffice to stop us DOSsing any servers, but we didn't think hard enough about very large multi-tenancy virtual hosts. I think we might need two more throttle rules:
Is there a publicly available list of domains hosted by GovCMS? |
@nathan-w, is it the case that all GovCMS sites have something like this in their HTML Head?
If so, I think we could make a "govCMS detector" and self-maintain a list of govCMS sites. |
Chris - please email your contact details to [email protected] - I'd like to take this conversation private while options are explored. |
note: apparently all govCMS sites share a google analytics key, so perhaps that could also be used for "govCMS detection" |
I guess this issue may be closed now, while deployed and has been working for some time already. |
This behaviour is causing performance issues on sites that use dynamic URLs to serve up filtered content.
The text was updated successfully, but these errors were encountered: