-
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails to remove trailing amp URL markup #23
Comments
Okay, so this is a bug in the extension. It should URL decode the URL to get “?amp” instead of “%3famp”. After fixing that, it should get the right pages and redirect to them to the right location.
The extension redirects these URLs to the canonical HTML location when the page loads. What problem are you seeing with these ones? It’s not the same issue as with the others. |
The latter ones I didn't have a problem with---I just noticed that they had amp-related suffixes. When I load the target of the redirect (which succeeds), the server rewrites the URL again, removing the |
Hm. So the AMP Cache URLs shouldn’t have been urlencoded, and decoding them when they’re not encoded can lead to unexpected results. They’re not urlencoded on the publishers AMP page nor when served by Google or Bing. Where did these URLs come from? Did someone send them to you? Using what app? or did you get them from a search result page? If I can find out where they came from, I can figure out where things went wrong and what the underlying problem is. |
AMP redirection happens in two steps: the first step is to redirect off from an AMP Viewer or AMP Cache (e.g. Google, Bing, Baidu). This is done my rewriting URLs that match a pseudo-standard pattern. This redirects you to the publishers AMP page. As you’ve noticed, every publisher applies their own patterns to determine the URL for their AMP versions of pages. The extension can’t blindly chop off suffixes like “/amp/” because it can’t know whether that refers to the AMP tech or e.g. the page is about an electronic amplifier The second steps is parsing the AMP page to find the non-AMP version of that page. Lastly, you’re redirected to the non-AMP version. It’s a convoluted journey, but it gets there in the end. Complications like this is why many web developers feel strongly against AMP in the first place. There should be only one web page and one URL for any given document. |
These URLs came to me via slack. I do not know how the people on the other end got them to be the way they are android vs ios, copy/paste vs an OS share button, etc. But I can inquire if it'd help. I'm with you on anti-AMP sentiments, which is why I was delighted to find this extension in the first place! |
amp2html/scripts/redirector.js Lines 21 to 36 in 088d9f7
|
I’ve attempted a fix. It’s not great, see code comments above. I’ll release it in a few days after some more testing. You can help with testing. Disable the current extension. Download and unzip the test version, and load it temporarily in Firefox or Chrome/Edge. |
Here are two examples which redirect to the wrong page:
https://www.google.com/amp/s/www.etonline.com/the-bachelorette-season-16-episode-10-Tayshia-Adams-men-tell-all-2020-12-14-live-updates%3famp
https://www.google.com/amp/s/www.washingtonpost.com/nation/2020/12/09/idaho-coronavirus-protest-homes/%3foutputType=amp
The issue is the trailing parts of the URL. After redirect, the first example's URL has an
%3famp
at the end which, when removed, gives the correct page,https://www.etonline.com/the-bachelorette-season-16-episode-10-Tayshia-Adams-men-tell-all-2020-12-14-live-updates
The second example's URL has an extra
?outputType=amp/
which, when removed, gives the correct page,https://www.washingtonpost.com/nation/2020/12/09/idaho-coronavirus-protest-homes/?outputType=amp/
Some other similar examples such as
https://www.google.com/amp/s/nypost.com/2020/12/07/queens-boy-steals-family-car-drives-away-with-sister-to-nj/amp/
https://www.google.com/amp/s/m.jpost.com/omg/former-israeli-space-security-chief-says-aliens-exist-humanity-not-ready-651405/amp
enjoy server-side redirects to the content of 'interest' (again: no real interest, just an example).
This may be a server-side problem---but if half of servers have a wrong implementation, it's something this extension could help with.
The text was updated successfully, but these errors were encountered: