Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special characters in page names #19

Closed
cdruee opened this issue Dec 23, 2014 · 6 comments
Closed

Special characters in page names #19

cdruee opened this issue Dec 23, 2014 · 6 comments

Comments

@cdruee
Copy link

cdruee commented Dec 23, 2014

Special caharcters in page names cause som trouble, althoug in text the seem to be fixed by #4

  • Umlaut (äöü): caused the program to exit wit an error. Imanaged to fix that by the following patch:
111c111
<             with gzip.open(atticpath, "wb") as f:

---
>             with gzip.open(atticpath.encode("utf-8"), "wb") as f:
  • Round brackets: Pages with round brackets in the name are missing, the respective links appear untranslated.
  • Slashes (/): Pages with forward slashes in the name are missing. If the slash is surrounded by blanks, the portion after the last slash (including a leading underscore) appears as link text. If the slash is not surrounded by blanks, the respective link appears untranslated.
  • Colons: They are not elininated or escaped, causing the page names to appear in odd namespaces (we use MAC addresses to store Hardware information).
@projectgus
Copy link
Owner

Thanks for reporting this CMDIUB. I've merged your crashing fix for umlauts or other characters in page names.

I'm not sure what is best to do about the other things. When you say the pages are "missing", you mean the brackets/slashes go missing or the pages don't convert at all?

Regarding colons, I'm really not sure what to do about this at all. Do you have any suggestions? I think you can change the namespace delimiter character in Dokuwiki, so in your case you might want to use '-' or something for namespacing instead of ':'. But I'm not sure how much work it will be to expose that as a yamdwe option.

Angus

@cdruee
Copy link
Author

cdruee commented Jan 12, 2015

HI,

I had to dig a little deeper into the matter to find out, what I liked to be different. I then started to try some changes in the code. You will find them in my fork of yamdwe in the branch called "uwm". I hope that work, because I am totally new to github and git itself (mostly used cvs so far).
The thing I needed to change form my wiki to transfer correctly are:

  • The level of subsections were transferred incorrectly (= -> ===== instead of = -> ======) which resulted in handling lowset-level headlines as plain text.
  • Mediawiki page titles containing a colon or a slash were assuemd to denote a namespace link, but both are allowed in Mediawii to occur as literal characters. Now every title containing such characters id checked ageinst the actual list of namespaces to decide wether it points into a namespace. Literal colos nd slashes are also converted into underscores.
  • File links in a localized version were not recognized, if the were prefixed with the localized canonical prefix (the localized aliases WERE recognized).
  • Links to Mediawiki file pages were not converted. Apparently file pages (showing metadata, versions and such stuff) don't exist in (at least vanilla) dokuwiki. Hence I decided to convert them into links to the actual file (revisions and metadata are acessible trought the media explorer).
  • Leading or trailing special character - for example a brac.ket - produced a leading or trailing underscore in the page title, which is invalid.
  • Alternative (piped) internal link names were not honored (at least after the above changes), leading to invalid links.

Now everything seems to work fine for my project.
Cheers,
Clemens

@cdruee
Copy link
Author

cdruee commented Jan 12, 2015

One more:

  • html-encoded characters in internal links were not recognized. Example "%20" gave "20" instead of "%20" denoting " " giving ""

@cdruee
Copy link
Author

cdruee commented Jan 12, 2015

I just created a pul request. Unfortunately this means that yo are shown some forth-and-back changes, which Include some debug messages, typos etc. That makes the actual changes harder to read. I would like to apologize for that.

@projectgus
Copy link
Owner

Thanks for all this @CDMIUB. Sounds great. I'm out of town this week but I'll hopefully get a chance to look at the PR soon. Glad you got your wiki imported OK (I assume you did?)

@projectgus
Copy link
Owner

(Closing as remaining issues are discussed in #21)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants