Skip to main content
Home  ›  Blog

An Essay on Auto-Converting Titles to Paths like Wordpress

When we create apps like blogs, articles or news we often need to generate a nice, SEO-style path for the details page, which should contain the title. This looks easy - I spent over a day on this simple challenge, and would like to share what I worked out…

Here's a super-short video explaining why this is important…

So basically the challenge is converting all kinds of titles to paths. Just replacing bad characters doesn't come close to delivering a useful solution. Let's look at some common issues:

  1. Umlauts - just killing them wouldn't be good as the word would get mangled - like "große Küchengeräte" which should result in a url like "grosse-kuechengeraete"
  2. Multiple "bad" characters, like "We +/- love this" would result in something like "We-----love-this"
  3. Leading / trailing spaces or special characters like "Learn Grunt (200)" should NOT result in "Learn-Grunt-200-"
  4. …especially in combination with path-characters (if you allow them) like "catalog/-best-mixer-ever"

I needed to get this worked out, because 2sxc 8.3.5 provides a new input-field called "string-url-path" which will auto-fill from one or more other fields. So the designer can specify it to fill from "[Title]" or more advanced cases like "[Category]/[Title]" and everything else must just happen.

If you're interested in the code, check out the JavaScript code on Github. But here's a short explanation to what I did:

  1. Before even starting, get the fields like Category, Title and remove slashes inside each. Reason is that the final result may have slashes (because category/name can have a slash), but if an inner piece also had a slash, this could cause trouble.
  2. Merge the result based on the mask (like [Category]/[Title]
  3. Lowercase everything
  4. Latinize everything - I created an Angular-Service which does this for me, converting around 1000+ "bad" characters like "áűőú" or "ǽ" to simpler characters. If you want to use it, you can find my AngularJS latinize-text-service here
  5. Neutralize apostrophe-s combinations like "Daniel's cat" to "Daniels cat" because I don't want it to end up as "daniel-s-cat" in the URL, but I also don't want to capture "she said 'super'" just because we have apos-s in a normal content
  6. Rotate all bad slashes \ to /
  7. Replace all unwanted characters including spaces with "-"
  8. Remove duplicate "-" and duplicate "/" in case they were created by previous conversions
  9. Replace all "-" and "/" side-by-side variations as they can easily be generated by previous conversions with simpler "/". This is to catch things like "(beta) Learn Gulp (200)" from resulting in a "blog/-beta-learn-gulp-200-" url
  10. And finally trim leading and trailing "-" characters

Usually you want to do this in JavaScript, because you want the UI to show the result immediately, and potentially allow the user to overwrite the resulting URL, ideally also trapping his input and preventing him from adding bad stuff. So feel free to use my code. Suggestions are also welcome.

Love from Switzerland,
Daniel


Daniel Mettler grew up in the jungles of Indonesia and is founder and CEO of 2sic internet solutions in Switzerland and Liechtenstein, an 20-head web specialist with over 800 DNN projects since 1999. He is also chief architect of 2sxc (see github), an open source module for creating attractive content and DNN Apps.

Read more posts by Daniel Mettler