8

Is there a short phrase meaning "mapping text through near-homoglyphs that are intentionally less similar looking, specifically when used to write controversial things (swear words, the k!n&, etc.)"?

I was writing about a controversial topic and wanted to munge some text to evade search engines and AI bots. Inspired by obfuscated swear words I found on the Internet, I wrote the following letter-mapping script.

#!/usr/bin/python3
import sys
if len(sys.argv) < 2: exit("obf ['text' | -f file]")
src = open(sys.argv[2]).read() if sys.argv[1] == '-f' else sys.argv[1]

# obfuscation algorithm
map = {'s': '$', 'a': '@', 'h': '#', 'i': '!', 'w':'ω',
       'o':'ø', 'e':'€', 't':'†', 'm':'Պ', 'c':'©', 'g': '&'}
print(''.join([map.get(l, l) for l in src]))

What is this type of letter mapping called? At first, I thought it might be homoglyphs, but it seems that homoglyphs are too similar-looking to fool current search engines. What do we call it when we intentionally make the text a bit harder to read?

For example, the title of this question1 (if it doesn't get banned for not being good English). As another example, when I run the aforementioned controversial text through the above script, I get:

W€ #øld †#€$€ †ru†#$ †ø b€ $€lf-€v!d€n†, †#@† @ll Պ€n @r€ ©r€@†€d €qu@l, †#@† †#€y @r€ €ndøω€d by †#€!r Cr€@†ør ω!†# ©€r†@!n un@l!€n@bl€ R!&#†$, †#@† @Պøn& †#€$€ @r€ L!f€, L!b€r†y @nd †#€ pur$u!† øf H@pp!n€$$. T#@† †ø $€©ur€ †#€$€ r!&#†$, Gøv€rnՊ€n†$ @r€ !n$†!†u†€d @Պøn& M€n, d€r!v!n& †#€!r ju$† pøω€r$ frøՊ †#€ ©øn$€n† øf †#€ &øv€rn€d, T#@† ω#€n€v€r @ny FørՊ øf Gøv€rnՊ€n† b€©øՊ€$ d€$†ru©†!v€ øf †#€$€ €nd$, !† !$ †#€ R!&#† øf †#€ P€øpl€ †ø @l†€r ør †ø @bøl!$# !†, @nd †ø !n$†!†u†€ n€ω Gøv€rnՊ€n†, l@y!n& !†$ føund@†!øn øn $u©# pr!n©!pl€$ @nd ør&@n!z!n& !†$ pøω€r$ !n $u©# førՊ, @$ †ø †#€Պ $#@ll $€€Պ Պø$† l!k€ly †ø €ff€©† †#€!r S@f€†y @nd H@pp!n€$$. Prud€n©€, !nd€€d, ω!ll d!©†@†€ †#@† Gøv€rnՊ€n†$ løn& €$†@bl!$#€d $#øuld nø† b€ ©#@n&€d før l!&#† @nd †r@n$!€n† ©@u$€$; @nd @©©ørd!n&ly @ll €xp€r!€n©€ #@†# $#€ωn †#@† Պ@nk!nd @r€ Պør€ d!$pø$€d †ø $uff€r, ω#!l€ €v!l$ @r€ $uff€r@bl€ †#@n †ø r!&#† †#€Պ$€lv€$ by @bøl!$#!n& †#€ førՊ$ †ø ω#!©# †#€y @r€ @©©u$†øՊ€d. Bu† ω#€n @ løn& †r@!n øf @bu$€$ @nd u$urp@†!øn$, pur$u!n& !nv@r!@bly †#€ $@Պ€ Obj€©† €v!n©€$ @ d€$!&n †ø r€du©€ †#€Պ und€r @b$ølu†€ D€$pø†!$Պ, !† !$ †#€!r r!&#†, !† !$ †#€!r du†y, †ø †#røω øff $u©# Gøv€rnՊ€n†, @nd †ø prøv!d€ n€ω Gu@rd$ før †#€!r fu†ur€ $€©ur!†y. Su©# #@$ b€€n †#€ p@†!€n† $uff€r@n©€ øf †#€$€ Cøløn!€$; @nd $u©# !$ nøω †#€ n€©€$$!†y ω#!©# ©øn$†r@!n$ †#€Պ †ø @l†€r †#€!r førՊ€r Sy$†€Պ$ øf Gøv€rnՊ€n†. T#€ #!$†øry øf †#€ pr€$€n† K!n& øf Gr€@† Br!†@!n !$ @ #!$†øry øf r€p€@†€d !njur!€$ @nd u$urp@†!øn$, @ll #@v!n& !n d!r€©† øbj€©† †#€ €$†@bl!$#Պ€n† øf @n @b$ølu†€ Tyr@nny øv€r †#€$€ S†@†€$. Tø prøv€ †#!$, l€† F@©†$ b€ $ubՊ!††€d †ø @ ©@nd!d ωørld.

Does English provide a succinct way to specify this obfuscation method?


1The title is "What is this obfuscation method called?", obfuscated in the way the question describes.

14
  • 5
    One problem with the lengthy example is that it so nearly obfuscated that I can't be bothered to figure out what it says. I can pick out a few obvious words, and get that is the Declaration of Independence. I could write a 'reverse' tool that undoes it, but why don't you just use a cipher? Commented 22 hours ago
  • 2
    You're on the right track with "homoglyph" - Wikipedia calls them "quasi-homoglyphs", since they resemble, but are not sufficiently identical to other characters. Commented 21 hours ago
  • 1
    Responsible? You posted it a big chunk of unreadable guff, where one sentence would suffice. No, I didn't get the context until looking at your link, and only then did I recognise a few keywords from the Declaration of a has-been country. To me, it isn't "slightly obfuscated" but "unreadable." Which is why I suggested using a cipher. Commented 21 hours ago
  • 3
    @Weather To me, it isn't "slightly obfuscate" but "unreadable." "unreadable" is relative here. The question does state we intentionally make the text a bit harder to read. So the whole idea is indeed that the meaning takes a while to sink in. But in the end you did recognise a few keywords from the Declaration. So yeah, it's somewhat "unreadable". But not "entirely unreadable". Commented 21 hours ago
  • 3
    My robot has no problem reading your text. Assumedly, it won't have any trouble with any other 1-to-1 mapping either. So, I don't think you're going to fool the machines. Commented 21 hours ago

1 Answer 1

17

It looks like an extreme version of Leet, or Leetspeak, in which text characters are replaced by numbers or punctuation. Specifically, the Wikipedia article mentions a "casual" form of Leet using letter-by-letter substitutions similar to those postulated in the question:

the primary strategy is to use quasi-homoglyphs, symbols that closely resemble (to varying degrees) the letters for which they stand.

By this definition, quasi-homoglyphs may be less resemblant of the original than ordinary homoglyphs, allowing for a lower level of intelligibility.

1
  • [may resemble less] Commented 2 hours ago

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.