Infopost | 2023.12.20

Mass Effect star map with routes

The saga of my subsurface web linker feature request has finally hit the MVP milestone. After some groundwork I'm now producing recommendations for similar content from other voices. It's still pretty foundational though; I've only indexed about 10k pages, my recommendation algorithm is unsophisticated, and I haven't run the recommender on anything but this month's posts.

But that's a story for another time.

The reading, research, and influx of links from my partner in crime meant I was left with a trove of neat things to quote, link, and otherwise regurgitate. Some of it is thematically close to my link recommender project, so I'll start there.
Marginalia website similarity

Marginalia similarity search

I briefly thought my feature request had already been implemented when I happend upon a 2022 post from Hacker News mainstay Marginalia.

Marginalia This is a write-up about an experiment from a few months ago, in how to find websites that are similar to each other. Website similarity is useful for many things, including discovering new websites to crawl, as well as suggesting similar websites in the Marginalia Search random exploration mode.

The approach chosen was to use the link graph look for websites that are linked to from the same websites. This turned out to work remarkably well.

Already we diverge, the Marginalia guy went with a Page Rank style of linking using a massive graph of hrefs. I've been working toward content-focused similarity since two people can write about wood preservation or Slay The Spire without linking each other - indeed, the lack of peer discovery is part of the problem.

That said, based on his ability to do math and create useful terabyte databases, I will defer to the wisdom of the Marginalia approach at the very least as an excellent parallel approach. And creating networks of peers makes a lot of sense, I used to link to Connie and KO and Heidi, when I find interesting stuff on the greater internet I often quote it and link it (e.g. right now). Following links is a big step up from blogrolls and links to friends.

I'm curious what my own peer graph would look like. Based on Rob's content analysis, my top linked domains are Reddit and Flickr. This is an accurate reflection of my interests in that I like photography and (decreasingly) democratized content with (decreasingly) robust discussion. But also my site is nothing like either of these and Flickr is only a frequent href from this site because I used to use it for image hosting.

Marginalia In plain English, this service looks at which websites link to a particular target website, and then it ranks websites that are popular among those linking websites using a method commonly used in recommendation algorithms.

In technical jargon, it reinterprets the incident edges in the adjacency matrix as sparse high dimensional vector, and uses cosine similarity to find the nearest neighbors nodes within this feature-space.

To complete my book report on Marginalia search, similarity is scored using vector math.


Marginalia As a whole the feature shares a lot of similarity with how you would construct a recommendation algorithm of the type "other shoppers also bought", and in doing so also exposes how creepy they can be. You can't build a recommendation engine without building a tool for profiling. It's largely the same thing.

If you for example point the website explorer to the fringes of politics, it will map that web-space with terrifying accuracy.

I initially read this and thought, "how could this be used to profile web surfers clicking on recommended links?". Then I realized he was referring to the blogger/publisher/content creator being involuntarily associated with peers - guilt by association.

Six degrees Wikipedia
Source. This is a Wikipedia a->b tool that is conceptually similar to Marginalia's similarity graph.

The guilt by association thing is a reasonable concern. Naively, you can pretty easily distance yourself from unsavory content by simply not linking to unsavory content. But unless you thoroughly vet the authors of your links, you're bound to become a second- or third-order associate of something unpleasant. And, of course, hrefs don't understand amusement or sarcasm. If I post a link to /r/phunware and say "hey, check out these assholes", the link won't capture that context. Similarly, while I enjoy a well-written greentext, I'm not really into most of what /b has to offer.

Then there's inbound links, over which a site administrator has no control.

Marginalia's neighbors

Note again how few of those websites are actually indexed by Marginalia. Only those websites with 'MS' links are! The rest are inferred from the data. On the one hand it's fascinating and cool, on the other it's deeply troubling: If I can create such a map on PC in my living room, imagine what might be accomplished with a datacenter.

You might think "Well what's the problem? QAnon deserves all the scrutiny, give them nowhere to hide!". Except this sort of tool could concievably work just as well as well for mapping democracy advocates in Hong Kong, Putin-critics in Russia, gay people in Uganda, and so forth.

To be fair, I'm sure this capability exists in more than a few places.
Another perspective

Japanese web site art Mauchuu

There was a neat post that made it to HN about an internet subculture that has historically had the same concerns expressed in the Marginalia post. We'll get to that part in a moment, I'll let the author get a few words in about the state of the indieweb/old web/smallweb/personal web:

bikobatanari The Personal Web, to many people, only exists in a select few places. It could be solely sites on Blogspot, or Neocities, or some other adjacent platform, and that to them is the "Personal Web". However, once you've exhausted these places and found the sites that you find interesting, it's extremely difficult to figure out where to go next-to go to some unknown territory that you don't even know exists.

For myself, I've browsed Neocities for what seems like four years now as of writing this. I've seen many sites come and go-some plenty interesting, and others not at all. And even now, with plenty of sites that I don't recognize, I've become rather jaded. It's hard for me to find sites that pique my interest anymore-and if they do, it's hard to find them actually being updated or not be completely barren. All of this is what led me to going on excursions to places that not many people have gone to.

This is difficult in and of itself since funnily enough, Neocities users tend to link to only Neocities users and no one else. Despite many of its users being against walled gardens, it ironically became one itself.

So Bikobatanri fired up translation software and (as an outsider) embarked upon an expedition into the doujin web.

bikobatanari From what I've seen, [doujin sites are] structured by various search engines whose sole purpose is to index personal Japanese sites and nothing else; by "index", what is really meant is people register their own personal websites onto the engine-sort of like a glorified link directory. Its scope is even narrower than that of Neocities and other hosting platforms because sites with more formal contexts (such as business sites) are not even allowed in these spaces.

This got me curious then: how differently do people over in the East Asian sphere (primarily Japan) handle personal websites compared to the West?

Site design and SEO implications? You had my curiosity, now you have my attention.

Japanese web site art Mauchuu

bikobatanari Something that I've noticed in general is that the personal sites over there tend to be very creations/product focused. That is, their sole purpose is to show off things that they've made, rather than embody some sort of persona.

Even the site topic distribution makes this evident. The front page of a search engine that specializes in doujin sites called [Yorozulink when romanized] has sectioned off registered sites into categories, and the visual arts trumps practically every other category. An overwhelming majority of these sites' admins post illustrations, lots of them post their own mangas (original or derived from an existing series), write novels and stories, and indulge in a lot of other creative hobbies. Personal diaries and blogs do exist, but I don't think it's as ubiquitous there as it is compared to the West.

Ah, doujin sites are like portfolios and the community has its own indie search engines.

bikobatanari Usually the design is all coded by hand, and templates are, in a way, frowned upon. But relating to the creations-focused philosophy that a lot of these sites adhere to, the design of many of these sites are actually rather... tame. Minimalistic, even. Portfolio-like. Designs that showcases their work rather than ones that potentially take away attention from it. This is despite the fact that they're actually not portfolio sites.

Okay not portfolios. Semi-portfolios. The author goes on to discuss the guestbook protocol which is very much the opposite of the like/retweet/rant aesthetic of social media sites. And it doesn't stop there:

bikobatanari A term which I've encountered quite a bit on Japanese personal sites [translates to] simply "search avoidance".

Essentially, there are plenty of personal sites that go out of their way to make sure their space doesn't get spotted or picked up by search engines; and not the search engines that index these types of sites mind you (like the ones I linked to in the beginning of this article)-they're explicitly talking about search engines like Google, Bing, Yahoo!, etc.

These sites will have a disclaimer saying that their site "avoids search", and more often than not they will also add an additional disclaimer saying that their site is not allowed to be linked on SNS (basically their shorthand for social media).
The anti-timeline

Worn F5 key missing sticker ergo keyboard

Amy Hoy [In the 90s], we didn't have platforms or feeds or social networks or... blogs.

We had homepages.

From Bikobatanri I found another post with some old web/doujin overlap. The striking (yet obvious) observation made therein is that blogs are the internet's true villain:

Amy Hoy Homepages had a timeless quality, an index of interesting or useful or relevant things about a topic or about a person. You didn't reload a homepage every day in pursuit of novelty. (That's what Netscape's What's Cool was for!)

Chronological content was in the minority.

The Internet at the time was largely populated by academics, professionals, and college students. Not everyone had the desire to publish their angsty poetry, sexcapades, or surfing habits on a daily basis; the other limiter on chrono-content was the sheer time and energy it required. Diarying was a helluva lot of work. First you had to have something to say, then write, edit it, format it, add clip art, edit your index.html, edit any prev/next links, check those links, and lastly, upload the files.

This sent me into a vortex of personal crisis, have I forgotten my roots? From 2000 until 2007 this site was a time-agnostic photo portfolio with handwritten html and the occasional angsty sexcapade. Argh, I already did it again, talking about 2000-2007.

Amy Hoy And once you've had a taste of effortless updates, it's awfully hard to go back to manual everything.

So they didn't.

And neither did thousands of their peers. It just simply wasn't worth it. The inertia was too strong.

The old web, the cool web, the weird web, the hand-organized web... died.

And the damn reverse chronology bias - once called into creation, it hungers eternally - sought its next victim. Myspace. Facebook. Twitter. Instagram. Pinterest, of all things. Today these social publishing tools are beginning to buck reverse chronological sort; they're introducing algorithm sort, to surface content not by time posted but by popularity, or expected interactions, based on individual and group history. There is even less control than ever before.

Bikobatanari's take on this:

bikobatanari Another issue which holds for platforms with this type of structure (especially IG, Twitter, and Tumblr) is that looking back through another person's archived works is an absolute chore. If you want to look for a particular piece of work in someone's account, have fun wading through years of work in reverse chronological order just to find it. Because of this, people just end up resigning to have content spoonfed to them through the feed, as opposed to searching for all of the hidden gems that have long since disappeared from the public eye. It's a real shame, because there are possibly plenty of great works that will not be seen ever again because it's such a nightmare digging through all of this stuff just to find something specific.

They're not wrong. I think I've subconsciously seen this in my own site and worked toware creating alternative means to navigate it. E.g. the Slay The Spire page shows other titles sorted (generally) by similarity. Navbar-linked pages present indexes to travel posts and lists. It's not quite a 90s or doujin site, but it doesn't require eight hours of scrolling to see something from a year ago (@Twitter).

Bikobatanari seems to have had the same crisis:

bikobatanari In a way, this article has influenced my website's structure in some way. If you were here to see my website a few months ago, you would find that my articles page used to be entirely in reverse chronological order. It wasn't until November that I started categorizing my articles into separate topics, and I think that small little change has done wonders for both myself and for those who want to read something more specific to their interests.

The more I think about it, the more I see that the rise of chronologically ordered content for all of these platforms has impacted content creation in a way which I think is detrimental. Not only has it affected a piece of content's lifespan and long-term influence, but it has also normalized a structure which doesn't suit the majority of content in the first place.

In defense of the feed model I should say two things:
  1. It reduces the frequency with which you see something twice.
  2. Even timeless things have a birthday. It's not a bad thing, so long as its identity isn't inseparable from a moment in time.
bikobatanari I do want to note that the feed itself isn't bad. It has its uses. Blogs and journals work perfectly fine with a feed. The main gripe that I have with it is that with the normalization of using social media as a platform for content creation, the feed became the structure which everything was forced into, regardless of what type of content it is.

Yeah, I will continue to use this format and continue to provide navigational wormholes. And I'll think long and hard about adding a recency variable to my external link recommender.
Some good old-fashioned ranting

Fallout Nerd Rage perk

The real value of the web's fringe sites is that you occasionally come upon something profoundly entertaining. And so apropos of nothing, here is the experience of some random dude at a couple of tech tradeshows.

Note: these quotes and the post itself are best experienced with a mental picture of Lewis Black in your head.

Ludic One of the keynote speakers runs a major customer loyalty program, which as a non-specialist I believe is code for "we sell all your purchasing data in the hopes that people who can't do math don't realize our rewards are worth like $200 over your entire lifespan". If you are a specialist, I will accept corrections but also, I dunno, fuck you on principle I guess. You might not deserve that, but it's a Monday and I just had to go through standup.

This person breathlessly took the stage and spoke happily about how they've had almost 10% year-on-year growth because of the crippling increases in rent and groceries driving the working class to seek savings wherever they could.

Very cool and normal, and also fuck you on principle, even if it isn't a Monday.

They then continued to talk about the thrill of seeing that family finally purchase that vacuum cleaner that was always aspirational.

Again, fuck you, and also I hope you fall down a flight of stairs. I swear to God, I can't even imagine what kind of defective software you have to be running in your brain to be that tone-deaf, but I was deeply concerned to see this is what our bajillionaire class is doing. It's a super concerning blend of being a complete sellout and too goddamn stupid to even hide it well. How hard is it to get on a stage without sounding like a Disney villain?

I normally try to tie together the block quotes with some commentary, but in this case I'm just interrupting.

Ludic I think the worst part was realizing that this didn't flag for some people in the audience, even techies. Some part of their brain just turned off and went "10% year-on-year growth? That's money. And look how important that person on the stage is! I wish I got attention!"

Silicon Valley HBO show Disrupt SF conference middle out

On the tradeshow floor:

Ludic I am not sure how to describe this rationally so I'm not going to try, but the air felt like someone had been operating some grease-filled humidifier, and I think this hit me because I walked in and immediately saw the event was sponsored by some dipshit crypto application. The funny thing is that rather than having blind hatred, I read Mastering Ethereum for a bit because it would have been so convenient if I could actually just print money by finding some crypto use case that I'd be morally okay with, and I just couldn't. So rather than blind hatred, my hatred has intense visual acuity.

Ludic We were then approached by a guy, who we will call Henry, that immediately blasted us with totally unsolicited advice on how to get our own business off the ground... he makes sure that we have his cards as we spend twenty minutes trying to extricate ourselves.

He seems like he learned his social skills from Dale Carnegie, which is forgivable, but he thinks he's better than us, which is possibly but true but not forgivable.

Thank you, internet. Thank you, blogosphere.

He has another good post on corporate decisionmaking, "The Failed Commodification Of Technical Work":

Ludic That's right, there's a whole genre of corporate fanfiction out there. Was it useful to read? Yes. Does it miss some of the real barriers to organizations improving? Yes, which I should talk about in another article. Was it cringe-inducing at points? Hell yes.
Moment of zen

Afterburner Tengen Nintendo game cartridge
Source. After Burner and RBI Baseball are on my desert island list.

At the crossroads of early console gaming and DRM there's a story about why unlicensed games were rumored to blow up your Nintendo.

Nicole Express The other thing that makes Tengen stand out is how they broke the Nintendo's lockout chip. Modern consoles maintain their lockout using cryptography. But in the 1980's, that would get your console classified as a munition, and the NES' 1.7MHz CPU would struggle to implement anything regardless. Plus, cryptographic locks had no legal force at the time; this wouldn't be the case in the US until the 1998 Digital Millennium Copyright Act.

So instead, Nintendo developed a small microcontroller, which implements a program of sending random numbers back and forth. The microcontroller in the console compares with one in the cartridge, and if their numbers don't match the expected pattern, it resets the console every second, preventing gameplay. The chip is configured so the program can't be dumped easily, and if you do dump it, the program is protected by copyright law.

Here's Camerica's 1992 release Micro Machines. You might notice some circuitry in the corner; what this is is actually something we've covered on this blog before, a charge pump that produces a negative voltage from the console's 5V input. When the console turns on, a negative voltage spike is sent down the reset line of the lockout chip, frying it long enough to break its program and cause it not to reset the console.

Related - internal

Some posts from this site with similar content.



Links pages, webrings, and search.

Feature request

I just want to link other people's blogs.

Small web

Kagi sees an opportunity to index the indieweb.

Related - external

Risky click advisory: these links are produced algorithmically from a crawl of the subsurface web (and some select mainstream web). I haven't personally looked at them or checked them for quality, decency, or sanity. None of these links are promoted, sponsored, or affiliated with this site. For more information, see this post.

The Empty Hall Of Smiling Assassins - Ludicity

The Great Software Stagnation - Alarming Development

Software is eating the world. But progress in software technology itself largely stalled around 1996. Here's what we had then, in chronological order: LISP, Algol, Basic, APL, Unix, C, SQL, O...

Shellsharks IndieMark Score

Using IndieMark leveling to determine IndieWeb-ness

Created 2024.06 from an index of 271,867 pages.