WebMemex

Archiving WebMemex

Gerben — Sun, 24 Nov 2024 15:45:00 GMT

With no movement in this project for several years, I figured it’s time to consider it over, archive this blog, dehydrate its content before bitrot consumes it. But first, I’ll give an overview of my WebMemex-related efforts of the past years for any curious visitor or future archeologist.

The mindmapping browser

Freshly graduated in 2016, I tried to create a mindmap-like read/write web browser for growing one’s digital memory; a memex-ish tool based on world-wide web technology. See:

My 8-minute talk at I Annotate 2016; with a demo of my first experiment.
A 1-minute screencast of the same software (slightly further developed).
The live demo to try it out (while it lasts..).

This first demo was a browser that was itself built as a web-app, (ab)using elements. It relies on a proxy to avoid cross-origin requests as well as to insert a script into each page which captures link clicks inside the frame (because clicked links should open in a new node, creating a path).

By browsing you would build a graph of visited web pages, connected by the links you followed. You could also create links between any pages, to add your own. And you create notes, which could be regarded as tiny, user-writable web pages themselves, and could thus be attached to any webpages by their links.

The most similar tool out there was The Brain. Jerry Michalski, who publishes many years of accumulated thoughts and findings as Jerry’s Brain, had convinced me that a directed graph is a powerful structure for organising thoughts: it allows building hierarchies while also allowing multicategorisation. So I started with this, other organisation methods could still be added later.

The demo elicited encouraging responses, and hopefully inspired some people; e.g. Mozilla’s browser.html experiment reportedly borrowed some ideas. While being merely a proof of concept, the WebMemex was somewhat functional already. I used it intensively for a while myself (until I lost my data multiple times), some friends gave it a try, and for a short period hypertext-inventor Ted Nelson was the most enthusiastic adopter (until presumably giving up on it too).

Given the limitations of the iframe-based approach, and as building a new browser was beyond my abilities, in order to make something practical I decided to make a browser extension to incrementally add the desired features to existing browsers: see the first WebMemex blog post. I got a small grant from SIDN fonds for this work.

Snapshotting pages

When the browser becomes a knowledge management tool, accumulating web pages and notes to function as your personal memory extension, storing only the addresses of web pages is not sufficient — the web appeared too ephemeral. Web pages behave rather like living beings that are not easily stored, applications that often rely on their back-ends to remain operational. To treat web pages as pages again, as documents that you can store on your digital shelf, I needed a module for snapshotting web pages; not finding any, I started to build it myself. This became the freeze-dry javascript module, which packs the whole snapshotting logic into one simple but highly configurable function.

To keep focus, the WebMemex browser extension got stripped down to a simple web page snapshotting tool based on freeze-dry, with full-text search through one’s snapshotted pages. After a brief collaboration, a more feature-rich fork of the extension by Oliver Sauter and his team developed into WorldBrain’s Memex.

With a grant from the Prototype Fund in 2018, I improved freeze-dry and created a proof of concept for personal web archives: your collection of snapshots is your archive, which can be queried via the Memento protocol like any other web archive. To prototype this I made a small Nextcloud app.

Web annotation

An important goal of the WebMemex project was to enable highlighting and annotating the content of pages; e.g. linking between corroborating or conflicting statements. Creating links should not be limited to the publisher of a page; browsers lack a highlighter and pencil.

To ensure that annotations made in one tool are also readable in another, we need standard formats for them; both to be able to share annotations between people, and to avoid content dying along with the software that made it. The W3C Web Annotation Data Model was an attempt in this direction, but the standards were both broad and vague and got little uptake.

After earlier work with web annotations (e.g. an internship at Hypothes.is in 2014), I made some efforts on web annotation again in 2020–2022, with an NGI0 grant from NLnet Foundation (and then I ended up working at NLnet). Instead of annotation support directly into the WebMemex browser extension, I wanted to work on more widely reusable tooling for the web annotation standards. I contributed to the modules of the ever-incubating Apache Annotator, and created a proof of concept specification and implementation for standards-based ‘annotation feeds’, Web Annotation Discovery. This 3-minute screencast conveys the idea.

Relatedly, in 2022 the work on URL Fragment Text Directives (first called ‘scroll-to-text’) by the Chromium team caught my eye. I had made the highly similar quoteurl back in 2016, and a Precise links browser extension in 2017, with the hope that links to arbitrary content would some day become a standard. While I’m not fond of a dominant browser effectively changing the nature of URLs with little input from others, this particular change seemed helpful. I made a line-by-line implementation of the spec, text-fragments-ts, and tried improve the spec at the edges, hoping that it will finally allow web links to refer to arbitrary phrases within a document.

Now what

The aspirations behind the WebMemex are ever present, and will perhaps revive some day in some form, but this website can be turned into past tense now. Most likely I will not attempt to build an all-in-one WebMemex knowledge management tool, though time permitting I might maintain and improve some technical enablers, such as freeze-dry.

Over the last few years, I have come to realise that building the user-facing application, while difficult by itself, is not the primary challenge; the more important goal is to grow an ecosystem of standards and modules, and also establish the concepts, that power a wide range of interoperating knowledge management tools.

It helps to see your whole computer as your memex, and every application is part of it; highlighting, annotating or linking a fragment of text or audio should be as ubiquitous as copy-paste. And by cross-referencing between people’s knowledge bases, these then form a web of memexes — somewhat like the original idea behind the world wide web, which then went another way.

Now to archiving this blog. Instead of running the specific software that currently powers this blog (the Ghost CMS, in this case), I will turn the website into static HTML files along with their subresources (images, stylesheets, scripts), which can then be hosted by any simple web server. (Wiser people would have used a static website generator from the beginning.) I could simply run wget --recursive, or take this opportunity to use the WebMemex in action..

Personal web archives; a report

Gerben — Sun, 09 Dec 2018 22:55:37 GMT

My goal over the past spring and summer was to enable people to store web pages and create their personal web archives. Hereby an overdue report of six months of undisturbed coding.

I will start with an overview of the approach and architectural design as it evolved along the way, then elaborate on what has been implemented thus far.

Approach

In this project, the angle to archiving differed a bit from established projects, such as the widely respected Internet Archive. These archives conceptualise the web as a global living library, that (desperately) needs a permanent record to counteract its continuous decay and keep the previous content of web sites available. While such a versioned repository of nearly the whole web is a very important and useful endeavour, I have tried to address three other aspects:

Firstly, for more resilient and self-organised archiving, it would be great if people can own and keep the copies of the documents they care about, thereby creating a highly redundant, localised, distributed storage.
Secondly, acknowledging the web consists not only of public, static pages, people should be able to archive the web as it looked from their perspective, including private or personalised views (accordingly, archived content should be private by default).
Thirdly, I'd like to take into account the fact that many web 'pages' are in fact more like an application than a document, influenced by user interaction and other resources it fetches from the internet.

With these aspects in mind, the goal has not been to crawl and capture web pages with all their complexity and interactivity, but rather to enable people to store any page exactly as they see it in their browser. Not recording the web as if it were a living thing, but just making web pages more durable objects.

Even a personalised, ephemeral view that was generated on the fly (e.g. a hotel booking confirmation) is considered a document that one could store just like a paper letter. And any interactive application (e.g. a zoomable map, or simply a page with pop-ups) is considered a sequence of document variations, any of which can be stored as a simple, non-interactive document by snapshotting the application's current presentation.

By keeping viewed documents as you browse, you could page by page grow your personal web archive; a personal selection of things worth keeping, not unlike one's collection of books on a shelf.

Now an interesting additional concept is that the snapshots of web pages could themselves again be made part of the web. You could then access your archive with your browser, which could let you access both the live and archived versions of a page when visiting a link or bookmark. Moreover you could share parts of your archive on the web, so others could access them too.

Hopefully, the fragility of the web could be somewhat compensated for by keeping your own copies of valuable pages and querying others for theirs, instead of relying on the original publishers to keep hosting them forever.

Designing for individuals

Throughout this project (which, to be clear, is far from finished), the design philosophy has been to think bottom-up: primarily, the purpose is to create tools for the needs of individuals. Then, only as a secondary, long-term objective, a sizeable distributed web archive might emerge from that through people's collective behaviour.

Another strong intention has been to let people use the various means they may already have, mixing and matching different tools as it suits them. Some may want to save web pages using our tool, but want to do full-text search using their usual desktop search tool. They may want to share snapshots by email, rather than via their personal archive.

The challenge is to not design a monolith catering for pre-envisioned scenarios, but make primitive tools that are easily composable with others to create different workflows. While working on this project, I realised the desired functionality could nicely be divided in three independent but complementary tools:

a browser extension for storing web pages,
a web archive server hosting such snapshots,
a(nother) browser extension to query such archives.

Tool 1: Storing a web page as a file

To start with, it may be best to forget the idea of growing a personal archive, and just wonder how a browser (extension) could let an individual keep a single page. As explained, the idea is to capture the page's state on the moment the user chooses to save it, but another question is how to deliver this capture in a useful form.

As the principal unit of knowledge management on computers is a file, having each snapshot consist of a single file would enable people to use decades of existing software for organising them, sending them to others, making back-ups, and so on.

A file feels almost tangible when compared to an object that is kept in the custom database of the application that created it (it's sad how the latter is becoming the norm). This is even more so if the file can be opened by various existing applications. Since most people's computers, including 'smart'phones, can display an HTML file, this format felt like the best choice for this project (even though formats like WARC or MHTML may seem more likely choices for web archives). All required resources (images, stylesheets...), as well as any meta-data (date of snapshot, the page's URL..) will have to be stored inside the file itself. Luckily, with HTML this is possible; details about that further below.

An encouraging observation is that saving static snapshots of web pages as files is something people already do, often with the limited means that are commonly available: making screenshots or saving them as PDFs. Even if it is a low-quality copy that lost much of the page's layout, structure and links, people simply want to keep hold of what they saw, locally. When asking around among journalists, academics, techies, and other frequent users of the web, it keeps amazing me how many of them do this. Saving a page as an HTML file, thus retaining its natural format, seems a small but useful step forward.

Save Page As...

Time for a small intermezzo, as when discussing the creation of a browser extension that just saves pages as html files, a savvy reader probably realises such a feature exists already: browsers have always had a button to save the currently viewed page.

But almost nobody uses this feature, because it has been mostly dysfunctional since medieval times (perhaps back then they worked fine, because the pages where primitive too). At least in Firefox and Chromium*, saving a page results in an HTML file, next to a folder with (some of) its subresources. In most situations, the result is..

hard to move around, or e.g. attach to an email, as the file and folder must remain together.
of bad quality: often not all subresources are stored, so pages look malformed as they lack some images or stylesheets.
unreliable: some subresources are not captured, but are still fetched from their original source, thus falsely suggesting they were stored when first viewing the snapshot.
unpredictable: when viewed, the page's scripts are re-executed in the already rendered page, which might lead to.. anything really, depending on whether the script was written with presumptions about the page content.
possibly insecure: executing scripts might lead to security problems, or leak private information to the internet.
lastly, the saved file has no notion of where it came from, dissociating it from its original URL.

Boring as it sounds, while I started by thinking how to create a decentralised web archive, I slowly realised the most valuable step towards it might just be to fix the 'Save Page As...' button. Hopefully browsers will one day give this forgotten feature some love again, and until then browser extensions can help fill the gap.

Tool 2: Hosting an archive

Assuming we have mostly solved the problem of saving web pages, such that people have the ability to turn them into tangible files, we can build the next level of functionality. We can make a tool that interprets a bunch of snapshot files as one's personal web archive.

This archive would be a static web server hosting one's snapshots. Moreover, it would support Memento, a protocol supported by many archives to enable users to query the archive with a URL and a date, and receive a redirect to the snapshot of that URL closest to that date (if any exists).

The archive could be running locally on one's own computer and be solely for individual use, or on a remote server with the option to also provide other people access. Besides the basic features, it could provide (human or programming) interfaces to, for example, add new snapshots to the archive, control their access, or perform full-text search on them.

Tool 3: Querying the archive(s)

With the browser's augmented capability to save web pages, and an archive server for providing access to the snapshots, the remaining piece of the puzzle returns our attention to the browser, in order to make it automatically connect to and make use of the archive. This could again be achieved with a browser extension; either combined with the other one, or developed and packaged independently (modularity!).

Whenever the user is about to visit a web page, the browser would have the possibility fetch it from the live web, but now also to query the archive for previous snapshots of it. It could show a page's most recent snapshot as a fallback whenever the live page is unavailable; or if preferred even show the snapshot by default, for 'offline-first' browsing.

Beyond one's own personal archive, one could connect to multiple archives, which can be any archives that support Memento, and query any or all of them to find a snapshot. This gets us back to the vision of a distributed, emergent, collective web of archives.

Implementation

The above described the desired architecture as my idea of it solidified during the project. The current implementation still lags a little behind the target, but the basics are there. I implemented the first two tools that were described: the browser extension that makes snapshots of pages, and a minimal form of the archive server that can host them. I did not implement the third tool yet, but one can use already existing Memento clients, such as the Memento Time Travel browser extension, to get the basic functionality.

The browser extension

As I took off from the already existing WebMemex code, the browser extension I developed does more than just giving a snapshot file. It also saves these files in its internal storage, so that it can access and search through them, to provide some of the personal archive features already.

However, the storage inside a browser extension is a silo, inaccessible from one's file manager or other applications. It is not obvious to a user how to do basic things like making a back-up of the files or moving them to another computer; I would have to reinvent those wheels specifically for the extension. These issues were strong motivators to come up with the above architecture design consisting of multiple, composable tools.

The main task of the browser extension should be snapshotting a web page, and most of the effort has gone into coding this part as well as possible. Because there are so many cases where snapshotting a web page would be useful, this code has been developed as a separate, reusable module called freeze-dry (I actually started on freeze-dry last year, but rewrote it from scratch).

The WebMemex browser extension is just one possible application that can be built around freeze-dry. It could also be integrated into browsers (e.g. replace the "Save Page As..." button!), be used for automatic capturing of pages in headless browsers, and could even be used on a virtual dom like jsdom.

I like to see code modularisation as the pursuit of composability at a second level: while users can mix and match compatible tools, developers can mix and match modules to easily make a large variety of those tools.

How freeze-dry works

As it is the main deliverable of the project, I will elaborate a bit on the internals of freeze-dry. On a first glance, saving a web page seems a simple task. It just has to read the current DOM as an HTML string, so a good first approximation to freeze-dry would be:

const freezeDry = document => document.documentElement.outerHTML

But, as one user expressed in a spontaneous email: "I was surprised and saddened by just how much work it takes to do what freeze-dry does!". It gets more complex because to get the a full snapshot of the page, freeze-dry also has to fetch and inline external images, stylesheets, etcetera as data: urls (i.e. base64-encoded strings); and recurse into the stylesheets, inlining their fonts, images, and imported stylesheets; recurse into iframes; convert relative URLs into absolute ones; remove scripts and event handlers; ...add a couple more steps like these, and we easily end up with over 2k lines of code (for full details on how freeze-dry works, look at the source code).

The result of freeze-drying is a snapshot of the page in the form of a single string of HTML that is..:

self-contained: all subresources, such as images and stylesheets, are inside the snapshot (which can be several megabytes).
context-free: not tied to its original location, i.e. it has no relative URLs.
static: scripts are removed, to obtain a document one can view, reuse, annotate, and perhaps even edit without surprises.

Put the resulting string in a file, and your freeze-dried page can be preserved for many years to come.

Beyond the required changes to preserve the page, freeze-dry tries to make as few changes as possible. One exception to this is that it adds two pieces of metadata to the page, to remember its original URL and the moment the snapshot was made. As I could not find an existing standard for expressing this information, I chose to embed the equivalent HTTP headers as they are defined for Memento protocol, so a snapshot will contain two lines like this:

One other small addition to the HTML tree is made in order to ensure the snapshot is self-contained: freeze-dry adds a tag with a Content Security Policy (CSP), that instructs browsers not to fetch or connect to any outside resources which the snapshot might accidentally still refer to (e.g. if freeze-dry neglected to inline some yet unknown or non-standard type of subresource). This CSP solves two issues: firstly, internet connectivity could pose a security or privacy issue, especially as one would expect to be viewing a document locally. But, perhaps more importantly, the CSP ensures reliability: one can be sure a snapshot will still look the same when one is offline, or when the links to its subresources have rotted away.

Personal archive server

To provide the second tool, that enables people to host their snapshots in a personal web archive, I decided to prototype a simple web server that can easily be run on people's usual personal servers. Unfortunately, personal servers are not a usual thing at all. I found however that NextCloud comes close enough to use it for building a proof of concept.

I made two small NextCloud apps:

raw allows using NextCloud as a static web server. Because by itself, NextCloud does not serve user files raw. It only provides them as a download or displays them within its own user interface (where for an HTML file it only shows the source code).
memento makes the server speak the Memento protocol, so one can ask it for a snapshot of a given URL near a given date. Upon receiving a Memento query, the app searches (linearly, for now) through all HTML files one has stored inside their NextCloud, and looks if any of them are snapshots of the requested URL (i.e., if it has the requested URL in a tag). If there are multiple matches, it looks at the snapshot dates to pick the closest match. It then redirects the client to the 'raw' app to serve that snapshot.

While this is a very minimal prototype of a personal archive server, I do like that it follows the ideal of modularity and composability. For instance, the archive is quite independent from the browser extension that saves the pages (their only coupling being the particular metadata format of the snapshots, and support for other snapshot formats could easily be added). Likewise, to query the server, any Memento client should work. To get snapshots onto the server, any of NextCloud's options can be used, such as syncing with a local folder or dragging and dropping snapshot files onto its web interface.

Way forward

Much work is remaining on the way to personal web archiving, but I hope this project has made some baby steps forward; small contributions towards changing the way we use, along with the way we conceptualise the web. I would be very satisfied if some more people would get used to thinking of…

…documents as being different from services/applications, with web pages often being an awkward mix.
…web pages as being fetched from the web, instead of being viewed on the web.
…web pages as being storable, tangible, ownable things.
…pages stored on their computer as still being part of the web.
…fetching a page as a way to get the latest edition, when their own version may be outdated.
…fetching a page from anybody who has it, rather than always from the original publisher.

Of course, much of the web today won't fit these concepts, but that is exactly the first point: we should distinguish between live, interactive services (e.g. a web shop or SaaS application) and web pages that are primarily a document which we can refer to by its URL and expect to remain available. If we keep developing the web mainly for the living services, but meanwhile keep using the web as if it were a global library of interconnected literature, we would remain stuck in the digital dark age where knowledge decays within a decade.

Work ahead

I will wrap up with some thoughts on what may be worth working on going forward, besides the obvious steps to improve freeze-dry and related tools. Perhaps most of all, my hope is to increase the composability of tools in general, to give users more freedom in choosing their tools and workflows.

Insistence on modularity and composability may yield fruits in the larger development ecosystem, but it may not be the easiest way to get quick and visible results. For example, I would have liked to support direct communication between the browser extension and archive server, to store newly created snapshots directly on the server instead of inside the extension, but did not find a satisfactory way to do this without resorting to a custom or obscure protocol.

In fact, the initial reason to try use NextCloud was that it supports WebDAV, a simple protocol enabling us to PUT files onto a server, though I did not get around to implement that in the client. Also WebDAV would be only part of the solution, as aspects such as configuration and authentication also need to be addressed to provide a smooth user experience. UX for composable tools is hard.

To work together, the browser extension and the server apps largely rely on the capabilities of the platforms around them. And that is probably the way to go. To make convenient, composable tools, most effort has to be put not into the tools themselves but into the platforms they run on and their protocols and interfaces.

For example, we could improve the browser so that one can drag a file directly from within the browser to one's server, instead of having to first save it in a folder. Or we could even support automating such workflows with something like webintents (but, another insight during this project: first make things possible, then make them efficient).

I could come up with a whole bunch of other things that would be helpful for this project and others like it. I will just list a couple of broad directions for inspiration:

Support Memento in browsers, servers, and other applications.
Support for reading MHTML and WARC files in browsers, or develop another self-contained web package format.
Improve caching in browsers, changing the UI to differentiate between a local cached version and the 'live' online version.
Make web pages as documents instead of applications; make them usable without javascript.
Invent and define a limited form of javascript that can be reasoned about and archived reliably.
Make personal, read/write web servers with standardised protocols. WebDAV is a primitive start; RemoteStorage adds some features to it; Solid might take things further still.
Work on 'named data' approaches to the web; on projects like IPFS and Dat&Beaker, so documents can be fetched from any source, and without needing to trust the source.
Last but not least, work out ways to do protect people's privacy while querying others for resources.

While for the moment I am occupied elsewise, I hope to contribute to some of these directions in the future, and that whoever read all the way to the end of this post may have an increased motivation to do so too.

Das eigene Webarchiv

Gerben — Sun, 08 Apr 2018 21:58:56 GMT

tl;dr: back to work, with support from the Prototype Fund, to spend 6 months on personal web archiving.

It has been silent around this project for a long time, while I was distracted with other things, ran out of funds, and needed a viable plan to move things forward. This week, the project gets an impulse again, as it will be supported by the Prototype Fund for the next six months under the project title "Das eigene Webarchiv".

As the title suggests (eigen = own), the focus will be even more than before on the ability to archive web pages for personal use. Like in the original idea of the memex, such use would ideally include adding notes and links among things you have read, thus organising them by your assocations; and this is still the long term vision. But to organise items one cannot own, to annotate pages that may change or disappear.. it all feels like building on quicksand. So let's first build this foundation, in order to work around the web's inability to retain the documents we care about.

This sub-mission feels partly like a continuation of the work done so far, and partly as a new project building on lessons learnt from that. Either way, the plan for the coming months is to work on technologies and tools for web archiving; and to combine their features in a browser extension that enables you to…

store a web page as you are visiting it. The core task here is improving freeze-dry, to as well as possible snapshot a page and bundle it with its dependencies (e.g. images and stylesheets).
browse pages from your archive. Whenever you visit a webpage, you can choose to see previously saved versions; you could even choose to browse offline-first by default.

An explicit goal is to avoid creating a silo that locks archived pages up inside your browser. Rather, the idea is to be part of an ecosystem of composable tools, so you could access or even edit pages using other applications, thereby making them truly yours.

To this end, pages will be stored on a web server of your choice; possibly just running on your own computer and only accessible by you, but nevertheless speaking webby protocols (exactly which protocols is still to be determined). Part of the project plan is to experiment with this architecture to add social features, that will enable you to…

share saved pages with others. By storing your archive on an internet-connected web server (aka "the cloud"), you can easily make archived pages, or a whole archive, available to others. You can snapshot a page and give me the link to your snapshot.
browse using other people's archives. Besides your own archive, you could query the archives shared by others about their versions of pages, using the Memento protocol. Your friends' archives, your own archive, and big ones like the Internet Archive; each will just be another repository of documents too precious to lose, queryable in the same manner.

These four listed features will form the core work of the next months; of course there will be more features to make this a convenient tool: comparing different versions, full-text search through the archive(s), and perhaps more. How exactly things will work is to be found out as we go!

Keep all the gems

Gerben — Mon, 10 Jul 2017 01:04:01 GMT

TL;DR: Browser extension has been released, with as its first main feature: local web page preservation.

If my newspaper is important to me, I can put it in a drawer and read it again at any moment. However, if I find an important page on the web, storing it is never easy, and I remain dependent on others to provide it when I need it. If the original source stops serving it, it may simply be lost — almost as if a book would disappear when its author retires.

For growing a personal web — a digital library on your own computer — keeping hold of things you have seen is thus an essential feature. While the behaviour of interactive web pages (web apps) is often hard or even impossible to store, simply storing a page as I currently see it would already be great. No scripts, interactivity, or remote dependencies; just a static, preserved, freeze-dried web page.

While other WebMemex features are still in heavy development, freeze-drying pages works reasonably well now, and seems useful enough by itself to deliver it to people. So if you have a recent version of Firefox or Chromium/Chrome, go get it:

Install in Firefox

Install in Chromium/Chrome

Once installed, the extension provides a button for capturing the page:

The icon currently depends on your browser, because this project still needs a logo. (any graphic designers reading this?)

All pages you have stored are listed in the memory overview (tip: Ctrl+Y should open it), where you can search through them:

You can also search directly from the browser's location bar, by starting with an m:

Click an item in the memory overview to see the stored page, or possibly export it as a single html file.

For now, that's about all it does. There is not much novelty yet (it is similar to e.g. SingleFile or Scrapbook), but it provides the basis for developing the really interesting features next: editing and creating pages, creating links to organise and browse your web by your associations, and sharing your personal web on the world wide web.

So, many things still to add, fix and improve. Any help with that — with code, design or communication — is always welcome!

Progress update

Gerben — Mon, 24 Apr 2017 09:49:00 GMT

A summary of project progress over the last two months.

TL;DR: Many contributed features, now refocussing on core functionality, and unlicensing the code.

New features

With the help of many friendly and motivated contributors, a whole bunch of features and improvements have been realised:

The design of the overview got an overhaul (thanks Shivang!).
You can quickly run a search in the browser's location bar (or omnibar, awesomebar or however browsers call it nowadays) (thanks Rohan!).
Search can be restricted to a chosen period in your memory (thanks Raj!).
PDFs you viewed are now also full-text searchable (thanks Anil!).
Small but pleasant, search shows an animation while busy (thanks Shivang!), and tells you explicitly if there are no results (thanks Chaitya!).
An options page has been added (thanks Aquib!), soon allowing you to configure extension behaviour.
A start was made on a heuristic deduplication system, to work around the transient nature of URLs and lack of versioning in web pages.
Code was organised and cleaned up, helped by using css-modules (thanks Aquib!) and code linters (thanks Shivang!).

One more thank is due to Oliver, who has been the main motivator getting all these people to help on this project, through his WorldBrain project.

Increasing focus

With the wave of enthusiastic contributors also came the lesson that reviewing and cleanly integrating multiple people's code can take a lot of effort, especially when these people are new to the project. I started to understand the need for developing less different things at a time, even when in theory others are doing a big part of the work. We have added a couple of features that each are nice to have, but we also drifted off from the development roadmap.

My plan is now to direct focus more towards the core functionality, the essence of which I came to understand better: creating and owning your personal piece of web. That means expanding the possibilities of browser, to enable you to save, create and edit pages and links, rather than just view them. And complementary to that you need the ability to browse and search through your own web and to share parts with others.

Of course, all such functionality would really become great when it fits well in your existing work flows. For example, it could enable you to add your local files to your personal web, and to import your browser's existing bookmarks, and a hundred other things you can imagine. We should definitely keep such features on the road map, but rather than tackling them all at the same time, it may be good to first get the basics working and get a functional extension out there.

Current development

So, back to working on the core. A few of the things that are being worked on at this moment:

Freeze-drying visited web pages to store them in your personal web.
Browsing through locally stored pages like any other web page.
Faster full-text search through your web.
Settings to choose which websites to remember automatically, and which to ignore.
The ability to edit your pages.

Besides this, many small things are being worked on to make a coherent whole, fix quirks and improve performance, in order to make a release soon. Stay tuned.

Public domain dedication

One more note to close with: all code in the project repo is now published under the Unlicense, which is a rough equivalent of the more succinct "do whatever the fuck you want with this code"-licence. Its purpose is to undo copyright, by explicitly waiving all such exclusive rights the authors may have been automatically granted.

Copyright is a vague, complex, internationally inconsistent, out-dated system of laws. When effectively the only purpose of a licence (e.g. MIT/BSD/ISC) is to legally force people to mention your name when they reuse your work, it feels like an overkill to pull in the whole bulky legal system. Attribution can be a matter of honour, not law.

Hopefully waiving copyrights helps code reuse. If you copy a few lines of code from this project, you can think whether you consider it worth attributing, rather than having to think what a dozen legal frameworks require you to. No need to end up rewriting code to legally be on the safe side, nor to manage and carry around a bunch of copyright notices for every snippet you borrowed from elsewhere.

So go clone our repo and grab whatever you like!

Try our alpha-prerelease-0.0.1

Gerben — Tue, 21 Feb 2017 13:52:28 GMT

Release early and often, right? It is half-baked, undercooked, and somewhat sluggish, but here is the first (pre)release of the WebMemex browser extension. Consider it a teaser for what is to come.

What does it do?

It currently captures (the text of) web pages you visit, lets you search through them and reread them; all on your computer, even when offline. For example, to find back where the heck it was that you read about that topic before:

It also provides a first glimpse of the possibility to create links: you can select quotes in web pages and remember them:

Try it out

You are welcome to give it a spin already in Chromium or Chrome:

I want it.

Download the extension via the button above (ignore the security warning) and then drag the file onto your list of extensions (in the menu Tools → Extensions), as shown below. Then just open a new tab!

For convenience it may be made available in Google's WebStore too at some moment, but it seems healthy to not completely depend on a single central authority.

Be warned that your knowledge stored using this prerelease may become unusable in a future release, as the data model is still changing.

What about Firefox?

Unfortunately and surprisingly, Firefox nowadays gives users even less freedom than Chromium/Chrome regarding extensions, as it requires every extension to be signed by Mozilla. Our extension is still awaiting their manual review.

Update: it got through review, find it here.

For jailbroken Firefox users, the unsigned add-on can be found here. Be warned that it is (even) less tested. Also highlighting a quote in its context does not yet work, and due to browser differences the memory overview cannot show up in every new tab, so click the add-on's button (the green puzzle icon) or try press Ctrl+Y to open it.

Thanks for the funds!

Gerben — Mon, 13 Feb 2017 18:14:51 GMT

The kind SIDN fonds just donated €8k for the initial development of the WebMemex browser extension, to pursue the roadmap until May. A big thanks for believing in the mission!

Over the last few weeks, this development has been progressing quietly but steadily. Much of it has been technical structuring, such as creating a simple data model that represents pages separately from visits to those pages, as a groundwork for the cool features to build next.

On the project roadmap, we are slowly moving into phase two now. There are still many browsing overview improvements waiting, but it works just well enough to start thinking about the next stage: features that allow users to create notes, take quotes and snapshots from pages, and link things together to actively organise their knowledge.

Many forthcoming features have now been described in GitHub issues and project boards, to open up the design & development process and ease participation. Be welcome over there to follow the progress or contribute to it; there is plenty to do.

Some liveliness is already emanating from the deliberated collaboration with the people of the WorldBrain project. Their direction has so much overlap, that we decided to try avoid inventing the wheel twice and build on the same code base instead. The plan starts with porting features from their existing extension, such as the ability to import browser bookmarks and history, so we will get the best of both projects combined.

As said, there is still a lot to do. At least, a bit of funding helps doing it.

First code published

Gerben — Mon, 16 Jan 2017 10:09:00 GMT

On Github, for now. Currently it logs the pages you visit, shows them in a list, and lets you filter them using a full-text index. Not much yet, but it's a start!

Giving your browser a memory — the roadmap

Gerben — Thu, 05 Jan 2017 01:00:00 GMT

Web browsers suffer chronic amnesia. They excel at quickly accessing remote knowledge, but are no use in keeping or managing it — bookmarks suck, browser history is a joke, hoarding dozens of tabs is grossly inadequate, and other tools are often too cumbersome. We handle much more information than our brains were designed for, so we need better ways to augment our intellect.

This project is about repurposing the browser towards its original purpose: managing information. The approach is to build a browser extension that lets you grow your own web of knowledge, with pages you have read, thoughts you wrote down and connections you made. From a user's perspective, it is a memory extension, giving you a digital memory that you can browse and search through with your assocations, and that you can even share with others.

If you know me even a little, you probably heard this idea many times, and may have seen the experiments I made over the past year or so. The goal is now to take all the great feedback received and lessons learnt, and prototype a practical solution for organising our digital minds. Experimental still, but usable and useful, and hopefully inspiring for others in turn.

The project philosophy is to consider many of the desired features as separate projects, each adding some functionality to the browser, while together forming a coherent user experience. The plan is to start with quickly developing the bare essentials of each feature in the next few months, borrowing from existing work where available, and then improve them in parallel. The work will be grouped in three phases, each covering a group of features.

1. Overview & recall

The first phase aims to provide you a timeline overview of your memory, displaying the documents you have read and the paths you took to get to them, and letting you browse and search through them. So you can safely close those tabs and still find things back, and then recall not just one document, but get its whole context around it.

In a sense, this overview will unify the functionality now provided by browser history, bookmarks (which are just highlights in your history), and even tabs (which are your most recent history). At some moment, it could perhaps replace the browser interface altogether. For now, it will remain simpler, and will be split up in four components:

a logger, that collects the browsing activity
an archiver to store the documents (because links rot)
an in-browser search engine to find relevant documents
the viewer that arranges and displays the overview

2. Create & organise

The next step is to enable you to create links and notes, so you can organise your memory and turn it into your personal web. A bit like a mind map of your memory, with your thoughts and assocations.

You could say we are tightly integrating a private wiki into your browser, for easily jotting down your thoughts, and adding quotes from and links to your best reads. For the first prototype, the main features to develop are:

a hypertext note editor
drag&drop support to easily create links to (selections within) documents
integration into the overview interface

3. Share & sync

With the ability to create your personal web locally in your browser, the next step is to synchronise your web with your server (or service provider), so you can publish notes and links on the world wide web.

You can then also edit the same web on multiple devices, and perhaps even sync it with friends to form a collective memory. The main components in this phase are:

a web server
data synchronisation
a fancy viewer for browsing a published web

This roadmap may sound ambitious, and it is. But let's first start with building a proof of concept and testing how that works out. The plan is to cover phase one and push out a first browser extension already in one month, follow with phase two in March, and push through phase three in April/May. And after that? Let's see by then. :)

Great you made it (or skipped?) down to the end. This write-up sketched the project direction, and is to be the first of a stream of regular updates, initiated to involve all of you who would like to follow the progress, be among the first test-users, and perhaps contribute to its development. So feel free to subscribe to updates, and do not hesitate to leave a message or pop in on IRC if you would like to share any expertise, code, designs, or ideas!

Now, let's code.