The Myst Graph is generated by DeMystify, now an open source Go program: github.com/glthr/DeMystify. This third article focuses on the implementation details and shares suggestions for further enhancement. As the article evolves, it will be updated based on reader feedback and any questions that require clarification. (An email address is shared for this purpose at the end of the article.)

Converting the Original HyperCard Files

My initial attempt to implement a parser for HyperCard files proved more challenging than expected. Creating a parser based on the unofficial file format standard (and its helpful addendum) would have been the best approach but would have required a significant effort for a single use—the Myst HyperCard files only need to be converted once, after all.

Fortunately, stackimport, a C++ library, demonstrated its ability to reliably convert HyperCard files into XML (and PBM) files. DeMystify leverages this ability and therefore requires converting the original files beforehand into XML files. The path to these files must be provided as a command-line argument. I do not provide these files in the repository as they may be copyrighted: converting the original source code is left as an exercise to the reader.

There is an opportunity for improvement here and an interesting challenge for contributors. The conversion process could be further streamlined by creating a Go-based HyperCard converter library that can generate files or even data structures directly usable by other programs.

Parsing the Converted HyperCard Files

Stacks

Demystify parser package translates these XML files into a comprehensive “proto graph” representation of HyperCard objects and their relationships. The parser logic begins by processing each stack, resulting in a slice of stack objects that contain essential information such as the stack’s name (i.e., its Age), processed HyperTalk scripts, and a mapping of card IDs to their corresponding names (if applicable). Each stack is also associated with a list of cards that are part of it. To illustrate this concept, let’s take a look at an example:

. . .
<card id="8277" file="card_8277.xml" marked="false" name="" owner="2702" />
<card id="8776" file="card_8776.xml" marked="true" name="qbow" owner="2702" />
. . .

In this example, two cards, 8277 and 8776, are part of the stack. One is marked; another is not (this information is not used for the Myst Graph). And 8776 was named “qbow” by its creator. Because the card files do not contain names, the mapping 8776 → “qbow” needs to be provided to the cards parser.

Cards

After the stacks, the cards are processed. The goal is to generate a card object that contains a card ID, a reference to the stack to which the card belongs, its original name (if any) by using—as just explained—the IDs-names map populated during the analysis of the stacks, the parsed scripts, and the backgrounds—corresponding to the name of the image featured by the card. The image name is consistently present in the <content> tag having ID 1. For instance, the asset embedded in the following card is “Caldera 4-N”:

. . .
<content>
    <layer>background</layer>
    <id>1</id>
    <text>Caldera 4-N</text>
</content>
<content>
    <layer>background</layer>
    <id>7</id>
    <text>card id 31046</text>
</content>

Furthermore, the card is internally named following the pattern “{stack}:{ID}”, where “{stack}” represents the ID of the stack containing the card, and “{ID}” is the unique identifier of the card, as explained in the first article.

The preprocessing of HyperTalk scripts is, at this stage, minimal. The scripts, in the <script></script> tags, are split by line breaks. Similarly to SQL, scripts are commented out with --. When a line is prefixed this way, the command is identified as being disabled.

References to Other Cards

Subsequently, the relationships between stacks and cards, and between cards, are inferred from the analysis of the scripts. Currently, this analysis relies on regex pattern matching to identify the links between elements. As explained in the previous articles, this analysis is not granular: it does not consider the gameplay or the scripts as a whole. It suffices for a card to point to another to identify a link between them.

HyperTalk provides a convenient mechanism for determining when a card references another card. When they belong to the same stack, the script contains a go to card id {ID} (where “to” is optional). When a card refers to another stack, the pattern is go to card id {ID} of stack “{stack name}” or go to card “{card name}” of stack “{stack name}”.

There is also a more subtle push/pop card mechanism. When a card is pushed onto the stack (push command), it is stored in memory as a reference. A pop operation is, at least the way it has been used for Myst, equivalent to a go to card id {ID}, with the ID of the card that had been stored in memory. This duality characterizes two classes of relationships.

When the first card is pushed (push card), the second card just backtracks to the first.

However, when the first card pushes another card (push card id {ID} of stack “{stack name}”), a more complex transitive relationship emerges, making the player navigate from the first card to the pushed card through an intermediate card (which “pops” to the target card).

For instance, card 100894 from Channelwood Age contains this script:

. . .
push card id 46439 of stack "Myst"
go to card id 44018 of stack "Myst"
. . .

It pushes card 46439 from a different stack, then opens card 44018 belonging to the Myst Island.

The latter embeds the following script:

on mouseUp
htlock "nobw"
htlock true
vs up
pop card
end mouseUp

It just pops itself (line 5). But instead of backtracking to Channelwood Age, it goes to the card on the top of the stack: card 44018. Therefore, the path becomes:

Channelwood:100894 (push Myst:46439; go to Myst:44018) → Myst:44018 (intermediary card: library up; pop) → Myst:46439 (bookshelf).

The script analyzer also identifies the cards that feature pages using the following patterns:

put "{digit},A,0" into ALL_Page for blue pages (A is for Achenar)
put "{digit},S,0" into ALL_Page for red pages (S is for Sirrus)
put "Atrus" into ALL_Page for the white page

Last, if a card refers to a card that cannot be found, a new virtual card is created. This just means that the virtual card existed up until some point in the stack and was then deleted for the release of the game.

This initial approach may appear rudimentary but is well suited for the HyperTalk grammar and the way the Myst creators have used it. A welcome evolution would be to make the analysis more comprehensive to detect additional associations (typically capturing the navigation from page to page in a book, handled by image substitution). Also, perhaps, encode the gameplay constraints into the edges.

First Layer of Abstraction (or: Creation of the Proto Graph)

The last action of the parser is to transform stacks and cards into node objects that contain attributes (stack or card; contains colored pages; virtual) and edge objects, also containing attributes (intra-Age or cross-Ages; disabled; self-reference; etc.) and pointing to the two nodes they link. At this point, the resulting structure is not yet a proper graph but rather a preliminary representation of the data that will eventually be used to generate an actual graph.

Note that the parser ensures that the edges are unique. A card can refer several times to another card (i.e., clicking on different parts of the first card leads to the same target card). I decided, for simplification purposes, to merge edges having the same attributes and directionality. If DeMystify is updated to encode the gameplay into the edges, then it would make sense to unmerge these edges.

Also, the parser translates the properties of the cards and links into node and edge attributes. For a node, the base attribute is either stack or card. Then, if it contains a page, an additional attribute is the color of the page (can be cumulative: both blue and red). Edges can be disabled (they initially existed, but the developers disabled it by commenting it out in the script), notImplemented (the link connects to a node that refers to a card that does not exist in the stack), intraAge or crossAge (depending on whether they refer or not to a node from a different Age). An edge can have special attributes, such as a backtracking attribute (A to B, then B to A). Suppose the edge is restrictively transitive (see the figure on the right of the Transitivity section of article 1) because the transitivity relationship includes three nodes (A to B to C). In that case, the parser distinguishes between the “tail” of the transitivity and its “head”. The tail is the edge between the initial node and the intermediary node (A to B); the head is the edge between the intermediary node and the target node (B to C). This distinction is helpful for the rendering of the graph (no arrowhead on the tail section of the transitive edge).

Once this process is done, the parser returns an object containing a “bag” of nodes and edges. The next step involves transforming these components into a graph.

Second Layer of Abstraction: Creation of the Myst Graph

DeMystify leverages the Gonum library to create an instance of a graph. Due to the directionality of the edges, a directed graph is necessary. Moreover, since multiple edges can coexist between two nodes for a specific direction (even when identical edges are merged, they may differ in attributes) and self-loops should also be supported, a multigraph is required. Additionally, I had to consider backtracking edges, which should be excluded during distance calculations. To accomplish this, I utilized weighted edges with regular edges assigned a value of 1 and backtracking edges given an extremely large positive value (positive infinity).

To satisfy these constraints, after evaluating alternative solutions, I determined that multi.WeightedDirectedGraph from Gonum was the most suitable choice. This graph type effectively accommodates both directed and multigraph structures, as well as weighted edges with distinct properties.

Creating the weighted directed multigraph is straightforward, thanks to the parser’s preprocessing. Nodes are added as-is, while edges default to a weight of 1. On the other hand, backtracking edges are assigned to the math.Inf(1) value to distinguish them from regular edges. Disabled edges are skipped altogether and stored in a map for future rendering.

With the graph now created, we can analyze it.

Three types of analyses are performed: nodes, components, and paths.

Node analysis. The node analyzer identifies key nodes based on their in-degree and out-degree. It, therefore, identifies nodes with the most incoming edges (by iterating over the nodes, counting the number of in-degree edges, then comparing with the current maximum), with the most outgoing edges, source nodes (no incoming edges and a nonzero number of outgoing edges), sink nodes (no outgoing edges and a nonzero number of incoming edges), and isolated nodes (no outgoing nor incoming edges).
Components analysis. This analysis aims to identify the graph components (that is, subgraphs not connected to others). The general idea is to iterate over all nodes and, for each node, traverse (using BFS) the subgraph to which it belongs. By keeping a “map” (in fact, a slice, for making the outcome of the process deterministic) associating the subgraphs with the nodes that compose them, it is possible to identify the components.
Path analysis. This analysis examines two types of paths: the shortest paths and the longest shortest paths (i.e., most separated nodes). For the computation of the shortest path, Gonum provides a valuable resource: dijkstraFrom: “DijkstraFrom returns a shortest-path tree for a shortest path from u to all nodes in the graph g. If the graph does not implement Weighted, UniformCost is used”. This function uses the Dijkstra algorithm under the hood, taking into consideration the weighted edge values in the graph. As a result, backtracking edges are de facto excluded.

When implementing DeMystify, I experimented with other conceptual tools, such as the identification of bridges. But for the purpose of the initial version of the Myst Graph, I found that focusing on just these three dimensions would provide interesting information without making it too complex. It provides a way to categorize the nodes from a graph theory perspective, identify the isolated subgraphs, and compute varieties of paths. I would be very interested in the readers’ findings in addition to these dimensions.

At this point, the Graph Myst is analyzed. It is now time to visually render it.

Rendering the Graph

At the start of this project, I wanted to use the DOT language due to its “natural” affinity for representing graphs. Initially, I used the Gonum DOT library. Then, when things became too complex for this library, I assessed other libraries. Unfortunately, none of them were flexible enough to support the requirements: I had no choice but to develop a custom encoding mechanism using string manipulation techniques. This aspect of DeMystify lacks elegance and was a source of frustration during its implementation.

From the generated DOT object, I wanted to keep things simple for the first graph version, so I decided to generate a static file. At first, I tried generating PNG images, but that quickly became impractical because of the sheer size of the graph. Creating a PDF file proved to be a good solution, also having the benefit of being searchable. At first, I did it manually, using the DOT file as input. After experimenting with different CLI tools, I found that Neato generated an acceptable output. As for generating the DOT object, I wanted to use a library to generate the PDF à la Neato. Unfortunately, I did not find any: consequently and unfortunately, DeMystify must externally call the Neato binary.

There are several areas of improvement. First and ideally, the rendering process should be refactored to use libraries. This process would be unglamorous, as the behavior would remain unnoticeable, make DeMystify more robust. Second, it would be interesting to generate a dynamic graph. From a visual perspective, I would like to view alternative renderings from contributors. On Hacker News, a commenter suggested rendering the actual images as nodes. While I (really) like this suggestion, I wonder how feasible that would be, particularly from a legal perspective (Myst not being open source).

Conclusion

As it stands, the initial prototyping phase focused on quickly demonstrating a working graph representation, though significant improvements are needed. Currently, the project lacks unit tests and relies on external dependencies (manual HyperCard file conversion and binary calls), generating a static graph instead of a dynamic one, etc.

Because this is a personal project with limited time commitment, I am actively seeking contributions from the open-source community. Your expertise and passion could dramatically improve DeMystify. I believe that with collaborative effort, DeMystify can truly reach its full potential. A fourth article suggests possible directions for the project.

Please contact me at myst.graph@glthr.com to discuss potential collaborations, ask about implementation questions, or share your own generated graph (I would reference it in the fourth article with your permission). I am excited to see how the community will shape and evolve this project: feel free to fork the repository and contribute your ideas.

The Myst Graph, 3: Creating the Graph Using DeMystify