Saturday, February 11, 2012

On Map Size (or: I accidentally giant worlds)

When it comes to deciding how large you want your game world to be, the Roguelike format offers, as in so many areas, an almost overwhelming selection of choices.
Historically, Rogue and it's early descendants such as (Net)Hack confined their maps to the typical terminal screen size of 80x20, whereas many of the more recent RL's offer scrolling maps, whose size may vary from a few terminal sizes to technically infinite.

The advantage of having a small map of 80x20 is obvious. You can generate a level on the fly with litte or no delay, you can store the entire level to file and retrieve it later and you don't need much memory on runtime. It's also a very simple and straightforward implementation, and pretty much sufficient for a reasonable dungeon level.

The next step would be a scrolling dungeon map of, say, 150x100. A fair bit more to explore per level, but still condensed enough to be fun. Angband is a good example of this. The maps are commonly still very much run of the mill dungeon maps (a bit streched out), but the programmer gains a fair amount of flexibility at the cost of only slightly more complex code. For example: If you don't limit your map size to the size of the "viewport", the actual area of map you see on screen at any time, you can free up screen real estate for the UI by reducing the viewport size. If you keep with the practice of generating the entire level map when the player first enters the dungeon level, all that's required for this is some tweaks to the render() call to only render the stuff that's in the viewport.

This approach of "front-loading" the map generation, keeping the entirety of it in memory at all times and rendering only the visible tiles has the advantage of being blazing fast at runtime (since there are no time-consuming CPU or HDD I/O operations), but there is of course a limit to it - the memory size.



Arguably, if you could limit memory usage per tile to a byte, on a modern system with, say, 4 GB of RAM that's a whopping 63245x63245 tiles until your memory is full. But that's pretty unrealistic:
Firstly, consider the scenario in which you'd need a single large map. First thing that comes to mind would be a large, hopefully diverse and interesting world with different biomes and geographical features etc.. Obviously, if the world is flat and has only one z-level, you ain't gonna get any geographical features whatsoever. Also consider that with storing information relating to biomes, temperature and humidity, you'll likely need more than a byte for every tile, depending on how you choose to store your information.
Assuming this more realistic scenario of taking maybe 10 bytes per tile and having about 100 z-levels, you end up filling the 4GB RAM with just 2000x2000x100 tiles. 2000 tiles on each side of the world is... not that impressive.
Secondly, if you want a persistent world (and I assume you would, because, hey, whats the point of making a giant map if you don't get to explore it properly?) and you just write the memory into a file when saving, you'd be greedily claiming a fairly sizeable chunk of your user's hard drive (Okay, you could compress it, but then it's going to take even longer to save and load).

So, how CAN you handle truly large worlds? 

Disclaimer: The following is a description of how I handle the pseudo-infinite maps in SH2RL. It took me about a week of tinkering to come with this system, so I claim in no way that this is the best or even a very good system. But it's pretty robust so far and build to scale nicely (upwards).

Chop it up! By dividing the map into smaller chunks or "cells" (as I've chosen to call them in my code), you limit the amount of data in memory to what you actually need.
In SH2RL, the player is always inside a cell, which is in turn surrounded by another cell in every direction. That means that 27 (3x3x3) cells total are loaded. Currently the cell size is 200x200x20, that means that an area of 600x600x60 tiles is present in memory at any given time. I currently use a ushort (or UInt16) structure to represent each tile (in lieu of any biomes/djikstra maps/etc), which is 16 bit or 2 bytes, so total memory usage for tiles only comes to ~43 MB. That's pretty reasonable.


Now on to the generation. Generating an infinite world on startup would, by definition, take an infinite amount of time (and disk space). So, not really practical. BUT - You've probably heard the term "map seed" associated with Roguelikes (or Minecraft). While they are also used with the approach I explained above, they really shine with infinite worlds.
A map seed is commonly the seed used to initialize the PRNG (Pseudo-Random Number Generator). The PRNG is an integral part of any map generator. It assigns a seemingly random number to a set of coordinates, based on it’s seed. This association is consistant, and that’s the really important thing. If I give the PRNG a set of coordinates and a fixed seed, it will always return the same number for the same coordinates. The PRNG is also used for noise generation (Perlin or Simplex), which is also an important component in map generation. For SH2RL I use the libtcod default PRNG, a Complementary-Multiply-With-Carry generator, which is faster and “more random” (trust me, I’m a professional *cough*) than the old favorite Mersenne Twister-PRNGs.

As I said above, the player is always surrounded by cells. As soon as he enters a different one than he was in the previous turn, the Map object (which holds everything currently in memory) shifts the loaded cells in the appropriate direction and loads new ones if necessary.
The “skeleton” of the cells, the tilemaps, i.e. the actual tile-by-tile maps of the cells are then generated on the fly. Every tile is defined exactly by its coordinates in the 3D-space and for every tile, the PRNG (or more accurately, the noise function) generates a number from the coordinates and the map seed, which then defines the characteristics of the tile.
Therefore, the program creates the exact same cell every time it is loaded/generated again, which means I don’t have to save anything but the map seed (and the cell coordinates), and still have a persistent world to explore.

However, this system alone has an imporant limitation. It doesn’t allow the player to change tiles, because once the cell is unloaded and (re)generated again, any previous changes are just ignored. So, to allow for advanced player-map interaction, changes to tiles are stored in the programs/profiles SQLite DB, retrieved after the cell’s generation when they override the generated tiles.

This approach of keeping only the necessary tiles in memory, generating the bulk of a cell on demand and storing only the changes is the database is memory-friendly, consumes a minimum of hard drive space and delegates most of the work to the CPU. To reduce the amount of lag when entering a new cell, the cell generation is multithreaded and asynchronous, which means that for ~5-7 seconds after entering a new cell, movement/rendering is a bit stuttered (up to 150-200ms/frame from ~35), but there is no huge “lockup” every time a new cell is entered.
It still lacks some security features (because it’s theoretically possible to “see” a tile that hasn’t been loaded yet, it sometimes crashes the program), and it’s still rather slow on older single-core systems, but it works for now.



Thursday, January 26, 2012

Improving cell loading times by being not stupid

A very important function for every roguelike is the one that retrieves the character/fore color/back color of a tile, which could be saved in a large instance of a Tile-class, or, in the case of SH2RL, in a byte which represents a certain large instance of a Tile-class  ("conversion" is handled with a Byte - Tile dictionary). This function is called several thousand times in the rendering pipeline. In SH2RL's case with a default viewport of 100*80, a whopping 6076 times! (I actually had to reduce the viewport's size, because with 150*80 and 12000 calls the rendering took more than 100ms, which was noticeably "laggy").

While implementing the "GetTileFromCell"-function the first time I just hacked in the first thing that I thought of:

1. Retrieve the unique ID of the cell at the given absolute coordinates (abs_x, abs_y, abs_z)
2. Loop through the three-dimensional array of cells currently loaded and look for an ID match
3. Once the match is found, call the cells "GetTile"-function and return it's value

Now, when I tested the "game" with this function, the following occurred:



Notice that the drawing time was ~62ms if the player was in the upper left corner of the "beige" cell (which is pretty damn long already) but with ~114ms almost 100% higher if the player was in the lower right corner of the "light green" cell, and consequently unacceptably long.
I immediately suspected my GetTileFromCell-function was to blame for this wierd discrepancy in drawing time, because it was basicially the only function called in the rendering process (except for the actual printing function).
This is how the function looked:

private Tile getTileFromCells(int abs_x, int abs_y, int abs_z)
{
    Cell relevantCell;
    int relCellID;


    relCellID = wm.GetCellIDFromCoordinates(abs_x, abs_y, abs_z);


    for (int x = 0; x < 3; x++)
    {
        for (int y = 0; y < 3; y++)
        {
            for (int z = 0; z < 3; z++)
            {
                if (cells[x, y, z].CellID == relCellID)
                {
                    relevantCell = cells[x, y, z];

                    return relevantCell.GetTile(abs_x, abs_y, abs_z);
                }
            }
        }
    }

    return null;
}

As you can see, if the coordinates in question were, say, in the cell at [1,1,1], this function would have to loop though 13 other cells first. Also, the "GetCellIDFromCoordinates" probably took some time as well. Normally, this would not be a problem and the function totally sufficient (if ineffective), but if you call it several thousand times per second, every bit of performance counts.

After some thinking (and facepalming) I came up with this function:

private Tile getTileFromCells(int abs_x, int abs_y, int abs_z)
{
    int x = 0, y = 0, z = 0;
    x = (int)((abs_x - cells[0, 0, 0].X) / WorldMap.CELL_WIDTH);
    y = (int)((abs_y - cells[0, 0, 0].Y) / WorldMap.CELL_HEIGHT);
    z = (int)((abs_z - cells[0, 0, 0].Z) / WorldMap.CELL_DEPTH);

    return cells[x, y, z].GetTile(abs_x, abs_y, abs_z);
                     
}        

The solution was really damn simple (As is often the case when you re-evaluate your code. When you actually write it, it's always the most complicated shit you come up with). All it does is substract the absolute coordinates of the leftmost (X), uppermost (Y), lowest (Z) cell from the requested absolute coordinates (which results in the coordinates relative to the current Map) and then divides the results by the width/height/depth of a cell. This produces the index of the cell holding the tile at the requested coordinates. After implementing this, the performance has doubled (for top left of cell: 60ms -> 30ms) to quadrupled (for bottom right of cell 120ms -> 30ms) and is no longer dependant on the players position.

SUCCESS!





First time I felt the need to do an awkward victory dance, which is not documented here. Probably for the better.