How to Read vgagrX.dat files and groundXo.dat files --------------------------------------------------- document by ccexplore clarification on 4-bit bitmaps by Simon This document assumes that you know about the general .DAT decompression algorithm. The vgagrX.dat files are compressed with that, so the first step you'd want to take is to decompress it. You will end up with 2 sections of decompressed data. The groundXo.dat files are not compressed. groundXo.dat ------------ This file is vital to intepreting vgagrX.dat, since groundXo.dat describes the characteristics of the bitmaps contained in vgagrX.dat. You will note that all groundXo.dat have the same size. They all follow this organization: OBJECT_INFO[16] TERRAIN_INFO[64] PALETTES In other words, first comes 16 slots for describing 16 objects. The object IDs in the .LVL file is really referring to this array. Then comes 64 slots for describing 64 terrain pieces. Again, the terrain IDs in the .LVL file is really referring to this array. Finally comes a bunch of palettes. I'll now describe each chunk in detail: OBJECT_INFO ----------- An OBJECT_INFO slot is 28 bytes in size, so the OBJECT_INFO array takes up a total of 28*16 = 448 bytes. The structure of a single OBJECT_INFO is as follows: WORD animation_flags; BYTE start_animation_frame_index; BYTE end_animation_frame_index; BYTE width; BYTE height; WORD animation_frame_data_size; WORD mask_offset_from_image; WORD unknown1; WORD unknown2; WORD trigger_left; WORD trigger_top; BYTE trigger_width; BYTE trigger_height; BYTE trigger_effect_id; WORD animation_frames_base_loc; WORD preview_image_index; WORD unknown3; BYTE trap_sound_effect_id; First of all, a word on WORD and BYTE in groundXo.dat. Unlike the .LVL file format, WORDs in groundXo.dat are stored little-endian. This means that the lower byte is stored before the upper byte, in reverse to the way you'd normally write down the value in binary or hex. For example, the decimal value 4660 when written in hex is 0x1234. When stored as 2 bytes in little-endian format, the 0x34 bytes comes first, followed by 0x12. This little-endian format is native to the Intel instruction set, so if you read the bytes in this order directly into a 16-bit int variable, the int variable will contain the correct value. So, an explanation of each field. Actually I haven't verified many of them, so some of the information below might be incomplete or inaccurate, but anyhow: width, height, animation_frames_base_loc, animation_frame_data_size, mask_offset_from_image, start_animation_frame_index, end_animation_frame_index, preview_image_index: I have to explain all these fields at once because they are interrelated. Width and height of course are the width and height of a single animation frame bitmap. animation_frames_base_loc is the offset from the start of the data section in vgagrX.dat which contains the bitmaps for objects. Remember that the object bitmaps are always in the second data section of vgagrX.dat. animation_frames_base_loc points to the start of all the bitmap data for the object in question. animation_frame_data_size is how many bytes in individual animation frame bitmap takes. To be more precise, it specifies the offset in bytes from one animation frame bitmap to the next one. In other words, the first animation frame is the animation_frame_data_size number of bytes starting from animation_frames_base_loc. The second animation frame is the animation_frame_data_size number of bytes starting from animation_frames_base_loc + 1 * animation_frame_data_size. The third animation frame is the animation_frame_data_size number of bytes starting from animation_frames_base_loc + 2 * animation_frame_data_size. And so forth. start_animation_frame_index and end_animation_frame_index refers to which frame to start off the animation from, and end_animation_frame_index refers to the last animation frame of the object. Animation frames are numbered from 0 up, so end_animation_frame_index in effect tells us how many animation frames total there are for this object. For continous looping animation, when you get to the last frame of animation it goes back to animation frame #0. For triggered animations, when you get to the last frame of animation, on the next frame it will display the start_animation_frame_index-th frame (I think; or maybe it's just frame #0) and animation stops there until it's triggered again. preview_image_index points to the animation frame used when rendering the level preview you see at the level info screen you get right (the one that also tells you the title, how many out and % to save) before the game starts the level. (It is a full-size animation frame despite being used for the level preview; the level preview shrinks everything down automatically.) Finally, mask_offset_from_image. Each frame of animation really comprises of 2 bitmaps. One is the 16-color bitmap you expect. However, in order to represent the concept of transparent pixels, a separate monochrome mask bitmap is also necessary. The mask bitmap specifies which pixels are really the object's pixels, and which pixels are instead the background's pixels and so should be transparent. Each 1 bit in the mask stands for a solid pixel, and each 0 means transparent background. But anyway, the mask_offset_from_image tells you the offset from the start of whatever animation frame you are in to get the corresponding mask. So for example, for frame #0, the 16-color bitmap is located at animation_frames_base_loc, so the corresponding mask for frame #0 is at animation_frames_base_loc + mask_offset_from_image. Similarly, for frame #5, the 16-color bitmap is located at animation_frames_base_loc + 5 * animation_frame_data_size, so the corresponding mask bitmap for frame #5 is at animation_frames_base_loc + 5 * animation_frame_data_size + mask_offset_from_image. With these fields, you now have more than enough information to display the objects of a level for level-editor purposes. The remaining fields contains additional information the game uses: animation_flags: Most of this WORD seems to be unused, and I _think_ the unused bits are all 0s. Bit 0 appears to be set to 0 for animations that loops continously (eg. water, the coal pits, the spinning trap of death), and to 1 for animations that only plays when the object is "triggered" in some fashion (eg. most of the traps). I'm not sure what it does for objects that don't animate, but I suppose objects that don't animate can be interpreted as a continous loop animation with a single animation frame. I think bit 1 might also do something but I'm not sure. trigger_left, trigger_top, trigger_width, trigger_height, trigger_effect_id: These parameters describes the rectangular trigger area for the object and the trigger effect. The trigger effect of the object only takes place when the lemming is within the rectangular trigger area. Trigger effects can be things like "exit", "drowned", or "death-by-trap". You'd notice for example that to exit, it is not enough for the lemming to merely reach the edges of the exit bitmap, he has to go to somewhere in the middle before the "exit" effect takes place. The trigger area of the exit object is the mediator of where within the object bitmap the effects of the object take place. That being said, beware that the numbers are encoded slightly. So if I use left, top, width and height to denote the true location and dimensions of the trigger area, this is how they relate: left = trigger_left * 4 top = trigger_top * 4 - 4 width = trigger_width * 4 height = trigger_height * 4 At least I think this is how they work, although I'm slightly unsure about the "- 4" in the formula for top. Note that the location (left, top) is relative to the position of the top-left corner of the object's bitmap. Actually, to be more precise, it is relative to the position of the top-left corner of the object's bitmap with the x and y coordinates of the object's location rounded down to the closest multiple of 4. For objects the x coordinate needs to be a multiple of 8 anyway, but you can specify any y coordinates. This is why in LemEdit you'll find that sometimes the exits "don't work" unless you lower its position--this happens if Y is not a multiple of 4. The problem stems from the fact that the game uses a map with only a resolution of 4 pixels to keep track of the various trigger areas. One odd thing about the trigger area is that you cannot actually specify a trigger_width and trigger_height of 0. 0 in those fields are actually treated as a value of 256 it appears. Instead, if you want to specify a null trigger area, you specify a small trigger area (say 4x4), and set the trigger_effect_id to a value that corresponds to "no effect". Indeed, for all objects that are merely decorative, the values are always set as follows: trigger_left = 0, trigger_top = 0, trigger_width = 1, trigger_height = 1, trigger_effect_id = 0 Finally, the trigger_effect_id. This indicates the effect imposed on the lemming who moves into the trigger area of the object. It can be of the following values: 0: no effect 1: exits level 4: triggered trap 5: drown 6: immediate disintegration 7: one-way wall, left 8: one-way wall, right 9: steel area (not actually used by any objects) "triggered trap" refers to the class of traps where the trap does not animate continously, but rather only animates when a lemming triggers it. This includes things like the beartrap, the 10-ton trap, etc. The key characteristic of this trap is that while the trap animation is playing, other lemmings can pass through the trap unharmed. Also of note is that the trap animation graphics actually includes the lemming itself. So in fact the game immediately removes the killed lemming from the game when the trap is triggered. The lemming you see while the death animation is playing is actually not the real one, but just part of the animation graphics of the object. "immediate disintegration" refers to the other type of traps where the lemming "disintegrates" upon touching the trigger area. The main difference being that the death animation is not part of the object animation, and that more than 1 lemming can be killed by the trap at once. This type of trap include things like the spinning-trap-of-death in the "pink" graphics set, the fire-shooting traps in the "hell" graphics set, the coal pit trap in the "hell" graphic set, etc. "drown" of course refers to the water and lava traps, where upon touching the trigger area, the lemming is drowned. The logical effect is basically the same as "immediate disintegration", just that the death animation is different. You might note that there isn't an "entrance" effect, what gives? Actually if you think about it, entrances don't follow the "trigger effect" model. Entrances don't take effect by having a lemming go through it. So in fact the game hard-codes entrances to always be the first object id in any graphics set, and the location within the entrance bitmap where the lemmings come out from is also hardcoded (but unfortunately I don't have the values at hand). In the graphics sets the game supplies, the entrance object's trigger_effect_id is set to 0. trap_sound_effect_id: This describes the sound effect played when a trap of trigger_effect_id = 4 is triggered. It does not apply for other trigger_effect_id's: the game takes over for objects whose trigger_effect_id is not 4 and so you don't get to select the sound effect. Here are a list of values (in hex) of the sound effects recognized by PC Lemmings/CustLemm: 00 = no sound 01 = skill select (the sound you get when you click on one of the skill icons at the bottom of the screen) 02 = entrance opening (sounds like "boing") 03 = level intro (the "let's go" sound) 04 = the sound you get when you assign a skill to lemming 05 = the "oh no" sound when a lemming is about to explode 06 = sound effect of the electrode trap and zap trap, 07 = sound effect of the rock squishing trap, pillar squishing trap, and spikes trap 08 = the "aargh" sound when the lemming fall down too far and splatters 09 = sound effect of the rope trap and slicer trap 0A = sound effect when a basher/miner/digger hits steel 0B = (not sure where used in game) 0C = sound effect of a lemming explosion 0D = sound effect of the spinning-trap-of-death, coal pits, and fire shooters (when a lemming touches the object and dies) 0E = sound effect of the 10-ton trap 0F = sound effect of the bear trap 10 = sound effect of a lemming exiting 11 = sound effect of a lemming dropping into water and drowning 12 = sound effect for the last 3 bricks a builder is laying down And finally there are the "unknownX" words. I don't believe they are used in the EGA/VGA versions of the game, but might be involved in the Tandy and CGA versions of the game. So I didn't really bother investigating those fields, but I've noted that: unknown1 always seem to = mask_offset_from_image unknown2 always seem to = unknown1 / 2 Not all 16 slots of OBJECT_INFO are used. An unused slot is set to all 0s. You can probably detect an unused slot if the bitmap width or height is 0. TERRAIN_INFO ------------ A TERRAIN_INFO is 8 bytes, so the TERRAIN_INFO array takes up a total of 8*64 = 512 bytes. The structure of a TERRAIN_INFO slot is as follows: BYTE width; BYTE height; WORD image_loc; WORD mask_loc; WORD unknown1; Thankfully, since terrain pieces are basically just a static bitmap, there are far less metadata associated with them as compared with OBJECT_INFO. width and height should be self explanatory. image_loc and mask_loc refers to the offset from the start of the data section. The data section in question is the second one in vgagrX.dat (recall that the second one contains the object's bitmaps). At offset image_loc is the 16-color bitmap for this terrain piece, and at offset mask_loc is the mask bitmap for this terrain piece. unknown1 is probably related to the CGA/Tandy graphics which we don't care about. Again, not all 64 slots of TERRAIN_INFO are used. An unused slot is set to all 0s, and you can probably detect an unused slot if width or height is 0. PALETTES -------- Because the game runs in a 320x200x16 color graphics mode, to get decent graphics it needs to modify the palette to a custom set of colors tailored for the graphics set in question. For example, to get all those shades of blue in the "crystal" graphics set, many of the palette entries will have to be set to shades of blue, and similarly for the other graphics sets where you'd notice a dominate color. (Indeed, it's probably because of the limit of 16 colors that cause each graphics set to tend towards a particular dominate color.) It turns out CGA and Tandy graphics modes cannot use palettes, so the palettes contained in groundXo.dat are for the EGA and VGA modes. An ega palette entry takes up a single byte, while a vga palette entry takes up 3 bytes. I'll notate the types as EGA_PAL_ENTRY and VGA_PAL_ENTRY respectively. (I'll explain how to interpret a palette entry later.) And so the structure of the PALETTES section goes as follows: EGA_PAL_ENTRY ega_custom[8]; EGA_PAL_ENTRY ega_standard[8]; EGA_PAL_ENTRY ega_preview[8]; VGA_PAL_ENTRY vga_custom[8]; VGA_PAL_ENTRY vga_standard[8]; VGA_PAL_ENTRY vga_preview[8]; First of all, the concept of a palette goes as follows. As a 16-color mode, each pixel has a value from 0 to 15. This value indexes into a 16-entry physical palette. The physical palette specifies the actual color to display corresponding to each of 0 to 15. So although you only get to display 16 distinct colors at once, you get some flexibility regarding the set of colors to use. But it turns out the Lemmings game imposes further restrictions on the palette. It splits the 16-entry physical palette into two halves. The lower half, which corresponds to pixel values 0 to 7, are fixed by the game and always use a fixed set of colors. This is even though the PALETTES section appears to contain a ega_stanard[8] and a vga_standard[8]; those palette entries are actually not used by the game, as far as I know. The upper half corresponding to pixel values 8 to 15, are the ones that can vary from graphics set to graphics set, because the values used are read from entries in the PALETTES section. In particular, suppose we're in VGA mode (the story is analogous for EGA mode). Then when rendering the level preview, the game uses the 8 entries in the vga_preview array for the upper-half of the physical palette. When rendering the actual level (eg. what you'd expect to see in a level editor), the game uses the 8 entries in the vga_custom array instead. Although in practice, I believe in all the graphics sets the game supplies, vga_custom and vga_preview are identical. Finally, how to intepret a palette entry. First of all, I assume you understand the concept of RGB for specifying a color. If not look it up. In a VGA_PAL_ENTRY, the first byte specifies the red component, the second byte the green, and the third byte the blue. Although a byte is used for each color component, only the lower 6 bits of each byte are actually used. This is because the physical hardware of VGA only supports that many bits per color component. So (0x3F, 0x00, 0x00) gives you the brightest red you can get, (0x00, 0x3F, 0x00) the brightest green, and (0x00, 0x00, 0x3F) the brightest blue. (0x3F, 0x3F, 0x3F) would be the brighest white. Since nowadays graphics cards are capable of 24-bit color, the conversion from a VGA_PAL_ENTRY to a 24-bit RGB color (where each color component gets 8 bits instead of only 6) is simply 6-bit-component-value * 255 / 63 for each color component. (This sentence mainly applies only if you're planning to write a Windows version of the level editor.) In a EGA_PAL_ENTRY, you get only 2 bits per color component. So only the lower 6 bits of the 8 bits in a EGA_PAL_ENTRY are meaningful. The bits are organized as follows: (unused) (unused) RH GH BH RL GL BL. "RH" means the higher bit of the red component, "RL" means the lower bit of the red component, etc. So for example, the 4 levels of red from brighest to darkest (with darkest being actually the color black rather than red) would have the values 0x24 0x20, 0x04, and 0x00 respectively (translate them into binary to see how they involve the RH and RL bits). Ah yes, before I forgot, here is the fixed, lower half of the physical palette the game uses, specified in the VGA_PAL_ENTRY format: {0x00, 0x00, 0x00}, /* black */ {0x10, 0x10, 0x38}, /* blue, used for the lemmings' bodies */ {0x00, 0x2C, 0x00}, /* green, used for hair */ {0x3C, 0x34, 0x34}, /* white, used for skin */ {0x2C, 0x2C, 0x00}, /* dirty yellow, used in the skill panel */ {0x3C, 0x08, 0x08}, /* red, used in the nuke icon */ {0x20, 0x20, 0x20}, /* gray, used in the skill panel */ (variable) Ok, so I lied slightly that the the lower half is fixed. In actuality, only the first 7 entries are fixed. But the last one is not copied from vga_standard[7] as you'd expect. Instead, I believe the game always copies from vga_preview[0] or vga_custom[0] (whichever is used in the context of question) to the "variable" entry. So you don't get an additional color to specify, since it is just a duplicate of one of your pickable colors. An interesting thing to note about the "variable" color above, is that it is the color used to render that mini-map you get at the lower-right corner when you are playing a level, and also for rendering the bricks of a builder. VGAGRx.DAT ---------- Recall once more that the first section of decompressed data is for the bitmaps and masks of the terrain pieces, while the second section is for the bitmaps and masks of the interactive objects. The bitmap is a 4-bit planar bitmap, and the mask is a monochrome bitmap. The width, height, and location of all bitmaps are specified in GROUNDxo.DAT as explained before. These 4-bit bitmaps, i.e. 16 color bitmaps, are stored component-wise. There are 4 monochrome bitmaps next to each other for one 16-color bitmap. First comes a monochrome bitmap that describes the first bit of each pixel, then another monochrome bitmap which describes all second bits, etc. These single-bit bitmaps are stored in little endian order, i.e. the first bitmap adds 1 to each color, the second 2, then 4, then 8. This means that the bits for a single 4-bit pixel are scattered throughout the whole bitmap data. Here's a curiosity about the terrain graphics. The 16-color bitmaps for the terrain only make use of the graphic-set-specific 8 colors, numbered 8 through 15. Since the terrain uses a color >= 8 for each solid pixel and color 0, i.e. black, for air, the last plane (that adds 8 to a pixel) is equal to the mask plane. If you actually look at the terrain metadata in groundX.dat, you will see that the mask location is indeed identical to the last plane's location. E.g. the crystal set's terrain piece 0 has a size of 0x20 times 0x20, which means its 4-bit bitmap takes 0x20 * 0x20 * 4 bits = 0x1000 bits = 0x200 bytes of storage space, but the mask location starts only 0x180 bytes behind the image location's begin. I believe the game nonetheless doesn't infer the mask from the black pixels, but instead reads the mask information independently. I also believe that the game reads all four planes to get the color of a terrain piece, i.e. I don't think it reads just the first three planes and then adds 8 to each solid pixel. After all, it must read four-plane bitmaps anyway when making the special objects; they use all 16 colors. It's possible that the VGAGRx.DAT may contain additional bitmaps not related to interactive objects or terrain pieces, but I don't know much about that. Experiment and find out whether there are any gaps and if so whether they contain any interesting graphics. Note that just because there are additional graphics doesn't necessarily mean that the game will use them. MAIN.DAT probably contains the remaining graphics the game actually uses, but I currently don't know anything about that file. However, if there are any bitmaps in there, I can guarantee you they will be stored as planar bitmaps, or as monochrome bitmaps if they are masks (eg. the masks used for digging, bashing, explosion, etc.). So with some experimentation you might be able to discover some of the other bitmaps the game uses.