Listing the code pages for gdb ‘set target-charset’

August 21, 2020 C/C++ development and debugging. No comments , , , , , ,

I wanted to display some internal state as an IBM-1141 codepage, but didn’t know the name to use.  I knew that EBCDIC-US could be used for IBM-1047, but gdb didn’t like ibm-1147:

(gdb) set target-charset EBCDIC-US
(gdb) p (char *)0x7ffbb7b58088
$2 = 0x7ffbb7b58088 "{Jim       ;012}", ' ' <repeats 104 times>
(gdb) set target-charset ibm-1141
Undefined item: "ibm-1141".

I’d either didn’t know or had forgotten that we can get a list of the supported codepages. The help shows this:

(gdb) help set target-charset
Set the target character set.
The `target character set' is the one used by the program being debugged.
GDB translates characters and strings between the host and target
character sets as needed.
To see a list of the character sets GDB supports, type `set target-charset'<TAB>

I had to hit tab twice, but after doing so, I see:

(gdb) set target-charset 
Display all 200 possibilities? (y or n)
1026               866                ARABIC7            CP-HU              CP1129             CP1158             CP1371             CP4517             CP856              CP903
1046               866NAV             ARMSCII-8          CP037              CP1130             CP1160             CP1388             CP4899             CP857              CP904
1047               869                ASCII              CP038              CP1132             CP1161             CP1390             CP4909             CP860              CP905
10646-1:1993       874                ASMO-708           CP1004             CP1133             CP1162             CP1399             CP4971             CP861              CP912
10646-1:1993/UCS4  8859_1             ASMO_449           CP1008             CP1137             CP1163             CP273              CP500              CP862              CP915
437                8859_2             BALTIC             CP1025             CP1140             CP1164             CP274              CP5347             CP863              CP916
500                8859_3             BIG-5              CP1026             CP1141             CP1166             CP275              CP737              CP864              CP918
500V1              8859_4             BIG-FIVE           CP1046             CP1142             CP1167             CP278              CP770              CP865              CP920
850                8859_5             BIG5               CP1047             CP1143             CP1250             CP280              CP771              CP866              CP921
851                8859_6             BIG5-HKSCS         CP1070             CP1144             CP1251             CP281              CP772              CP866NAV           CP922
852                8859_7             BIG5HKSCS          CP1079             CP1145             CP1252             CP282              CP773              CP868              CP930
855                8859_8             BIGFIVE            CP1081             CP1146             CP1253             CP284              CP774              CP869              CP932
856                8859_9             BRF                CP1084             CP1147             CP1254             CP285              CP775              CP870              CP933
857                904                BS_4730            CP1089             CP1148             CP1255             CP290              CP803              CP871              CP935
860                ANSI_X3.110        CA                 CP1097             CP1149             CP1256             CP297              CP813              CP874              CP936
861                ANSI_X3.110-1983   CN                 CP1112             CP1153             CP1257             CP367              CP819              CP875              CP937
862                ANSI_X3.4          CN-BIG5            CP1122             CP1154             CP1258             CP420              CP850              CP880              CP939
863                ANSI_X3.4-1968     CN-GB              CP1123             CP1155             CP1282             CP423              CP851              CP891              CP949
864                ANSI_X3.4-1986     CP-AR              CP1124             CP1156             CP1361             CP424              CP852              CP901              CP950
865                ARABIC             CP-GR              CP1125             CP1157             CP1364             CP437              CP855              CP902              auto
*** List may be truncated, max-completions reached. ***

There’s my ibm-1141 in there, but masquerading as CP1141, so I’m able to view my data in that codepage, and lookup the value of characters of interest in 1141:

(gdb) set target-charset CP1141
(gdb) p (char *)0x7ffbb7b58088
$3 = 0x7ffbb7b58088 "äJim       ;012ü", ' ' <repeats 104 times>
(gdb) p /x '{'
$4 = 0x43
(gdb) p /x '}
Unmatched single quote.
(gdb) p /x '}'
$5 = 0xdc
(gdb) p /x *(char *)0x7ffbb7b58088
$6 = 0xc0

I’m able to conclude that the buffer in question appears to be in CP1047, not CP1141 (the first character, which is supposed to be ‘{‘ doesn’t have the CP1141 value of ‘{‘).

gdb pretty print of structures

March 9, 2017 C/C++ development and debugging. No comments , , , ,

Here’s a nice little gdb trick for displaying structure contents in a less compact format

(gdb) set print pretty on
(gdb) p dd[0]
$4 = {
  jfcb = {
    datasetName = "PJOOT.NVS1", ' ' <repeats 34 times>,
    vols = {"<AAAiW", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000"},
  block_size = 800,
  device_class = 32 '\040',
  device_type = 15 '\017',
  disp_normal = 8 '\010',
  disp_cond = 8 '\010',
  volsers = 0x7fb71801ecd6 "<AAAiW",

compare this to the dense default

(gdb) set print pretty on
(gdb) p dd[0]
$5 = {jfcb = {datasetName = "PJOOT.NVS1", ... vols = {"<AAAiW", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000", "\000\000\000\000\000"}, block_size = 800, block_size_limit = 0, device_class = 32 '\040', device_type = 15 '\017', disp_normal = 8 '\010', disp_cond = 8 '\010', volsers = 0x7fb71801ecd6 "<AAAiW"}

For really big structures (this one actually is, but I’ve pruned a bunch of stuff), this makes the structure print display a whole lot more readable. Additionally, if you combine this with ‘(gdb) set logging on’, then with pretty print enabled you can prune the output by line easily to see just what you want.

peeking into relocation of function static in shared library

February 27, 2017 C/C++ development and debugging. 2 comments , , , , , , , ,

Here’s GUI (TUI) output of a function with a static variable access:

B+ |0x7ffff7616800 <st32>           test   %edi,%edi                                                                                                                     |
   |0x7ffff7616802 <st32+2>         je     0x7ffff7616811 <st32+17>                                                                                                      |
   |0x7ffff7616804 <st32+4>         mov    %edi,%eax                                                                                                                     |
   |0x7ffff7616806 <st32+6>         bswap  %eax                                                                                                                          |
   |0x7ffff7616808 <st32+8>         mov    %eax,0x200852(%rip)        # 0x7ffff7817060 <st32.yst32>                                                                        |
   |0x7ffff761680e <st32+14>        mov    %edi,%eax                                                                                                                     |
   |0x7ffff7616810 <st32+16>        retq                                                                                                                                 |
  >|0x7ffff7616811 <st32+17>        mov    0x200849(%rip),%edi        # 0x7ffff7817060 <st32.yst32>                                                                        |
   |0x7ffff7616817 <st32+23>        bswap  %edi                                                                                                                          |
   |0x7ffff7616819 <st32+25>        mov    %edi,%eax                                                                                                                     |
   |0x7ffff761681b <st32+27>        retq                                                                                                                                 |
   |0x7ffff761681c <_fini>          sub    $0x8,%rsp                                                                                                                     |
   |0x7ffff7616820 <_fini+4>        add    $0x8,%rsp                                                                                                                     |
   |0x7ffff7616824 <_fini+8>        retq                                                                                                                                 |
   |0x7ffff7616825                  add    %al,(%rcx)                                                                                                                    |
   |0x7ffff7616827 <x16+1>          add    (%rcx),%al                                                                                                                    |

The associated code is:

int st32( int v ) {
    static int yst32 = 0x1a2b3c4d;

    if ( v ) {
        yst32 = v;

    return yst32;

The object code dump (prior to relocation) just has zeros in the offset for the variable:

$ objdump -d | grep -A12 '<st32>'
0000000000000050 <st32>:
  50:   85 ff                   test   %edi,%edi
  52:   74 0d                   je     61 <st32+0x11>
  54:   89 f8                   mov    %edi,%eax
  56:   0f c8                   bswap  %eax
  58:   89 05 00 00 00 00       mov    %eax,0x0(%rip)        # 5e <st32+0xe>
  5e:   89 f8                   mov    %edi,%eax
  60:   c3                      retq   
  61:   8b 3d 00 00 00 00       mov    0x0(%rip),%edi        # 67 <st32+0x17>
  67:   0f cf                   bswap  %edi
  69:   89 f8                   mov    %edi,%eax
  6b:   c3                      retq   

The linker has filled in the real offsets in question, and the dynamic loader has collaborated to put the data segment in the desired location.

The observant reader may notice bwsap instructions in the listings above that don’t make sense for x86_64 code. That is because this code is compiled with an LLVM pass that performs byte swapping at load and store points, making it big endian in a limited fashion.

The book Linkers and Loaders has some nice explanation of how relocation works, but I wanted to see the end result first hand in the debugger. It turned out that my naive expectation that the sum of $rip and the constant relocation factor is the address of the global variable (actually static in this case) is incorrect. Check that out in the debugger:

(gdb) p /x 0x200849+$rip
$1 = 0x7ffff781705a

(gdb) x/10 $1
0x7ffff781705a <gy+26>: 0x22110000      0x2b1a4433      0x00004d3c      0x00000000
0x7ffff781706a: 0x00000000      0x00000000      0x00000000      0x30350000
0x7ffff781707a: 0x20333236      0x64655228

My magic value 0x1a2b3c4d looks like it is 6 bytes into the $rip + 0x200849 location that the disassembly appears to point to, and that is in fact the case:

(gdb) x/10 $1+6
0x7ffff7817060 <st32.yst32>:      0x4d3c2b1a      0x00000000      0x00000000      0x00000000
0x7ffff7817070 <y32>:   0x00000000      0x00000000      0x32363035      0x52282033
0x7ffff7817080: 0x48206465      0x34207461

My guess was the mysterious offset of 6 required to actually find this global address was the number of bytes in the MOV instruction, and sure enough that MOV is 6 bytes long:

(gdb) disassemble /r
Dump of assembler code for function st32:
   0x00007ffff7616800 <+0>:     85 ff   test   %edi,%edi
   0x00007ffff7616802 <+2>:     74 0d   je     0x7ffff7616811 <st32+17>
   0x00007ffff7616804 <+4>:     89 f8   mov    %edi,%eax
   0x00007ffff7616806 <+6>:     0f c8   bswap  %eax
   0x00007ffff7616808 <+8>:     89 05 52 08 20 00       mov    %eax,0x200852(%rip)        # 0x7ffff7817060 <st32.yst32>
   0x00007ffff761680e <+14>:    89 f8   mov    %edi,%eax
   0x00007ffff7616810 <+16>:    c3      retq
=> 0x00007ffff7616811 <+17>:    8b 3d 49 08 20 00       mov    0x200849(%rip),%edi        # 0x7ffff7817060 <st32.yst32>
   0x00007ffff7616817 <+23>:    0f cf   bswap  %edi
   0x00007ffff7616819 <+25>:    89 f8   mov    %edi,%eax
   0x00007ffff761681b <+27>:    c3      retq
End of assembler dump.

So, it appears that the %rip reference in the disassembly is really the value of the instruction pointer after the instruction executes, which is curious.

Note that this 4 byte relocation requires the shared library code segment and the shared library data segment be separated by no more than 4G. The linux dynamic loader has put all the segments back to back so that this is the case. This can be seen from /proc/PID/maps for the process:

$ ps -ef | grep maindl
pjoot    17622 17582  0 10:50 pts/3    00:00:00 /home/pjoot/workspace/pass/global/maindl

$ grep libglob /proc/17622/maps
7ffff7616000-7ffff7617000 r-xp 00000000 fc:00 2492653                    /home/pjoot/workspace/pass/global/
7ffff7617000-7ffff7816000 ---p 00001000 fc:00 2492653                    /home/pjoot/workspace/pass/global/
7ffff7816000-7ffff7817000 r--p 00000000 fc:00 2492653                    /home/pjoot/workspace/pass/global/
7ffff7817000-7ffff7818000 rw-p 00001000 fc:00 2492653                    /home/pjoot/workspace/pass/global/

We’ve got a read-execute mmap region, where the code lies, and a read-write mmap region for the data. There’s a read-only segment which I presume is for read only global variables (my shared lib has one such variable and we have one page worth of space allocated for read only memory).

I wonder what the segment that has none of the read, write, nor execute permissions set is?

gdb set target-charset

January 9, 2017 C/C++ development and debugging. No comments , , ,

I was looking for a way to convert ASCII and EBCDIC strings in gdb debugging sessions and was experimenting with gdb python script extensions. I managed to figure out how to add my own command that read a gdb variable, and print it out, but it failed when I tried to run a character conversion function. In the process of debugging that char encoding error, I found that there’s a built in way to do exactly what I wanted to do:

(gdb) p argv[0]
$16 = 0x7fd8fbda0108 "\323\326\303\301\323\305\303\326"
(gdb) set target-charset EBCDIC-US
(gdb) p argv[0]
$17 = 0x7fd8fbda0108 "LOCALECO"
(gdb) set target-charset ASCII
(gdb) p argv[0]
$18 = 0x7fd8fbda0108 "\323\326\303\301\323\305\303\326"

gdb TUI mode debugging

June 9, 2016 C/C++ development and debugging. 1 comment , , ,

I recently watched Greg Law’s “I’ll change your opinion on GDB” talk from CPPCON 2015, where he gave a nice example of how to use the gdb TUI (text user interface) mode. I’d recently encountered the gdb –tui option, but thought that you had to have the foresight to remember to invoke gdb with –tui (which I usually didn’t, but also didn’t always want to since it’s redraw was flaky in a putty terminal session).

It turns out that there are command line keystrokes to enable TUI dynamically when you want it (or turn it off when you don’t). That makes this TUI a much nicer feature!

The main key commands of interest are:

– C-x a, turn on (or off) TUI mode.
– C-x 2. Use two windows, or cycle to the next layout.
– C-x o. Change active window.
– C-L.  Redraw screen.
– C-x p. command history (since up and down move the cursor)

Here’s an example how things look after turning on TUI (C-x a):

Screen Shot 2016-06-09 at 7.42.59 PM

After turning on two window mode (C-X 2):

Screen Shot 2016-06-09 at 7.44.05 PM

This has a source view, an assembly view, and the output of ‘info registers’.  You can actually get a register window by cycling through the window layouts once (C-x 2 again) :


Screen Shot 2016-06-09 at 7.44.40 PM

Some of the previous times that I’d tried ‘gdb –tui’ explicitly, I’d found it hit redraw issues, but didn’t really want to detach and reattach the debugger just to change the TUI mode.  It sounds like such issues can be dealt with either by turning off TUI mode (C-x a), or using redraw (C-L).