First attempt at reversing a firmware

Stuck with a problem in your code? Seek help here.

First attempt at reversing a firmware

Postby treg » December 24th, 2014, 4:47 am


I am currently trying to learn how to reverse a firmware. I started with the 1.1.7 revision of the rapid-i keyboard firmware. SpriteTM already provides some help :
-He gives the decyphering/patching/recyphering code
-He says that the firmware is loaded at address 0x2400

So I loaded the decoded firmware into Ida Pro, found the relocated interrupt table at 2400 (yes, not so hard :) ) and started to anotate code reading the datasheet.

What I could not solve by myself for now :

-I have to manualy anotate all the hardcoded address. I do have .H downloaded from Holtek's website but could not find how to have ida use them to replace direct values with named constants
-I don't know where to start looking. Should I trace the whole code from start (spending hours documenting the initialization of all peripherals), or directly jump to some part ? I wanted to look for the code which switches leds, then see where it is called to look for lighting mode handling code, but I did not find that code. All code that is writting to ports writ to port E and do not seem to do PWM. I also found this in rom :
Code: Select all
ROM:00004C44                     DCD 0x4001E000
ROM:00004C48                     DCD 0x4001C000
ROM:00004C4C                     DCD 0x4001D000
ROM:00004C50                     DCD 0x4001B000
ROM:00004C54                     DCD 0x4001A000

I guess this table (which contains all the adresses of the ports) is used somewhere to iterate over the leds, but I could not find where.

Of course, I do not have a JTAG (nor do I want to void the warranty yet) to debug the code.

Do you have some hints ?

Thank you,
Posts: 14
Joined: February 12th, 2011, 6:03 am

Re: First attempt at reversing a firmware

Postby bandersnatch » December 27th, 2014, 12:12 am

Firstly: "Respect!" : a quick shot of praise for your willingness to dive fearlessly into the unknown.
Your committment to discovery is obvious from the fact that you have already researched
the general tools (e.g. IdaPro) and specific data sheets for the task at hand, and also
from the nature of your questions.

I understand the dilemma you face at present.... How to proceed???...
Here are a few thoughts...( naturally all only MY humble opinions!)

Reverse engineering is a wierd combination of logical passive code analysis, logical active analysis/tracing/logging
of running code and totally illogical wild and intuitive guesswork & brutal code hacking
("hacking" in the old-school sense of "quick & nasty programming" to try out an idea)

This depends on your personality and experience.
Some people have an amazing talent for "sniffing out" critical code regions just by looking at the
disassembled code, others dive straight into the debugger, brutally hack all RETs with breakpoints
and hammer the return key until an LED changes color, and others carefully analyse the passive structure
of the program before starting a dynamic analysis.

You will need to experiment with various different methods & develop your own style.
Whatever your style, you need to keep a few things in mind:

1/ Your goal: What are you actually trying to do?
- Always keep your end goal in the back of your mind
- It is easy to get distracted with irrelevant information gathering
- There is no point in analyzing the keyboard matrix scanner code when you just want
to control the leds (!)

2/ Your tools: What is the best tool for your immediate problem?
- Dont get stuck using a single tool
- IdaPro is awesome for disassembly & debugging but this is only part of the picture
- A database system (MySql & Toad, MS-Access (dont laugh!), LotusNotes(dont laugh even harder!) etc)
is a powerful tool for automatically documenting thousands of functions, mapping the structure
of the code and finding commonly used entry points and tables
- Try to get the code actually running:
- In IdaPro: YOu may need to patch multiple IO read functions before the code will actually
run... This can be frustrating, especially in the initialization code and you might
have checksum problems if the program performs a self-check checksum calculation

- Use a logic simulator to mirror your hardware
- (dont laugh!) A keyboard only has a cpu, a key matrix and LED outputs. You only need to find the
IO pins on the CPU that are connected to the keys & LEDs. This sometimes easier than patching
the IdaPro code to compensate for missing hardware...

- Use visualisation tools
- Use a "TreeView" structure for the function call structure
- Automate the analysis process wherever possible
- Part of the fun of reverse engineering is writing your own tools
- If you find yourself performing the same task more than 10 times (insert your own patience limit here!)
then think about writing your own program to automate the task.

- Use "cross-pollination"
- Use the data from one tool as the input for another tool
- The classic example is to analyze the IdaPro disassembler outout using a database system

ETC. The main point is to always use the right tool for the job...

3/ Your approach

The two basic approaches are
"Documentation" vs "Experimentation" or ("Passive" vs "Active" reverse engineering..)

"Passive" analysis means analyzing the controller firmware with a disassembler, a hex editor
and other analysis and visualisation tools. Reverse engineering almost always begins with
this type of analysis, which provides a lot of useful information but this is usually not enough.

The problem is that most modern microcontrollers no longer use simple linear execution of code
stored in the firmware (like in the bad old days).
The usual approach these days is that the CPU executes a small piece of "bootloader" code
in the firmware that performs the following tasks:

- Performs a checksum-based integrity check of the firmware
- A pain in the neck for modders 'cos they have to first identify & disable the checksum routine
before they can even run the code with a debugger..
- (Possibly but not always) checks the hardware.
- (Possibly but not always) dynamically decrypts/loads program & data segments into the system RAM
- Initializes an "interrupt vector" table defining the entry point addresses of the functions to
be called for each different "interrupt" event
- I use the term "interrupt vector" in the broadest sense here (apologies to the purists!).
There are usually several tables e.g.
- The true hardware & software "interrupt vector" tables for the pre-defined HW & SW interrupts of the CPU
- "timer vector" tables defining the respective functions to be executed regularly when various timers expire
- "program vector" tables defining the respective functions to be executed for specific program events
etc. etc
- Initializes various system variables and sets various timers
- Then exits & does nothing. Everything is now "event driven" via the timers and "interrupt" tables.

(apologies to the purists: Yeah, I know that this is oversimplified and not always true!).

The main point is that there is often no "main loop" in the traditional sense.
This type of "asynchronous event-driven" software is difficult to passively analyze
'cos the firmware code does not reflect the "runtime" structure of the program.

You can still discover heaps about the program through passive analysis but
somewhere along the way you always seem to end up needing to actually run the program.

From your post, it seems that you have already started a passive analysis,
so here are a few tips for this.

Passive analysis
By "passive" I mean analysing the disassembled code / hex tables / string resources / io addresses
using IdaPro & other tools

- Document any obvious interrupt vectors / lookup tables / IO addresses is a good idea but will only take you so far
- DEFINITELY ask specific questions on how to use these aspects of Ida Pro...
...your knowledge of the tool directly affects your effectiveness with the tool

- Dont get distracted with too much detail... only document the stuff you are directly dealing with

- In my experience, the documentation/labelling of interrupt vectors / lookup tables / IO addresses / functions grows
dynamically and almost "organically" as a result of the exploration process..

- document all locations where the CPU IO registers (with pins to the outside world) are read or written
- Use a unique name embedding the specific register/pin and direction
- DONT try to "understand" the code....just document the LOCATIONS at first

- document all locations where CPU INTERRUPT and TIMER registers are written
- Use a unique name embedding the specific register
- DONT try to "understand" the code....just document the LOCATIONS at first

- The sheer amount of code can be daunting and a passive analysis of the code will also only take you so far
but you should first try to get an overview of the general program structure...

- First Try to Identify the various logically separate functions in the code as "black box" functions
- DON'T GO TOO DEEP in the call tree at first....
- DONT try to "understand" the code.. just document the STRUCTURE at first

There are different ways of doing this & you will probably need to use more than one

- "Top down" starting at the target addresses specified in the interrupt vector / powerup vector table

Start at the highest level & Document all the CALLs & conditional JMPS (JEQ, JNE, etc..) you find...
- Just give each call a Name describing:
- the originating place for the call, a destination name for the call, and the type of the call (e.g. TOPLEVEL_CALL_01, TOPLEVEL_CALL02 etc.)
- The idea here is that your names should contain implict embedded information about the named object
This will help you later 'cos it allows you to immediatly identify the context, destination and call type (CALL, JEQ, JNE, JMP etc.)
from the name without having to look up the code...

- You can also ocument the parameters pushed onto the stack if you want but this is not essential for the first pass. We are only
trying to get an idea of the program structure at the moment..

Keep grinding through the highest level code, labelling all calls until you reach a dead end..
- "Dead end" usually means an unconditional JMP of some kind.
- I use the term "dead end" 'cos the top level code usually ends with a JMP to an address indexed through
a jump vector table that is loaded somewhere in the previous code.
With A passive analysis you cannot follow a jump vector that is dynamically created.
- The top level code might actually finally jump to a known address that you can follow (yippee!!)
- If so, continue the labelling process from the new location
- The top level code might also end with a RET (return) statement (eh... we are not in a function???)
- This just means that an address was shoved onto (inserted into!) the stack somewhere in the
code above

Whatever.... somewhere along the way, the "top level" code will lead you to a dead end.
You now know the (abstract) call structure of the top level code.
If you are REALLY lucky, you might find a loop in the top level code.
- GREAT! You have something that you can analyse with breakpoints...
As previously mentioned, this is unlikely. The top level code of embedded systems usually
just intializes interrupt vectors, timers, counters, global variables and then jumps
to the main scanning loop at some obscure location that is currently unknown to us..

OK. You have now labelled the call structure of the "top level" code. What next??....
You can of course repeat the procedure for each of the destinations discovered and
keep diving down into each level, deeper & deeper...
This is fun for a while but you will eventually become overwhelmed/bored/depressed with the
sheer quantity of functions.

You now need to get smart & automate the process using IdaPro plugins, third-party tools
or by writing your own tool.
One trick is to dump the disassembly to a text file and analyse the text
file with a "depth-first" tree search using a Python/C#/Visual Basic/Java program
(take your pick!!!!) and a database of some kind (MySqL/MSAccess/LotusNotes (take your pick!!!!)
Create a new DB record containing:
- The name of the calling context
- A generated name for the called code
- The address of the called code
- The disassembled text of the called code (if you want)

This will automatically document the call structure of everything called by the top-level
code (A disassembly starting from the powerup vector).
For visualisation you can also use this information to load a "TreeControl" showing the
call structure (fun but not always helpful)..

SUPERFICIAL TOP DOWN from other start locations
Once your tool is working you can try running it on disassembled code from other
locations in the program. The problem here is that without a "dynamic" (runtime) analysis
you cannot be sure where the other code segments in the firmware start.

- Check the interrupt vector table for the CPU used & try disassembling from there..
- Just disassemble everything without worrying about wierd errors & run your analysis
program on anything that looks sensible..
- Just guess & play...

Somewhere along the way you will get fed up with gathering data & will want a change of scenery.
This is a highly personal decision. You do not need to analyze the entire structure of the
firmware. What usually happens is that you discover calls to functions containing the IO
operations that you previously documented and things suddenly start to make sense..

Whatever, at some point you have gathered tons of information on the general structure of the program.
WITHOUT trying to understand anything.
You now have a general view of the "functional" building blocks in the code and
a rough idea of their calling structure...
(lots of stuff is still missing or wrong but we still have a lot of useful information)

Now is the time to perform an initial analysis of the structure.
This is where a database system is really useful. The idea is to get a general idea
of what the functions do, how often they are called and identify critical sections
of the code...

IO operations are a great place to start 'cos you can directly map the pins of an IO register
to physical pins on the CPU & the devices connected to them (keyboard matrix, LED drivers etc.)
- Find all the functions containing the IO & interrupt operations
& re-label all names AND references with an IO_IN_xxxx, IO_OUTxxxx, IO_INOUTxxx prefix accordingly
- examine the actual IO registers that are read & written & use a multimeter to find out what is connected
to the corresponding IO pins on the chip

If the IO functions are general & you cannot directly identify the register being written to
then search for all calls to one specific IO function (preferably a WRITE) & look for the common code
setting registers or pushing values onto the stack before the function is called...

Hint: LED IO registers are usually only accessed via a WRITE and are not READ
(purists: Yeah I know, not true with external bidirectional multiplexors,
but keyboards usually have a few components as possible)
IO functions that only WRITE to a port are good place to look for LEDs
Try to identify the value witten, either in the IO function or the code calling the IO function.
Look for Literal single bit operations that set or clear a single bit in a value written
to an io register
(e.g. "REG01 OR (0x01)" = Bit 1 ON, "REG01 AND (0xFE) = Bit 1 OFF"
Loops writing an incrementing number to an IO register are probably writing to
the output pins for scanning the keyboard.
Each pass through the loop will then also have a corresponding read operation from a
different IO register to see if a key in the current scan line is pressed

You get the idea. Use a combination of hardware analysis/knowledge, guess what the function
might be doing & check your hypothesis by examing the calling code.
DONT GET STUCK. If the code makes no sense then just move on somewhere else (!)

You can use the same technique to identify:
- the code that sends characters from the keyboard out the serial port to the computer...
- the code that responds to special keys that are not in the normal keyboard matrix
- code responding/writing to other hardware IO pins..

Keep the high level processes performed by the program in mind:
- Scan the keyboard matrix
- Translate the pressed key into a keycode
- Send the keycode out the serial port

- Timers are used (e.g.) for:
- Scanning the keyboard

FIND commonly used code segments
- Sort the function names according to their calling frequency
- We are looking for the most commonly used entry points
- Error handlers often use the same exit vector

- Scan the function code for indexed data operations & look for common offsets...
I.e. LDAI (REG01+1) : Load accumulator indirect from (location pointed to by Register01)+1
Look for commonly used literal & calculated indexes: (REG01+1),(REG01+2), (REG01+REG02)
REG01 probably contains the base address of a table and the literal value or (e.g.) REG02
value is the index into the table. Climb up the call tree until you find the place where
REG01 is set. If you are lucky you might find the address of the table & can deduce the
purpose of the function from the contents of the table (e.g. key matrix-->keycode conversion

OK. Nuff said for now, I have already probably overloaded you with detail
but these are really only a few basic ideas to get you started with passive analysis.
You can push this MUCH further but somewhere along the way it is no longer worth the effort
& you will need to move on to active analysis to start having fun again (!)

I hope this helps you to make some progress.
Let us know how you get on..
Posts: 150
Joined: September 17th, 2014, 12:06 pm

Return to Help Me! Software

Who is online

Users browsing this forum: No registered users and 2 guests