In the past I have worked on several projects trying to performance tune slow IDM systems. To most people this sounds like quite a technical chore, but I found out it often isn’t. Most of the times problems stem from a misunderstanding of the systems inner workings, over engineering or system bloating. So how do you get to any one of these root causes? I am going to give you five tips on how to get to the root cause of an IDM performance problem with little to no technical knowledge.
In short these tips are, in no particular order:
- Question everything.
- Find out what the software was designed to do.
- Documentation, Documentation, Documentation.
- Finding peak loads.
- Sneak peek at the content.
1: Question everything.
It might come as a surprise, but sometimes the problem can be easily traced by asking a few questions. I usually interview four people in various positions to get some sort of cross comparison of what they think the problem might be. Next to some technical people, I usually include an architect or functional administrator and an end user. The main reason for interviewing non-technical users is to get a bearing on whether they experience a problem at all and during which activity. Even if their answers seem to be way of track, record them anyway. You might just run into a systemic problem regarding the maintenance of multiple systems. If you are conducting your research in an environment which has a lot of java applications and you find out that the end user experiences problems with most of these applications, it might be java tuning which is the problem.
Also, whilst you are interviewing people, don’t forget to ask them what they think the IDM system should be doing. There have been multiple cases I know of, in which end users don’t actually know what was going on and adopted a “how hard can it be?” stance. This in turn creates a negative frame of mind in which everything in the system is wrong, including performance.
2: Find out what the software was designed to do.
Finding out what an IDM system is supposed to do sounds like a meaningless task. An IDM system is designed to manage users, right? So why would you want to know? You want to know what the system was designed to do, to check if it is currently blatantly exceeding this limits. If you are looking at a system which was designed to handle a thousand users for example, but currently is handling a million, you might have found the root cause. The landscape in which an IDM system sits, is almost always an evolving landscape. Systems come and go, user populations will differ over time and specifications will change. It is therefore important that the IDM system is regularly updated. If old connectors are not removed, users have virtual accounts to systems which are long gone. This in fact creates a digital anchor which will slow the IDM system down in the future. So take a good look at the initial specifications for the system and how well it has been updated. Which brings me to my next point:
3: Documentation, Documentation, Documentation.
Having an IDM system change alongside your other infrastructure is fine, as long as you document what is going on. So ask for whatever documentation is available. Check if it is still up to date and what kind of solutions have been chosen. If most of the solutions are custom code, or custom configurations, there needs to be proper documentation. In most cases, if the problem originates in the configuration or the code base, it is almost always in the custom build parts of the system. If your system has been subjected to five years of quick fixes, and no one bothered to write it down, you will at some point get a performance problem.
A good indicator if your system has been affected by a lot of undocumented changes is asking the technical administrators how they experience the work load. If administrators have a hard time coping with all the tasks surrounding the administration of the IDM server on a day to day basis, chances are that they have taken certain ‘short cuts’ when it comes to implementing new features. One of the most cut corners is almost universally documentation. Although understandable that corners have to be cut under pressure, it does create problems down the line. This ironically adds to the work pressure which is already felt by the administrators. Especially if the solution which was chosen over time is not a standard solution and its documentation is handed down in stories from administrator to administrator, losing bits of information with every transfer.
4: Finding peak loads
A peak load can best be described as a sudden large amount of work which the IDM system has to process. In general there are two different kinds of peak loads, data driven peak loads and user driven peak loads.
Data driven peak loads are peak loads which are usually caused by a great influx of new users, or the mass deprovisioning of users. Think of a school for example. At the start of the school year all new students are registered and processed. This means that the IDM system will get an instant workload which it might need to process for several days. During this time, the IDM system will feel sluggish and slow to the end users. So how would you resolve this? In case of a data driven peak load, the key is workload throttling. For example, you let the IDM system process users at full capacity outside of work hours and you stem the influx of work items during the workday. If you couple this with a policy of pre-registration, you can start processing users earlier without the end-users experiencing any performance issues. Spreading out the workload will improve the performance of the system.
The second type of peak load is the user driven peak load. It is the moment at which a lot of users need to interact with the IDM system at the same time. When you are dealing with this kind of performance problem there are usually two distinct causes, either the endpoint which the users are accessing is not performing as is intended, or you are dealing with too many users. You can diagnose this by looking at the amount of sessions which the IDM server is trying to handle. So how do you solve this problem? In most cases you are not allowed to deny service to users on your system, so user throttling is not possible. The best bet to solving this is usually to beef up the server on which the IDM system is running or load balancing. Either way you need to contact your supplier to find out which solution they support best.
5: Sneak peek at the content.
The last of my tips is quite simple, log in and take a quick look. Get a feeling of the contents of the IDM system. Try to determine if the system is littered with users, if a role explosion has happened, if there are a lot of errors in the error logs, basically get an idea of the state in which the system is now. I have seen engineers break out a lot of tooling without knowing the state of the system and taking days to find out that the IDM system is logging way too much. You can easily prevent this by taking a good look around and scrutinize all things which seem to be strange. It is usually worth taking a moment to consider what a certain setting might mean and why it was enabled or disabled.
So here it is, my five tips on performance tuning an IDM system without going into technical details. I would like to know your thoughts, have I missed anything or do you agree with this list?