Optimize Logins v2

by Ray Davis, CTP

Supercharge Citrix Logins collections tips from the field

I wanted to take the time and list the Optimizations I try to follow where I can when helping clients tune images and make Login faster. I also wanted to state that these tips and tricks are gathered from a collection of EUC sources I follow. I can’t take any credit for these, and this blog is to try to put it all in one place for the community. There are many folks out there that have blogs that go deep into this. One that always comes to mind is James Rankin. I have been following his hat tricks for many years. He has a great “Ultimate guide to Windows Login time” series, and I recommend you read it. As I go through and list out the optimizations, please note that some of this is my opinion based on my experience and from others EUC help from the community. I also understand that each environment is different, and some may or may not apply, and some may not agree with these. I still try to use all I can within the control given during the situation. As you read this, remember these are helpful tips and aren’t intended to go out and start changing this right away. Take your time and test, test, and test. I did not focus on the Storage aspect, as ideally, using SSD or NVMe storage is something you would want to stay within any VDI environment.

  1. UEM Tool

It would be beneficial to obtain a UEM tool with system optimizations for CPU, Memory, and I/O. By just doing Citrix WEM, it has a magic formula(simplified a lot). By setting four options, you will achieve more of a scalable approach for the images, which means you will get more out of the Hypervisor around CPU cycle, CPU wait time, and CPU response. Memory management can be beneficial because it takes a working Optimization set and clamps the usage if needed. The next question folks ask is, what about the Disk I/O or Disk latency that could occur? Sure, that could happen, but 13k-18k IOPS per disk at 3gpbps-6gbps is very unlikely. In today’s Technology times, I don’t run into disk constraints as I used to 6-8 years ago. But it’s still likely to happen.

  • Tuning GPO

GPO is an essential part. There is nothing wrong with the older mindset around away GPP and Client-side extension, login scripts, item-level targeting, and WMI filters. But ideally, to get the best user experience, they would need to go away or be open to change if user performance is the key. It does work very well, but it also adds much overhead. But this is the #1 thing I’ve cleaned up at many companies. You move these to a UEM tool

  • GPO Functional vs. Monothecal

Number 2 leads me to number 3, get rid of functional GPO and do the monolithic layout. Too many single-liners GPOs will make logins slow from my experince. One Or Two Main GPO objects will make GPO processing a lot better. Yes, it will contain a lot of GPO in one, but it processes faster. This gentleman, In the blog, is Trent. He works for ControlUp, and I occasionally talk to him about custom control-up script base actions. He is very sharp and has helped me many times. Another one that is on this list is Jame Rakin.

  • Loopback Processing

GPO loopback Processing is something I have seen this done wrong in so many places. In a Citrix XA-XD or even RDSH Environment. Ideally, you also want to do a loopback replacement. You do not want GPO from other OUs applying. This can be a hot topic because you might have your OU laid out where users are in one OU with user policies and computers in another OU with computer policies. But in my last 15 years, the approach has been computer GPOs, and if you want the user’s GPO applied, you need a loopback enabled and then set replace, not merge. Taking this approach means doing GPO additions or OU re-org. This is a debatable factor, and some may not agree.

  • Computer GPO over user GPO

One crucial piece is always if you can choose Computer GPO when available. Suppose you have a user and computer GPO that do the same thing. Go with computer GPO. It will apply at a startup making the GPO faster. You might be thinking that we have specific user settings that apply to users. Yea, I get that. But again, use a UEM tool and get away from what I listed in #2. Keep nested Group to a minimum, or Logins will be impacted. But again, each Setup may not be able to do this based on the environments complexity.

  • Asynchronous GPO processing

Ensure you have Asynchronous GPO processing on

  1. Always wait for the networking at computer Startup and logon” Disabled           
  2. Computer Config > Admin Templates > System > Logon > Always wait for the network at computer startup and logon: Disabled
  3. Allow asynchronous user Group Policy processing when logging on through Remote Desktop Services – Enabled
  4. Computer Config > Admin Templates > System > Group Policy > Allow asynchronous user Group Policy processing when logging on through Remote Desktop Services: Enabled
  5. How to get the fastest possible Citrix logon times – JAMES-RANKIN.COM
  6. Make Citrix logons use asynchronous user Group Policy processing mode – JAMES-RANKIN.COM
  7. The ultimate guide to Windows logon time optimizations – part #4 – JAMES-RANKIN.COM
  • OS optimization

Windows OS Optimizations, such as Citrix Optimizer and bolt-Ons from Citrix marketplace, for 3rd party applications. Such as Edge, Chrome, office, etc. It’s essential to tune the image. VMware OSOT vs Citrix Optimizer Optimizer Smackdown | GO-EUC

  1. Citrix_Optimizer_Community_Template_Marketplace/templates at master · ryancbutler/Citrix_Optimizer_Community_Template_Marketplace · GitHub
  2. Creating a custom template for Citrix Optimizer – Dennis Span
  1.  Minimize Application from Startup

Remove all applications at startup, except for the key elements. An example would be the CU agent, UEM Agent, and AV. Autoruns helps in this manner. Nothing needs to run in the hklm\run or Run once.  If it needs to run at startup, you use A UEM tool to call it a day.

  • Application tunning

In my experience, that can be a daunting task. Many companies will have custom software for the businesses. Some are in-house, and some are 3rd party and some are used universal accorss many companies. In any case, try to reference the documentation where possible. Most but not all will have guides on applying best practices in RDSH/XenApp/VDI.

As an example, here are some that come to mind. There are many more I am sure.

Active Setup was another legacy hook from MS that they kept around. Remove active setup keys from Registry, and these bloat the unserint and shell from loading, causing delays. I have details and data on this I can provide.

  1. Citrix TechZone highlights this in their best practices for deploying google chrome. Although the topic isn’t about chrome, it gives you an idea.
  2. Tech Paper: Deploying Google Chrome | Citrix Tech Zone
  3. Preferred method – Add this into Citrix Optimizer
    1. Creating a custom template for Citrix Optimizer – Dennis Span
  4. another method – Run James Rankins script

echo Querying and deleting 32bit STUB paths…

setlocal EnableDelayedExpansion

:: Queries the Registry and searches for specific strings. In this case ‘STUBPATH’

set KEY=”HKEY_LOCAL_MACHINE\Software\Microsoft\Active Setup\Installed Components”

set FND=find /i %KEY%

for /f “Tokens=*” %%a in (‘reg query %KEY% /s^|%FND%’) do (

set SP=N

for /f “tokens=*” %%b in (‘reg query “%%a”^|find /i ” STUBPATH”^|find “REG_”‘) do (

set SP=Y

)

:: Runs an if statement, stating that if a key matching ‘STUBPATH’ is true, it should be deleted.

if “!SP!” EQU “Y” reg delete “%%a” /V STUBPATH /F

)

echo Querying and deleting 64bit STUB paths…

endlocal

setlocal EnableDelayedExpansion

:: Queries the Registry and searches for specific strings. In this case ‘STUBPATH’

set KEY=”HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Active Setup\Installed Components”

set FND=find /i %KEY%

for /f “Tokens=*” %%a in (‘reg query %KEY% /s^|%FND%’) do (

set SP=N

for /f “tokens=*” %%b in (‘reg query “%%a”^|find /i ” STUBPATH”^|find “REG_”‘) do (

set SP=Y

)

:: Runs an if statement, stating that if a key matching ‘STUBPATH’ is found, it should be deleted.

if “!SP!” EQU “Y” reg delete “%%a” /V STUBPATH /F

)

endlocal

  • PVS vDisk maintenance ( if PVS is used)

PVS offline vDisk maintenance. Yea, it would help if you defragged the VHDX. It doesn’t matter how fast your Storage is. Disk fragments will occur, reducing performance by 20-40%, in my experience. There are ways to do this without downtime and automation. The more version you create, the more it happens. I have blogs on this if you are interested.

  • Extra VDA Image tweaks

Sometimes I would like to squeeze more out of the optimizations. Being in the community means many talented folks have many tricks. Here is another blog I go through to see where it can help. I encourage you to ensure you understand what this is doing. If you implement it, it would be a good idea to make a list of running optimizations. It will allow you to have source control for yourself and your peers, helping support the image/environment.

a. This key has been around since 2012/Win8 days. I still implement it today.

b. Another little nugget I stumbled on was DisableAcrylicBackgroundOnLogon

  • HKLM\SOFTWARE\Policies\Microsoft\Windows\System
    • DisableAcrylicBackgroundOnLogon
      • DWORD Value: 1 (Enabled)
    • GPO method – ComputerConfig>Admin Templates>System>Logon>Show Clear logon Backgroup =Enabled

c. Remove extra UWP/AppX package

d. Windows Welcome screen spinning waiting, or slow

  • Finalizing or Sealing

Remember that Finalizing or Sealing the image is very important. I have been using BISF by Matthias Schlimm for many years now. I have a good working relationship with him from my CTA experience. This is another critical element. If you currently have your Image Sealing scripts, then no problem. We can combine them, and the results are even better.

Here are some key elements I always use in my Golden Image

  • Disable IPv6
  • Run DelProf2
  • Run CCleaner
  • Run AV Scan ( it depends on the AV product at times)
  • Configuration CTX Optimization
  • Configure Citrix PVS Target ( Set my Write Cache drive for me)
  • Run a Defrag ([Issue]: Defrag not performed, not defined based on DiskMode VDAPrivate · Issue #369 · EUCweb/BIS-F · GitHub)
  • Run .NET Optimzations
  • Rebuild Performance Counter
  • Enable WinSxS optimization  with a Max of 480 minutes( Execute on base Disk only)
  • Disable “Delete allUsersStartMene Content” I do this because It will ask you, and I have seen folks say yes and not read the messages.
  • Remove ghost devices ( be carefull, and understand this)
  • Configure Desktop shortcut
  • Shutdown Base Image after sealing
  • If using  FSLogix AppMasking, “Copy FSLogix rules (*.frx), assignments (*.fxa) and URL (*.xml) from central share during Device Personalization on System Startup” You can use GPP, but this approach I like more.
  • Azure AD (If using this) PREP: Azure AD leave doesn’t work · Issue #330 · EUCweb/BIS-F · GitHub
  • Rearm MS Office once ( you need to evaluate this for your environment)
  • Rearm MS Windows once ( you need to evaluate this for your environment)
  • Enable RDP support ( allows you to execute BIS-F within a RDP session)
  • Configure logging to a UNC path

Here are some key elements I run on the GPO for the VDAs but not limited to:

  • Configure Citrix WEM
  • VDA Configuration “Delay Citrix Desktop Service” this helps when you modify the List of DDCs as well as the purpose of the Delay
  • Configure Page file
  • Bake GPO in Image or use GPMC

This is another hot topic that I have had many conversations about with the community. I say it depends on the Setup and the use case. Bake the GPO in the images to get the best processing and logins. Doing it from GPMC from AD seems better. Make the change 90 minutes later with a 30-minute offset GPO applied or do a GPUpdate /force remotely, mostly completed. But if you bake it in the image, the GPO processing is super-fast. But the downside is you have to crack the image open for any GPO change. Unless it is a Computer GPO, a reboot may be needed to reflect the HKLM\policy hive. 

  • Good profile management.

Profile Containers seem to be everyone’s go-to here. But that is not always the case. However, UPM is still great in my humble opinon. FSLOGIX office container is geared around Office 365 and roaming the container’s search database. You can stick it in the profile container or split it in an office container. Server 2019/Win10 Multi-session and above do not set the search to Roam anymore in the ADMX file for the GPO. Windows natively do this now, and it will cause issues if you do. It’s The FSLogix docs, and I’m sure you also know.  I did a webinar about one year ago, and the advice I gave was to be careful with exclusions. Exclusions are not treated as they were in the UPM days. Citrix Profile Container (Not UPM), but Profile containers are also perfect. They are giving FSLogix a run. Well-respected James Kindon has broken this down very nicely.

  • Shrink Scripts /Deduplication/Exclusions

Jim Moyle is an FSLOGIX genius, and he preaches this all the time. Yes, you will need a shrink script to shrink the VHDX. When I did this, I would do it weekly with Jim Moyles’s script. Another add and if you use any windows server to host them. Enable data deduplication.  I have also written a blog to show savings and shrink scripts.  When you do exclusions, be aware that the first Login will impact the PVS write cache. In today’s deployments, the use case is Write cache to Ram with Disk overflow. I wish there were a magical number or a T-shirt size that would fit all. (Maybe there is, and I been living under a rock) Disk overflow would be the D drive it creates when using the XenDesktop wizard from PVS, or automation works. The older rule of thumb was for desktop operating systems, starting with 256-512MB, and for server operating systems, starting with 2-4GB. Anyways, From my testing, it would only happen on the first Login of the profile creation and will not happen again.  Exclusions do not make the VHDX mount faster, and it plays no part in making logins faster.   I used to use 20GB drives for disk overflow, but it may seem that just isn’t cutting it for today’s applications. However, this is environment based in most cases. FSLogix 2210 now has a compaction feature they introduced. I have only used it on a lab setup. It seemed to work well, but I still stick with Jims Script for now. Matthias Schlimm released a blog giving  great inside looks of what is going on. I suggest you read it.

  1. AV Exclusions

Making sure the proper AV exclusions are in place is extremely important. I would also verify and check in the Registry if the AV product allows it. Most do, from what I have seen.

  1. Turning RDS/Virtual AppsTSFairShare

Fair Share technologies for CPU resources were introduced in Windows Server 2008 R2. Remote Desktop Services (RDS) server, Windows 10 Enterprise multi-session, and Windows 11 Enterprise multi-session use Fair Share technology to manage resources. RDS builds on the Fair Share technologies to add features for allocating network bandwidth and disk resources. Fair Share technologies are enabled by default, but you can disable them using Windows PowerShell and WMI. I would disable these settings to get the best user experience. Make sure to test this beforehand. On March 2023 on the VirtualExpo, you can see that this indeed helped longin and application launch times.

These registry key exists for CPU, DISK, and Network, all enabled by default.

  • Disk: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TSFairShare\Disk
    • Network:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TSFairShare\NetFS
    • CPU: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\QuotaSystem

19. Hardware Layer

Understanding the CPU architecture is another good topic to pay attention to. In my experience, most places now have SSD or NVMe for the storage aspect of things. The hypervisors that I see are Nutanix and VMware. Nutanix has a wonderful HCI solution, and VMware offers an HCI solution and an traditional 3-tier layout for things like UCS, PowerEdge box, etc. Whatever flavor you are running, it is vital to understand the VM sizing for the workloads. The answer around what size is mostly “it depends” However you can follow guidance from Techzone for a Scablity aspect.

 “On older chips, such as Broadwell and Haswell, Intel connected processors using a ring-based architecture. But as the number of cores increased, access latency increased and bandwidth per core diminished so Intel would mitigate this by splitting the chip into two halves and adding a second ring to reduce distances. And this invisible split was something that needed to be factored into CVAD SSS to provide optimal results. This has been referred to in the past as “NUMA” or Non-Uniform Memory Access. And the leading guidance was to ensure that you are sizing CVA VMs as large as possible but not crossing NUMA nodes, sub-NUMA clusters or rings at the same time. If you sized your CVA VMs too large and they effectively spanned NUMA nodes or rings, it can lead to NUMA “thrashing” by accessing non-local resources and this would yield reduced SSS. Fast-forward to today and Intel has moved from a ring-based architecture to a mesh-based architecture. And this new mesh architecture introduced in Skylake does not have the same limitations as before where we have to split chips, divide cores or add rings. And this changes the way we size CVA servers in particular. So it’s important to understand the specific chip that is being used in the hardware you purchase and how the underlying microprocessor architecture is designed and constructed”

I do see this a lot at times, client/company throwing more CPU at things hoping it will speed up the back in workloads. Sure there are times it will help. But I try to pay heavy attention to these. CPU wait time and CPU ready time are both terms used in the context of CPU scheduling and resource management in operating systems.

CPU wait time: refers to the amount of time that a process is waiting in a queue, ready to run but unable to do so because the CPU is currently executing another process. During this time, the process is waiting for the CPU to become available so that it can start executing. Example, a virtual machine did get scheduled but the processors have nothing to process and so the CPU simply waits while the scheduled time for the virtual machine clicks by.

CPU ready time: on the other hand, refers to the amount of time that a process spends in a ready queue, waiting to be allocated CPU resources. This includes the time that the process spends waiting for its turn to use the CPU, as well as any time that it spends waiting for input/output (I/O) operations to complete. Example, virtual machine was ready, but could not get scheduled to run on the physical CPU. Bascially cpu ready means the guest is waiting on the host, cpu wait means the host is waiting on the guest

In summary, CPU wait time refers specifically to the time a process spends waiting for the CPU to become available, while CPU ready time encompasses all the time a process spends waiting for CPU and other resources.

Design Decision: Single Server Scalability | Citrix Tech Zone

20. Choosing the suitable Provision method.

MCS or PVS

MCS considerations

PVS Considerations

This concludes the tips and tricks. Remember, this was more of a catch-all source blog showing links and summarizing what many of the EUC folks use to optimize logins. Please let me know if I missed something you believe can be helpful, and I’ll update the blog to include it.