Optimize Logins v2 – World of EUC

by Ray Davis, CTP

Supercharge Citrix Logins collections tips from the field

I wanted to take the time and list the Optimizations I try to follow where I can when helping clients tune images and make Login faster. I also wanted to state that these tips and tricks are gathered from a collection of EUC sources I follow. I can’t take any credit for these, and this blog is to try to put it all in one place for the community. There are many folks out there that have blogs that go deep into this. One that always comes to mind is James Rankin. I have been following his hat tricks for many years. He has a great “Ultimate guide to Windows Login time” series, and I recommend you read it. As I go through and list out the optimizations, please note that some of this is my opinion based on my experience and from others EUC help from the community. I also understand that each environment is different, and some may or may not apply, and some may not agree with these. I still try to use all I can within the control given during the situation. As you read this, remember these are helpful tips and aren’t intended to go out and start changing this right away. Take your time and test, test, and test. I did not focus on the Storage aspect, as ideally, using SSD or NVMe storage is something you would want to stay within any VDI environment.

UEM Tool

It would be beneficial to obtain a UEM tool with system optimizations for CPU, Memory, and I/O. By just doing Citrix WEM, it has a magic formula(simplified a lot). By setting four options, you will achieve more of a scalable approach for the images, which means you will get more out of the Hypervisor around CPU cycle, CPU wait time, and CPU response. Memory management can be beneficial because it takes a working Optimization set and clamps the usage if needed. The next question folks ask is, what about the Disk I/O or Disk latency that could occur? Sure, that could happen, but 13k-18k IOPS per disk at 3gpbps-6gbps is very unlikely. In today’s Technology times, I don’t run into disk constraints as I used to 6-8 years ago. But it’s still likely to happen.

Tuning GPO

GPO is an essential part. There is nothing wrong with the older mindset around away GPP and Client-side extension, login scripts, item-level targeting, and WMI filters. But ideally, to get the best user experience, they would need to go away or be open to change if user performance is the key. It does work very well, but it also adds much overhead. But this is the #1 thing I’ve cleaned up at many companies. You move these to a UEM tool

GPO Functional vs. Monothecal

Number 2 leads me to number 3, get rid of functional GPO and do the monolithic layout. Too many single-liners GPOs will make logins slow from my experince. One Or Two Main GPO objects will make GPO processing a lot better. Yes, it will contain a lot of GPO in one, but it processes faster. This gentleman, In the blog, is Trent. He works for ControlUp, and I occasionally talk to him about custom control-up script base actions. He is very sharp and has helped me many times. Another one that is on this list is Jame Rakin.

Loopback Processing

GPO loopback Processing is something I have seen this done wrong in so many places. In a Citrix XA-XD or even RDSH Environment. Ideally, you also want to do a loopback replacement. You do not want GPO from other OUs applying. This can be a hot topic because you might have your OU laid out where users are in one OU with user policies and computers in another OU with computer policies. But in my last 15 years, the approach has been computer GPOs, and if you want the user’s GPO applied, you need a loopback enabled and then set replace, not merge. Taking this approach means doing GPO additions or OU re-org. This is a debatable factor, and some may not agree.

Computer GPO over user GPO

One crucial piece is always if you can choose Computer GPO when available. Suppose you have a user and computer GPO that do the same thing. Go with computer GPO. It will apply at a startup making the GPO faster. You might be thinking that we have specific user settings that apply to users. Yea, I get that. But again, use a UEM tool and get away from what I listed in #2. Keep nested Group to a minimum, or Logins will be impacted. But again, each Setup may not be able to do this based on the environments complexity.

The not-so-hidden tax of granular Group based application presentation with Citrix WEM (jkindon.com)

Asynchronous GPO processing

Ensure you have Asynchronous GPO processing on

Always wait for the networking at computer Startup and logon” Disabled
Computer Config > Admin Templates > System > Logon > Always wait for the network at computer startup and logon: Disabled
Allow asynchronous user Group Policy processing when logging on through Remote Desktop Services – Enabled
Computer Config > Admin Templates > System > Group Policy > Allow asynchronous user Group Policy processing when logging on through Remote Desktop Services: Enabled
How to get the fastest possible Citrix logon times – JAMES-RANKIN.COM
Make Citrix logons use asynchronous user Group Policy processing mode – JAMES-RANKIN.COM
The ultimate guide to Windows logon time optimizations – part #4 – JAMES-RANKIN.COM

OS optimization

Windows OS Optimizations, such as Citrix Optimizer and bolt-Ons from Citrix marketplace, for 3rd party applications. Such as Edge, Chrome, office, etc. It’s essential to tune the image. VMware OSOT vs Citrix Optimizer Optimizer Smackdown | GO-EUC

Minimize Application from Startup

Remove all applications at startup, except for the key elements. An example would be the CU agent, UEM Agent, and AV. Autoruns helps in this manner. Nothing needs to run in the hklm\run or Run once. If it needs to run at startup, you use A UEM tool to call it a day.

Application tunning

In my experience, that can be a daunting task. Many companies will have custom software for the businesses. Some are in-house, and some are 3^rd party and some are used universal accorss many companies. In any case, try to reference the documentation where possible. Most but not all will have guides on applying best practices in RDSH/XenApp/VDI.

As an example, here are some that come to mind. There are many more I am sure.

Microsoft Edge
- Tech Paper: Deployment Guide Microsoft Edge | Citrix Tech Zone

Google Chrome
- Tech Paper: Deploying Google Chrome | Citrix Tech Zone

Microsoft 365 with CVAD
- Deployment Guide: Microsoft 365 with Citrix Virtual Apps and Desktops
Active Setup

Active Setup was another legacy hook from MS that they kept around. Remove active setup keys from Registry, and these bloat the unserint and shell from loading, causing delays. I have details and data on this I can provide.

Citrix TechZone highlights this in their best practices for deploying google chrome. Although the topic isn’t about chrome, it gives you an idea.
Tech Paper: Deploying Google Chrome | Citrix Tech Zone
Preferred method – Add this into Citrix Optimizer
1. Creating a custom template for Citrix Optimizer – Dennis Span
another method – Run James Rankins script

echo Querying and deleting 32bit STUB paths…

setlocal EnableDelayedExpansion

:: Queries the Registry and searches for specific strings. In this case ‘STUBPATH’

set KEY=”HKEY_LOCAL_MACHINE\Software\Microsoft\Active Setup\Installed Components”

set FND=find /i %KEY%

for /f “Tokens=*” %%a in (‘reg query %KEY% /s^|%FND%’) do (

set SP=N

for /f “tokens=*” %%b in (‘reg query “%%a”^|find /i ” STUBPATH”^|find “REG_”‘) do (

set SP=Y

)

:: Runs an if statement, stating that if a key matching ‘STUBPATH’ is true, it should be deleted.

if “!SP!” EQU “Y” reg delete “%%a” /V STUBPATH /F

)

echo Querying and deleting 64bit STUB paths…

endlocal

setlocal EnableDelayedExpansion

:: Queries the Registry and searches for specific strings. In this case ‘STUBPATH’

set KEY=”HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Active Setup\Installed Components”

set FND=find /i %KEY%

for /f “Tokens=*” %%a in (‘reg query %KEY% /s^|%FND%’) do (

set SP=N

for /f “tokens=*” %%b in (‘reg query “%%a”^|find /i ” STUBPATH”^|find “REG_”‘) do (

set SP=Y

)

:: Runs an if statement, stating that if a key matching ‘STUBPATH’ is found, it should be deleted.

if “!SP!” EQU “Y” reg delete “%%a” /V STUBPATH /F

)

endlocal

PVS vDisk maintenance ( if PVS is used)

PVS offline vDisk maintenance. Yea, it would help if you defragged the VHDX. It doesn’t matter how fast your Storage is. Disk fragments will occur, reducing performance by 20-40%, in my experience. There are ways to do this without downtime and automation. The more version you create, the more it happens. I have blogs on this if you are interested.

How I Run a Defrag on a PVS Target vDisk (mycugc.org)

Extra VDA Image tweaks

Sometimes I would like to squeeze more out of the optimizations. Being in the community means many talented folks have many tricks. Here is another blog I go through to see where it can help. I encourage you to ensure you understand what this is doing. If you implement it, it would be a good idea to make a list of running optimizations. It will allow you to have source control for yourself and your peers, helping support the image/environment.

Citrix Virtual Delivery Agent (VDA) Post Install Script | J House Consulting – DevOps, Microsoft, Citrix & Desktop Virtualisation (VDI) Specialist – +61 413 441 846

a. This key has been around since 2012/Win8 days. I still implement it today.

StartupDelayInMSec”=dword:00000000
Add this into GPO or WEM ( this helps Citrix Director get the correct times)
[HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Serialize]
Optimize Logon Times – Part 1: Citrix Director – xenappblog
Reduce Citrix logon times by up to 75% – JGSpiers.com

b. Another little nugget I stumbled on was DisableAcrylicBackgroundOnLogon

HKLM\SOFTWARE\Policies\Microsoft\Windows\System
- DisableAcrylicBackgroundOnLogon
  - DWORD Value: 1 (Enabled)
- GPO method – ComputerConfig>Admin Templates>System>Logon>Show Clear logon Backgroup =Enabled

c. Remove extra UWP/AppX package

Get-AppxProvisionedPackage -online | Out-GridView -PassThru | Remove-AppxProvisionedPackage -online
https://james-rankin.com/articles/how-to-remove-uwp-apps-on-windows-10-v1803/

d. Windows Welcome screen spinning waiting, or slow

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System
Value = DelayedDesktopSwitchTimeout
Type = DWORD
Data = 5 or 1
This key I always test this in the environments.
Windows 10 VDA exhibits slow logon at Welcome screen (citrix.com)
The ultimate guide to Windows logon time optimizations, part #6 – JAMES-RANKIN.COM

Finalizing or Sealing

Remember that Finalizing or Sealing the image is very important. I have been using BISF by Matthias Schlimm for many years now. I have a good working relationship with him from my CTA experience. This is another critical element. If you currently have your Image Sealing scripts, then no problem. We can combine them, and the results are even better.

Base Image Script Framework ( BIS-F) 6.1 (eucweb.com)

Here are some key elements I always use in my Golden Image

Disable IPv6
Run DelProf2
Run CCleaner
Run AV Scan ( it depends on the AV product at times)
Configuration CTX Optimization
Configure Citrix PVS Target ( Set my Write Cache drive for me)
Run a Defrag ([Issue]: Defrag not performed, not defined based on DiskMode VDAPrivate · Issue #369 · EUCweb/BIS-F · GitHub)
Run .NET Optimzations
Rebuild Performance Counter
Enable WinSxS optimization with a Max of 480 minutes( Execute on base Disk only)
Disable “Delete allUsersStartMene Content” I do this because It will ask you, and I have seen folks say yes and not read the messages.
Remove ghost devices ( be carefull, and understand this)
Configure Desktop shortcut
Shutdown Base Image after sealing
If using FSLogix AppMasking, “Copy FSLogix rules (*.frx), assignments (*.fxa) and URL (*.xml) from central share during Device Personalization on System Startup” You can use GPP, but this approach I like more.
Azure AD (If using this) PREP: Azure AD leave doesn’t work · Issue #330 · EUCweb/BIS-F · GitHub
Rearm MS Office once ( you need to evaluate this for your environment)
Rearm MS Windows once ( you need to evaluate this for your environment)
Enable RDP support ( allows you to execute BIS-F within a RDP session)
Configure logging to a UNC path

Here are some key elements I run on the GPO for the VDAs but not limited to:

Configure Citrix WEM
VDA Configuration “Delay Citrix Desktop Service” this helps when you modify the List of DDCs as well as the purpose of the Delay
Configure Page file

Bake GPO in Image or use GPMC

This is another hot topic that I have had many conversations about with the community. I say it depends on the Setup and the use case. Bake the GPO in the images to get the best processing and logins. Doing it from GPMC from AD seems better. Make the change 90 minutes later with a 30-minute offset GPO applied or do a GPUpdate /force remotely, mostly completed. But if you bake it in the image, the GPO processing is super-fast. But the downside is you have to crack the image open for any GPO change. Unless it is a Computer GPO, a reboot may be needed to reflect the HKLM\policy hive.

Good profile management.

Profile Containers seem to be everyone’s go-to here. But that is not always the case. However, UPM is still great in my humble opinon. FSLOGIX office container is geared around Office 365 and roaming the container’s search database. You can stick it in the profile container or split it in an office container. Server 2019/Win10 Multi-session and above do not set the search to Roam anymore in the ADMX file for the GPO. Windows natively do this now, and it will cause issues if you do. It’s The FSLogix docs, and I’m sure you also know. I did a webinar about one year ago, and the advice I gave was to be careful with exclusions. Exclusions are not treated as they were in the UPM days. Citrix Profile Container (Not UPM), but Profile containers are also perfect. They are giving FSLogix a run. Well-respected James Kindon has broken this down very nicely.

Shrink Scripts /Deduplication/Exclusions

Jim Moyle is an FSLOGIX genius, and he preaches this all the time. Yes, you will need a shrink script to shrink the VHDX. When I did this, I would do it weekly with Jim Moyles’s script. Another add and if you use any windows server to host them. Enable data deduplication. I have also written a blog to show savings and shrink scripts. When you do exclusions, be aware that the first Login will impact the PVS write cache. In today’s deployments, the use case is Write cache to Ram with Disk overflow. I wish there were a magical number or a T-shirt size that would fit all. (Maybe there is, and I been living under a rock) Disk overflow would be the D drive it creates when using the XenDesktop wizard from PVS, or automation works. The older rule of thumb was for desktop operating systems, starting with 256-512MB, and for server operating systems, starting with 2-4GB. Anyways, From my testing, it would only happen on the first Login of the profile creation and will not happen again. Exclusions do not make the VHDX mount faster, and it plays no part in making logins faster. I used to use 20GB drives for disk overflow, but it may seem that just isn’t cutting it for today’s applications. However, this is environment based in most cases. FSLogix 2210 now has a compaction feature they introduced. I have only used it on a lab setup. It seemed to work well, but I still stick with Jims Script for now. Matthias Schlimm released a blog giving great inside looks of what is going on. I suggest you read it.

AV Exclusions

Making sure the proper AV exclusions are in place is extremely important. I would also verify and check in the Registry if the AV product allows it. Most do, from what I have seen.

https://docs.citrix.com/en-us/workspace-environment-management/service/system-requirements.html#antivirus-exclusions

Tech Paper: Endpoint Security, Antivirus, and Antimalware Best Practices (citrix.com)

(55) 3 Biggest Mistakes AVD Admins Make (Easy, Simple Fix) – YouTube

Turning RDS/Virtual AppsTSFairShare

Fair Share technologies for CPU resources were introduced in Windows Server 2008 R2. Remote Desktop Services (RDS) server, Windows 10 Enterprise multi-session, and Windows 11 Enterprise multi-session use Fair Share technology to manage resources. RDS builds on the Fair Share technologies to add features for allocating network bandwidth and disk resources. Fair Share technologies are enabled by default, but you can disable them using Windows PowerShell and WMI. I would disable these settings to get the best user experience. Make sure to test this beforehand. On March 2023 on the VirtualExpo, you can see that this indeed helped longin and application launch times.

Rory Monaghan on Twitter: “Application launch time went from 40 seconds to 20 seconds when changing the TSFairShare setting. Great tip! I haven’t had this come up before. #VirtualExpo https://t.co/GdKGqvZA42” / Twitter

Fair Share technologies are enabled by default in Remote Desktop Services – Windows Server | Microsoft Learn
Slow application on Citrix / RDS – TSFairShare – Wedel IT
Disable fair sharing in Windows Server – Ryslander.com
CTP Bart Jacobs talks about this as well here: QuickPost #0004: Disable DFSS (cloudsparkle.be)

These registry key exists for CPU, DISK, and Network, all enabled by default.

Disk: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TSFairShare\Disk
- Network:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TSFairShare\NetFS
- CPU: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\QuotaSystem

19. Hardware Layer

Understanding the CPU architecture is another good topic to pay attention to. In my experience, most places now have SSD or NVMe for the storage aspect of things. The hypervisors that I see are Nutanix and VMware. Nutanix has a wonderful HCI solution, and VMware offers an HCI solution and an traditional 3-tier layout for things like UCS, PowerEdge box, etc. Whatever flavor you are running, it is vital to understand the VM sizing for the workloads. The answer around what size is mostly “it depends” However you can follow guidance from Techzone for a Scablity aspect.

“On older chips, such as Broadwell and Haswell, Intel connected processors using a ring-based architecture. But as the number of cores increased, access latency increased and bandwidth per core diminished so Intel would mitigate this by splitting the chip into two halves and adding a second ring to reduce distances. And this invisible split was something that needed to be factored into CVAD SSS to provide optimal results. This has been referred to in the past as “NUMA” or Non-Uniform Memory Access. And the leading guidance was to ensure that you are sizing CVA VMs as large as possible but not crossing NUMA nodes, sub-NUMA clusters or rings at the same time. If you sized your CVA VMs too large and they effectively spanned NUMA nodes or rings, it can lead to NUMA “thrashing” by accessing non-local resources and this would yield reduced SSS. Fast-forward to today and Intel has moved from a ring-based architecture to a mesh-based architecture. And this new mesh architecture introduced in Skylake does not have the same limitations as before where we have to split chips, divide cores or add rings. And this changes the way we size CVA servers in particular. So it’s important to understand the specific chip that is being used in the hardware you purchase and how the underlying microprocessor architecture is designed and constructed”

I do see this a lot at times, client/company throwing more CPU at things hoping it will speed up the back in workloads. Sure there are times it will help. But I try to pay heavy attention to these. CPU wait time and CPU ready time are both terms used in the context of CPU scheduling and resource management in operating systems.

CPU wait time: refers to the amount of time that a process is waiting in a queue, ready to run but unable to do so because the CPU is currently executing another process. During this time, the process is waiting for the CPU to become available so that it can start executing. Example, a virtual machine did get scheduled but the processors have nothing to process and so the CPU simply waits while the scheduled time for the virtual machine clicks by.

CPU ready time: on the other hand, refers to the amount of time that a process spends in a ready queue, waiting to be allocated CPU resources. This includes the time that the process spends waiting for its turn to use the CPU, as well as any time that it spends waiting for input/output (I/O) operations to complete. Example, virtual machine was ready, but could not get scheduled to run on the physical CPU. Bascially cpu ready means the guest is waiting on the host, cpu wait means the host is waiting on the guest

In summary, CPU wait time refers specifically to the time a process spends waiting for the CPU to become available, while CPU ready time encompasses all the time a process spends waiting for CPU and other resources.

Design Decision: Single Server Scalability | Citrix Tech Zone

20. Choosing the suitable Provision method.

MCS or PVS

Design Decision: Single Server Scalability | Citrix Tech Zone

MCS considerations

Machine Creation Services (MCS) Storage Considerations (citrix.com)
- https://docs.citrix.com/en-us/citrix-virtual-apps-desktops/install-configure/machine-catalogs-create.html#mcs-storage-considerations

PVS Considerations

This concludes the tips and tricks. Remember, this was more of a catch-all source blog showing links and summarizing what many of the EUC folks use to optimize logins. Please let me know if I missed something you believe can be helpful, and I’ll update the blog to include it.