Summary
Overview
Work History
Education
Publications
Projects
Certification
Timeline
Generic

Shruthi Jaisimha

Milpitas

Summary

Silicon Architecture Engineer specializing in SoC firmware development for Intel's high-performance Xeon server portfolio. Possessing expertise spanning the full product lifecycle (PC to PRQ), consistently driving advanced Power Management algorithms and high-priority features that deliver significant, quantifiable performance and revenue impact. Recognized for meeting demanding technical and executive-level quality standards through architectural planning and cross-functional leadership.

Overview

4
4
years of professional experience
2
2
Certifications

Work History

Silicon Architecture Engineer

Intel
06.2021 - Current
  • Achieved a breakthrough SVOS boot milestone in 18 hours on a single unit for the CWF-AP (Intel's first Xeon processor on the 18A process), a technical feat recognized by the Intel CEO.
  • Delivered on a strategic directive from the DCAI Executive VP, engineering firmware features for a key GNR-AP customer to successfully optimize performance, resulting in significant improvements to NUMA workload performance, and reduced access latencies (LLC/local memory).
  • Designed and implemented advanced power management algorithms that consistently delivered the highest performance per watt for the SoC. Engineered a solution achieving power savings of up to 2 watts by dynamically decreasing the VCCDDRA FIVR supply based on DIMM configuration.
  • Drove the successful Production Release Qualification (PRQ) for the GNR-AP and SRF-AP Xeon Server products, ensuring time-to-market (TTM) success while managing the full product lifecycle (product concept to PRQ) of SoC services firmware.
  • Led the development of per-RMID energy consumption reporting in core telemetry, a critical feature enabling Cloud Service Providers (CSPs) to achieve sustainability goals, and contributing to substantial Xeon revenue growth.
  • Developed and deployed Post-Si survivability firmware, and offered timely hardware workarounds by partnering with key internal stakeholders (Architect, Pre-Si, Emulation, and Post-Si teams) to rapidly resolve critical hardware bugs and technical dependencies.
  • Responsible for the architecture, design, development, and unit testing of comprehensive SoC services firmware, powering Intel's high-performance server portfolio.
  • Provided significant business unit value-add through the successful development of custom SKUs, and rapid integration of late-stage feature additions.

Education

Master of Science - Electrical And Computer Engineering

University of Illinois
Chicago, USA

Bachelor of Science - Electronics And Communications Engineering

SJBIT
Bengaluru, India

Publications

Compute-in-Memory Upside Down: A Learning Operator Co-Design Perspective for Scalability

  • This paper introduces a novel model-hardware co-design for compute-in-SRAM deep learning that eliminates complex mixed-signal peripherals (e.g., parallel DACs), significantly simplifying implementation complexity. This is achieved by co-designing learning operators to fit SRAM constraints, making the approach DAC-free even for multi-bit precision DNNs. The framework shows synergistic interaction with Bayesian inference, allowing for similar accuracy with a much smaller network size, which minimizes the footprint cost crucial for compute-in-SRAM applications.

Design of Energy Efficient and Size Reduced SRAM Cell

  • To enhance the energy efficiency of semiconductor memories, we addressed the issue of dynamic power consumption, which constitutes 80% of total power. We successfully proposed a novel 4T asymmetric SRAM cell designed using 45nm technology. This cell demonstrated a superior 98.58% reduction in dynamic power consumption and 30.86% less area compared to existing 4T SRAM cells.

Gabor Features for Single Sample Face Recognition on Multicolor Space Domain

  • Developed a modified face recognition scheme based on a hybrid color model, Gabor Feature Extraction, and PCA to address variations in illumination and small sample size challenges. The algorithm achieved 100% recognition accuracy on the MUCT Database (single image training) and 87.43% on the FEI Database, outperforming existing methodologies. This work significantly advanced recognition accuracy despite diverse challenges including illumination and expression variations.

Projects

THESIS - Energy-Efficient eDRAM Compute-in-Memory Architecture using Multiplication-Free Bitwise Operators for DNN Acceleration

  • Pioneered an ultra-compact eDRAM-based Compute-in-Memory (CiM) Framework: Developed a novel CiM macro utilizing single-transistor eDRAM cells with an additional two transistors for control, resulting in high-density and low-leakage power storage and processing capabilities superior to traditional SRAM cells.
  • Implemented a Multiplication-Free CiM Operator: Eliminated the need for Digital-to-Analog Converters (DACs) for multi-bit precision operations at the bitline level and relaxed the precision demands on the Analog-to-Digital Converters (ADCs), significantly reducing circuit complexity and power overhead.
  • Architected a Structured Macro Design: Structured the framework into μ-arrays and μ-channels, where each μ-array stores one weight channel. Neural Network weights are stored across columns, and weight bitplanes are arranged across rows for efficient bitwise operations.
  • Enabled Bitwise Multiplication within the Cell: Integrated two control transistors into the eDRAM bitcell, allowing the cell to perform bitwise multiplication by conditionally charging or discharging based on the stored weight bit and the inverted input fed through control signals.
  • Optimized Analog-to-Digital Conversion: Employed 4-bit Flash ADCs and Transmission Gate MUXes for bitline summation and analog-to-digital conversion, followed by a digital shift-and-add operation to produce the final output. The Flash ADC was identified as the major power consumer, accounting for ∼64% of total power.
  • Validated Robustness and Linearity: Demonstrated high Linearity Accuracy between inputs and the SL line. The design was subjected to up to 35mV of process variability, with two-sigma values well within bounds and the variability statistics following a Gaussian trend, confirming the architecture's stability and reliability.

Dynamic Thread Clustering DRAM Scheduler for Optimized Latency and Throughput in Multi-Core Systems

  • Designed and implemented a Multi-Metric Thread Clustering Scheduler that dynamically categorizes concurrent threads into Bandwidth and Latency clusters based on Misses Per Kilo Instruction (MPKI). This method strategically balances DRAM throughput and thread-level fairness.
  • Developed an innovative composite "Niceness Metric" (Bank Level Parallelism - Row Buffer Locality) to manage thread priorities within the Bandwidth Cluster, employing a dynamic 800-cycle shuffling to ensure fair resource allocation among memory-intensive applications.
  • Engineered a Hybrid Scheduling Core that extends the First-Ready, First-Come, First-Served (FR-FCFS) baseline with both a robust High/Low Water Mark Write-Drain policy to prevent queue overflow and a load-aware Batching Policy to manage core request injection.
  • Improved Row Buffer Hit Rate and Reduced Latency by integrating an Aggressive Precharge mechanism that monitors recent column accesses and a comprehensive auto-precharge check, minimizing row contention and rapidly preparing banks for new requests.
  • Built a Real-Time Profiling and Adaptation System that recalculates all per-thread statistics (MPKI, Bank Level Parallelism, Row Buffer Locality) over periodic time quanta, enabling the scheduler to adapt cluster assignments dynamically to shifting application memory behavior.

Fault Injection Attack Analysis and Implementation on AES Cryptography using Clock and Power Glitching

  • Implemented the Advanced Encryption Standard (AES) Algorithm on an FPGA board to establish a hardware-based cryptographic system for security testing and analysis.
  • Executed Advanced Side-Channel Fault Injection Attacks against the cryptographic hardware by strategically injecting clock and power glitches to induce errors and observe fault propagation.
  • Successfully Demonstrated a Cryptographic Break of the AES implementation by using hardware fault injection techniques (clock and power glitches), proving vulnerability to physical attacks.
  • Developed and utilized Python scripting to control the fault injection hardware (likely a tool like ChipWhisperer, given the context) and automate the entire attack process, from fault injection to data capture and analysis.

45nm Custom Design of an 8-bit MAC Datapath with Integrated SRAM for Neural Network Acceleration

  • Designed and optimized a foundational 8-bit Multiplication and Accumulation (MAC) Datapath for Neural Network acceleration, achieving optimized performance and power consumption for critical operations.
  • Implemented core arithmetic logic units (ALUs), specifically designing an 8-bit Adder and a 4-bit Multiplier as fundamental components of the high-speed MAC unit.
  • Developed an on-chip memory solution by integrating the design of a 32x32 SRAM array to store intermediate data, ensuring fast access and tight integration with the custom MAC unit.
  • Completed the physical design (PD) of the entire MAC unit and memory subsystem using Cadence EDA tools and targeting the standard 45nm CMOS technology node, demonstrating proficiency in industry-standard process and tool flows.

Certification

SWIFT Clean code certification

Timeline

Silicon Architecture Engineer

Intel
06.2021 - Current

Master of Science - Electrical And Computer Engineering

University of Illinois

Bachelor of Science - Electronics And Communications Engineering

SJBIT
Shruthi Jaisimha