Publications
2024
MoQ: Mixture-of-format Activation Quantization for Communication-efficient AI Inference System
NeurIPS 2024 Workshop Machine Learning with new Compute Paradigms, 2024
2020
2019
Increased Fault-Tolerance and Real-Time Performance Resiliency for Stream Processing Workloads through Redundancy
2019 IEEE International Conference on Services Computing (SCC), 51-55, 2019
Computational requirements for real-time ptychographic image reconstruction
Applied Optics 58 (7), B19-B27, 2019
2018
Reducing Tail Latencies While Improving Resiliency to Timing Errors for Stream Processing Workloads
2018 IEEE International Conference on Services Computing (SCC), 2018
Pacer: Automated Feedback-Based Vertical Elasticity for Heterogeneous Soft Real-Time Workloads
2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing …, 2018
2017
A Comparison of System Performance on a Private OpenStack Cloud and Amazon EC2
10th IEEE International Conference on Cloud Computing (IEEE Cloud), 2017
Dynamically Improving Resiliency to Timing Errors for Stream Processing Workloads
The 18th International Conference on Parallel and Distributed Computing …, 2017
Computer system and network security
CRC press, 2017
Load Balancing for Minimizing Deadline Misses and Total Runtime for Connected Car System in Fog Computing
15th International Symposium on Parallel and Distributed Processing with …, 2017
2016
Hypervisor Performance Analysis for Real-Time Workloads
High Performance Extreme Computing Conference (HPEC) IEEE, 2016
Automated Demand-Based Vertical Elasticity for Heterogeneous Real-Time Workloads
9th Annual IEEE International Conference on Cloud Computing, 2016
Reducing Data Movement with Approximate Computing Techniques
IEEE International Conference on Rebooting Computing, 2016
2015
Supporting high performance molecular dynamics in virtualized clusters using IOMMU, SR-IOV, and GPUDirect
ACM SIGPLAN Notices 50 (7), 31-38, 2015
Heterogeneous Cloud Computing: The Way Forward
IEEE Computer, 59-61, 2015
2014
Energy performance of fpgas on perfect suite kernels
2014 IEEE High Performance Extreme Computing Conference (HPEC), 1-6, 2014
Dynamic runtime optimizations for systems of heterogeneous architectures
2014 IEEE High Performance Extreme Computing Conference (HPEC), 1-6, 2014
Bridging the Virtualization Performance Gap for HPC using SR-IOV for InfiniBand
7th IEEE International Conference on Cloud Computing, 2014
GPU-Passthrough Performance: A Comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL Applications
7th IEEE International Conference on Cloud Computing, 2014
Evaluating GPU passthrough in Xen for high performance cloud computing
2014 IEEE international parallel & distributed processing symposium …, 2014
2013
Implementation of kernels on the Maestro processor
2013 IEEE Aerospace Conference, 1-6, 2013
2012
Implementation of fft and crblaster on the maestro processor
2012 IEEE Aerospace Conference, 1-6, 2012
Integrating high performance file systems in a cloud computing environment
2012 SC Companion: High Performance Computing, Networking Storage and …, 2012
2011
Heterogeneous cloud computing
Proceedings of the Workshop on Parallel Programming on Accelerator Clusters …, 2011
Programming models and development software for a space-based many-core processor
2011 IEEE Fourth International Conference on Space Mission Challenges for …, 2011
Fftw and complex ambiguity function performance on the maestro processor
2011 Aerospace Conference, 1-8, 2011
Software-based fault tolerance for the Maestro many-core processor
2011 Aerospace Conference, 1-12, 2011
Design and development of a run-time monitor for multi-core architectures in cloud computing
Sensors 11 (4), 3595-3610, 2011
P. et al Eads
Heterogeneous cloud computing. In, 378-385, 2011
2010
Opportunities for concurrent dynamic analysis with explicit inter-core communication
Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for …, 2010
Algorithm Classes for Architecture Research (ACAR)
Final Report, 2010
2009
MPI performance analysis and optimization on Tile64/Maestro
Proceedings of Workshop on Multi-core Processors for Space—Opportunities …, 2009
2008
Tera-Op Reliable Intelligently Adaptive Processing System (TRIPS) Implementation
AFRL-RI-WPTR-2008-1529, The University of Texas at Austin, 2008
Advanced Microprocessor Architectures
High Performance Embedded Computing Handbook, 499-521, 2008
2007
Preliminary study toward intelligent run-time resource management techniques for large multi-core architectures
Proceedings of the 2007 Workshop on High Performance Embedded Computing (HPEC07), 2007
A voltage and resource synthesis technique for energy-aware real-time systems
13th IEEE International Conference on Embedded and Real-Time Computing …, 2007
Evaluation of Stream Virtual Machine on Raw Processor
2007 IEEE International Parallel and Distributed Processing Symposium, 1-8, 2007
2006
CEARCH: Cognition enabled architecture
Proceedings of the Tenth Annual High Performance Embedded Computing Workshop …, 2006
Design and evaluation of a hierarchical decoupled architecture
The Journal of Supercomputing 38, 237-259, 2006
CEARCH: Cognitive Enabled ARCHitectures
Proceedings of the 10th Annual High Performance Embedded Computing Workshop …, 2006
2005
2004
2003
Dynamic power management of heterogeneous systems
Proceedings International Parallel and Distributed Processing Symposium, 8 pp., 2003
HiDISC: A decoupled architecture for data-intensive applications
Proceedings International Parallel and Distributed Processing Symposium, 8 pp., 2003
Robust Highly-Connected Direct Interconnection Network Topologies.
PDPTA, 995-1000, 2003
2002
A stream processor development platform
Proceedings. IEEE International Conference on Computer Design: VLSI in …, 2002
A power-aware, satellite-based parallel signal processing scheme
Power Aware Computing, 243-259, 2002
An optimal voltage synthesis technique for a power-efficient satellite application
Proceedings of the 39th annual Design Automation Conference, 492-497, 2002
Dynamic power management of multiprocessor systems
Proceedings 16th International Parallel and Distributed Processing Symposium …, 2002
A fast resource synthesis technique for energy-efficient real-time systems
23rd IEEE Real-Time Systems Symposium. RTSS 2002., 225-234, 2002
2001
Power-aware design synthesis techniques for distributed real-time systems
Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools …, 2001
Implementations of Real-time Data Intensive Applications on PIM-based Multiprocessor Systems.
IPDPS, 99, 2001
A PIM-based multiprocessor system
Proceedings 15th International Parallel and Distributed Processing Symposium …, 2001
PIM-and stream processor-based processing for radar signal applications
Proceedings of the 3rd Workshop on Media and Streaming Processors, 77-85, 2001
Efficient Algorithms for Fixed-Point Arithmetic Operations In An Embedded PIM
University of Southern California/Information Sciences Institute, 2001
2000
A communication scheduling algorithm for multi-FPGA systems
Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing …, 2000
Programming and development environments for configurable computing systems
2000 IEEE Aerospace Conference. Proceedings (Cat. No. 00TH8484) 5, 487-497, 2000
A high-performance, hierarchical decoupled architecture
Proceedings of MEDEA Workshop, 2000
1998
SLAAC: a distributed architecture for adaptive computing
Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No …, 1998
1997
HiDisc: A Decoupled Architecture for the future
University of Southern California, 1997
HiDISC: A High-Performance Hierarchical, Decoupled Architecture
Ph. D. Thesis, University of Southern California, 1997
HiDISC: A high-performance hierarchical decoupled computer architecture
University of Southern California, 1997
1996
Improving the performance of loop-based programs using a prefetch processor
Submitted to the 24th Annual International Symposium on Computer Architecture, 1996